Add publisher status report notebook #43

duncandewhurst · 2024-07-29T16:00:19Z

@neelima-j I've documented the outstanding tasks for this PR below. Feel free to move these to issues if you want to get the PR merged. You don't need to do all of these before sharing the report with CoST, but it would be good to complete them before the project is wrapped up.

To do:

General

Add new notebooks to the table in README.md
Format SQL code in new notebooks using pgFormatter (can reuse the script from https://github.com/open-contracting/notebooks-ocds/blob/624dcca5d1c360132f0f50af7be03daa119b321a/manage.py#L165-L176)
Ask tech infrastructure group about setting up database backups

Data import notebook

Refactor so that all imports can be run at once

Quality criteria, checks and metrics notebook

semantics_coordinates: Update to cover all supported geometry types
criteria_registered: Add code to update registered_prefixes table from https://standard.open-contracting.org/staging/infrastructure/0.9-dev/en/reference/prefixes

Publisher status report notebook

Replace collection_id in coverage query outputs with source_id. See get_output function definition for an example.
Format coverage query outputs for readability using Styler, e.g. colour scale for coverage scores, collapsing/expanding for objects. See get_results for an example of how to do this.

review-notebook-app · 2024-07-29T16:10:41Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

Merge main into publisher_status_report

duncandewhurst · 2024-08-01T14:11:07Z

@neelima-j please see the list of outstanding tasks in the PR description.

…edit to appendix

…cover all supported geometry types

neelima-j · 2024-10-07T16:45:32Z

Outstanding tasks moved to separate issues #45 #46 #47

odscjen · 2024-10-09T11:39:25Z

OC4IDS_Database_Data_Import.ipynb

@@ -16,9 +16,11 @@
        "\n",


For each data source, choose a data source and run after (Ctrl+F10)

(Ctrl+F10) doesn't run the cell for me. I suspect this might be a browser specific thing (I use Firefox), or maybe it's to do with the other extensions I've got running in my browser, but either way as this isn't a universal command I'd remove it and replace it with a command to run the cell (which I assume it what it's supposed to do?)

Reply via ReviewNB

odscjen · 2024-10-09T11:39:26Z

OC4IDS_Publisher_Status_Report.ipynb

@@ -0,0 +1,1796 @@
+{


Using run_id = 2024-08-02 06:29:25.083245

Cell didn't run, got the error

File "<ipython-input-20-704129bd1c51>", line 5 source_id, ^ IndentationError: unexpected indent

Reply via ReviewNB

odscjen · 2024-10-09T11:39:26Z

OC4IDS_Publisher_Status_Report.ipynb

@@ -0,0 +1,1796 @@
+{


Line #3. % % sql
should be %%sql i.e. need to remove the spaces

Reply via ReviewNB

odscjen · 2024-10-09T11:39:26Z

OC4IDS_Publisher_Status_Report.ipynb

@@ -0,0 +1,1796 @@
+{


cell returned error

KeyError Traceback (most recent call last) /usr/local/lib/python3.10/dist-packages/pandas/core/indexes/base.py in get_loc(self, key) 3804 try: -> 3805 return self._engine.get_loc(casted_key) 3806 except KeyError as err: index.pyx in pandas._libs.index.IndexEngine.get_loc() index.pyx in pandas._libs.index.IndexEngine.get_loc() pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item() pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item() KeyError: 'check' The above exception was the direct cause of the following exception: KeyError Traceback (most recent call last)

6 frames

/usr/local/lib/python3.10/dist-packages/pandas/core/indexes/base.py in get_loc(self, key) 3810 ): 3811 raise InvalidIndexError(key) -> 3812 raise KeyError(key) from err 3813 except TypeError: 3814 # If we have a listlike key, _check_indexing_error will raise KeyError: 'check'

Reply via ReviewNB

odscjen · 2024-10-09T11:39:26Z

OC4IDS_Publisher_Status_Report.ipynb

@@ -0,0 +1,1796 @@
+{


This and the following other date cells, in the returned table the column with the date is titled "count" which is confusing

Reply via ReviewNB

odscjen · 2024-10-09T11:39:26Z

OC4IDS_Quality_Criteria_Checks_and_Metrics.ipynb

@@ -0,0 +1,1792 @@
+{


I don't understand why 2 collections need to be chosen and what the difference should be between them, some explanation is needed, or if there is explanation elsewhere it needs to be signposted here

Reply via ReviewNB

odscjen · 2024-10-09T11:39:26Z

OC4IDS_Quality_Criteria_Checks_and_Metrics.ipynb

@@ -0,0 +1,1792 @@
+{


This is returning an empty table which means all the other checks also return an empty table, not sure what I'm doing wrong? I used 'dev_1' and 'dev_2' as the default load and comparison_load ids and then also tried with 2 collections I created myself using the import notebook with the same empty result.

Reply via ReviewNB

duncandewhurst added 2 commits July 29, 2024 16:59

Add publisher status report notebook

1c26c7c

Publisher Status Report: Add link to data import notebook

4ad33fb

duncandewhurst added 3 commits July 29, 2024 17:21

Add quality criteria, checks and metrics notebook

88304b6

Merge pull request #44 from open-contracting/main

304c24d

Merge main into publisher_status_report

data import notebook: add load_id, refactor

57b49a6

duncandewhurst force-pushed the publisher_status_report branch from 99e9731 to 162e373 Compare July 30, 2024 08:31

Update data import notebook

0c4b846

duncandewhurst force-pushed the publisher_status_report branch from 162e373 to 0c4b846 Compare July 30, 2024 08:31

duncandewhurst added 3 commits August 1, 2024 13:08

quality criteria, checks and metrics: Add coverage checks

1a08076

Publisher status report: Add coverage section

1a4cbb2

Publisher status report: Add coverage docs

5ad93de

duncandewhurst assigned neelima-j Aug 1, 2024

neelima-j added 7 commits September 16, 2024 15:08

README.md: Add links to quality check and status report notebooks

7f3da3f

Replace collection_id in coverage query outputs with source_id, copy …

4913575

…edit to appendix

Publisher status report: Format coverage query outputs for readability

1db6e17

Quality criteria checks and metrics: Update semantics_coordinates to …

e44f007

…cover all supported geometry types

Fix spelling of 'criteria' in text and code

6a75701

Formats SQL code using pgFormatter

2fcaf65

Formats SQL code using pgFormatter

201494e

neelima-j requested a review from odscjen October 7, 2024 16:45

neelima-j marked this pull request as ready for review October 7, 2024 16:46

odscjen reviewed Oct 9, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add publisher status report notebook #43

Add publisher status report notebook #43

duncandewhurst commented Jul 29, 2024 •

edited by neelima-j

Loading

review-notebook-app bot commented Jul 29, 2024

duncandewhurst commented Aug 1, 2024

neelima-j commented Oct 7, 2024

odscjen Oct 9, 2024 •

edited

Loading

odscjen Oct 9, 2024 •

edited

Loading

odscjen Oct 9, 2024 •

edited

Loading

odscjen Oct 9, 2024 •

edited

Loading

odscjen Oct 9, 2024 •

edited

Loading

odscjen Oct 9, 2024 •

edited

Loading

odscjen Oct 9, 2024 •

edited

Loading

Add publisher status report notebook #43

Are you sure you want to change the base?

Add publisher status report notebook #43

Conversation

duncandewhurst commented Jul 29, 2024 • edited by neelima-j Loading

review-notebook-app bot commented Jul 29, 2024

duncandewhurst commented Aug 1, 2024

neelima-j commented Oct 7, 2024

odscjen Oct 9, 2024 • edited Loading

Choose a reason for hiding this comment

odscjen Oct 9, 2024 • edited Loading

Choose a reason for hiding this comment

odscjen Oct 9, 2024 • edited Loading

Choose a reason for hiding this comment

odscjen Oct 9, 2024 • edited Loading

Choose a reason for hiding this comment

odscjen Oct 9, 2024 • edited Loading

Choose a reason for hiding this comment

odscjen Oct 9, 2024 • edited Loading

Choose a reason for hiding this comment

odscjen Oct 9, 2024 • edited Loading

Choose a reason for hiding this comment

duncandewhurst commented Jul 29, 2024 •

edited by neelima-j

Loading

odscjen Oct 9, 2024 •

edited

Loading

odscjen Oct 9, 2024 •

edited

Loading

odscjen Oct 9, 2024 •

edited

Loading

odscjen Oct 9, 2024 •

edited

Loading

odscjen Oct 9, 2024 •

edited

Loading

odscjen Oct 9, 2024 •

edited

Loading

odscjen Oct 9, 2024 •

edited

Loading