Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add publisher status report notebook #43

Open
wants to merge 16 commits into
base: main
Choose a base branch
from

Conversation

duncandewhurst
Copy link
Contributor

@duncandewhurst duncandewhurst commented Jul 29, 2024

@neelima-j I've documented the outstanding tasks for this PR below. Feel free to move these to issues if you want to get the PR merged. You don't need to do all of these before sharing the report with CoST, but it would be good to complete them before the project is wrapped up.

To do:

General

Data import notebook

  • Refactor so that all imports can be run at once

Quality criteria, checks and metrics notebook

Publisher status report notebook

  • Replace collection_id in coverage query outputs with source_id. See get_output function definition for an example.
  • Format coverage query outputs for readability using Styler, e.g. colour scale for coverage scores, collapsing/expanding for objects. See get_results for an example of how to do this.

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@duncandewhurst
Copy link
Contributor Author

@neelima-j please see the list of outstanding tasks in the PR description.

@neelima-j
Copy link
Contributor

Outstanding tasks moved to separate issues #45 #46 #47

@neelima-j neelima-j requested a review from odscjen October 7, 2024 16:45
@neelima-j neelima-j marked this pull request as ready for review October 7, 2024 16:46
@@ -16,9 +16,11 @@
"\n",
Copy link
Contributor

@odscjen odscjen Oct 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For each data source, choose a data source and run after (Ctrl+F10)

(Ctrl+F10) doesn't run the cell for me. I suspect this might be a browser specific thing (I use Firefox), or maybe it's to do with the other extensions I've got running in my browser, but either way as this isn't a universal command I'd remove it and replace it with a command to run the cell (which I assume it what it's supposed to do?)


Reply via ReviewNB

@@ -0,0 +1,1796 @@
{
Copy link
Contributor

@odscjen odscjen Oct 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using run_id = 2024-08-02 06:29:25.083245

Cell didn't run, got the error

File "<ipython-input-20-704129bd1c51>", line 5
    source_id,
    ^
IndentationError: unexpected indent

Reply via ReviewNB

@@ -0,0 +1,1796 @@
{
Copy link
Contributor

@odscjen odscjen Oct 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line #3.    % % sql

should be %%sql i.e. need to remove the spaces


Reply via ReviewNB

@@ -0,0 +1,1796 @@
{
Copy link
Contributor

@odscjen odscjen Oct 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cell returned error

KeyError                                  Traceback (most recent call last)
/usr/local/lib/python3.10/dist-packages/pandas/core/indexes/base.py in get_loc(self, key)
   3804         try:
-> 3805             return self._engine.get_loc(casted_key)
   3806         except KeyError as err:

index.pyx in pandas._libs.index.IndexEngine.get_loc()

index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'check'
The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)

6 frames

/usr/local/lib/python3.10/dist-packages/pandas/core/indexes/base.py in get_loc(self, key)
   3810             ):
   3811                 raise InvalidIndexError(key)
-> 3812             raise KeyError(key) from err
   3813         except TypeError:
   3814             # If we have a listlike key, _check_indexing_error will raise

KeyError: 'check'


Reply via ReviewNB

@@ -0,0 +1,1796 @@
{
Copy link
Contributor

@odscjen odscjen Oct 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This and the following other date cells, in the returned table the column with the date is titled "count" which is confusing


Reply via ReviewNB

@@ -0,0 +1,1792 @@
{
Copy link
Contributor

@odscjen odscjen Oct 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why 2 collections need to be chosen and what the difference should be between them, some explanation is needed, or if there is explanation elsewhere it needs to be signposted here


Reply via ReviewNB

@@ -0,0 +1,1792 @@
{
Copy link
Contributor

@odscjen odscjen Oct 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is returning an empty table which means all the other checks also return an empty table, not sure what I'm doing wrong? I used 'dev_1' and 'dev_2' as the default load and comparison_load ids and then also tried with 2 collections I created myself using the import notebook with the same empty result.


Reply via ReviewNB

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants