Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add github action for testing notebooks #123

Merged
merged 17 commits into from
Oct 31, 2024
Merged

Add github action for testing notebooks #123

merged 17 commits into from
Oct 31, 2024

Conversation

hevansDev
Copy link
Contributor

No description provided.

@petermarshallio
Copy link
Collaborator

Reckon @techdocsmith @hevansDev we should specific about what tests we are, and are not, attempting to catch with this test suite. Are we trying to be sure, for example, that the notebooks run "as code"? Or are we just looking for quality things?

For example...

  • Should this test do anything that a user wouldn't do if they picked up the notebooks; i.e. I noticed that there's lots of http://localhost:8082/status calls when the tests start the containers, waiting for Druid to actually be live. Therefore, the notebooks themselves should have something in them that will check that all the services are up, and then throw an error after a timeout. Ie replace status_client = druid.status and status_client.version. (Add to this that one of the tests I did was to the wrong port number being put into the notebook, and I'm wondering if this should actually be a notebook based thing.)
  • Some notebooks state that they only require druid-jupyter - if the test is run with all-services and it works, but it doesn't with what we say it does, is that a fail? This might also speed up the actual time it takes to run the tests.

Are we OK with such a long timeout for the tests? When I ran this with the wrong port for Druid, for example, it took ages to fail the test. I was wondering if, realistically, any notebook should take like 5 minutes - as a quality test more than a functional one.

@petermarshallio
Copy link
Collaborator

I'm wondering @hevansDev @techdocsmith if we only really know if this change becomes an issue when we start to commit notebooks. I do think we should find the other angle, though, like what common issues is this change looking to catch?

@petermarshallio
Copy link
Collaborator

@hevansDev what errors does this test throw across all notebooks rn? I wonder cos I found something in a notebook earlier that didn't seem to work any more, and wondered if it was me (!!!) or if it really was an issue.

@petermarshallio petermarshallio requested review from petermarshallio and removed request for petermarshallio September 26, 2024 14:09
@hevansDev hevansDev force-pushed the test-gha branch 2 times, most recently from e6edce8 to d9e92be Compare October 14, 2024 15:40
Copy link
Collaborator

@petermarshallio petermarshallio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catches!! I suggest pulling out the amendments to the notebooks into a PR that gets merged before this one as these fix a separate issue.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this change is necessary : when I tested this notebook, the table was correctly dropped.
If still required, note that the variable needs to be used rather than the explicit table name. The variable table_name is set in a previous cell.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To conform with the template, all imports should be in a section at the top ("import additional modules").
100 seconds also seems a long time to pause.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above, can it be lower than 100 seconds?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect the authors intention here was to use latest_session_length throughout, rather than session_length, given that the ingestion is a GROUP BY, and session_length is high cardinality.

@petermarshallio petermarshallio merged commit e707b66 into main Oct 31, 2024
3 checks passed
@petermarshallio petermarshallio deleted the test-gha branch October 31, 2024 09:27
@petermarshallio petermarshallio restored the test-gha branch October 31, 2024 09:28
@petermarshallio
Copy link
Collaborator

Argh I merged the wrong PR and thus deleted the branch. I have reverted the changes and restored the branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants