Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify run.py πŸƒβ€β™€οΈ #308

Closed
wants to merge 10 commits into from

Conversation

KristinaUlicna
Copy link
Collaborator

@KristinaUlicna KristinaUlicna commented Oct 12, 2023

PR contribution summary

Why is this PR useful / good for? Please describe the problem(s) you're trying to address.

  • Allows node feature & edge property computation at runtime by:
    • Checking all graphs in provided train, valid and infer datasets
    • Checking if graph attribute storage is allowed in the config; error otherwise.
    • Computing & storing the features in the source graph if allowed.

List of proposed changes / linked issues & discussions

What should a reviewer concentrate their feedback on?

  • πŸƒ Scripts to run - replicating the results with run.py
  • πŸ’» Code quality
  • πŸ“ Everything looks OK?

What type of PR is this? (check all applicable)

  • πŸͺ„ Feature
  • πŸ› Bug fix
  • πŸ§‘β€πŸ’» Code refactor / style
  • πŸ”₯ Performance Improvements

Added tests?

  • πŸ‘ yes, re-vitalised the commented tests + now expects test_run.pyto fail

Hints for the reviewer:

At review time, please try a bunch of files which do not have the node features & edge properties in them.
Set the store_graph_attributes_permanently config hyperparameter to False πŸ‘‡

store_graph_attributes_permanently: False

which should throw an error & instructions on what to do. Please follow the instructions until you get to the point where you can successfully train your model πŸ˜ƒ


PR review summary

Describe what this PR does & how you reviewed the individual items, where needed:

Some helper checks to tick off:

  • Focus on image annotation
  • Focus on model training
  • Could any optimization be applied?
  • Is there any redundant code?
  • Are there any spelling errors?

In conclusion, after my review, I'd like to:

  • πŸ™‹ ask some clarifying questions
  • πŸ™… suggest some specific changes

@KristinaUlicna KristinaUlicna added the methodology Building functional & diverse pipeline label Oct 12, 2023
@KristinaUlicna KristinaUlicna self-assigned this Oct 12, 2023
@review-notebook-app
Copy link

Check out this pull request onΒ  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@KristinaUlicna KristinaUlicna added this to the Merge `dev` -> `main` milestone Oct 12, 2023
@KristinaUlicna KristinaUlicna changed the base branch from main to development October 12, 2023 20:10
@KristinaUlicna KristinaUlicna marked this pull request as draft October 12, 2023 20:10
@KristinaUlicna KristinaUlicna marked this pull request as ready for review October 13, 2023 16:10

return target_list, subgraph_dataset
# Create a transform function with frozen arguments:
check_and_chop_partial = partial(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find the name of this function a bit confusing, it is not very clear to what is happening here. As is in the main run.py scrip maybe we should add more comments to explain what is happening?

store_permanently: bool = False,
extractor_fn: str | Path = None,
):
# Check if datasets are ready for training:
Copy link
Collaborator

@crangelsmith crangelsmith Oct 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar as described for run.py maybe add some docstring description of what these functions are about?

@crangelsmith
Copy link
Collaborator

Question:

If I already have the graph attributes saved in my graph but I want to update them I should also set: store_graph_attributes_permanently: True?

In that case some suggestions:

  • I think the config name should be a bit more clear that we are actually computing as well as storing these attributes.
  • I think we should log clearly what is happening with the attributes in the console. e.g:
    • If we load a graph that already has attributes and the config is False, log that we are using precomputed attributes.
    • If we load a graph that already has attributes and the config is True, log that we are using recomputing the attributes that will be overwritten.
    • Similar to when you don't have attributes yet, the case where store_graph_attributes_permanently: False is very well documented/described but when is true we should log it too.

@KristinaUlicna KristinaUlicna marked this pull request as draft October 23, 2023 20:06
@KristinaUlicna
Copy link
Collaborator Author

After some offline discussions & design re-consideration, this PR lost on value as the issues can be approached differently, without needing to store the NODE_FEATURES vector in the written-out graph. This gets addressed in #329 . Closing this deprecated PR now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
methodology Building functional & diverse pipeline
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[DEVELOPMENT] Allow feature / property extraction to be computed at run-time
2 participants