To Consider: Comprehensive Final Enhancements for Project Efficiency and Maintainability #32

tmushayahama · 2024-04-22T17:21:50Z

Some tasks to consider for the remaining time

Implement Elasticsearch Scrolling for Pagination
Add pagination for large datasets using Elasticsearch scrolling for @quemeb.
Consider creating endpoints like scrollAnnotations, ScrollSNPsByChromosome, and ScrollSnpsById.
Research and implement API security, possibly using an API Guard annotation.
Make the scrollId an optional parameter and extend the Snp class to return a scrollId.
I will explain below more detail
Automate Purge of Downloads Folder
Develop a cron job or equivalent to regularly clear the downloads folder.
Enhance Test Coverage
Ensure test coverage includes fields like VEP_refseq_PANTHER_GO_SLIM_cellular_component_list_id.
Add these values to your sample data to ensure comprehensive testing.
Dynamic Column Handling
Implement functionality to test variable column loading, allowing for the addition or removal of columns dynamically.
This will start from your schema generation code
API Documentation
Research and implement a tool equivalent to Swagger for documenting APIs, including descriptions, required parameters, and optional parameters.
Code Documentation
If time allows, enhance code documentation using docstrings.
Reference: https://testdriven.io/blog/documenting-python/
Something to consider, Standardize Coding Conventions
Ensure consistent naming conventions across the codebase.
Choose and enforce a standard naming convention (preferably snake_case for Python). sometimes it is
GetSNPsByChromosome and sometimes it is search_by_chromosomes
Good Error Messages

Implementation flow idea Scrolling in Elasticsearch:

Scrolling in Elasticsearch allows you to retrieve large numbers of results from a query in multiple batches without the cost of deep pagination. It's suitable for processing large datasets that exceed typical pagination limits.

When a scroll query is initiated, Elasticsearch provides a scroll_id that you use to fetch the next batch of results. This scroll_id acts like a cursor pointing to a specific place in the dataset.

Making scrollId an Optional Parameter:

Modify the endpoint that triggers the scrolling query to accept a scrollId as an optional query parameter.
If a scrollId is provided, the API should continue fetching results from where the last batch ended.
If no scrollId is provided, the API should start a new scroll session and return the initial batch of results along with a new scrollId.

Extending the Snp Class:
Subclass the Snp class to include a property that can return a scrollId associated with a query session.

API and Code Adjustments:
Adjust the API's logic to manage the lifecycle of a scroll session, including the expiration of scrollIds after a certain time (typically 1 minute by default in Elasticsearch, but configurable).
Implement error handling for cases when an expired or invalid scrollId is received.

tagging @akshala @huaiyumi

tmushayahama assigned akshala Apr 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

To Consider: Comprehensive Final Enhancements for Project Efficiency and Maintainability #32

To Consider: Comprehensive Final Enhancements for Project Efficiency and Maintainability #32

tmushayahama commented Apr 22, 2024 •

edited by akshala

Loading

To Consider: Comprehensive Final Enhancements for Project Efficiency and Maintainability #32

To Consider: Comprehensive Final Enhancements for Project Efficiency and Maintainability #32

Comments

tmushayahama commented Apr 22, 2024 • edited by akshala Loading

Some tasks to consider for the remaining time

Implementation flow idea Scrolling in Elasticsearch:

tmushayahama commented Apr 22, 2024 •

edited by akshala

Loading