Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add extraction of Connectivity Search PathCount table #4

Merged
merged 5 commits into from
Nov 17, 2024

Conversation

d33bs
Copy link
Member

@d33bs d33bs commented Nov 17, 2024

This PR adds an extraction of the Connectivity Search PathCount table. I attempted to use the SQL credentials under https://github.com/greenelab/connectivity-search-backend?tab=readme-ov-file#database but I found that my query would hang indefinitely when attempting to access table dj_hetmech_app_pathcount (this may have been user error, I'm unsure). As a result my efforts here surrounded using the SQL statement backups. I tried to focus on extracting only the dj_hetmech_app_pathcount table and avoided unnecessary data loading into a full PostgreSQL database in order to keep things lightweight and avoid potentially larger than system resource consumption or cost (mostly storage).

To avoid a full data extraction of the database I filtered the SQL backup statements to create and then populate a single table within a DuckDB database (pg_restore offers a single table ingest but only if the archive file is non-text/SQL, so again, I avoided using PostgreSQL directly). Then, to simplify the data access further, I extract the table from the database as a Parquet table. I plan to share along the results directly via email but this code is also able to generate the results where needed.

CC @NegarJanani @cgreene

@d33bs d33bs merged commit 3c2a595 into CU-DBMI:main Nov 17, 2024
5 checks passed
@d33bs d33bs deleted the path-table branch November 17, 2024 20:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant