-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add file level metadata extraction #244
Comments
Here is the issue which describes it datalad/datalad-metalad#395 . It would be nice to first run across a good sample datasets to assess how large those dumps would be: would postgresql scale up to store that much of BJSONs? |
I know that JSONB is the best type for storing JSON data in Postgres in many aspects. As for how well it scales, below is an answer from ChatGPT.
|
sounds relevant. I guess we will see how well/bad it works after we fill up this one with some initial dumps of per-file. And then we might want to look into partitioning etc. |
I think JSONB is the best option for JSON data, for our use, from Postgres. If you want anything better, we will have to look into NoSQL databases, CouchDB, Elasticsearch, etc. Many applications have both relational and NoSQL databases. |
ok, Let's plan to tackle this one soonish after we merge #257 . For that one we will enable Later, after we see how badly this does not scale for anything (e.g. search, datalad catalog etc) -- we might want to establish a dedicated table with per-dataset/path metadata records. |
One of the possible use cases would be to establish/feed the datalad-catalog as outlined in #123 (comment) .
For that we should start with
metalad_core
file level extraction. But I think we might need some guidance or even functionality at metalad level to make it feasible in reasonable time. One of the relevant issues with a comment there is datalad/datalad-metalad#379 and also inquiring now from @christian-monch on Element.The text was updated successfully, but these errors were encountered: