Skip to content

MADICES/paas-the-parser

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 

Repository files navigation

paas-the-parser

(27/09/2022) We are trying to create a MaRDA working group around this topic (link to proposal document) and repo https://github.com/marda-alliance/metadata_extractors

From ongoing discussions:

We discussed how parsing unstructured logs/output files is still very common. Everyone's writes their own parser that maps to their own data models. Usually these are bundled as part of larger packages that do the analysis (e.g. pymatgen, Ase, aiida in comp. mat. sci), and rewritten in multiple languages (e.g. the modular cheminfo parsers in JS).

Do we think a simple registry/framework for code objects that operate on files and return structured data would be a useful investment? e.g., a docker image per parser with a unified interface that also spits out a schema for the parsed data? Do such things already exist? Does this go any way to tackling the scalability of our current ecosystem, or is this just creating more laborious work? This could then motivate the development by the original raw file creators, like instrument manufacturers and code authors.

These could then be employed across multiple ELN/repository services and used for ETL in perhaps a more scalable way than is currently available. Given the wealth of existing parsers it would be easy to test this out quite quickly, and there is potential for nice integration with many existing services present at the workshop.

Prior art

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published