Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What form should that database (or repository) be for storing experimental data? #1

Open
mrshirts opened this issue Apr 2, 2014 · 2 comments
Labels

Comments

@mrshirts
Copy link
Contributor

mrshirts commented Apr 2, 2014

No description provided.

@leeping
Copy link
Contributor

leeping commented Apr 2, 2014

Hi Michael,

I think that this repository could either be highly flexible and contain information as we find it (e.g. mixed-format tables and papers from the literature), or it could be formal and curated, or it could contain both (i.e. "raw data" and "curated data" folders).

Yesterday we had a discussion for how to best store experimental data in a ForceBalance calculation - but I am not sure if this format is suitable for a repository: leeping/forcebalance#59

John suggested using one of the existing standards for experimental data formats like IASTAB. I don't think it's the best choice for ForceBalance because it could overcomplicate simple jobs - plus it would take me too long to write a parser that fully conforms to the standard - but it might be a good solution for more "long term" data storage.

@davidlmobley
Copy link

So, in general I don't think files as exported by Excel are a good choice
for a flexible data format. Updates by script (for example) could be
nontrivial as one would then have to ensure the output is identical to what
would have been obtained exporting from Excel.

Also, while delimited text files can be helpful in some cases, these can be
particularly problematic in others. For example, IUPAC names can contain
BOTH spaces AND commas. In a space-delimited file, the spaces obviously
present problems. Likewise, in a comma delimited file the commas present
problems. When I use a delimited file for chemical information, I typically
end up having to use alternate delimiters (currently I'm using ';') which
are not particularly Excel-friendly. Presumably if the data contains URLs
(which it would if linking to papers) problems with special characters in
URLs could also pose problems.

I think a better solution would be some type of XML or XML-like format. I
propose not reinventing the wheel; instead, see what Python libraries are
available, probably for XML libraries, and just adopt a format which will
work with those. Plan on making a tool which will update the libraries, and
another tool which can dump the library into a human readable format for
easy perusal. This could use tabs for delimiting data, and since it
wouldn't be necessary to parse this to/from any other format there would be
no problems with a parser needing to be able to decipher the delimiters.

David

On Tue, Apr 1, 2014 at 8:08 PM, Michael Shirts notifications@github.comwrote:

Reply to this email directly or view it on GitHubhttps://github.com//issues/1
.

David Mobley
Assistant Professor
Department of Pharmaceutical Sciences
Department of Chemistry
3134B Natural Sciences I
University of California, Irvine
Irvine, CA 92697
dmobley@uci.edu
work (949) 824-6383
cell (949) 385-2436

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants