Skip to content

Tips for Pulling Data from Open Humans Projects

Dana Lewis edited this page Aug 16, 2021 · 4 revisions

Different ways to pull data down from an Open Humans project

Here are some example ways to pull down data from an Open Humans project. For more details, also check out this page.

(Note: In these examples, XitlFDXBqm5TRK8Vuh3Ey2cDFdiTWz7amKpot97H9Xfgak1qpvray0b0arQhvpEP is an example project token - make sure to use your project's token. Ditto for substituting in the appropriate member ID instead of the examples 12345678).


1. You can pull down a metadata file to see the data included in the project:

ohproj-download-metadata -T XitlFDXBqm5TRK8Vuh3Ey2cDFdiTWz7amKpot97H9Xfgak1qpvray0b0arQhvpEP --output-csv MyProjectMetadata.csv


2. You can download all data at once for your project:

ohproj-download -T XitlFDXBqm5TRK8Vuh3Ey2cDFdiTWz7amKpot97H9Xfgak1qpvray0b0arQhvpEP -d MyProjectData


3. You can pull down individual project members’ data for your project:

ohproj-download -T XitlFDXBqm5TRK8Vuh3Ey2cDFdiTWz7amKpot97H9Xfgak1qpvray0b0arQhvpEP -m 12345678 -d MyProjectData


4. You can categorize member data into lists, and pull batches of data based on the lists.

  • This is helpful for excluding people's data who you know have joined the project but don’t have the required data to participate in a project and shouldn't be downloaded every time. Here's an example for using --excludelist:

ohproj-download -T XitlFDXBqm5TRK8Vuh3Ey2cDFdiTWz7amKpot97H9Xfgak1qpvray0b0arQhvpEP --excludelist BlackList.txt -d MyProjectData

  • This can also be helpful for tracking which members data are used for various projects, by creating white list files for the project and updating if/when you add members to that project --memberlist

ohproj-download -T XitlFDXBqm5TRK8Vuh3Ey2cDFdiTWz7amKpot97H9Xfgak1qpvray0b0arQhvpEP --memberlist WhiteList.txt -d MyProjectData

or:

ohproj-download -T XitlFDXBqm5TRK8Vuh3Ey2cDFdiTWz7amKpot97H9Xfgak1qpvray0b0arQhvpEP --memberlist Project123ParticipantList.txt -d MyProjectData

TIP - if you get an error saying 'Each line in whitelist or blacklist is expected ', it may be because there are some 7-digit member numbers. Add a zero in front of those, re-save your list file, and then the data pull should work as expected.


How to create a whitelist or memberexclude list:

  1. Type vi WhiteList.txt on the command line and hit enter. (You can change WhiteList.txt to be WhateverYouWantToCallIt.txt)

  2. This opens a blank file.

  • Type i to enter insert mode.
  • Type, or paste, the list of numbers into the file.
  • When you’re finished, hit the escape key to exit insert mode.
  • To save the file and exit, type colon, w, q, and ! (i.e. :wq!) and hit enter. If you want to save, but continue viewing the file, just type :w!. To exist without saving, :q!.

TIP: look at the list of numbers, if some are 7 numbers long, they have a leading zero that was dropped. Add a zero in front of those member numbers, otherwise many of the scripts may error out because it checks for an 8-digit member number.

  1. Check where you created the WhiteList.txt file. That needs to be in whatever directory you are in for running the command. I.e. if you’re in your Desktop, and create your MyProjectData folder there, your WhiteList.txt file should be in your Desktop directory folder.

Troubleshooting

If you run into issues running a command from the above, such as

Traceback (most recent call last):
  File "/usr/local/bin/ohproj-download-metadata", line 9, in <module>
    load_entry_point('open-humans-api==0.1.3', 'console_scripts', 'ohproj-download-metadata')()
  File "/Library/Python/2.7/site-packages/click/core.py", line 722, in __call__
    return self.main(*args, **kwargs)
  File "/Library/Python/2.7/site-packages/click/core.py", line 697, in main
    rv = self.invoke(ctx)
  File "/Library/Python/2.7/site-packages/click/core.py", line 895, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Library/Python/2.7/site-packages/click/core.py", line 535, in invoke
    return callback(*args, **kwargs)
  File "/Library/Python/2.7/site-packages/ohapi/command_line.py", line 117, in download_metadata
    project = OHProject(master_access_token=master_token)
  File "/Library/Python/2.7/site-packages/ohapi/projects.py", line 20, in __init__
    self.update_data()
  File "/Library/Python/2.7/site-packages/ohapi/projects.py", line 36, in update_data
    results = get_all_results(url)
  File "/Library/Python/2.7/site-packages/ohapi/utils.py", line 185, in get_all_results
    data = get_page(page)
  File "/Library/Python/2.7/site-packages/ohapi/utils.py", line 168, in get_page
    if 'detail' in response.json():
  File "/Library/Python/2.7/site-packages/requests/models.py", line 892, in json
    return complexjson.loads(self.text, **kwargs)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/__init__.py", line 338, in loads
    return _default_decoder.decode(s)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/decoder.py", line 366, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/decoder.py", line 384, in raw_decode
    raise ValueError("No JSON object could be decoded")
ValueError: No JSON object could be decoded

..then it may be an issue with your python versioning/linking.

Consider installing pyenv following the advice here to instead update and use a relevant Python version. Note - the instructions reference 3.7.3, which was the latest python version when the article was written. Check for the latest version number, e.g. 3.9.6 or later and use that in place of 3.7.3 in those instructions.

Watch for error messages in the output of each step, installing pyenv may involve some additional commands such as updating your homebrew, pip, etc along the way.

To run with pyenv, follow these instructions, summarized here:

  1. which python will likely return /usr/local/bin/python, so run PATH=$(pyenv root)/shims:$PATH, then which python should return /Users/my_username/.pyenv/shims/python.
  2. Run echo 'PATH=$(pyenv root)/shims:$PATH' >> ~/.bashrc then python -V to check the version (should be the version you want, e.g. 3.9.6).
  3. Run python -m venv venv (no output) then source ./venv/bin/activate, and a (venv) should show up to the left of your device name now. You're now running the virtual environment for this terminal session and can proceed to use the above OH API calls.