-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to load HPDS data from CSV/DBMS, and the format issue #49
Comments
Jenkins is configured to use local directory on host. For the job "Project Load HPDS Data From CSV" it uses the folder /usr/local/docker-config/hpds_csv/ on host and mounted into the container in this path: /opt/local/hpds/ |
We have examples of how to map and load your data and examples of how the NHANES data was loaded in this repo: https://github.com/hms-dbmi/pic-sure-hpds-phenotype-load-example Please let us know if you need additional assistance. |
thank you ! |
Hi, thanks a lot! |
Hi, We want to make sure we understand the question. When you say "multiple databases/projects", does that mean that you want them displayed with different root paths? Can you provide a more detailed example? Thanks! |
Dear, dmpillion: Yes, projects have different root paths, I don't know the exact meaning you mentioned. For example, a single project will have a set of SUBJECT_ID as primary key, another project will have a different set of SUBJECT_ID, therefore these two csv files cannot be combined to one single allConcepts.csv file. There are also several other questions:
Feb 21, 2023 4:36:23 AM com.google.common.cache.LocalCache processPendingNotifications
thanks a lot! |
Thank you both. I'm the PI on one of the pilot AIM-AHEAD projects (I'm a physician scientist and not a data scientist) and Dr. Paul Avilach advised our group to try installing PIC-SURE to be linked to AWS SWB (Xiangjun has been working on this for several weeks). The concept mapping is interesting but unclear how feasible it is. I have extracted clinical data on 20,000 patients with likely millions of different unique longitudinal lab values and several millions of unique ICD/CPT/HCPCS codes. Would each one require its own concept mapping for PIC-SURE to function properly? We also have semi-structured and unstructured long clinical notes. I read from the example that the core of PIC-SURE is i2b2 which is a data aggregation/search platform that our institution already has. Personally, I'm trying to understand the benefit of using PIC-SURE HPDS platform over standard SQL platform... or just leave them as csv files that we can easily import to any statistical software for data merging and analyses. Is PIC-SURE more like i2b2, SlicerDicer, or does it have any built-in NLP capability similar to EPIC search engine? |
Dear dmpillion: We have error when importing csv into pic-sure, Exception in thread "main" java.lang.OutOfMemoryError: Java heap space The heapsize was assigned in default: + docker run --name=hpds-etl -v /usr/local/docker-config/hpds_temp:/opt/local/hpds -v /usr/local/docker-config/hpds_csv/allConcepts.csv:/opt/local/hpds/allConcepts.csv -e HEAPSIZE=4096 -e LOADER_NAME=CSVLoader --name hpds_data_load_csv hms-dbmi/pic-sure-hpds-etl:LATEST How can users set memory size? thanks |
Assuming you are using this Job (Load HPDS Data From CSV) to load the data. |
Hi, + docker run --name=hpds-etl -v /usr/local/docker-config/hpds_temp:/opt/local/hpds -v /usr/local/docker-config/hpds_csv/allConcepts.csv:/opt/local/hpds/allConcepts.csv -e HEAPSIZE=100000 -e LOADER_NAME=CSVLoader --name hpds_data_load_csv hms-dbmi/pic-sure-hpds-etl:LATEST OpenJDK 64-Bit Server VM warning: INFO: os::commit_memory(0x00007fd4aa000000, 6325010432, 0) failed; error='Not enough space' (errno=12) # # There is insufficient memory for the Java Runtime Environment to continue. # Native memory allocation (mmap) failed to map 6325010432 bytes for committing reserved memory. # An error report file with more information is saved as: # //hs_err_pid7.log Build step 'Execute shell' marked build as failure Finished: FAILURE Thanks |
1.) Can you confirm the available RAM on your machine? 2.) Can you explain the use case for wanting to transfer the UNIX? |
Hi, dmpillion:
thanks again! |
To further clarify Xiao's comment, our dataset has longitudinal date/time variable stamps. For example, we need to load every complete blood count result from 1/1/2011 to 1/1/2023. Based on the NHANES tutorial, all date/time stamps must be first converted to UNIX since it would be otherwise treated as a string character. However, after they are converted to UNIX, we can't seem to be able to convert them back into date/time presentation in PIC-SURE. Thank you. |
Machine has 32GB Ram, but provisioned (HEAPSIZE=100000, 100000/1024) ~97 GB.. |
thank you, I will try it when our system admin comes back |
@anilk2hms Hi, we upgraded our system with 64 Gb ram, there is no any error message, please see the log file attached. thanks |
Hi,
I am a new user, I tried to follow the instruction in "Project Load HPDS Data From CSV" part, however, the variable definition is not clear to me:
"PATIENT_NUM","CONCEPT_PATH","NVAL_NUM","TVAL_CHAR","TIMESTAMP". Could you give me a real example file, especially for "CONCEPT_PATH"?
You mentioned that "This job requires datafile in csv format in location - /usr/local/docker-config/hpds_csv/allConcepts.csv", what if I want to upload my own csv files? After "Run Jenkins job - Start PIC-SURE" is finished, does it mean that I will see new samples posted in pic-sure website?
thanks a lot!
The text was updated successfully, but these errors were encountered: