-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
got error when processing training data #24
Comments
when i tried to print t0,t1,t2 in the code some of the files are successfully processed while others turned out t0 t1 t2 respectively are (0, 'OS') |
Hi there, Since we used this script to process different formats of training data. So we may altered some of the scripts in process_data_newdataset.py during processing. So one solution way is to find out what is the data composed of by using pickle(python package) to load those files and check the exact details in those file. I hope that will work. Thanks |
Thank you for your reply!I checked the component of the data and found some of the data invalid.It ouputs "OS" instead of rna sequence,accounting for at least a half of the dataset.I wonder if such situation is normal or there is something wrong with my dataset. If there is something wrong with my dataset, where else can i get those data? |
I wonder if there is some format issue related to the system(like "OS""Mac" etc.), it seems you used MacOS to deal with those files. We process those file using Linux(Ubuntu). You may pay attention to that. |
Thank you for your reply! I think I have figured out what the problem is by double checking the data! In the TR0 folder I downloaded each piece of rna sequence contains two document named“._bpRNA_XXXXX”and“bpRNA_XXXX” respectively.I suppose it would be fixed by adding a selective condition. |
Hi Dear developer,
I got an error when procesing training data with TR0 data provided by MXfold2
$ python process_data_newdataset.py TR0
Traceback (most recent call last):
File "process_data_newdataset.py", line 69, in
pair_dict_all_list = [[int(item_tmp)-1,int(t2[1].split('\n')[index_tmp])-1] for index_tmp,item_tmp in enumerate(t1[1].split('\n')) if int(t2[1].split('\n')[index_tmp]) != 0]
File "process_data_newdataset.py", line 69, in
pair_dict_all_list = [[int(item_tmp)-1,int(t2[1].split('\n')[index_tmp])-1] for index_tmp,item_tmp in enumerate(t1[1].split('\n')) if int(t2[1].split('\n')[index_tmp]) != 0]
ValueError: invalid literal for int() with base 10: 'X'
Having no idea of what the data exactly look like , I feel confused with this problem. Could you please tell me how to fix it ? Thank you!
The text was updated successfully, but these errors were encountered: