Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Double quotes escaped when fetching CSV metadata #126

Open
pscohy opened this issue Aug 12, 2021 · 15 comments
Open

Double quotes escaped when fetching CSV metadata #126

pscohy opened this issue Aug 12, 2021 · 15 comments

Comments

@pscohy
Copy link

pscohy commented Aug 12, 2021

Hello,

We are using your nice library to fetch the data and metadata from Rave. When we fetch the metadata in CSV format, the double quote seems to be escaped with a back-tick, leading to some error on our side since we are checking that the projectname is the same in the response than in the request.

We receive something like:
projectname,viewname,ordinal,varname,vartype,varlength,varformat,varlabel
"{projectName}","{ViewName}","{ordinal}","{varname}","{vartype}","{varlength}","{varformat}","{varlabel}"
...

Did you already do something in the Library to solve this issue ? Or do you know what is the best way to solve this issue?
I saw in the Rave documentation that someone had the same issue, so maybe if it is a common issue you have done something for this.

Already many thanks for the time you'll spent to answer my questions.

Have a nice day

PS: it appears only with the metadata, when fetching the clinical data, the double quote is not escaped with a back-tick

@isparks
Copy link
Collaborator

isparks commented Aug 12, 2021

Hi @pscohy, I am not sure I am getting the same result as you when I test. I created a project called "ESCAPE/PROJECT" - the name includes the quotes and the /. When I ask for the Clinical View columns metadata with:

from rwslib import RWSConnection
from rwslib.rws_requests.biostats_gateway import ProjectMetaDataRequest
r = RWSConnection(host, username, password)
csv_meta = r.send_request(ProjectMetaDataRequest('"ESCAPE/PROJECT"'))

print(csv_meta)

I get this:

projectname,viewname,ordinal,varname,vartype,varlength,varformat,varlabel
"""ESCAPE/PROJECT""","V__ESCAPE_PROJECT__DOV","1","userid","num","8","10.","Internal id for the user"
"""ESCAPE/PROJECT""","V__ESCAPE_PROJECT__DOV","2","projectid","num","8","10.","projectid"
"""ESCAPE/PROJECT""","V__ESCAPE_PROJECT__DOV","3","project","char","255","$255.","project"
...

So I do not get back-ticks, but I do get expansion of quotes. Are you using different quotes like ' ?

@pscohy
Copy link
Author

pscohy commented Aug 13, 2021

Hi @isparks ,

Thanks for your quick answer 😊

It is something strange that we experience with only one study. The thing that is different from the other is that it is a Japanese study. Can it be related?

Also, we've found this in the documentation, do you know if there is something on going to investigate/fix the issue?
documentation where the issue was raised: https://learn.mdsol.com/api/rws/retrieve-clinical-view-form-datasets-as-csv-95587309.html
Screenshot 2021-08-13 at 10 36 01

@isparks
Copy link
Collaborator

isparks commented Aug 13, 2021

Hi @pscohy - it seems likely to be related but when using that endpoint which is for clinical data I get the same kind of escaping. I also tried to enter some clinical data that might need escaping in the output to see if the same concept is being applied. I got output like this:

.......,"","","""DOUBLE QUOTED""","",""
.......,"","","'SINGLE QUOTED'","",""
.......,"","","`Back quoted`","",""
.......,"","","With \ Backslash","",""
.......,"","","Comma and Quote ,""","",""
.......,"","","Quote and Comma "",","",""

So far, was not able to get a backtick to appear in the quoting.

I haven't worked at Medidata for more than 5 years at the time of writing so I can't say if there is any ongoing investigation. After 1 year (from May 28, 2020) it seems unlikely that any more investigation is happening on this? But you could ask again on learn.mdsol.com.

Can you share the name of the study/project? Is it named using only ASCII characters or does it include Japanese characters?

My thinking is that it must be something related to the study name - what special characters does the study name include?

@pscohy
Copy link
Author

pscohy commented Aug 13, 2021

@isparks

I will not be able to share with you the name of the study since it is a study of one of our client and we are not allowed to share this kind of information outside our company. But I can say that the name of the project itself is in ascii not japanese. What is in japanese are the values they entered on the platform, all the structure is defined in english

What I can tell is that there are indeed japanese character in their clinical data.

To be able to fetch their data, we had to change the default encoding to use UTF-8 and use the parameter unicode=true (when we fetch the audit records).

What seems strange is that the back-tick and double quote only appears in the metadata that does not contain any japanese character. We don't have the back-tick and double quote in the response of the clinical data that contains japanese character.

Also, it seems that the unicode=true parameter is not handled in this call , because I only receive the headers and not the content
{hostName}/RaveWebServices/datasets/ClinicalViewMetadata.csv?ProjectName={projectName} (meaning with this method: ProjectMetaDataRequest(project_name)). In the code it looks like this
def get_metadata_csv(self, trialname):
return self._send_request(
ProjectMetaDataRequest(trialname, dataset_format="csv")
)

@isparks
Copy link
Collaborator

isparks commented Aug 13, 2021

@pscohy I understand about not sharing the name and I understand it is all in ASCII. So are there any characters in the name which are not in the A-Z range? Any punctuation like /,*`"'~ etc?

When you make a call like this for a project that doesn't exist:

r = RWSConnection(host, username, password)
csv_meta = r.send_request(ProjectMetaDataRequest('NO SUCH PROJECT'))

you get results like this:

projectname,viewname,ordinal,varname,vartype,varlength,varformat,varlabel
EOF

This is just how that endpoint works for RWS. You can check the last request made like:

r = RWSConnection(host, username, password)
csv_meta = r.send_request(ProjectMetaDataRequest('NO SUCH PROJECT'))
print(r.last_result.url) # <-- What URL did you actually call?

And check it to make sure it is the correct URL

@pscohy
Copy link
Author

pscohy commented Aug 13, 2021

Thanks for the last explanation, it makes sense

For the name of the study, it looks like this X111111-XX111 (where X is an alpha char and 1 is a numeric char).

We have other customer with study name like 1111-XX-1111 and we don't have this issue, so I doubt it is related to the name of the study, but it could be

@isparks
Copy link
Collaborator

isparks commented Aug 13, 2021

Well, it continues to be very mysterious but to confirm, the extra ` only appears in the project name column like:

projectname,viewname,ordinal,varname,vartype,varlength,varformat,varlabel
"X111111-XX111`","V__X111111_XX111__DOV","1","userid","num","8","10.","Internal id for the user"

or does it appear in all the columns?

@pscohy
Copy link
Author

pscohy commented Aug 13, 2021

In all columns like

projectname,viewname,ordinal,varname,vartype,varlength,varformat,varlabel
`"X111111-XX111`",`"prod.V_X111111_XX111_AE`",`"1`",`"userid`",`"num`",`"8`",`"10.`",`"Internal id for the user`"

@isparks
Copy link
Collaborator

isparks commented Aug 13, 2021

This is strange and I am sorry I have not been able to help explain it or debug it. One question, if you enter the url manually into a browser and download the file that is produced, does it have the backticks? I do not believe that rwslib could be introducing these but I would like to be sure.

If rwslib isn't introducing them then I don't think I can help more, You will have to ask Medidata. Probably removing the backticks isn't too painful? If the first character of the second line of the result starts with a ` then you know that its backtick-quote delimited, otherwise its quote delimited?

@pscohy
Copy link
Author

pscohy commented Aug 16, 2021

@isparks , when doing the request in postman, I have the same issue so you are right, it is not introduce by RWSLIB.

Many thanks for your help and time on this, it is really appreciated.

Have a nice day

@glow-mdsol
Copy link
Member

I'm checking with one of the experts here, suspect it's something bubbling up from SQL Server.

@glow-mdsol
Copy link
Member

Is this something new or has it always been like this?

@pscohy
Copy link
Author

pscohy commented Aug 18, 2021

@glow-mdsol ,

I would both of them. We have this issue since we are trying to fetch the data through Rave webservices. But we are trying to fetch the data since 2 weeks. So it was always there for us, but maybe it was not present before?

@glow-mdsol
Copy link
Member

Ok, there's an internal investigation on this I think. Will let you know what is found.

@pscohy
Copy link
Author

pscohy commented Aug 18, 2021

@glow-mdsol ,

Many thanks. I'm waiting for your future investigation and feedback 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants