-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for reading Multiple Response from sav (WIP, DO NOT MERGE) #259
base: dev
Are you sure you want to change the base?
Conversation
updated changelog
version 1.2.4
updating changelog
version 1.2.7
* Reading multiple response data from spss format from sav files
Hey @slobodan-ilic, tried to install your version with poetry and got these two errors: ./src/spss/readstat_sav_read.c:1878:40: error: call to undeclared library function 'toupper' with type 'int (int)'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
sv_name_upper[c] = toupper((unsigned char) mr.subvariables[j][c]);
./src/spss/readstat_sav_read.c:176:51: error: call to undeclared library function 'isdigit' with type 'int (int)'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
for (int i = 0; i < internal_count && isdigit(*next_part); i++) { The exact command: |
54fa7c5
to
f9b0bcc
Compare
f9b0bcc
to
18cd3aa
Compare
@joaquimadraz fixed this, and also some other issues... Will report here when I update the readstat as well... |
hi there, thanks a lot for this PR, highly appreciated! PS: I see that in the corresponding issue there is a sample file, so I guess we can use that one. |
Hi @ofajardo , thanks for your comment! This work is likely to go through at least 1 or 2 more changes, where I'd need to make it more in-line with the rest of the design of readstat library. I will add the test file here, but you can use the one from the issue in the meantime, to test. You can also do some comparison with |
* name is an array and will never be null
3cedfa0
to
8ff323c
Compare
Hi @ofajardo! 👋 We've done some work on this, fixed bugs and tested on bunch of real-life survey data. We also have our pr on readstat ready. Waiting for Evan to provide another round of 👀 on that. Would you be able to provide some more info about the process your releases take? Do you always need to be in sync with readstat? How are the files copied over to pyreadstat, is it manually or a submodule or something else? Any info is more than welcome, so we can plan our roadmap accordingly (whether we fork or wait basically). All the best! @joaquimadraz fyi ☝️ |
hi, thanks a lot for working on this! What I would like to do is the following:
Hope that makes it clearer. Let me know if you have further questions! |
@ofajardo Hi, thanks for the quick response. I'd definitely like to go forward with CI path first, as we might discover new issues more quickly this way. I guess the communication on the readstat is going to happen with higher latency than here. Thanks for the explanation about copying files, that's what I had supposed. Btw, do you know how to run the fuzzy tests locally? I had tried doing that (on readstat lib) but didn't have success, even after following instructions from Evan's repo. I see that the CI there relies on VS17, which seems old (and can't run it since I'm on mac). Any advice is really welcome! Thanks! |
Ok, through combination of perseverance and GPT, I was able to build and run fuzzers locally, even though the instructions on readstat don't seem to be applicable to (at least my own) setup. Found another bug and fixed it (since it was immediately clear what it was). Now it's in better shape then it was :) |
Short update: After fixing all CI issues in readstat, Evan did a thorough review. Other than a couple minor malloc changes, it seemed as if it's gonna be finished quick, until I saw the proposition to completely rewrite the parsing part in ragel. I've been on it ever since, should be done soon-ish. |
5fe9fd6
to
c21f15c
Compare
hi would you mind adding a few short lines in this file describing the new field in the metadata object? also subfields if applicable. Regarding trying the pyreadstat CI/CD ... is this something you want to do on your side, or you prefer me doing it? If me, I would need to merge this PR to a dedicated new branch to launch the CI/CD pointing there. We can wait a bit also until everything is ready on your side. |
This PR implements reading multiple response metadata from sav format. It's intended to be based on a change to the readstat library. The code is included here for convenience, but should eventually be rebased, once the readstat lib is updated.
I'll keep posting here regularly, as updates happen on the readstat side.
Here's also an example on how to use the new functionality:
which gives the following output:
you can test it with the following (very basic) file:
simple_alltypes.sav.zip
but you have to unzip it (GitHub won't allow direct upload of the *.sav extension).