-
Notifications
You must be signed in to change notification settings - Fork 9
Geomarker Assessment with DeGAUSS
The geomarker assessment images will only work with the output of the geocoding docker image (or a CSV file with columns named lat
and lon
). In a similar fashion as before, navigate to the directory where the geocoded CSV file is located. If you are running geomarker assessment right after geocoding and using the same shell, the files will be in the same location, so no further navigation is necessary.
Run:
docker run --rm -v "$PWD":/tmp degauss/<name-of-image> <name-of-geocoded-file>
Continuing with our usage example, if we wanted to calculate the median household income of each of the subjects' census tracts, we would use:
docker run --rm -v "$PWD":/tmp degauss/acs_income my_address_file_geocoded.csv
Docker will emit some messages as it progresses through the calculations and will again write the file to the working directory with a descriptive name appended (in this case the median household income from the 2015 American Community Survey based on the census tract of each location).
Again, our output file will be written into the same directory as our input file. In our example above, this will be called my_address_file_geocoded_medianhouseholdincome.csv
:
"tract","address","id","street","zip","city","state","score","prenum","number","precision","county","lon","lat","B19013_001E"
"39061027000","2800 Winslow Avenue Cincinnati OH 45206",3,"Winslow Ave",45206,"Cincinnati","OH",0.941,NA,2800,"range","39061",-84.49631,39.130586,"12282"
"39061027000","3333 Burnet Ave Cincinnati OH 45229",1,"Burnet Ave",45229,"Cincinnati","OH",0.949,NA,3333,"range","39061",-84.500402,39.14089,"12282"
"39061027000","660 Lincoln Avenue Cincinnati OH 45229",2,"Lincoln Ave",45206,NA,NA,0.805,NA,660,"range","39061",-84.494724,39.13282,"12282"
Please note that the geomarker assesment programs will not return rows that contain missing coordinate values. Missing coordinate values are possible if the geocoding container failed to assign them, for example, when using a malformed address string. A warning will be issued and the rows with missing coordinates will be removed before proceeding. A user should verify that the address strings have been recorded correctly; however, geocoding sometimes fails even with a correctly supplied address due to inconsistencies and inaccuracies in the street range files provided by the census.
Now that we have our desired geomarkers, we can remove the addresses and coordinates from our output file, leaving only the geomarker information that will be associated with health outcomes in a downstream analysis:
"id","B19013_001E"
3,"12282"
1,"12282"
2,"12282"
B19013_001E
is the variable name used by the census to denote median household income in a census tract -- here, they are all the same becuase our example addresses are all in the same census tract. Since this file no longer contains any PHI, it is no longer subject to HIPAA and can be shared with others or used with third party online services. (Note: Here, we are applying the "Safe Harbor" method defined by HIPAA for deidentification, but re-identification is certainly possible when enough geomarkers and non-identifying information are combined together. Do not take the use of DeGAUSS as a guarantee of deidentification and please consult your institution for more information relating to their specific policies.)