This repo is for work on Vipul Naik's Donations List Website.
The issue that started this repo is: vipulnaik/donations#2
Data is at http://www.publicwelfare.org/grants-process/our-grants/
Download data into CSV:
# NOTE: make sure to edit the range in the scrape file
# (https://github.com/riceissa/public-welfare-foundation/blob/fe1219f75b97f54434123e0fea61d2647ce27659/scrape.py#L15)
# to match the actual page numbers on the website.
./scrape.py > data.csv
Use CSV file to generate SQL file:
# NOTE: don't run proc.py, which is the processing script for the old version
# of the website.
./proc2.py data.csv > out.sql
The table on PWF's website is loaded using JavaScript and the URL doesn't change when paginating so it's a little annoying to get the data.
The table script itself is pretty simple and you can find it at http://www.publicwelfare.org/wp-content/themes/publicwelfare/js/page-grants.js.
What I ended up doing was sending the following request in Chrome's console,
with the page
parameter varying from 1–12 (i.e. send the request twelve
times; there could be more pages by the time you read this):
jQuery.post("http://www.publicwelfare.org/wp-admin/admin-ajax.php", {
action: 'grants_load',
categories: "",
years: "",
page: 1,
nextblock: 0,
prevblock: 0
}, function(response){
if(response) {
if(response === '0') {
/*
* Wrong response
*/
jQuery("#grants-content").html("Ups :(<br>Something went wrong");
}
else {
console.log(response);
}
}
});
This produces a command/result dump in console. Right click in the console and
do "Save as …". You then have to remove all the command lines so you get just
the results. Store that in data.html
and you're done.
There might be a better way to do this. In particular clicking on each page and saving the DOM once that page has loaded, and repeating that, shouldn't be too bad.
CC0 for the scripts, not sure about data.