Skip to content

Pulls lap time PDFs from americanmotocrossresults.com writes to hosted MongoDB on mLab

Notifications You must be signed in to change notification settings

EGWeeks/AMAPDFtoJSONParser

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

52 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AMA PDF to JSON parser

Built for the purpose of building an API for pro motocross data. I was not able to find any other data format other than PDFs from american motocross results.

Short Description

The Node.js app is downloading all the lap time PDFs then parsing them to JSON. Some reading and writing in the middle. Then the last step is sending the JSON up to the hosted database connected to the Pro Motocross API. This is one of the unique processes of the parser app. Instead of using a driver to connect to my hosted database I am use Node's core module Child Process to execute some shell files. For no particular purpose other than to test out child process, which is awesome.

###The beauty of promises

./app.js

The benefits of chaining Promises, it reads like instructions on how to make toast. Plus when we get a rejection we will know exactly what Promise it came from. No need to preach to the chore. Checkout app.js to see the full creation.

// Version 2.0.0

getPDFs()
	.then(pdfs => writePDFs(pdfs))
	.then(pathsPDFs => getToJSON(pathsPDFs))
	.then(allRaceJSON => sendToParser(allRaceJSON))
	.then(parsedJSON => unlinkFile(parsedJSON))
	.then(allParsedJSON => writeJSONData(allParsedJSON))
	.then(pathToJSON => execFiles.dropCollection(pathToJSON[0]))
	.then(pathToJSON => execFiles.toDB(pathToJSON))
	.then(success => console.log(success+' Good To Go!'))
	.catch(err => console.error(err));

Nothing wrong with async but a bit harder to read and more funny to debug.

// Version 1.0.0

function fetchLapTimesPDF() {
	Promise.all(
		urlArr().map((pdfLink, index) => {
			return download(pdfLink).then(pdf => {
					let pdfFilePath = 'laptimes/moto'+index+'.pdf';
					let jsonFilePath = 'lap-times-json/moto'+index+'.json';
					fs.writeFile(pdfFilePath, pdf, err => {
						if(err) console.error('PDF write file threw '+ err);
						pdfTojson(pdfFilePath, jsonFilePath);
					});				
				});
		}))
	.then(res => console.log('Success! ' + res))
	.catch(err => console.error('Promise all URLs ended in Error: '+ err));
}

function pdfTojson(pdfFilePath, jsonFilePath) {
  let pdfParser = new PDFParser();
  pdfParser.on('pdfParser_dataError', errData => console.error('PDFParser error : '+errData.parserError) );
  pdfParser.on('pdfParser_dataReady', jsonData => {
    fs.writeFile(jsonFilePath, jsonParser(jsonData), err => {
    	if(err) console.error('pdfParser write file error: '+ err);
    });
  });
  pdfParser.loadPDF(pdfFilePath);
}

About

Pulls lap time PDFs from americanmotocrossresults.com writes to hosted MongoDB on mLab

Resources

Stars

Watchers

Forks

Packages

No packages published