-
-
Notifications
You must be signed in to change notification settings - Fork 5
Home
Robb Shecter edited this page Mar 11, 2022
·
24 revisions
The key idea is to split parsing into two stages. They're analogous to the lexer + parser pair in a compiler. Dividing the parsing into two pieces allows each to be simpler.
The first stage (this repo) crawls and converts original sources to JSON. The actual schema of the JSON mirrors the original content as much as possible. And so, each type of original source will have very different looking JSON. But, being JSON (instead of PDF, HTML, etc.) they're all easily read by the next stage. The second stage can focus on converting the source schema to a particular app's needs.
Current project: International Law in support of Ukraine