Skip to content

Latest commit

 

History

History
executable file
·
82 lines (45 loc) · 1.76 KB

README.md

File metadata and controls

executable file
·
82 lines (45 loc) · 1.76 KB

MARCdata

Writes selected parts of a MARC XML bibliographic data file into a .csv.gz file.

Preliminaries

C++11

Requires standard library support for the following C++11 extensions:

  • range-based for loop
  • auto type specifier
  • nullptr identifier

Usage

Input files (XML) should be placed in folder 'data'. Output goes there as well.

For instance for Fennica, see ~/data/fennica/raw/fennica_*.xml.gz

Note that the XML files should be uncompressed before parsing!

ESTC

FOR THE CURRENT CSV CONVERSION PIPELINE, SEE https://github.com/COMHIS/estc-raw-csv-prepicker.

For the old versions, use:

make estc
./estc

Fennica

make fennica
./fennica

Kungliga

bash split-kungliga.sh
make kungliga
./kungliga

Göttingen

make cerl
./cerl

Author

Niko Ilomäki

Contributions by Leo Lahti

License

MIT License

RapidXML library by Marcin Kalicinski and licensed under the MIT License

Gzstream library by Deepak Bandyopadhyay and Lutz Kettner and licensed under LGPL 2.1

Log

Jul 12 Odd behavior in language field (008) parsing was observed with Kungliga. It turned out that the last digits 38/39 are sometimes missing in Kungliga 008 field, so the parser was changed to start reading from the beginning of the line instead of the end of the line (as in other catalogs). This yields recognizable language codes for 99.88% of the entries now.