Command line application (and clojure library) for converting CSV to RDF according to the specifications for CSV on the web.
We provide CI generated native builds for Linux (AMD64) and MacOS (AMD64) of the csv2rdf
command line app attached to releases.
csv2rdf can be run from the command line given the location of either a tabular data file or metadata file referencing the described tabular file. The location can be either a path on the local machine or URI for the document on the web.
To run from a tabular file:
java -jar csv2rdf-standalone.jar -t /path/to/tabular/file.csv
The resulting RDF is written to standard output in turtle format. The output can instead be written to file with the -o option:
java -jar csv2rdf-standalone.jar -t /path/to/tabular/file.csv -o output.ttl
The extension of the output file is used to determine the output format. The full list of supported formats is defined by rdf4j, some common formats are listed below:
Extension | Format |
---|---|
.ttl | turtle |
.nt | n-triples |
.xml | rdf-xml |
.trig | trig |
.nq | n-quads |
Note that for quad formats like trig and n-quads the graph will be nil.
The triples are generated according to CSVW standard mode by default. The mode to use can be specified by the -m parameter:
java -jar csv2rdf-standalone.jar -t /path/to/tabular/file.csv -m minimal
The supported values for the mode are standard
and minimal
and annotated
. annotated
mode is a non-standard mode which behaves like
minimal
mode with the addition that any notes or non-standard annotations defined for table groups and tables will be output if the
corresponding metadata element specifies an @id
.
The recommended way to start processing a tabular file is from a metadata document that describes the structure of a referenced tabular file. The tabular file does not need to be provided when processing from a metadata file since the metadata should contain a reference to the tabular file(s).
java -jar csv2rdf-standalone.jar -u /path/to/metadata/file.json -o output.ttl
Docker images are published to the public repository europe-west2-docker.pkg.dev/swirrl-devops-infrastructure-1/public/csv2rdf
.
These can be run by specifying the image version to run, and mapping volumes into the container to make local files available within
the container e.g.
docker run --rm -v .:/data europe-west2-docker.pkg.dev/swirrl-devops-infrastructure-1/public/csv2rdf:v0.7.1 -t /data/input.csv -o /data/output.ttl
Note that file paths should be specified relative to the container, not the local system.
csv2rdf also exposes its functionality as a library - please see the csv2rdf library for a description of the library and its interface.
- See overview of the code for an overview of the codebase.
- See Developing csv2rdf itself for a quickstart guide on how to work on the library and application itself.
In order to compile and deploy new native image builds for all our supported architectures, just create a release in the Github UI tagged to a commit.
Copyright © 2018 Swirrl IT Ltd.
Distributed under the Eclipse Public License either version 1.0 or (at your option) any later version.