This app uses an algorithm based on analysis of a JSON iterable's schema (generated on the fly) to attempt to convert it into a table.
- As of v6.0, the CSV files generated by JsonTools use an algorithm similar to RFC 4180, in that all values containing the quote character,
\r
, or\n
must be wrapped in quotes, and quote characters are escaped by doubling them up.- Thus the JSON string
"foo,\n\"bar\""
would become"foo,\n""bar""""
(with an actual newline) in a CSV file with,
as delimiter and"
as quote character, because the newline, quote character, and delimiter in the string all necessitate enclosing quotes, and the internal quotes must be doubled up.
- Thus the JSON string
- Floating point numbers are always exported with
.
as the decimal separator, as in JSON. This is not currently configurable, but may become optional in the future. null
is represented as an empty string.- Dates are represented in
YYYY-MM-DD
format (e.g., January 7, 2015 is represented as2015-01-07
) - Datetimes are represented in
yyyy-MM-dd hh:mm:ss
format (e.g., 4:15:35PM April 29 2021 is represented as2021-04-29 16:15:35
)
At present, four strategies for making a table are supported.
- The Key Separator (keysep) is a character that joins together the path to a value. In
Full Recursive
andDefault
modes, the algorithm recursively searches for child keys and indices, and uses the path as a column name instead of a key.
For example, we might have
{"b": [{"a": [1, 2, 3]}]}
which can be tabularized, but requires recursive search from key b
to the first array to key a
in the sub-object, yielding the path
["b", "a"]
.
So with keysep = '.'
, we get the output
b.a
1
2
3
and with keysep = '_'
, we get
b_a
1
2
3
-
The Delimiter in output file (one of
'\t'
,','
, or'|'
) is the character used to separate the fields in the output file. By default this is','
, so you output a CSV.'\t'
would result in a TSV (tab-separated variables). -
The Line terminator in output file (one of
CRLF
,CR
, orLF
) setting (added in v5.8) is the line terminator. By default this is UnixLF
, which was the only line terminator available prior to this release. -
Convert booleans to integers takes all instances of
true
andfalse
and replaces them with1
and0
, respectively. Can be useful sometimes.
Drills down into nested JSON until it finds one of the following:
- an array of arrays of scalars
- an array of same-format objects with scalar values
- an object with at least one key that maps to arrays of scalars
The emitted table will then have all the scalar values along the path to the table, as well as the table itself.
- For example, something that's already tabular, like
{"a": 1, "b": [1, 2, 3], "c": ["a", "b", "c"]}
will be tabularized to
a,b,c
1,1,a
1,2,b
1,3,c
- But it can drill down to find the tabular part of this JSON:
[{"a": 1, "b": [1, 2, 3], "c": {"d": "y"}}, {"a": 2, "b": [4, 5, 6], "c": {"d": "z"}}]
will be tabularized to
a,b,c.d
1,1,y
1,2,y
1,3,y
2,4,z
2,5,z
2,6,z
If the JSON is an array of objects or an array of arrays, recursively search each sub-object or sub-array, adding new columns to the emitted table for each path to a terminal node.
If the JSON is not an array, throw an error.
This can make tables with a lot of columns, as shown in this example:
[{"a": 1, "b": [1, 2, 3], "c": {"d": "y"}}, {"a": 2, "b": [4, 5, 6], "c": {"d": "z"}}]
will be tabularized to
a,b.col1,b.col2,b.col3,c.d
1,1,2,3,y
2,4,5,6,z
If the JSON is one of the following:
- an array of arrays of scalars
- an array of same-format objects with scalar values
- an object with at least one key that maps to arrays of scalars
make a table containing all the rows. No recursive search will be done - the entire JSON must match this specification.
- For example,
{"a": 1, "b": [1, 2, 3], "c": ["a", "b", null]}
will be tabularized to
a,b,c
1,1,a
1,2,b
1,3,
- But
[{"a": 1, "b": [1, 2, 3]}, {"a": 2, "b": [4, 5, 6]}]
will NOT be tabularized because it is an array of objects that don't contain all scalars.
This approach is similar to NO_RECURSION, but instead of refusing to run if it encounters non-scalars in the table, it stringifies them.
WARNING: this option does not work well with ','
as the separator! You should use '\t'
as the separator instead.
This strategy might be useful if you want to partially deconstruct your JSON so that some fields are their own columns and more complex deeply nested fields are left as JSON, which an RDBMS like PostgreSQL can convert into a JSON column type.
For example, consider
[{"a": 1, "b": [1, 2, 3], "c": {"d": "y"}}, {"a": 2, "b": [4, 5, 6], "c": {"d": "z"}}]
This will be tabularized to
a,b,c
1,"[1, 2, 3]","{""d"": ""y""}"
2,"[4, 5, 6]","{""d"": ""z""}"
Meanwhile,
{"a": 1, "b": [1, 2, 3], "c": ["a", "b", "c"]}
will be tabularized to
a,b,c
1,1,a
1,2,b
1,3,c
which is the same behavior as the No recursion
and Default
strategies.