- High performance algorithms for data transformation, serialization and manipulation.
- Based on well established standards.
- ZERO allocation when transforming from one format to another, except for APIs creating golang values from encoded data.
- JSON for web.
- CBOR for machine.
- Binary-Collation for crazy fast comparison/sorting.
This package is under continuous development, but the APIs are fairly stable.
- Slides on gson
- What is what
- Configuration
- Transforms
- Understanding collation
- Getting started
- Play with command line
- Articles related to gson
- How to contribute
JSON
- Java Script Object Notation, also called JSON, RFC-7159
- Fast becoming the internet standard for data exchange.
- Human readable format, not so friendly for machine representation.
Value (aka gson)
- Golang object parsed from JSON, CBOR or collate representation.
- JSON arrays are represented in golang as
[]interface{}
. - JSON objects, aka properties, are presented in golang as
map[string]interface{}
. - Following golang-types can be transformed to JSON, CBOR, or,
Binary-collation -
nil
,bool
,byte, int8, int16, uint16, int32, uint32, int, uint, int64, uint64
,float32, float64
,string
,[]interface{}
,map[string]interface{}
,[][2]interface{}
. - For type
[][2]interface{}
, first item is treated as key (string) and second item is treated as value, hence equivalent tomap[string]interface{}
. - Gson objects support operations like, Get(), Set(), and Delete() on individual fields located by the json-pointer.
CBOR
- Concise Binary Object Representation, also called CBOR, RFC-7049link.
- Machine friendly, designed for IoT, inter-networking of light weight devices, and easy to implement in many languages.
- Can be used for more than data exchange, left to user imagination :) ...
Binary-Collation
- A custom encoding based on a paper and improvised to handle JSON specification.
- Binary representation preserving the sort order.
- Transform back to original JSON from binary representation.
- Numbers can be treated as floating-point, for better performance or either as floating-point or integer, for flexibility.
- More details can be found here
JSON-Pointer
- URL like field locator within a JSON object, RFC-6901.
- For navigating through JSON arrays and objects, but to any level of nesting.
- JSON-pointers shall be unquoted before they are used as path into JSON document.
- Documents encoded in CBOR format using LengthPrefix are not supported by lookup APIs.
Following Benchmark is made on a map data which has a shape similar to:
{"key1": nil, "key2": true, "key3": false,
"key4": "hello world", "key5": 10.23122312}
or,
{"a":null,"b":true,"c":false,"d\"":10,"e":"tru\"e", "f":[1,2]}
BenchmarkVal2JsonMap5-8 3000000 461 ns/op 0 B/op 0 allocs/op
BenchmarkVal2CborMap5-8 5000000 262 ns/op 0 B/op 0 allocs/op
BenchmarkVal2CollMap-8 1000000 1321 ns/op 128 B/op 2 allocs/op
BenchmarkJson2CborMap-8 2000000 838 ns/op 0 B/op 0 allocs/op
BenchmarkCbor2JsonMap-8 2000000 1010 ns/op 0 B/op 0 allocs/op
BenchmarkJson2CollMap-8 1000000 1825 ns/op 202 B/op 2 allocs/op
BenchmarkColl2JsonMap-8 1000000 2028 ns/op 434 B/op 6 allocs/op
BenchmarkCbor2CollMap-8 1000000 1692 ns/op 131 B/op 2 allocs/op
BenchmarkColl2CborMap-8 1000000 1769 ns/op 440 B/op 6 allocs/op
Though converting to golang value incurs cost.
BenchmarkJson2ValMap5 1000000 1621 ns/op 699 B/op 14 allocs/op
BenchmarkCbor2ValMap5 1000000 1711 ns/op 496 B/op 18 allocs/op
BenchmarkColl2ValMap 1000000 2235 ns/op 1440 B/op 33 allocs/op
Configuration APIs are not re-entrant. For concurrent use of Gson, please
create a gson.Config{}
per routine, or protect them with a mutex.
NumberKind
There are two ways to treat numbers in Gson, as integers (upto 64-bit width) or floating-point (float64).
-
FloatNumber configuration can be used to tell Gson to treat all numbers as floating point. This has the convenience of having precision between discrete values, but suffers round of errors and inability to represent integer values greater than 2^53. DEFAULT choice.
-
SmartNumber will use int64, uint64 for representing integer values and use float64 when decimal precision is required. Choosing this option, Gson might incur a slight performance penalty.
Can be configured per configuration instance via SetNumberKind()
.
MaxKeys
Maximum number of keys allowed in a property object. This can be configured
globally via gson.MaxKeys
or per configuration via SetMaxkeys()
.
Memory-pools
Gson uses memory pools:
- Pool of Byte blocks for listing json-pointers.
- Pool of Byte blocks for for encoding / decoding string types, and property keys.
- Pool of set of strings for sorting keys within property.
Memory foot print of gson depends on the pools size, maximum length of input string, number keys in input property-map.
Mempools can be configured globally via
gson.MaxStringLen, gson.MaxKeys, gson.MaxCollateLen, gson.MaxJsonpointerLen
package variables or per configuration instance via ResetPools()
.
CBOR ContainerEncoding
In CBOR both map and array (called container types) can be encoded as length followed by items, or items followed by end-marker.
- LengthPrefix to encode length of container type first, followed by each item in the container.
- Stream to encode each item in the container as it appears in the input stream, finally ending it with a End-Stream-Marker. DEFAULT choice.
Can be configured per configuration instance via SetContainerEncoding
JSON Strict
- If configured as true, encode / decode JSON strings operations will use Golang's encoding / JSON package.
Can be configured per configuration instance via SetStrict
.
JSON SpaceKind
How to interpret space characters ? There are two options:
- AnsiSpace will be faster but does not support unicode.
- UnicodeSpace will be slower but supports unicode. DEFAULT choice.
Can be configured per configuration instance via SetSpaceKind
.
Collate ArrayLenPrefix
While sorting array, which is a container type, should collation algorithm consider the arity of the array ? If ArrayLenPrefix prefix is configured as true, arrays with more number of items will sort after arrays with lesser number of items.
Can be configured per configuration instance via SortbyArrayLen
.
Collate PropertyLenPrefix
While sorting property-map, which is a container type, should collation algorithm consider number of entries in the map ? If PropertyLenPrefix is configured as true, maps with more number of items will sort after maps with lesser number of items.
Can be configured per configuration instance via SortbyPropertyLen
.
JSON-Pointer JsonpointerLength
Maximum length a JSON-pointer string can take. Can be configured globally
via MaxJsonpointerLen or per configuration instance via SetJptrlen
.
NOTE: JSON pointers are composed of path segments, there is an upper limit to the number of path-segments a JSON pointer can have. If your configuration exceeds that limit, try increasing the JsonpointerLength.
Value to CBOR
- Golang types
nil
,true
,false
are encodable into CBOR format. - All Golang
number
types, including signed, unsigned, and floating-point variants, are encodable into CBOR format. - Type
[]byte
is encoded as CBOR byte-string. - Type
string
is encoded as CBOR text. - Generic
array
is interpreted as Golang[]interface{}
and encoded as CBOR array.- With
LengthPrefix
option for ContainerEncoding, arrays and maps are encoded with its length. - With
Stream
option, arrays and maps are encoded using Indefinite and Breakstop encoding.
- With
- Generic
property
is interpreted as golang[][2]interface{}
and encoded as CBOR array of 2-element array, where the first item is key represented as string and second item is any valid JSON value. - Before encoding
map[string]interface{}
type, useGolangMap2cborMap()
function to transform them to[][2]interface{}
. - Following golang data types are encoded using CBOR-tags,
- Type
time.Time
encoded with tag-0. - Type
Epoch
type supplied by CBOR package, encoded with tag-1. - Type
EpochMicro
type supplied by CBOR package, encoded with tag-1. - Type
math/big.Int
positive numbers are encoded with tag-2, and negative numbers are encoded with tag-3. - Type
DecimalFraction
type supplied by CBOR package, encoded with tag-4. - Type
BigFloat
type supplied by CBOR package, encoded with tag-5. - Type
CborTagBytes
type supplied by CBOR package, encoded with tag-24. - Type
regexp.Regexp
encoded with tag-35. - Type
CborTagPrefix
type supplied by CBOR package, encoded with tag-55799.
- Type
- All other types shall cause a panic.
Value to collate
- Types
nil
,true
,false
,float64
,int64
,int
,string
,[]byte
,[]interface{}
,map[string]interface{}
are supported for collation. - All JSON numbers are collated as arbitrary sized floating point numbers.
- Array-length (if configured) and property-length (if configured) are collated as integer.
JSON to Value
- Gson uses custom parser that must be faster than encoding/JSON.
- Numbers can be interpreted as integer, or float64,
FloatNumber
to interpret JSON number as 64-bit floating point.SmartNumber
to interpret JSON number as int64, or uint64, or float64.
- Whitespace can be interpreted, based on configuration type
SpaceKind
. SpaceKind can be one of the followingAnsiSpace
orUnicodeSpace
.AnsiSpace
that should be fasterUnicodeSpace
supports unicode white-spaces as well.
JSON to collate
- All number are collated as float.
- If config.nk is FloatNumber, all numbers are interpreted as float64 and collated as float64.
- If config.nk is SmartNumber, all JSON numbers are collated as arbitrary sized floating point numbers.
- Array-length (if configured) and property-length (if configured) are collated as integer.
JSON to CBOR
- JSON Types
null
,true
,false
are encodable into CBOR format. - Types
number
are encoded based on configuration typeNumberKind
, which can be one of the following.- If config.nk is FloatNumber, all numbers are encoded as CBOR-float64.
- If config.nk is SmartNumber, all JSON float64 numbers are encoded as CBOR-float64, and, all JSON positive integers are encoded as CBOR-uint64, and, all JSON negative integers are encoded as CBOR-int64.
- Type
string
will be parsed and translated into UTF-8, and subsequently encoded as CBOR-text. - Type
arrays
can be encoded inStream
mode, using CBOR's indefinite-length scheme, or inLengthPrefix
mode. - Type
properties
can be encoded either using CBOR's indefinite-length scheme (Stream
), or using CBOR'sLengthPrefix
. - Property-keys are always interpreted as string and encoded as UTF-8 CBOR-text.
CBOR to Value
- Reverse of all
value to CBOR
encoding, described above, are supported. - Cannot decode
float16
type and int64 > 9223372036854775807. - Indefinite byte-string chunks, text chunks shall be decoded outside
this package using
IsIndefinite*()
andIsBreakstop()
APIs.
CBOR to JSON
- CBOR types
nil
,true
,false
are transformed back to equivalent JSON types. - Types
float32
andfloat64
are transformed back to 32 bit JSON-float and 64 bit JSON-float respectively, in non-exponent format. - Type
integer
is transformed back to JSON-integer representation, and integers exceeding 9223372036854775807 are not supported. - Type
array
either with length prefix or with indefinite encoding are converted back to JSON array. - Type
map
either with length prefix or with indefinite encoding are converted back to JSON property. - Type bytes-strings are not supported or transformed to JSON.
- Type CBOR-text with indefinite encoding are not supported.
- Type Simple type float16 are not supported.
For transforming to and from binary-collation refer here
CBOR to Collate
- CBOR Types
null
,true
,false
,float32
,float64
,integer
,string
,[]byte
(aka binary),array
,object
can be collated. - All number are collated as float.
- If config.nk is FloatNumber, all numbers are interpreted as float64 and collated as float64.
- If config.nk is SmartNumber, all JSON numbers are collated as arbitrary sized floating point numbers.
- Array-length (if configured) and property-length (if configured) are collated as integer.
- Indefinite-length encoding for text and binary are not supported.
- LengthPrefix and Stream encoding for array and maps are supported.
Collate to CBOR
Missing
,null
,true
,false
,floating-point
,small-decimal
,integer
,string
,[]byte
(aka binary),array
,object
types from its collated from can be converted back to CBOR.- Since all numbers are collated as float, it is converted back to text representation of float, in format: [+-]x.e[+-].
- If config.nk is FloatNumber, all number are encoded as CBOR-float64.
- If config.nk is SmartNumber, all numbers whose exponent is >= 15 is encoded as uint64 (if number is positive), or int64 (if number is negative). Others are encoded as CBOR-float64.
Collate to JSON
- Since all numbers are collated as float, it is converted back to text representation of float, in format: [+-]x.e[+-].
- If config.nk is FloatNumber, all number are encoded as JSON-float64.
- If config.nk is SmartNumber, all numers whose exponent is >= 15 is encoded as uint64 (if number is positive), or int64 (if number is negative). Others are encoded as JSON-float64.
Collate to Value
- Since all numbers are collated as float, it is converted back to text representation of float, in format: [+-]x.e[+-].
- If config.nk is FloatNumber, all number are encoded as JSON-float64.
- If config.nk is SmartNumber, all numers whose exponent is >= 15 is encoded as uint64 (if number is positive), or int64 (if number is negative). Others are treated as float64.
- Pick an issue, or create an new issue. Provide adequate documentation for the issue.
- Assign the issue or get it assigned.
- Work on the code, once finished, raise a pull request.
- Gson is written in golang, hence expected to follow the global guidelines for writing go programs.
- If the changeset is more than few lines, please generate a report card.
- As of now, branch
master
is the development branch.
Task list
- Binary collation: transparently handle int64, uint64 and float64.
- Support for json.Number.
- UTF-8 collation of strings.
- JSON-pointer.
- JSON pointer for looking up within CBOR map.
- JSON pointer for looking up within value-map.
- Don't change the tag number.
- All supplied APIs will panic in case of error, applications can recover from panic, dump a stack trace along with input passed on to the API, and subsequently handle all such panics as a single valued error.
- For now, maximum integer range shall be within int64.
Config
instances, and its APIs, are neither re-entrant nor thread safe.
list of changes from github.com/prataprc/collatejson
- Codec type is renamed to Config.
- Caller should make sure that the o/p buffer passed to encoding and decoding APIs are adequately sized.
- Name and signature of NewCodec() (now, NewDefaultConfig) has changed.
- Configuration APIs,SortbyArrayLen, SortbyPropertyLen, UseMissing, NumberType all now return the config object back the caller - helps in call-chaining.
- All APIs panic instead of returning an error.
- Output buffer should have its len() == cap(), so that encoder and decoder can avoid append and instead use buffer index.