Skip to content

pitr/jsontokenizer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Go Reference

JSON Tokenizer

Zero-allocation JSON tokenizer.

Features

  • Fast. ~15x faster than encoding/json.Decoder. See benchmarks below.
  • Similar API to encoding/json.Decoder.
  • No reflection.
  • No allocations, beyond small buffer for reading.
  • Can be reused with a call to Reset.

Anti-Features

  • Does NOT parse JSON. Will not verify semantic correctness. [} will produce 2 tokens without errors.
  • Needs an io.Writer to write numbers and strings into. Based on the use case, can be os.Stdout, bytes.Buffer, ByteBuffer, etc.
  • Does not escape strings. "he is 5'11\\"." will be exactly that.
  • Does not parse numbers into floats/ints. Use strconv.Atoi() if needed.
  • Not thread safe. Use with sync.Mutex or the like to prevent simultaneous calls.

Quick Start

import (
	"io"

	json "pitr.ca/jsontokenizer"
)

func example(in io.Reader) error {
	tk := json.New(in)

	for {
		tok, err := tk.Token()
		if err == io.EOF {
			return nil
		}
		if err != nil {
			return err
		}
		switch tok {
		case json.TokNull:
			println("got null")
		case json.TokTrue, json.TokFalse:
			println("got bool")
		case json.TokArrayOpen, json.TokArrayClose, json.TokObjectOpen, json.TokObjectClose, json.TokObjectColon, json.TokComma:
			println("got delimiter")
		case json.TokNumber:
			println("got number")
			_, err := tk.ReadNumber(io.Discard)
			if err != nil {
				return err
			}
		case json.TokString:
			println("got string")
			_, err := tk.ReadString(io.Discard)
			if err != nil {
				return err
			}
		}
	}
}

Benchmarks

Sizes are buffer sizes, which can be specified with NewWithSize. Default is 64. Tokenizer is re-used between benchmark iterations, but this doesn't impact performance.

BenchmarkBuiltinDecoder is encoding/json.Decoder.

BenchmarkTokenizer/size=8-8         	    1419	    788208 ns/op	       0 B/op	       0 allocs/op
BenchmarkTokenizer/size=16-8         	    1668	    688656 ns/op	       0 B/op	       0 allocs/op
BenchmarkTokenizer/size=32-8         	    1792	    628601 ns/op	       0 B/op	       0 allocs/op
BenchmarkTokenizer/size=64-8         	    2040	    571411 ns/op	       0 B/op	       0 allocs/op
BenchmarkTokenizer/size=128-8        	    2228	    520646 ns/op	       0 B/op	       0 allocs/op
BenchmarkTokenizer/size=256-8        	    2392	    482151 ns/op	       0 B/op	       0 allocs/op
BenchmarkTokenizer/size=512-8        	    2516	    460283 ns/op	       0 B/op	       0 allocs/op
BenchmarkTokenizer/size=1024-8       	    2553	    458148 ns/op	       0 B/op	       0 allocs/op
BenchmarkTokenizer/size=2048-8       	    2618	    451937 ns/op	       0 B/op	       0 allocs/op
BenchmarkTokenizer/size=4096-8       	    2499	    451601 ns/op	       0 B/op	       0 allocs/op
BenchmarkTokenizer/size=8192-8       	    2610	    443493 ns/op	       0 B/op	       0 allocs/op

BenchmarkBuiltinDecoder-8            	     157	   7607729 ns/op	 1755495 B/op	  107836 allocs/op