A fast and space-efficient Base36 encoding for large data.
Base36_914
encodes every 9
bytes as a block into 14
chars, with a max space efficiency of 99.5% of the theoretical limit.
If the remaining data is less than 9 bytes, the output will be padded to 14 chars. So this algorithm is suitable for longer inputs, or input length close to a multiple of 9.
void base36_encode_block(u8 plain[9], u8 code[14], u8 encode_table[36])
Read 9 bytes from plain
and write 14 bytes to code
.
The built-in BASE36_ENCODE_TABLE_DEFAULT
can be used for encode_table
.(0-9 a-z)
void base36_encode_last_block(u8 plain[], int len, u8 code[14], encode_table[36])
Read len
(0 < len < 9) bytes from plain
and write 14 bytes to code
.
void base36_encode(u8 plain[], int len, u8 code[], encode_table)
Read len
bytes from plain
and write ceil(len / 9) * 14
bytes to code
.
void base36_decode_block(u8 code[14], u8 plain[9], u8 decode_talbe[256])
Read 14 bytes from code
and write 9 bytes to plain
.
The built-in BASE36_DECODE_TABLE_DEFAULT
can be used for decode_talbe
.
int base36_decode_last_block(u8 code[14], u8 plain[], decode_talbe[256])
Read 14 bytes from code
and write len
bytes to plain
.
len
is the return value. (0 < len <= 9)
int base36_decode(u8 code[], int len, u8 plain[], decode_table)
Read len
(must be a multiple of 14) bytes from code
and write len / 14 * 9
bytes to plain
, return len / 14 * 9
.
This function calls base36_decode_last_block
only once.
int base36_decode_stream(u8 code[], int len, u8 plain[], decode_table)
If base36_encode
is called multiple times during encoding, the 14-char-padding(s) may be located in the middle of the codes. In this case, this function must be used for decoding.
This function always uses base36_decode_last_block
, and returns the length of the decoded data.
None of the above decoding functions check the input code, it needs to be checked by the caller.
Even if the input code is out of the code table, the decoding functions can still work, just with wrong result. If the decoded data has an integrity check, the input check can be ignored.
Online: https://etherdream.com/base36/
Test:
./base36 <<< "Hello World 12345"
Print gatk40t14s0idcgzbz9l6nvgmq31
.
Encode:
./base36 < file > file36.txt
Decode:
./base36 -d < file36.txt > file-re
SIMD support.