mlacosta · gagoar · Nov 3, 2021 · Nov 3, 2021 · Nov 5, 2021 · Nov 8, 2021
diff --git a/.github/workflows/release.yml b/.github/workflows/release.yml
@@ -0,0 +1,20 @@
+on:
+  push:
+    # Sequence of patterns matched against refs/tags
+    tags:
+      - 'v*' # Push events to matching v*, i.e. v1.0, v20.15.10
+
+jobs:
+  npm:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v2
+      - uses: actions/setup-node@v2
+        with:
+          node-version: '12.x'
+          registry-url: 'https://registry.npmjs.org'
+      - run: npm install
+      - run: npm run build
+      - run: npm publish
+        env:
+          NODE_AUTH_TOKEN: ${{ secrets.NPM_TOKEN }}
diff --git a/.github/workflows/validation.yml b/.github/workflows/validation.yml
@@ -0,0 +1,50 @@
+name: Validation
+
+on: [pull_request]
+
+jobs:
+  lint:
+    name: Linting
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@master
+      - name: Use Node.js 12.x
+        uses: actions/setup-node@v2
+        with:
+          node-version: 12.x
+      - name: Install dependencies
+        run: npm install
+      - name: ESLint
+        run: npm run lint
+  test:
+    name: Run unit tests
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@master
+      - name: Use Node.js 12.x
+        uses: actions/setup-node@v2
+        with:
+          node-version: 12.x
+      - name: Install dependencies
+        run: npm install
+      - name: Jest
+        run: npm run test --coverage
+      - name: Send coverage to codecov
+        uses: codecov/codecov-action@v1
+        with:
+          token: ${{ secrets.CODECOV_TOKEN }}
+          flags: unittests
+
+  build:
+    name: Run build
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@master
+      - name: Use Node.js 12.x
+        uses: actions/setup-node@v2
+        with:
+          node-version: 12.x
+      - name: Install dependencies
+        run: npm install
+      - name: Build
+        run: npm run build
diff --git a/.gitignore b/.gitignore
@@ -1 +1,8 @@
-dataset.js
+node_modules
+.vscode/
+# ignore codecoverage output
+coverage/
+# ignore cli binary output
+cli/
+# ignore dist/ output
+dist/
diff --git a/.nvmrc b/.nvmrc
@@ -0,0 +1 @@
+14.17.5
diff --git a/LICENSE b/LICENSE
@@ -0,0 +1,21 @@
+MIT License
+
+Copyright (c) 2020 Gago
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
diff --git a/README.md b/README.md
@@ -1,33 +1,217 @@
-# Huffman-URL-Compressor-for-Nodejs
-## Author: Mariano L. Acosta
+<p align="center">
+  <h3 align="center">Huffman URL Compressor</h3>
 
-# Description
-Convert any kind of String into a url-friendly parameter using Huffman Encoding.
+  <p align="center">
+     ⚙ Convert any kind of String into a url-friendly parameter using Huffman Encoding ⚙
+    <br />
+    <a href="https://github.com/mlacosta/huffman-url-compressor#table-of-contents"><strong>Explore the docs »</strong></a>
+    <br />
+    <a href="https://github.com/mlacosta/huffman-url-compressor/issues">Report Bug</a>
+    ·
+    <a href="https://github.com/mlacosta/huffman-url-compressor/issues">Request Feature</a>
+  </p>
+</p>
 
-# Installation
+## Table of Contents
 
-        npm install --save huffman-url-compressor
+- [Built With](#built-with)
+- [Getting Started](#getting-started)
+- [Motivation](#motivation)
+- [Huffman Compression](#huffman-compression)
+- [Examples](#examples)
+- [Contributing](#contributing)
+- [License](#license)
 
-# Usage
+<!-- CONTRIBUTING -->
 
-## Parameters:
+### Built With
 
-- **Train (string)**: Training set used to create the encoder. This is where the algorithm gets the frequency for each char. 
-- **Test (string)**: String that you want to encode.
+- [ncc](https://github.com/vercel/ncc/)
+- [jest](https://github.com/facebook/jest)
+- [ora](https://github.com/sindresorhus/ora)
+- [commander](https://github.com/tj/commander.js/)
+- [cosmiconfig](https://github.com/davidtheclark/cosmiconfig)
 
-## Example:
+## Getting Started
 
-        import {createEncoder, encodeConfig, decodeConfig} from 'huffman-url-compressor';
+To install this dependency on your project:
 
-        //create encoder
+`npm i huffman-url-compressor`
 
-        let Encoder = createEncoder(train);
+## Motivation
 
-        //create a base64 encoded stream
+This library was originally intended to be used as an URL-friendly encoder/decoder. The idea was to process a chunk of text, compress the data and then embed it in an URL as a query parameter. Later on, you can retrieve the original piece of text using the same encoder.
 
-        let encodedParam = encodeConfig(test,Encoder)
-
-        //retrieve the original param 
+A typical application for this library is permalink creation and sharing. For instance, if you want to put a long text on an URL but you are constrained in length, this encoder will output a shorter base64-string that you can use instead. After that, you can reduce the length further using an URL-shortening service.
 
-        let decodParam= decodeConfig(encodedParam,Encoder)
+## Huffman Compression
 
+Huffman compression is a data encoding technique that uses a greedy approach for lossless compression based on how often a character or symbol occurs. Theoretically, It can achieve a compression rate between 20 and 90 percent.
+
+First, suppose we have a set of 6 letters and the number of occurrences (frequency) for each one:
+
+| letter | frequency |
+| ------ | --------- |
+| a      | 45        |
+| b      | 13        |
+| c      | 12        |
+| d      | 16        |
+| e      | 9         |
+| f      | 5         |
+
+Since we have 6 symbols, a naive approach would be to use a 3-bit encoding for each one of them:
+
+| letter | bitstring |
+| ------ | --------- |
+| a      | 000       |
+| b      | 001       |
+| c      | 010       |
+| d      | 011       |
+| e      | 100       |
+| f      | 101       |
+
+For instance, if we want to encode the string 'bacab' using the table from above:
+
+```
+'bacab' transforms into '001000010000001'
+
+```
+
+Then, we can encode it further using a [base64](https://en.wikipedia.org/wiki/Base64) approach:
+
+| binary (6-bits) | base64 (char) |
+| --------------- | ------------- |
+| 010000          | Q             |
+| 001000          | I             |
+
+```
+'001000010000001' transforms into 'IQI='
+```
+
+In this case, the symbol `=` is used for zero-padding by convention.
+
+Note this way of encoding could be served as a binary tree where each leaf represents a letter and its frequency. More generally, each node's parent contains the summation of its children's frequency and the combination of their symbols. Starting from the root, one could simply make a symbol search and output a `0` or `1` based on if you moved to the left or right respectively.
+
+![](https://i.imgur.com/QM2laV5.jpg)
+
+However, a better approach would be to create [prefix-efficient](https://en.wikipedia.org/wiki/Prefix_code) codes based on each letter's frequency. In that way, we could generate a variable-length encoding that depends on the number of occurrences (the more frequent a letter the shorter its representation). This results in a reduced bitstring on average. This particular way of operating is known as Huffman Compression.
+
+First, we need to generate a node for each one of the letters. We can use a data structure like this:
+
+```
+{
+  "symbol": 'a'
+  "frequency": 45
+}
+```
+
+Next, we combine all the nodes in a tree-like structure using a greedy algorithm that chooses between the least two frequent symbols and merges them. In our example, we start with:
+
+```
+{
+  "symbol": 'f'
+  "frequency": 5
+}
+
+{
+  "symbol": 'e'
+  "frequency": 9
+}
+```
+
+and we create the node:
+
+```
+{
+  "symbol": 'fe'
+  "frequency": 14
+}
+```
+
+Finally, we remove the nodes 'e' and 'f' from our pool and we replace them with the node 'fe'. By induction, after all the nodes are merged, we would obtain a Huffman tree that serves as our encoder:
+
+![](https://i.imgur.com/roKnNFS.jpg)
+
+The optimal way to implement this is using a [min-heap](<https://en.wikipedia.org/wiki/Heap_(data_structure)>) data structure.
+
+Back to our original example but now using the Huffman tree:
+
+| letter | bitstring |
+| ------ | --------- |
+| a      | 0         |
+| b      | 101       |
+| c      | 100       |
+| d      | 111       |
+| e      | 1100      |
+| f      | 1100      |
+
+```
+'bacab' transforms into '10101000101'
+```
+
+We save 26,67% of space from our original case.
+
+`compression ratio (CR in %) = 11/15 x 100% = 73,34%`
+
+`saved space = (1 - CR) x 100% = 26,67 %`
+
+Later we can introduce base64 encoding:
+
+```
+'10101000101' transforms into  qKA=
+```
+
+**Note:** there's a trade-off between the Huffman compression rate and the expansion generated by the base64 encoding that should be taken into consideration given the case.
+
+For more theoretical background check: [Cormen, T. H., Leiserson, C. E., Rivest, R. L., & Stein, C. (2009). Introduction to algorithms. MIT press.](https://books.google.com.ar/books?hl=en&lr=&id=aefUBQAAQBAJ&oi=fnd&pg=PR5&dq=introductions+to+algorithms+cormen&ots=dO5uNAXSaZ&sig=IMmhA7_JXSWjGppyqv6UiAMfufI&redir_esc=y#v=onepage&q=introductions%20to%20algorithms%20cormen&f=false)
+
+## Examples
+
+### Fluent bit configuration files
+
+We start gathering several Fluent bit configuration files that serve as a training [dataset](https://gist.github.com/mlacosta/b85c4a1788f0210a353b2fcead281403), which means, we obtain the frequency for each symbol present on all the configurations.
+
+Then, say you want to compress and embed the following configuration in an URL:
+
+```
+[INPUT]
+    Name        tail
+    Tag         tail.01
+    Path        /var/log/system.log
+
+[FILTER]
+    Name record_modifier
+    Match *
+    Record hostname ${HOSTNAME}
+
+[OUTPUT]
+    Name file
+    Match *
+    Path output.txt
+```
+
+The trained encoder will generate a base64 compressed version that you can use as an URL parameter.
+
+```
+vz9KnJmEW_yuoj6uIG3_Vxn1cQLYo8t_3nVW-LaueLCkwWxYysUNhSZa1-aPq5kd0It_ldRHvidL0AUJQIOSH2_6HVnW-oW_3xOl6HrUsqV1EbSl7QMOmpDdSjltrX4yTNTkzCLf5XUR45Axb_odWdb6hb_vOqt9K9by9W6idbW
+```
+
+In this case, we obtained a sequence that is 15% shorter (Note this is below the theoretical threshold of 20% due to the expansion generated by the base64 encoding)
+
+To retrieve the original configuration just use the encoded string.
+
+## Contributing
+
+Contributions are what make the open-source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated **greatly appreciated**.
+
+1. Fork the Project
+2. Create your Feature Branch (`git checkout -b feature/AmazingFeature`)
+3. Commit your Changes (`git commit -m 'Add some AmazingFeature'`)
+4. Push to the Branch (`git push origin feature/AmazingFeature`)
+5. Open a Pull Request
+
+<!-- LICENSE -->
+
+## License
+
+Distributed under the MIT License. See `LICENSE` for more information.