Fast Data Processing Using Tries and State Machines

CS 547 High Performance Computing @ Binghamton University

release date 2024-02-11 (piazza)

Standard Implementation

The standard implementation uses the C++17 STL and Regexes to compute results. While convenient, the C++ stdlib can be slow at times.

Optimized Implementation

The optmized implementation uses a state machine for parsing and a table-driven trie for storage and comparison of parse results.

Results

Note: 3 runs per test file and approximated the execution speed.

Lines	File Size	standard.cpp parser (s)	optimized.cpp (ms)
3	76 B	~1 ms	183 ms
1_000	29 KB	4 ms	201 ms
10_000	290 KB	41 ms	424 ms
1_000_000	29 MB	3_181 ms	1_152 ms
10_000_000	289.7 MB	32_130 ms	6_793 ms

Submission

DEADLINE:
- 2024-03-12 11:59PM (piazza)
- ~~2024-03-03 11:59PM~~ (piazza)
- ~~2024-02-29 11:59PM~~

Execution

Standard Implementation

To compile and execute the standard implementation:

make standard && ./build/standard test/data

Optimized Implementation

To compile and execute the optimized implementation:

make optimized && ./build/optimized test/data

Generate Test Data

A C++ program was provided by the professor to generate test data for our program.

assignment_2_gen.cc

The same file was downloaded and added to this repo and renamed to generate.cpp. To generate a test file:

g++ -std=c++17 -o generate generate.cpp && lines= && ./generate "${lines}" > "data_${lines}"

For testing, testing data was redirected into test/. For further description, within the test directory an extension was added to describe which implementation {.optimized, .standard} generated the results.

Source

https://cs.binghamton.edu/~kchiu/cs447/assign/2/

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
.github		.github
.vscode		.vscode
figures		figures
test		test
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
data		data
generate.cpp		generate.cpp
optimized.cpp		optimized.cpp
optimized.hpp		optimized.hpp
requirements.txt		requirements.txt
standard.cpp		standard.cpp
standard.hpp		standard.hpp

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Fast Data Processing Using Tries and State Machines

Standard Implementation

Optimized Implementation

Results

Submission

Execution

Standard Implementation

Optimized Implementation

Generate Test Data

Source

About

Releases

Packages

Contributors 2

Languages

gmaldona/optimized-parsing

Folders and files

Latest commit

History

Repository files navigation

Fast Data Processing Using Tries and State Machines

Standard Implementation

Optimized Implementation

Results

Submission

Execution

Standard Implementation

Optimized Implementation

Generate Test Data

Source

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages