GitHub - kuujis/stream-join

#Data processing excercise

Java 9 and Gradle 4.6 was used to build the project.

Execute "gradle run" to run with default argument values, if missing - output files will be created:

"Views.csv"
"Clicks.csv"
"ViewableViews.csv"
"ViewsWithClicks.csv",
"FilteredViews.csv"
"statistics.csv"
"1200000"
"3"

"gradle jar" will create a fat jar file with included needed dependencies in build/lib folder.

Launching the jar file with "java -jar stream-join-1.0.0-SNAPSHOT.jar" will handle files in the same directory with default names or passed in as parameters

Launching the jar file "java -jar stream-join-1.0.0-SNAPSHOT.jar "Views.csv" "Clicks.csv" "ViewableViews.csv"" will expect that provided input csv files exist and are reachable.

The other three values are treated as 3 output file names (defaults being "ViewsWithClicks.csv", FilteredViews.csv" and "statistics.csv").

Finally - last two parameters are for handling timeWindow and bufferSize constants, defaults are 1200000 (20 minutes) and 3. Chosen after some experimentation to minimize execution time. Max differences found were about 2.4 minutes, so 300000 could be considered safe(?) lower boundary.

timeWindow is an long value in milliseconds, which defines how far from a given view event Clicks and Viewable View events should be searched. It is also used when refreshing ADAmateurishCache to determine its size.
bufferSize value is an integer multiplier, which defines how many timeWindows in advance should be included in ADAmateurishCache.

So, to specify all possible values for a jar file, one could use the below:
java -jar stream-join-1.0.0-SNAPSHOT.jar "Views.csv" "Clicks.csv" "ViewableViews.csv" "ViewsWithClicks.csv", FilteredViews.csv" "statistics.csv" "1200000" "3"

".idea" folder is used by IntelliJ, had to update some settings manually to avoid coverage calculation output folder usage issues, can be safely ignored if building with gradle or running jar files.

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
.idea		.idea
src		src
.gitignore		.gitignore
Clicks.csv		Clicks.csv
ViewableViews.csv		ViewableViews.csv
Views.csv		Views.csv
build.gradle		build.gradle
readme.md		readme.md
settings.gradle		settings.gradle

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

kuujis/stream-join

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages