Skip to content

TeraSort in shared memory, Hadoop and Spark settings.

Notifications You must be signed in to change notification settings

llgeek/TeraSort-banchmark

Repository files navigation

Tera Sort benchmarking


How to run the code:

  1. SM_Terasort.java: compile:
javac SM_Terasort.java

run:

java -Xms8192m -Xmx10240m SM_Terasort

note: here the size for xms and xmx means the size of initial heap size and maximum heap size for jvm. This should be determined according to the configure file. If you configure larger chunk size, then this number should be increased to make sure one chunk can be sucessfully loaded into memory.

  1. Hadoop_Terasort.java: compile:
/home/ubuntu/hadoop/bin/hadoop com.sun.tools.javac.Main Hadoop_Terasort.java
jar cf Hadoop_Terasort.jar *.class

run:

/home/ubuntu/hadoop/bin/hadoop jar Hadoop_Terasort.jar Hadoop_Terasort /sortdata/sorttxt /sortdata/output
  1. Spark_Terasort.scala: compile: directly run Spark Shell, no need to compile

run:

/home/ubuntu/spark/bin/spark-shell 

About

TeraSort in shared memory, Hadoop and Spark settings.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published