Skip to content

piotrszul/spark-intro

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Spark Examples for the Presentation

#Pre-requisites:

  • java 8
  • maven 3
  • python 2.7 (with numpy and pandas)
  • ipython 3+
  • R/3.2

#Installing Apache Spark:

e.g in your ~/.bash_profile :

export SPARK_HOME=/opt/spark-1.6.1
export PATH=$PATH:$SPARK_HOME/bin	

#Running iPython examples:

In ipython dir run:

IPYTHON_OPTS="notebook" pyspark

#Running interactive shell examples:

For pyspark in shell dir run:

pyspark	
>>>  execfile("WordCount.py")

For spark-shell in shell dir run:

spark-shell
>>> :load WordCount.scala

#Runing java8 application

In java8/wordcount dir run:

mvn clean install  #to build the assembly jar
./run-local.sh     #to run in local mode

#Running spark sql and spark ml example In ml or sql dir run:

IPYTHON_OPTS="notebook" pyspark --packages com.databricks:spark-csv_2.10:1.3.0

#Running R examples

SparkR examples are in R directory.

SPARK_HOME env variable needs to be set. If not set globally can be set for R and RStudion in ~/.Rprofile. e.g:

#File: ~/.Rprofile
Sys.setenv(SPARK_HOME="/opt/spark-1.6.1")

To initialise a local spark context use:

source 'spark.local.R'

The spark context is available in: sc variable and the sql context (data frame context) in: dc

To go through the housing.R example open it in RStudio and run the desired fragments.

#More info:

Contact: piotr.szul@csiro.au

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages