Installing Spark

Report 13 Downloads 134 Views
Installing Spark

Installation: Requirements • Spark binary (version 1.4.1) • Java JDK 6/7 • Scientific Python (and Jupyter notebook) • py4j • (Optional) IRKernel (for Jupyter)

148

Installation: Requirements • Spark binary (version 1.4.1) • Java JDK 6/7 • Scientific Python (and Jupyter notebook) • py4j • (Optional) IRKernel (for Jupyter)

149

Installation: Requirements

NOTE Please do not install Spark with • Homebrew on OSX • Cygwin on Windows

150

Installation: Spark • Find your OS here: http://spark.apache.org/downloads.html •

Select “Pre-built for Hadoop 2.4” or earlier under “Choose a package type”



Download the tar package for spark-1.4.1-bin-hadoop1.tgz (If you are not sure pick the latest version.)

Make sure you are downloading the binary version, not the source version.

151

Installation: Configuration • Unzip the file and place it at your home directory (/Users/jonathandinu/) • Set PATH: Include the following lines in your ~/.bash_profile (or ~/.bashrc):

152

Installation: Configuration • Unzip the file and place it at your home directory (/Users/jonathandinu/) • Set PATH: Include the following lines in your ~/.bash_profile (or ~/.bashrc): export SPARK_HOME=/full/path/to/your/unzipped/spark/folder export PYTHONPATH=$SPARK_HOME/python/:$PYTHONPATH

153

Installation: Java JDK •

http://www.oracle.com/technetwork/java/javase/downloads/jdk8downloads-2133151.html



Find download for your OS



Follow install instructions/wizard



Make sure you get JDK instead of JRE

154

Installation: Requirements • Spark binary • Java JDK 6/7 • Scientific Python (and Jupyter notebook) • py4j • (Optional) IRKernel (for Jupyter)

155

Installation: Scientific Python •

http://continuum.io/downloads

156

Installation: Scientific Python •

http://continuum.io/downloads



Find download for your OS (make sure it is Python 2.7)



Follow install instructions/wizard

157

Installation: Scientific Python •

http://continuum.io/downloads



Find download for your OS (make sure it is Python 2.7)



Follow install instructions/wizard

To make sure it installed correctly: ipython notebook

158

And finally: pip install py4j

159

Installation: Test It All Out jonathan$ ipython

160

Installation: Requirements • Spark binary • Java JDK 6/7 • Scientific Python (and Jupyter notebook) • py4j • (Optional) IRKernel (for Jupyter)

161

Installation: IRKernel (Jupyter kernel for R) • Make sure R is installed: https://cran.r-project.org/bin/ • Install kernel via R (get into an R shell):

install.packages(c('rzmq','repr','IRkernel','IRdisplay'), repos = c('http://irkernel.github.io/', getOption('repos')))

IRkernel::installspec()

162

Installation: IRKernel (Jupyter kernel for R) And in the notebook: # Set this to where Spark is installed Sys.setenv(SPARK_HOME=“/Users/jonathandinu/spark")

# This line loads SparkR from the installed directory .libPaths(c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"), .libPaths()))

https://github.com/apache/spark/tree/master/R 163

Installation: IRKernel (Jupyter kernel for R)

And in the notebook: library(SparkR)

https://github.com/apache/spark/tree/master/R 164

Installation: IRKernel (Jupyter kernel for R)

https://github.com/apache/spark/tree/master/R 165

Note: If for any reason you cannot get Spark installed on your OS following these instructions, Cloudera and Hortonworks provide Linux VMs with Spark installed. • http://www.cloudera.com/content/cloudera/en/downloads/quickstart_vms/cdh-5-4-x.html • http://hortonworks.com/products/hortonworks-sandbox/#install

166

Review • Command-line Spark shell: ./bin/pyspark • Spark module: import pyspark as ps • Jupyter Notebook interface: ipython notebook • Also R support in the notebook (or RStudio)! 167