NOTE Please do not install Spark with • Homebrew on OSX • Cygwin on Windows
150
Installation: Spark • Find your OS here: http://spark.apache.org/downloads.html •
Select “Pre-built for Hadoop 2.4” or earlier under “Choose a package type”
•
Download the tar package for spark-1.4.1-bin-hadoop1.tgz (If you are not sure pick the latest version.)
Make sure you are downloading the binary version, not the source version.
151
Installation: Configuration • Unzip the file and place it at your home directory (/Users/jonathandinu/) • Set PATH: Include the following lines in your ~/.bash_profile (or ~/.bashrc):
152
Installation: Configuration • Unzip the file and place it at your home directory (/Users/jonathandinu/) • Set PATH: Include the following lines in your ~/.bash_profile (or ~/.bashrc): export SPARK_HOME=/full/path/to/your/unzipped/spark/folder export PYTHONPATH=$SPARK_HOME/python/:$PYTHONPATH
Installation: IRKernel (Jupyter kernel for R) • Make sure R is installed: https://cran.r-project.org/bin/ • Install kernel via R (get into an R shell):
Installation: IRKernel (Jupyter kernel for R) And in the notebook: # Set this to where Spark is installed Sys.setenv(SPARK_HOME=“/Users/jonathandinu/spark")
# This line loads SparkR from the installed directory .libPaths(c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"), .libPaths()))
https://github.com/apache/spark/tree/master/R 163
Installation: IRKernel (Jupyter kernel for R)
And in the notebook: library(SparkR)
https://github.com/apache/spark/tree/master/R 164
Installation: IRKernel (Jupyter kernel for R)
https://github.com/apache/spark/tree/master/R 165
Note: If for any reason you cannot get Spark installed on your OS following these instructions, Cloudera and Hortonworks provide Linux VMs with Spark installed. • http://www.cloudera.com/content/cloudera/en/downloads/quickstart_vms/cdh-5-4-x.html • http://hortonworks.com/products/hortonworks-sandbox/#install
166
Review • Command-line Spark shell: ./bin/pyspark • Spark module: import pyspark as ps • Jupyter Notebook interface: ipython notebook • Also R support in the notebook (or RStudio)! 167