Monday, October 17, 2016

Spark Setup and installation on window

Spark Setup and installation on window

1. Download r
2. Download Scala
3. Jdl 1.7 +
4. Download python

5. After saving the file I have downloaded the hadoop binary file winutils.exe, even though Spark runs independently of Hadoop, there is a bug which searches for winutils.exe which is needed for hadoop, and throws up an error.

I have download the file from the below mentioned link

I have created a folder named winutils in c:\ and created bin directory and placed the winutils.exe file in it. The file location is as follows.


6. Setup envionrment variable
Pressing WIN + R button which open’s up the run and enter sysdm.cpl
I then clicked on advanced tab and then on environment variables. Clicked new for the user variables and added the following

variable name as HADOOP_HOME  and it's value as C:\winutils
variable name as SPARK_HOME and it's value as C:\spark

7. Install Jupyter

8. run jupyter notebook


Continue with reading


Troubleshooting stesps

Wrong => trainErr = labelsAndPreds.filter(lambda (v, p): v != p).count() / float(parsedData.count())
right Pytyon 3.0=> testErr = labelsAndPredictions.filter(lambda v: v[0] != v[1]).count() / float(testData.count())

model = DecisionTree.trainClassifier(trainingData, numClasses=2, categoricalFeaturesInfo={},impurity='gini', maxDepth=5, maxBins=32)

[1] "The data can be downloaded from: "

  java.lang.ClassNotFoundException: Failed to find data source: com.databricks.spark.csv. Please find packages at