Installing SparkR on windows with RStudio

install-apache-sparkr-windows

In this post we will install SparkR on windows in standalone mode. There are various ways to install and configure SparkR on your machine. So lets start with one of the simplest one.

Install SparkR using RStudio:

to install SparkR, you have to follow below steps:

1. Install R from cran.r-project.org
2. Install rStudio from Official rStudio Site
3. Install Java on your machine (as Apache Spark will use it internally)

Once you install Rstudio, you will see a tab called “Spark” on right hand side (as shown in below screenshot) [if Spark tab is not appearing on your RStudio, then you might be using old version of RStudio, so will suggest to install latest version of rStudio].

initiate-spark-install

Then click on it and then Click on “New Connection” Button –> Select the Spark version and Hadoop Version from drop down, if not installed already, rstudio will ask you to install new copy of Apache spark, once you click “install”, it will start downloading Spark on your system.

confirm-spark-install

You can see the progress on the window, and on successful download it will download spark on RStudio’s temp folder, for example: in my case temp folder for rstudio is located at: C:\Users\dataxone\AppData\Local\rstudio\spark\Cache

spark-hadoop-installation-progress

On successful download, you will see a folder and a zip file under the temp directory of rstudio. For convenience
it is recommended to copy the spark folder ( which is spark-2.1.0-bin-hadoop2.7, in my case) so some D: drive or some other drive, and rename it as spark-2.1.0, which will be F:\spark-2.1.0 in my case, as I copied it on F: drive.

Now Let’s test our installation:

Run the following code into RStudio (make necessary changes like, our spark folder, you just pasted) :

On successfull run, you will see something similar to below screenshot, on your Rstudio Console:

[Screenshot]

Stack:
OS: Windows 10 ( 64Bit )
R Version: 3.2.2
Rstudio Version: 1.0.136
Spark Version: Spark 2.1.0 ( spark-2.1.0-bin-hadoop2.7 )

References:

SparkR Documentation

SparkR API

You may also like...