How to run Scala in Visual Studio Code with Jupyter Notebook and Spark

4 minute read

Hello everyone, today we are going to setup Visual Studio Code to run Scala codes from a Jupyter Notebook.

Scala is a strong statically typed high-level general-purpose programming language that supports both object-oriented programming and functional programming. Scala is considered the best language to use for Apache Spark due to its concise syntax, strong type system, and functional programming features, which allow for efficient and scalable distributed computing. Python is also a popular language for Spark due to its ease of use and extensive libraries. For that reason install a Jupyter Notebook with the capability to run Scala and Python is the purpose of this blog post.

Apache Spark is a data processing framework that can quickly perform processing tasks on very large data sets, and can also distribute data processing tasks across multiple computers, either on its own or in tandem with other distributed computing tools.

Step 1 - Installation of Spark and Scala

Let us assume you have already installed property Scala and Spark on your computer, you can follow this tutorial.

Moreover we assume you have installed Python from here and Visual Studio from here you can follow this tutorial.

Step 2 - Creation of Project

Create your project folder:

For example, We open a terminal

cd \

and we create folder called myproject

mkdir myproject

we enter to the folder

cd  myproject

then we open VS code

code .

then we open a new terminal there and we choose the Command Prompt

Step 3- Installation of the Environment

In the Command Prompt we create a virtual environment by typing

python -m venv .myenv

that would create a virtual environment named ‘.myenv’ and then we activate the environment

.myenv\Scripts\activate.bat

then we install the notebook

pip install notebook

and the Scala Kernel

pip install spylon-kernel

then we create a kernel spec

python -m spylon_kernel install 

Step 4. Update jupyter.kernels.trusted

Open the VS User Settings UI from the Command Palette (Ctrl+Shift+P) and find Preferences: Open User Settings

Search for the setting jupyter.kernels.trusted

Copy the fully qualified path to the kernelspec,

C:\ProgramData\jupyter\kernels\spylon-kernel\kernel.json

Add Item into the list using the Add button

and we have Re-load VS Code, by closing and opening. You can return back to you directory C:\myproject> and type

code .

Step 5 Create a Jupyter Notebook for Scala

We create a Jupyter Notebook by running the Create: New Jupyter Notebook command from the Command Palette (Ctrl+Shift+P)

Step 6 Select a kernel

After a new notebook was created click select Another kernel

then pick Jupyter Kernel

Next, select a kernel spylon_kernel using the kernel picker in the top right.

After selecting a kernel, the language picker located in the bottom right of each code cell will automatically update to the language supported by the kernel. you can see your enviroment installed here. We have choosen spylon_kernel that uses Scala language

Step 7 - Hello World in Scala

Let us test our Scala enviroment in Jupyter Notebook Let us copy the following cell in the notebook.

println("Hello World from Jupyter Notebook in VS Code")

and you will get

additionally you can see your Spark Job

Checking Spark Version

Like any other tools or language, you can use –version option with spark-submit, spark-shell, and spark-sql to find the version.

!spark-submit --version
!spark-shell --version
!spark-sql --version

or simply type

sc.version

or

spark.version

you will have something like this ##

Reading files from Scala Notebook

One of the first importat things during the creation of any software are the Environment variables. Environment variables allow us to define values for given keys on the OS, which applications can read.

val SPARK_HOME=sys.env.get("SPARK_HOME").mkString 
val FILE = "\\examples\\src\\main\\resources\\people.json"
val PATH=SPARK_HOME+FILE
val df = spark.read.json(PATH)
df.show()

Other languages

Since this makes use of metakernel you can evaluate normal python code using the %%python magic. In addition once the spark context has been created the spark variable will be added to your python environment.

Reading files in Python from Scala Notebook

The scala spark metakernel provides a scala kernel by default. On the first execution of scala code, a spark session will be constructed so that a user can interact with the interpreter.

%%python
import os
SPARK_HOME=os.environ['SPARK_HOME']
FILE = "examples/src/main/resources/people.json"
PATH=os.path.join(SPARK_HOME, FILE)
df = spark.read.json(PATH)
df.show()

Step 8 Create a Jupyter Notebook for Python with Scala Support

If we want to work in Python with Scala support we can get the Scala code inside the Python Code as a magic.

We create a new Jupyter Notebook by running the Create: New Jupyter Notebook command from the Command Palette (Ctrl+Shift+P)

After a new notebook was created click select Another kernel

then pick Jupyter Kernel

Next, select a kernel myproject using the kernel picker in the top right.

After selecting a kernel, you can see your enviroment. We have choosen spylon_kernel that uses Python language

Hello World in Python with Scala support from Jupyter Notebook

Spylon-kernel can be used as a magic in an existing ipykernel. This is the recommended solution when you want to write relatively small blocks of scala.

If you just want to send a string of scala code to the interpreter and evaluate it you can do that too.

from spylon_kernel import get_scala_interpreter

interp = get_scala_interpreter()

# Evaluate the result of a scala code block.
out=interp.interpret("""
val string = "Hello World in Python with Scala support"
string
""")
print(out)

You can download the notebook from Github here.

Congratulations! You have installed Scala and Jupyter Notebook in Visual Studio Code.

Posted:

Leave a comment