How to Import SparkSession in Scala? - Coding

This article focuses on discussing how to import SparkSession in Scala.

Table of Content

What is Sparksession?
Prerequisites
Approach to Import SparkSession in Scala
Implementation
Create a DataFrame Using SparkSession
Conclusion

What is Sparksession?

When spark runs, spark Driver creates a SparkSession which is an entry point to start programming with RDD, DataFrames, and Dataset to connect with Spark Cluster. The Sparksession was introduced in the Spark 2.0 version. It provides an interface for working with structured data processing. Before SparkSession, SparkContext used to be the entry point to run Spark. We must know that SparkSession doesn’t completely replace SparkContext, because SparkSession creates SparkConfig, SparkContext. The APIs that we used earlier in SparkContext like SQLContext, and HiveContext were now used with SparkSession.

SparkSession includes the following APIs:

SparkContext
StreamingContext
SQLContext
HiveContext

Prerequisites

The basic prerequisite would be choosing the spark version which is spark 2.0 or higher. The new library org.apache.spark.sql.SparkSession was introduced with all the Contexts that we discussed above.

Approach to Import SparkSession in Scala

We can create SparkSession using spark-shell, Scala, Python. Spark-shell, provides the SparkSession by default and we can run SparkSession with variable spark. In Scala, the SparkSession is created with the following methods:

builder() pattern method: Configures the SparkSession.
master() method: Determines the master URL where the Spark Application is running.
appName() method: Mentions the name of Spark Application.
getOrCreat() method: Which takes the SparkSession already existing or it will create new SparkSession.

Implementation

To import spark session, we use default library ‘org.apache.spark.sql.SparkSession’ using import statement. Let’s create SparkSession in scala as below:

Scala

//Importing SparkSession
import org.apache.spark.sql.SparkSession

//creating SparkSession
val spark = SparkSession.builder()
            .master("Local[1]") // set master URL
            .appName("mySparkApplication") //name of the application
            .getOrCreate()

Output:

SparkSession in Scala

Create a DataFrame Using SparkSession

SparkSession has various methods like createDataFrame() used to create a DataFrame from list.

Scala

// Create DataFrame
val df = spark.createDataFrame(
List(("Prasad", 50), ("Santosh", 25), ("Sushma", 24)))
df.show()

Output:

Creating DataFrame in Scala

Conclusion

SparkSession is a unified entry point for working with structured data in Spark 2.0 and later versions. It combines functionality from SparkContext, SQLContext, and HiveContext. SparkSession is designed for working with DataFrames and Datasets, which offer more structured and optimized operations compared to RDDs. SparkSession supports SQL queries, structured streaming, and DataFrame-based machine learning APIs. In tools like spark-shell and Databricks, the default SparkSession object is available as the spark variable.

Reffered: https://www.geeksforgeeks.org

Scala

Related
How to clone a Case Class instance and change just one field in Scala?
How does Collect function Work in Scala?
How to Sort a list in Scala?
How to reverse a String in Scala?
How to create an empty dataframe in Scala?

Type:	Geek
Category:	Coding
Sub Category:	Tutorial
Uploaded by:	Admin
Views:	14