Horje
How to Import SparkSession in Scala?

This article focuses on discussing how to import SparkSession in Scala.

What is Sparksession?

When spark runs, spark Driver creates a SparkSession which is an entry point to start programming with RDD, DataFrames, and Dataset to connect with Spark Cluster. The Sparksession was introduced in the Spark 2.0 version. It provides an interface for working with structured data processing. Before SparkSession, SparkContext used to be the entry point to run Spark. We must know that SparkSession doesn’t completely replace SparkContext, because SparkSession creates SparkConfig, SparkContext. The APIs that we used earlier in SparkContext like SQLContext, and HiveContext were now used with SparkSession.

SparkSession includes the following APIs:

  1. SparkContext
  2. StreamingContext
  3. SQLContext
  4. HiveContext

Prerequisites

The basic prerequisite would be choosing the spark version which is spark 2.0 or higher. The new library org.apache.spark.sql.SparkSession was introduced with all the Contexts that we discussed above.

Approach to Import SparkSession in Scala

We can create SparkSession using spark-shell, Scala, Python. Spark-shell, provides the SparkSession by default and we can run SparkSession with variable spark. In Scala, the SparkSession is created with the following methods:

  1. builder() pattern method: Configures the SparkSession.
  2. master() method: Determines the master URL where the Spark Application is running.
  3. appName() method: Mentions the name of Spark Application.
  4. getOrCreat() method: Which takes the SparkSession already existing or it will create new SparkSession.

Implementation

To import spark session, we use default library ‘org.apache.spark.sql.SparkSession’ using import statement. Let’s create SparkSession in scala as below:

Scala
//Importing SparkSession
import org.apache.spark.sql.SparkSession

//creating SparkSession
val spark = SparkSession.builder()
            .master("Local[1]") // set master URL
            .appName("mySparkApplication") //name of the application
            .getOrCreate()

Output:

Scala_SparkSession

SparkSession in Scala

Create a DataFrame Using SparkSession

SparkSession has various methods like createDataFrame() used to create a DataFrame from list.

Scala
// Create DataFrame
val df = spark.createDataFrame(
List(("Prasad", 50), ("Santosh", 25), ("Sushma", 24)))
df.show()

Output:

DataFrameCreation

Creating DataFrame in Scala

Conclusion

SparkSession is a unified entry point for working with structured data in Spark 2.0 and later versions. It combines functionality from SparkContext, SQLContext, and HiveContext. SparkSession is designed for working with DataFrames and Datasets, which offer more structured and optimized operations compared to RDDs. SparkSession supports SQL queries, structured streaming, and DataFrame-based machine learning APIs. In tools like spark-shell and Databricks, the default SparkSession object is available as the spark variable.




Reffered: https://www.geeksforgeeks.org


Scala

Related
How to clone a Case Class instance and change just one field in Scala? How to clone a Case Class instance and change just one field in Scala?
How does Collect function Work in Scala? How does Collect function Work in Scala?
How to Sort a list in Scala? How to Sort a list in Scala?
How to reverse a String in Scala? How to reverse a String in Scala?
How to create an empty dataframe in Scala? How to create an empty dataframe in Scala?

Type:
Geek
Category:
Coding
Sub Category:
Tutorial
Uploaded by:
Admin
Views:
14