In this article, we will learn how to parse nested JSON using Scala Spark.
To parse nested JSON using Scala Spark, you can follow these steps:- Define the schema for your JSON data.
- Read the JSON data into a Datc aFrame.
- Select and manipulate the DataFrame columns to work with the nested structure.
Scala Spark Program to parse nested JSON:
Scala
//Scala Spark Program to parse nested JSON :
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.functions
// Step 1: Define the schema
val schema = """
{
"type": "struct",
"fields": [
{"name": "id", "type": "integer", "nullable": false},
{"name": "name", "type": "string", "nullable": true},
{"name": "details",
"type": {
"type": "struct",
"fields": [
{"name": "age", "type": "integer", "nullable": true},
{"name": "city", "type": "string", "nullable": true}
]
},
"nullable": true
}
]
}
"""
// Step 2: Create SparkSession
val spark = SparkSession.builder()
.appName("Nested JSON Parsing")
.master("local[*]")
.getOrCreate()
// Step 3: Read JSON data into DataFrame
val df = spark.read.schema(schema).json("path_to_your_json_file")
// Step 4: Select and manipulate DataFrame columns
val parsedDF = df.select(
col("id"),
col("name"),
col("details.age").as("age"),
col("details.city").as("city")
)
// Step 5: Show the result
parsedDF.show()
Output:
+---+------+---+-----+ | id| name|age| city| +---+------+---+-----+ | 1| Alice| 30|Paris| | 2| Bob| 25|New York| | 3| Carol| 35|London|
|