![]() |
In this article, we’ll learn how to drop the columns in DataFrame if the entire column is null in Python using Pyspark. Creating a spark dataframe with Null Columns:To create a dataframe with pyspark.sql.SparkSession.createDataFrame() methods. Syntax
Python3
Output: +---------+----------+---------+------+------+ |firstname|middlename|lastname |gender|salary| +---------+----------+---------+------+------+ |James |null |Bond |M |6000 | |Michael |null |null |M |4000 | |Robert |null |Pattinson|M |4000 | |Natalie |null |Portman |F |4000 | |Julia |null |Roberts |F |1000 | +---------+----------+---------+------+------+ Remove all columns where the entire column is null in PySpark DataFrameHere we want to drop all the columns where the entire column is null, as we can see the middle name columns are null and we want to drop that. Python3
{'firstname': 0, 'middlename': 5, 'lastname': 1, 'gender': 0, 'salary': 0} ['middlename'] +---------+---------+------+------+ |firstname|lastname |gender|salary| +---------+---------+------+------+ |James |Bond |M |6000 | |Michael |null |M |4000 | |Robert |Pattinson|M |4000 | |Natalie |Portman |F |4000 | |Julia |Roberts |F |1000 | +---------+---------+------+------+ |
Reffered: https://www.geeksforgeeks.org
Python |
Type: | Geek |
Category: | Coding |
Sub Category: | Tutorial |
Uploaded by: | Admin |
Views: | 9 |