![]() |
Null values are undefined or empty data present in a dataframe. These null values may be added due to some errors in data transfer or technical glitches. We should identify null values and make necessary changes in the Dataframe to address null values. In this article, we will learn about the usage of the IsNot method in Pyspark to remove the NULL values from our dataframe. What is PySpark?PySpark is a Python API for Apache Spark which is an open-source distributed computing system designed for large-scale data processing. It enables data scientists to utilize Spark’s capabilities using Python, allowing for seamless data manipulation, analysis, and machine learning at scale. The isNotNull MethodThe isNotNull() method is provided by Spark SQL and operates on the Column class to check whether the column contains any null values. The return type of this method is boolean i.e. the method returns True if it does not find any null values or returns False if null value is found in the particular column. It is used with filter method of DataFrame class which takes condition as an argument to filter out particular rows. Using isNotNull Method in PysparkTo use the Example: Here we will create adataframe with with some null values using Python in Pyspark. We have used None which is an inbuilt datatype in Python to represent null values. The DataFrame is created using list of Row objects which takes column names and their respective values as arguments. To visualize the output in the form of table, we have used show() method of DataFrame object.
Output: ![]() DataFrame created Example: In this example, we will filter out null values in the Age column of the DataFrame by using filter method and passing isNotNull() method which will check whether the column contains null value or not. It will display only those rows which does not contain a Null value.
Output: ![]() Filtered DataFrame based on Age Example: In this example, we will filter out the null value rows of the marks column and display the rest of the rows.
Output: ![]() Filtered DataFrame based on Marks Example: We can also select multiple columns, by useing AND operator to check null values on two or more columns.
Output: ![]() Filtered DataFrame based on Age and Marks ConclusionIn this article we have seen how to filter out null values from one column or multiple columns using isNotNull() method provided by PySpark Library. We have provided suitable examples which can be easily integrated to your personal use cases. FAQsQ. What is the isNull() method of Column object?
Q. What is the difference between show() and collect()?
Q. Is it mandatory to use Row object to create DataFrame?
|
Reffered: https://www.geeksforgeeks.org
Python |
Type: | Geek |
Category: | Coding |
Sub Category: | Tutorial |
Uploaded by: | Admin |
Views: | 19 |