Converting columns to floats in Pandas DataFrame is a very crucial step for data analysis. Converting columns to float values can help you perform various arithmetic operations and plot graphs.
In this article, we’ll look at different ways to convert a column to a float in DataFrame.
- Using DataFrame.astype()
- Using pandas.to_numeric()
- Handling Non-numeric Values and Missing Values
- Converting Multiple Columns
- Convert the entire data frame
Convert Pandas Dataframe Column To Float DataTypeImporting Pandas
Python
Creating a sample dataframe
Python
# Sample DataFrame
data = {'IntegerColumn': [10, 20, 30],
'StringColumn': ['15.3', '25.6', '35.8']}
df = pd.DataFrame(data)
Method 1: Using DataFrame.astype()DataFrame.astype() method is used to cast a Pandas object to a specified dtype. astype() function is used to convert a particular column data type to another data type.
Here, we created a sample data frame with two columns containing integers and strings and then we converted the string column to a float column using the astype() function.
Python
# Print dtypes before conversion
print("Data types before conversion:")
print(df.dtypes)
# Convert 'StringColumn' to float using astype()
df['StringColumn'] = df['StringColumn'].astype(float)
# Print dtypes after conversion
print("\nData types after conversion:")
print(df.dtypes)
# Print the DataFrame
print("\nDataFrame after conversion:")
print(df)
Output:
Data types before conversion: IntegerColumn int64 StringColumn object dtype: object Data types after conversion: IntegerColumn int64 StringColumn float64 dtype: object DataFrame after conversion: IntegerColumn StringColumn 0 10 15.3 1 20 25.6 2 30 35.8
As you can observe the datatype of the string column is changed to float after using astype() function.
Method 2: Using pandas.to_numeric()pandas.to_numeric() is a function in Pandas which is used to convert argument to a numeric type. Here, we are converting the string column to float using to_numeric() function.
We are printing the data type of the columns before and after the conversion of column to understand the conversion.
Python
# Print dtypes before conversion
print("Data types before conversion:")
print(df.dtypes)
# Convert 'StringColumn' to float using to_numeric()
df['StringColumn'] = pd.to_numeric(df['StringColumn'])
# Print dtypes after conversion
print("\nData types after conversion:")
print(df.dtypes)
# Print the DataFrame
print("\nDataFrame after conversion:")
print(df)
Output:
Data types before conversion: IntegerColumn int64 StringColumn object dtype: object Data types after conversion: IntegerColumn int64 StringColumn float64 dtype: object DataFrame after conversion: IntegerColumn StringColumn 0 10 15.3 1 20 25.6
As you observe the output the data type of the string column is changed from object to float after using to_numeric() function.
Handling Non-numeric Values or Missing Values while coverting the DataType to FloatWe can handle Non-convertible values, Missing values, and NaN values by using errors=’coerce’ parameter in pandas.to_numeric() function, errors=’coerce’ parameter instructs Pandas to replace non-convertible values with NaN (Not a Number).
Here, we created a dataframe with missing values and alphabets (can’t convert) and applying pandas.to_numeric() function with errors=’coerce’ parameter. This output the dataframe with the NaN values where the values are not convertible.
Python
# Sample DataFrame with non-numeric and missing values
data = {'Column1': ['10.5', '20.7', '30.2', 'xyz'],
'Column2': ['15.3', '25.6', '35.8', '']}
df = pd.DataFrame(data)
print('Original DataFrame')
print(df)
# Convert columns to float, handling errors and missing values
df['Column1'] = pd.to_numeric(df['Column1'], errors='coerce')
df['Column2'] = pd.to_numeric(df['Column2'], errors='coerce')
# Print dtypes after conversion
print("\nData types after conversion:")
print(df.dtypes)
# Print the DataFrame
print("\nDataFrame after conversion:")
print(df)
Output:
Original DataFrame Column1 Column2 0 10.5 15.3 1 20.7 25.6 2 30.2 35.8 3 xyz
Data types after conversion: Column1 float64 Column2 float64 dtype: object
DataFrame after conversion: Column1 Column2 0 10.5 15.3 1 20.7 25.6 2 30.2 35.8 3 NaN NaN
Converting Multiple ColumnsWe can convert multiple columns to float in the dataframe by passing multiple columns while conversion. Here is a simple syntax,
df[[‘C1’, ‘C2’]] = df[[‘C1’, ‘C2’]].astype(float)
C1, C2 are the columns of the dataframe to be converted.
Now, we created a dataframe with two columns as strings and converted those two columns to float using the above syntax.
Python
# Sample DataFrame
data = {'C1': ['10.5', '20.7', '30.2'],
'C2': ['15.3', '25.6', '35.8']}
df = pd.DataFrame(data)
# Print dtypes before conversion
print("Data types before conversion:")
print(df.dtypes)
# Convert multiple columns to float using astype()
df[['C1', 'C2']] = df[['C1', 'C2']].astype(float)
# Print dtypes after conversion
print("\nData types after conversion:")
print(df.dtypes)
# Print the DataFrame
print("\nDataFrame after conversion:")
print(df)
Output:
Data types before conversion: C1 object C2 object dtype: object Data types after conversion: C1 float64 C2 float64 dtype: object DataFrame after conversion: C1 C2 0 10.5 15.3 1 20.7 25.6 2 30.2 35.8
Convert the entire DataFrameWe can convert the entire DataFrame using astype() function and passing float as datatype.
Python
# Sample DataFrame
data = {'C1': ['10.5', '20.7', '30.2'],
'C2': ['15.3', '25.6', '35.8']}
df = pd.DataFrame(data)
# Print dtypes before conversion
print("Data types before conversion:")
print(df.dtypes)
# Convert the entire DataFrame to float
df = df.astype(float)
# Print dtypes after conversion
print("\nData types after conversion:")
print(df.dtypes)
# Print the DataFrame
print("\nDataFrame after conversion:")
print(df)
int("GFG")
Output:
Data types before conversion: C1 object C2 object dtype: object Data types after conversion: C1 float64 C2 float64 dtype: object DataFrame after conversion: C1 C2 0 10.5 15.3 1 20.7 25.6 2 30.2 35.8
Conclusion:In conclusion, converting columns to float in a Pandas DataFrame is used for mathematical operations and data analysis. In this article,we discussed about methods like DataFrame.astype() and pandas.to_numeric() and their usage in converting columns to float and handling missing values.
Convert Pandas Dataframe Column To Float – FAQsHow to Convert a Column to Float in Pandas?To convert a column in a pandas DataFrame to a float data type, use the astype() method:
import pandas as pd
df = pd.DataFrame({ 'A': ['1.1', '2.2', '3.3'] })
# Convert column 'A' to float df['A'] = df['A'].astype(float) print(df['A'])
How to Convert Pandas DataFrame Column Float to Int?When converting a float column to an integer in pandas, decimals will be truncated. Use astype(int) or consider rounding before converting if appropriate:
# Assuming 'A' is already a float column df['A'] = df['A'].astype(int) # Direct conversion, truncates the decimal # or df['A'] = df['A'].round().astype(int) # Round first, then convert print(df['A'])
How to Convert Pandas Column to Integer?You can convert a column to an integer using the astype() method. If the column contains null values or decimals, you might need to handle them first since integers cannot represent NaN:
df = pd.DataFrame({ 'B': [1.0, 2.5, None] })
# Fill NaN with 0 (or another appropriate value) and convert to integer df['B'] = df['B'].fillna(0).astype(int) print(df['B'])
What is dtype (‘o’)?In pandas, dtype('O') (the letter ‘o’, not zero) refers to an object type. This is the most general dtype; it is typically used for columns that contain mixed types (e.g., numbers alongside strings) or purely string data. It’s akin to Python’s object type, which can effectively store any type of Python object:
df = pd.DataFrame({ 'C': ['text', 1, 2.5, True] }) print(df['C'].dtype) # Outputs: object
How to Get Only Float Columns in Pandas?To select only the columns in a DataFrame that are of float type, you can use the select_dtypes() method:
df = pd.DataFrame({ 'A': [1.1, 2.2, 3.3], 'B': [4, 5, 6], 'C': [7.1, 8.2, None] })
# Select only float columns float_columns = df.select_dtypes(include=['float64']) print(float_columns)
|