Use Pandas to Calculate Stats from an Imported CSV file - Coding

The library in Python that allows users to analyze big data and work with datasets is known as Pandas. Pandas allow users to create the data frame or read from the CSV file using the read_csv function. Once you have created or imported the data, you can calculate various statistics from it, such as mean, median, mode, max, min, sum, etc. In this article, we will discuss calculating statistics from an imported CSV file using Pandas.

Calculate Stats from an Imported CSV file using Pandas

Importing data from a CSV file:

We can read the data from a CSV file using the read_csv function.

Syntax:

df=pd.read_csv(‘#CSV file to be read’), df= dataframe

We have read the CSV file using read_csv function. Then, we have calculated mean, median, mode, standard deviation, variance, etc.

Python3

# Import Pandas library
import pandas as pd
 
url = 'https://media.geeksforgeeks.org/wp-content/uploads/20240208132839/student_data2.csv'
# Read the CSV file
df=pd.read_csv(url)
 
# Print the data frame
print(df)

Output:

      name         subject  class   fees  fine
0     Arun           Maths      9   9000   400
1   Aniket  Social Science     10  12000   600
2   Ishita         English     11  15000     0
3  Pranjal         Science     12  18000  1000
4  Vinayak        Computer     12  18000   500

There are various descriptive statistics, such as mean, median, mode, max, min, standard deviation, and variance, which we can calculate once we have the data.

1. Mean Value:

The average of given datasets is known as mean value. There occurs various circumstances when we need to calculate the mean of a column in the dataset. We can calculate the mean as follows:

Syntax:

mean_value = df[‘#Column Name for which mean is to be calculated’].mean()

Python3

# Calculate and print mean value
mean_value=df['fees'].mean()
print('Mean Value: '+str(mean_value))

Output:

Mean Value: 14400.0

2. Median Value:

The middle value of the dataset when arranged in ascending or descending order is known as median value. In a huge dataset, it is difficult to calculate the median on our own, hence we can calculate the median as follows:

Syntax:

median_value = df[‘#Column Name for which sum is to be calculated’].median()

Python3

# Calculate and print median value
median_value=df['fees'].median()
print('Median Value: '+str(median_value))

Output:

Median Value: 15000.0

3. Mode Value:

The value that occurs most frequently in the dataset is known as mode value. It is best used for examining the categorical data. We can calculate the mode as follows:

Syntax:

mode_value = df[‘#Column Name for which median is to be calculated’].mode()

Python3

# Calculate and print mode value
mode_value=df['fees'].mode()
print('Mode Value: '+str(mode_value))

Output:

Mode Value: 18000

4. Minimum Value:

The smallest value in the dataset is known as minimum value. The minimum value is used for various purposes, which can be calculated as follows:

Syntax:

min_value = df[‘#Column Name for which min is to be calculated’].min()

Python3

# Calculate and print min value
min_value=df['fees'].min()
print('Minimum Value: '+str(min_value))

Output:

Minimum Value: 9000

5. Maximum Value:

The largest value in the dataset is known as maximum value. The maximum value is used for various purposes, which can be calculated as follows:

Syntax:

max_value = df[‘#Column Name for which max is to be calculated’].max()

Python3

# Calculate and print max value
max_value=df['fees'].max()
print('Maximum Value: '+str(max_value))

Output:

Maximum Value: 18000

6. Sum Value:

The result after adding all the values in the dataset is known as sum value. The sum is the most common stastic, which can be calculated as follows:

Syntax:

sum_value = df[‘#Column Name for which sum is to be calculated’].sum()

Python3

# Calculate and print sum value
sum_value = df['fees'].sum()
print('Sum Value: '+str(sum_value))

Output:

Sum Value: 72000

7. Count Value:

The way of determining the quantity of objects in dataset is known as count value. The count is the basic statistic, which is further used in various other statistics. We can calculate count value as follows:

Syntax:

count_value = df[‘#Column Name for which count is to be calculated’].count()

Python3

# Calculate and print count value
count_value=df['fees'].count()
print('Count Value: '+str(count_value))

Output:

Count Value: 5

8. Standard Deviation Value:

The statistic measuring the dispersion of a dataset relative to its mean is known as standard deviation. It is rarely used in daily life. The standard deviation can be calculated as follows:

Syntax:

std_value = df[‘#Column Name for which standard deviation is to be calculated’].std()

Python3

# Calculate and print standard deviation value
std_value=df['fees'].std()
print('Standard Deviation Value: '+str(std_value))

Output:

Standard Deviation Value: 3911.521443121589

9. Variance Value:

The statistic measuring the spread between numbers in a data set is known as variance. It is used to group differences of certain values, which can be calculated as follows:

Syntax:

var_value = df[‘#Column Name for which variance is to be calculated’].var()

Python3

# Calculate and print variance value
var_value=df['fees'].var()
print('Variance Value: '+str(var_value))

Output:

Variance Value: 15300000.0

Reffered: https://www.geeksforgeeks.org

Pandas

Related
How To Convert Pandas Dataframe To Nested Dictionary
How To Load Csv File In Jupyter Notebook?
How to Convert float64 Columns to int64 in Pandas?
Change String To Date In Pandas Dataframe
Convert Bytes To a Pandas Dataframe

Type:	Geek
Category:	Coding
Sub Category:	Tutorial
Uploaded by:	Admin
Views:	14