![]() |
Quantiles offers valuable insights into data distribution and helping in various aspects of analysis. This article describes quantiles, looks at how to calculate them, and talks about how important they are for machine learning applications. We also discuss the problems with quantiles and how box plots may be used to represent them. For anybody dealing with data in the field of machine learning, having a firm understanding of quantiles is crucial. What are Quantiles?Quantiles divide the dataset into equal parts based on rank or percentile. They represent the values at certain points in a dataset sorted in increasing order. General quantiles include the median (50th percentile), quartiles (25th, 50th, and 75th percentiles), and percentiles (values ranging from 0 to 100). In machine learning and data science, quantiles play an important role in understanding the data, detecting outliers and evaluating model performance. Types of Quantiles
Steps to Calculate QuantilesThe steps for calculating quantiles involve:
Example with Mathematical Imputation:Let’s consider a dataset: [5, 10, 15, 20, 25, 30, 35, 40, 45, 50].
Implementation: Calculating Quantiles using NumPy LibraryQuintilesThis code uses NumPy to compute the quintiles (20th, 40th, 60th, and 80th percentiles) of a given dataset Python3 import numpy as np
# Different sample data
data = np.array([12, 15, 18, 20, 22, 25, 28, 30, 32, 35, 38, 40])
# Compute the quintiles
quintiles = np.percentile(data, [20, 40, 60, 80])
print("20th percentile (quintile 1):", quintiles[0])
print("40th percentile (quintile 2):", quintiles[1])
print("60th percentile (quintile 3):", quintiles[2])
print("80th percentile (quintile 4):", quintiles[3])
Output: print("60th percentile (quintile 3):", quintiles[2]) QuartilesThis code uses Python3 import numpy as np
# Sample data
data = np.array([5, 10, 15, 20, 25, 30, 35, 40, 45, 50])
# Calculating median (Q2)
median = np.quantile(data, 0.5)
# Calculating first quartile (Q1)
q1 = np.quantile(data, 0.25)
# Calculating third quartile (Q3)
q3 = np.quantile(data, 0.75)
print("Median (Q2):", median)
print("First Quartile (Q1):", q1)
print("Third Quartile (Q3):", q3)
Median (Q2): 27.5 PercentilesThis code also utilizes NumPy to compute the 25th, 50th (median), and 75th percentiles of a given dataset data. The np.percentile function calculates the desired percentiles, and the resulting values are printed out to the console. Python3 import numpy as np
# Sample data
data = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
# Compute the 25th, 50th, and 75th percentiles
percentiles = np.percentile(data, [25, 50, 75])
print("25th percentile:", percentiles[0])
print("50th percentile (median):", percentiles[1])
print("75th percentile:", percentiles[2])
Output: 25th percentile: 3.25 DecilesThis code utilizes NumPy to compute deciles (10th, 20th, …, 90th percentiles) of a given dataset data. The np.percentile function calculates the desired percentiles using an array of percentiles from 10 to 90 in increments of 10. The resulting decile values are then printed out to the console using a loop, with the enumerate function to iterate over the deciles and start=1 to start the enumeration from 1 instead of 0. Python3 import numpy as np
# Sample data
data = np.array([10, 20, 30, 40, 50, 60, 70, 80, 90, 100])
# Compute the deciles
deciles = np.percentile(data, np.arange(10, 100, 10))
for i, decile in enumerate(deciles, start=1):
print(f"{i}0th percentile (decile {i}):", decile)
Output: 10th percentile (decile 1): 19.0
Uses of Quantiles in Machine LearningQuantiles play a crucial role in various aspects of machine learning and data analysis. Here are some key uses:
Understanding these uses is essential for effectively utilizing quantiles in machine learning and data analysis tasks. Challenges and Limitations of Quantiles
ConclusionQuantiles are powerful statistical measures that provide valuable insights into the distribution of data. Understanding and utilizing quantiles effectively in machine learning and data science can enhance data analysis, model building, and decision-making processes. By calculating and interpreting quantiles, data scientists can gain more information about datasets and make informed decisions in various analytical tasks. |
Reffered: https://www.geeksforgeeks.org
AI ML DS |
Type: | Geek |
Category: | Coding |
Sub Category: | Tutorial |
Uploaded by: | Admin |
Views: | 13 |