![]() |
When it comes to data preprocessing, machine learning algorithms perform better when variables are transformed to fit a more Gaussian distribution. PowerTransformer is a scikit-learn library that is used to transform to fit Gaussian distribution. The article aims to explore PowerTransfoer technique, its methods along with implementation in scikit-learn. Table of Content What is a PowerTransformer?The PowerTransformer is a technique used to make numerical data resemble a Gaussian distribution more closely, which is often required for many machine learning models that operate under the assumption of normal distribution. It is especially valuable in situations where data shows significant skewness or kurtosis. By stabilizing variance and reducing skewness, the PowerTransformer helps to reinforce the foundational statistical assumptions, thus enhancing the effectiveness of the model. How Does PowerTransformer Work?The ‘PowerTransformer’ supports two main transformations:
Both of these methods are used to compute optimal transformation parameter [Tex]\lambda[/Tex] that normalizes the data. Box-Cox TransformThe Box-Cox transformation is a statistical method used to stabilize variance and make data more closely meet the assumptions of normality. The Box-Cox transformation can be applied to positive data. The transformation is parameterized by [Tex](\lambda)[/Tex] value, which varies to find the best approximation of a normal distribution. The formula for the Box-Cox transformation is: [Tex]y(\lambda) = \begin{cases} \log(y) & \text{if } \lambda \neq 0 \\ \lambda & \text{if } \lambda = 0 \end{cases} [/Tex] This transformation helps improve the validity of many statistical techniques that assume normality. Yeo-Johnson TransformThe Yeo-Johnson transformation, an extension of the Box-Cox method, serves to stabilize variance and normalize data distributions, rendering it more adaptable for real-world scenarios by accommodating both positive and negative data values. The transformation is defined as follows for values of [Tex]\lambda[/Tex] and y: [Tex]y(\lambda) = \begin{cases} \frac{\lambda}{[(y+1)\lambda – 1]} & \text{if } \lambda \neq 0, y \geq 0 \\ \log(y+1) & \text{if } \lambda = 0, y \geq 0 \\ \frac{2-\lambda}{[(-y+1)^2 – \lambda – 1]} & \text{if } \lambda \neq 2, y < 0 \\ -\log(-y+1) & \text{if } \lambda = 2, y < 0 \end{cases} [/Tex] Implementation: PowerTransformer in Scikit-LearnTo use the ‘ Step 1: Import LibrariesHere, we import necessary libraries: PowerTransformer from scikit-learn for applying the Yeo-Johnson transformation, numpy for numerical operations, and matplotlib.pyplot for data visualization.
Step 2: Generating Skewed DataWe use
Output: ![]() Skewed Dataset Step 3: Applying PowerTransformerWe initialize the
Output: ![]() Normally Distributed Data Advantages of PowerTransformer
ConclusionThe PowerTransformer technique, through Box-Cox and Yeo-Johnson transformations, optimally normalizes numerical data, vital for enhancing machine learning model performance by aligning with Gaussian distributions. Its robustness to outliers, preservation of rank order, and effectiveness in handling skewed data make it a valuable asset in data preprocessing for various machine learning applications. |
Reffered: https://www.geeksforgeeks.org
AI ML DS |
Related |
---|
![]() |
![]() |
![]() |
![]() |
![]() |
Type: | Geek |
Category: | Coding |
Sub Category: | Tutorial |
Uploaded by: | Admin |
Views: | 15 |