![]() |
Time series data transformation is a crucial step in time series analysis and forecasting, it involves converting raw time series data into a format that is suitable for analysis and modelling. In this article, we will see how we can use time series data transformation, which will be beneficial to our analysis. Types of transformationsIn univariate time series data, there are mainly four main types of transformations, that are used to make our data fit for model building. They are :
Generating the datasetThis code utilizes Pandas and NumPy libraries to create a synthetic dataset representing weather conditions over a 100-hour period, starting from April 1, 2006. Random weather conditions, including temperature, humidity, wind speed, pressure, visibility, and apparent temperature, are generated and stored in a Pandas data frame. Each row in the DataFrame corresponds to a specific hour, with columns indicating the date, weather conditions, and various meteorological parameters.
Formatted Date Weather Conditions Temperature (C) Humidity Wind Speed (km/h) Pressure (mbar) Visibility (km) Apparent Temperature (C) Daily Summary Applying Transformations to our data, 1. Time Series Data Transformation Using Power Transform:The power transform is mainly used to make the variance of the data constant. It involves mathematically transforming the data so it changes its distribution to be more Gaussian (normal). This can be particularly useful in cases where the data has a skewed distribution or heteroscedasticity (varying variance). The code utilizes a statistical technique called power transformation, specifically the Yeo-Johnson method.
Output: Original Variance: 217.82828282828282 This significant reduction in variance indicates that the power transform successfully stabilized the variance of the data. The original variance of the ‘Temperature (C)’ column was 217.82828282828282, and after we apply the Yeo-Johnson power transform, the variance became 1.0101010101010097. 2. Time Series Data Transformation Using Difference Transform:The difference transform is a technique used to make a time series data stationary by computing the differences between consecutive observations. This transformation is useful for removing trends or seasonal patterns in the data, making it easier to model using techniques like ARIMA. This code applies a differencing transformation to the ‘Humidity’ column of a DataFrame and performs the Augmented Dickey-Fuller (ADF) test to check for stationarity.
Output: Humidity difference ADF Statistic: -6.594772523405528 Here, we performed the Dickey-Fuller test to test for stationarity after applying the differenceing transformation. And the results of the Dickey-Fuller test for the ‘Humidity difference’ column indicated that the data is likely stationary which is supported by the very low p-value (6.969838186303788e-09), which is less than the typical significance level of 0.05. Additionally, the ADF statistic is lower than the critical values at the 1%, 5%, and 10% levels, further indicating that we can reject the null hypothesis of non-stationarity. 3. Time Series Data Transformation Using Standardization:Standardization, also known as z-score normalization, is a preprocessing technique used to scale the features of a dataset to have a mean of 0 and a standard deviation of 1. This transformation can be useful when working with features that have different scales, as it helps to bring all features to a similar scale. This code demonstrates how to use the
Output: Humidity standardized Pressure standardized The columns ‘Humidity standardized’ and ‘Pressure standardized’ are now standardized, with their values now having a mean of 0 and a standard deviation of 1, which brings them to a similar scale. 4. Time Series Data Transformation Using NormalizationNormalization is another data preprocessing technique used to scale the features of a dataset to a fixed range. This is achieved by subtracting the minimum value of the feature and then dividing by the range of the feature. Normalization is particularly useful when the features have different ranges and unit. Here’s the code fits the scaler to the data and transforms the ‘Humidity’ column, then prints out the first few rows of the transformed data.
Output: 0 0.102041 The ‘Humidity normalized’ column has been normalized, which will be computationally efficient when we apply to our model. ConclusionIn conclusion, time series data transformation is a crucial step in time series analysis and forecasting. It involves converting raw time series data into a format that is suitable for analysis and modeling. We applied these transformations to a sample dataset, showcasing how each transformation affects the data and its suitability for modeling. These transformations are essential for preparing time series data for analysis and modeling, ensuring that the data is in a suitable format for accurate and effective forecasting. |
Reffered: https://www.geeksforgeeks.org
AI ML DS |
Related |
---|
![]() |
![]() |
![]() |
![]() |
![]() |
Type: | Geek |
Category: | Coding |
Sub Category: | Tutorial |
Uploaded by: | Admin |
Views: | 16 |