![]() |
Time series data is prevalent in various fields such as finance, healthcare, and engineering. Extracting meaningful features from this data is crucial for building predictive models. The tsfresh Python package simplifies this process by automatically calculating a wide range of features. This article provides a comprehensive guide on how to use tsfresh to extract features from time series data. Table of Content Introduction to tsfreshtsfresh (Time Series Feature extraction based on scalable hypothesis tests) is a Python package designed to automate the extraction of a large number of features from time series data. It is particularly useful for tasks such as classification, regression, and clustering of time series data. The package integrates seamlessly with pandas and scikit-learn, making it easy to incorporate into existing workflows. Key Features of tsfresh:
How to Use tsfresh for Feature ExtractionInstallationTo install tsfresh, you can use pip: pip install tsfresh Ensure you have the necessary dependencies, such as pandas and numpy, which are commonly used in conjunction with tsfresh. Basic Usage : Step-by-Step ProcedureThe basic usage of tsfresh involves three main steps:
Step 1: Preparing the DataYour data should be in a long-format pandas DataFrame where each row corresponds to a single observation at a specific time point. The DataFrame should have at least the following columns:
Example:
Step 2: Extracting FeaturesTo extract features, use the
Output: Feature Extraction: 100%|██████████| 2/2 [00:00<00:00, 6.65it/s]
value__variance_larger_than_standard_deviation value__has_duplicate_max \
1 1.0 0.0
2 0.0 0.0
value__has_duplicate_min value__has_duplicate value__sum_values \
1 0.0 0.0 39.0
2 0.0 0.0 24.0
value__abs_energy value__mean_abs_change value__mean_change \
1 521.0 3.0 2.0
2 194.0 1.0 1.0
value__mean_second_derivative_central value__median ... \
1 -3.0 14.0 ...
2 0.0 8.0 ...
value__fourier_entropy__bins_5 value__fourier_entropy__bins_10 \
1 0.693147 0.693147
2 0.693147 0.693147
value__fourier_entropy__bins_100 \
1 0.693147
2 0.693147
value__permutation_entropy__dimension_3__tau_1 \
1 -0.0
2 -0.0
value__permutation_entropy__dimension_4__tau_1 \
1 NaN
2 NaN
value__permutation_entropy__dimension_5__tau_1 \
1 NaN
2 NaN
value__permutation_entropy__dimension_6__tau_1 \
1 NaN
2 NaN
value__permutation_entropy__dimension_7__tau_1 \
1 NaN
2 NaN
value__query_similarity_count__query_None__threshold_0.0 \
1 NaN
2 NaN
value__mean_n_absolute_max__number_of_maxima_7
1 NaN
2 NaN
Step 3: Filtering Relevant FeaturesAfter feature extraction, you may want to filter out irrelevant or redundant features. Use the select_features function to keep only the significant ones:
Output: value__autocorrelation__lag_4' 'value__autocorrelation__lag_5'
'value__autocorrelation__lag_6' 'value__autocorrelation__lag_7'
'value__autocorrelation__lag_8' 'value__autocorrelation__lag_9'
'value__partial_autocorrelation__lag_0'
'value__partial_autocorrelation__lag_1'
'value__partial_autocorrelation__lag_2'
'value__partial_autocorrelation__lag_3'
'value__partial_autocorrelation__lag_4'
'value__partial_autocorrelation__lag_5'
'value__partial_autocorrelation__lag_6'
'value__partial_autocorrelation__lag_7'
'value__partial_autocorrelation__lag_8'
'value__partial_autocorrelation__lag_9' Note: This is just the glimpse of the output This step ensures that only features relevant to your prediction task are retained. Step 4: Visualizing ResultsVisualizing the selected features can help understand their importance and distribution. Use visualization libraries such as matplotlib or seaborn to create plots:
Output: Empty DataFrame
Columns: []
Index: [1, 2] tsfresh to Extract Features from Time Series Data : Advanced Usagetsfresh offers several advanced features:
Example of custom settings: from tsfresh.feature_extraction import ComprehensiveFCParameters
settings = ComprehensiveFCParameters()
settings['maximum'] = None # Disable maximum feature extraction
extracted_features = extract_features(df, column_id='id', column_sort='time', default_fc_parameters=settings) ConclusionThe tsfresh package is a robust tool for extracting and selecting features from time series data. By automating the feature extraction process, it allows you to focus on building and optimizing your machine learning models. With its flexibility and ease of use, tsfresh is an essential package for anyone working with time series data. |
Reffered: https://www.geeksforgeeks.org
AI ML DS |
Type: | Geek |
Category: | Coding |
Sub Category: | Tutorial |
Uploaded by: | Admin |
Views: | 19 |