Horje
Slicing Column Values in Pandas

Slicing column values in Pandas is a crucial operation in data manipulation and analysis. Pandas, a powerful Python library, provides various methods to slice and extract specific data from DataFrames. This article will delve into the different techniques for slicing column values, highlighting their syntax, examples, and applications.

Introduction to Pandas DataFrame

A Pandas DataFrame is a two-dimensional, size-mutable, and heterogeneous tabular data structure with labeled axes (rows and columns). It is similar to a spreadsheet or SQL table and is one of the most commonly used data structures in data analysis.

To get started, let’s create a simple DataFrame:

Python
import pandas as pd

data = {
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 35],
    'City': ['New York', 'Los Angeles', 'Chicago']
}
df = pd.DataFrame(data)
print(df)

Output:

      Name  Age         City
0    Alice   25     New York
1      Bob   30  Los Angeles
2  Charlie   35      Chicago

Slicing Column Values using Indexing

1. Positional Indexing with iloc

The iloc function is used for positional indexing, which allows you to slice data based on numerical positions.

Python
# Slicing the first two rows of the 'Name' column
names = df.iloc[:2, 0]
print(names)

Output:

0    Alice
1      Bob
Name: Name, dtype: object

2. Label-based Indexing with loc

The loc function is used for label-based indexing, which allows you to slice data based on row and column labels.

Python
# Slicing the 'Name' column for the first two rows
names = df.loc[:1, 'Name']
print(names)

Output:

0    Alice
1      Bob
Name: Name, dtype: object

Slicing Column Values using String Methods

1. Accessing Substrings

You can access substrings of column values using the str accessor.

Python
# Extracting the first three characters of each name
df['Name_Short'] = df['Name'].str[:3]
print(df)

Output:

      Name  Age         City Name_Short
0    Alice   25     New York        Ali
1      Bob   30  Los Angeles        Bob
2  Charlie   35      Chicago        Cha

2. Using Regular Expressions

Regular expressions can be used for more complex slicing.

Python
# Extracting only the digits from the 'City' column (although in this case, there are none)
df['City_Digits'] = df['City'].str.extract('(\d+)', expand=False)
print(df)

Output:

      Name  Age         City Name_Short City_Digits
0    Alice   25     New York        Ali         NaN
1      Bob   30  Los Angeles        Bob         NaN
2  Charlie   35      Chicago        Cha         NaN

Slicing Column Values in Pandas : Advanced Techniques

1. Slicing with apply and lambda

The apply function combined with a lambda function provides a flexible way to slice column values.

Python
# Extracting the first letter of each city name
df['City_First_Letter'] = df['City'].apply(lambda x: x[0])
print(df)

Output:

      Name  Age         City Name_Short City_Digits City_First_Letter
0    Alice   25     New York        Ali         NaN                 N
1      Bob   30  Los Angeles        Bob         NaN                 L
2  Charlie   35      Chicago        Cha         NaN                 C

2. Using str.split for Complex Slicing

The str.split method splits strings based on a specified delimiter and returns a list. You can then slice these lists to extract specific parts.

Python
# Splitting the 'Name' column by the letter 'l' and taking the first part
df['Name_Split'] = df['Name'].str.split('l').str[0]
print(df)

Output:

      Name  Age         City Name_Short City_Digits City_First_Letter  \
0    Alice   25     New York        Ali         NaN                 N   
1      Bob   30  Los Angeles        Bob         NaN                 L   
2  Charlie   35      Chicago        Cha         NaN                 C   

  Name_Split  
0          A  
1        Bob  
2       Char  

Practical Examples: Slicing Columns in a Real-World Dataset

Example 1: Analyzing Titanic Passenger Data

Let’s consider a dataset of Titanic passengers:

Python
import pandas as pd

# Load the Titanic dataset
url = 'https://web.stanford.edu/class/archive/cs/cs109/cs109.1166/stuff/titanic.csv'
df = pd.read_csv(url)

# Display the first few rows of the dataset
print(df.head())

Output:

   PassengerId  Survived  Pclass  ...     Fare Cabin  Embarked
0            1         0       3  ...   7.2500   NaN         S
1            2         1       1  ...  71.2833   C85         C
2            3         1       3  ...   7.9250   NaN         S
3            4         1       1  ...  53.1000  C123         S
4            5         0       3  ...   8.0500   NaN         S

1. Slicing Specific Columns:

Python
# Slice columns 'Name', 'Age', and 'Sex'
df_sliced = df.loc[:, ['Name', 'Age', 'Sex']]
print(df_sliced.head())

Output:

                                                Name   Age     Sex
0                            Braund, Mr. Owen Harris  22.0    male
1  Cumings, Mrs. John Bradley (Florence Briggs Th...  38.0  female
2                             Heikkinen, Miss. Laina  26.0  female
3       Futrelle, Mrs. Jacques Heath (Lily May Peel)  35.0  female
4                           Allen, Mr. William Henry  35.0    male

2. Slicing Columns by Index:

Python
# Slice columns from index 1 to 4
df_sliced = df.iloc[:, 1:4]
print(df_sliced.head())

Output:

   Survived  Pclass  Name
0         0       3  Braund, Mr. Owen Harris
1         1       1  Cumings, Mrs. John Bradley (Florence Briggs Thayer)
2         1       3  Heikkinen, Miss. Laina
3         1       1  Futrelle, Mrs. Jacques Heath (Lily May Peel)
4         0       3  Allen, Mr. William Henry

Example 2: Slicing Substrings in a Product Codes Dataset

Consider a dataset with product codes:

Python
import pandas as pd

# Create a DataFrame with product codes
data = {
    'ProductCode': ['A12345', 'B67890', 'C54321', 'D98765'],
    'Price': [100, 150, 200, 250]
}

df = pd.DataFrame(data)
print(df)

Output:

  ProductCode  Price
0       A12345    100
1       B67890    150
2       C54321    200
3       D98765    250

1. Extracting Product Category:

Python
# Slice the first character to get the product category
df['Category'] = df['ProductCode'].str.slice(0, 1)
print(df)

Output:

  ProductCode  Price Category
0       A12345    100        A
1       B67890    150        B
2       C54321    200        C
3       D98765    250        D

2. Extracting Product Number:

Python
# Slice the numeric part of the product code
df['ProductNumber'] = df['ProductCode'].str.slice(1)
print(df)

Output:

  ProductCode  Price Category ProductNumber
0       A12345    100        A         12345
1       B67890    150        B         67890
2       C54321    200        C         54321
3       D98765    250        D         98765

Conclusion

Slicing column values in Pandas is a fundamental skill for data manipulation and analysis. Whether you need to slice entire columns or extract substrings from column values, Pandas provides versatile methods to accomplish these tasks. By mastering these techniques, you can efficiently preprocess and analyze your data, making your data analysis workflows more effective and streamlined.




Reffered: https://www.geeksforgeeks.org


AI ML DS

Related
How single-shot detector (SSD) works? How single-shot detector (SSD) works?
Selecting Top Features with tsfresh: A Technical Guide Selecting Top Features with tsfresh: A Technical Guide
Creating Powerful Time Series Features with tsfresh Creating Powerful Time Series Features with tsfresh
Using Dictionaries as Arguments in Numba Using Dictionaries as Arguments in Numba
How to Get an Internship as a Research Scientist How to Get an Internship as a Research Scientist

Type:
Geek
Category:
Coding
Sub Category:
Tutorial
Uploaded by:
Admin
Views:
16