Horje
pandas.crosstab() function in Python

This method is used to compute a simple cross-tabulation of two (or more) factors. By default, computes a frequency table of the factors unless an array of values and an aggregation function are passed.

Syntax: pandas.crosstab(index, columns, values=None, rownames=None, colnames=None, aggfunc=None, margins=False, margins_name=’All’, dropna=True, normalize=False)

Arguments :

  • index : array-like, Series, or list of arrays/Series, Values to group by in the rows.
  • columns : array-like, Series, or list of arrays/Series, Values to group by in the columns.
  • values : array-like, optional, array of values to aggregate according to the factors. Requires `aggfunc` be specified.
  • rownames : sequence, default None, If passed, must match number of row arrays passed.
  • colnames : sequence, default None, If passed, must match number of column arrays passed.
  • aggfunc : function, optional, If specified, requires `values` be specified as well.
  • margins : bool, default False, Add row/column margins (subtotals).
  • margins_name : str, default ‘All’, Name of the row/column that will contain the totals when margins is True.
  • dropna : bool, default True, Do not include columns whose entries are all NaN.

Below is the implementation of the above method with some examples :

Example 1 :

Python
# importing packages
import pandas
import numpy

# creating some data
a = numpy.array(["foo", "foo", "foo", "foo",
                 "bar", "bar", "bar", "bar",
                 "foo", "foo", "foo"],
                dtype=object)

b = numpy.array(["one", "one", "one", "two",
                 "one", "one", "one", "two",
                 "two", "two", "one"],
                dtype=object)

c = numpy.array(["dull", "dull", "shiny",
                 "dull", "dull", "shiny",
                 "shiny", "dull", "shiny",
                 "shiny", "shiny"],
                dtype=object)

# form the cross tab
pandas.crosstab(a, [b, c], rownames=['a'], colnames=['b', 'c'])

Output :

Example 2 :

Python
# importing package
import pandas

# create some data
foo = pandas.Categorical(['a', 'b'], 
                         categories=['a', 'b', 'c'])

bar = pandas.Categorical(['d', 'e'], 
                         categories=['d', 'e', 'f'])

# form crosstab with dropna=True (default)
pandas.crosstab(foo, bar)

# form crosstab with dropna=False
pandas.crosstab(foo, bar, dropna=False)

Output :

pandas.crosstab() function in Python – FAQs

What is the Crosstab Function in Pandas?

The crosstab function in pandas is used to compute a simple cross-tabulation of two (or more) factors. Essentially, it is used to create a table that shows the frequency with which certain groups of data appear. It can be used to summarize the data in a way that provides a two-dimensional view of the relationships within it.

import pandas as pd

# Example usage of pd.crosstab()
df = pd.DataFrame({
'Gender': ['Male', 'Female', 'Female', 'Male', 'Male'],
'Handedness': ['Right-handed', 'Left-handed', 'Right-handed', 'Right-handed', 'Left-handed']
})

ctab = pd.crosstab(df['Gender'], df['Handedness'])
print(ctab)

What is the Primary Purpose of Crosstab Functions in Python?

The primary purpose of the crosstab function is to provide a way to quantitatively analyze the relationship between multiple variables within a dataset. It helps in identifying trends, patterns, and anomalies in data by presenting the frequencies of variables in a tabular form, making it easier for data analysts to draw insights.

What is the Difference Between Crosstab and Pivot Table in Python?

Both crosstab and pivot_table functions in pandas are used to summarize data, but they serve slightly different purposes and have different defaults:

  • Crosstab: Mainly used for frequency tables, ideal for counting occurrences and examining relationships between categorical variables.
  • Pivot Table: More flexible than crosstab and can perform complex aggregations. You can define multiple aggregation functions and it works well with numerical data.

Here’s a basic comparison using an example:

# Using pivot_table to perform the same task as crosstab
pivot = df.pivot_table(index='Gender', columns='Handedness', aggfunc=len, fill_value=0)
print(pivot)

Is Crosstab Also Called Pivot Table?

While similar, they are not exactly the same. “Crosstab” specifically refers to a type of table showing the relationship between two or more variables. A “pivot table” is a more general term used in data analysis to summarize data in a tabular format, which can include crosstabulation but also supports more complex aggregations and multi-dimensional pivoting.

Why We Use Pivot Table in Pandas?

Pivot tables are used in pandas for several reasons:

  • Data Summarization: They provide a quick way of summarizing large datasets in a comprehensible form.
  • Aggregation: Pivot tables allow the aggregation of data according to any function you define (e.g., sum, mean, count).
  • Flexibility: You can quickly rearrange, sort, and filter the data depending on what insights you need.
  • Analytical Depth: Pivot tables make it easy to explore the nuances of data, spotting trends, and correlations that might not be apparent from raw data.



Reffered: https://www.geeksforgeeks.org


Pandas

Related
How to plot a dataframe using Pandas? How to plot a dataframe using Pandas?
DataFrame.read_pickle() method in Pandas DataFrame.read_pickle() method in Pandas
Python | Pandas Series.str.match() Python | Pandas Series.str.match()
Python | Pandas Series.str.extractall() Python | Pandas Series.str.extractall()
Python | Pandas Series.str.extract() Python | Pandas Series.str.extract()

Type:
Geek
Category:
Coding
Sub Category:
Tutorial
Uploaded by:
Admin
Views:
14