Selected Reading

Python Pandas - Pivoting



Pivoting in Python Pandas is a powerful data transformation technique that reshapes data for easier analysis and visualization. It changes the data representation from a "long" format to a "wide" format, making it simpler to perform aggregations and comparisons.

This technique is particularly useful when dealing with time series data or datasets with multiple columns. Pandas provides two primary methods for pivoting −

  • pivot(): Reshapes data according to specified column or index values.

  • pivot_table(): It is a more flexible method that allows you to create a spreadsheet-style pivot table as a DataFrame.

In this tutorial, we will learn about the pivoting in Pandas using these methods with examples to demonstrate their applications in data manipulation.

Pivoting with pivot()

The Pandas df.pivot() method is used to reshape data when there are unique values for the specified index and column pairs. It is straightforward and useful when your data is well-structured without duplicate entries for the index/column combination.

Example

Here is a basic example demonstrating pivoting a Pandas DataFrame with the Pandas df.pivot() method.

import pandas as pd

# Create a DataFrame
df = pd.DataFrame({"Col1": range(12),"Col2": ["A", "A", "A", "B", "B","B", "C", "C", "C", "D", "D", "D"],
"date": pd.to_datetime(["2024-01-03", "2024-01-04", "2024-01-05"] * 4)})

# Display the Input DataFrame
print('Original DataFrame:\n', df)

# Pivot the DataFrame
pivoted = df.pivot(index="date", columns="Col2", values="Col1")

# Display the output
print('Pivoted DataFrame:\n', pivoted)

Output

Following is the output of the above code −

Original DataFrame:
     Col1 Col2       date
0      0    A 2024-01-03
1      1    A 2024-01-04
2      2    A 2024-01-05
3      3    B 2024-01-03
4      4    B 2024-01-04
5      5    B 2024-01-05
6      6    C 2024-01-03
7      7    C 2024-01-04
8      8    C 2024-01-05
9      9    D 2024-01-03
10    10    D 2024-01-04
11    11    D 2024-01-05
Pivoted DataFrame:
 Col2        A  B  C   D
date                   
2024-01-03  0  3  6   9
2024-01-04  1  4  7  10
2024-01-05  2  5  8  11

Note: The pivot() method requires that the index and columns specified have unique values. If your data contains duplicates, you should use the pivot_table() method instead.

Pivoting with pivot_table()

The pivot() method is a straightforward way to reshape data, while pivot_table() offers flexibility for aggregation, making it suitable for more complex data manipulation tasks. This is particularly useful for summarizing data when dealing with duplicates and requires aggregation of data.

Example

This example demonstrates pivoting a DataFrame using the df.pivot_table() method.

import numpy as np
import pandas as pd
import datetime

# Create a DataFrame
df = pd.DataFrame({"A": [1, 1, 2, 3] * 6,
"B": ["A", "B", "C"] * 8,
"C": ["x", "x", "x", "y", "y", "y"] * 4,
"D": np.random.randn(24),
"E": np.random.randn(24),
"F": [datetime.datetime(2013, i, 1) for i in range(1, 13)] +[datetime.datetime(2013, i, 15) for i in range(1, 13)]})

# Display the Input DataFrame
print('Original DataFrame:\n', df)

# Pivot the DataFrame
pivot_table = pd.pivot_table(df, values="D", index=["A", "B"], columns=["C"])

# Display the output
print('Pivoted DataFrame:\n', pivot_table)

Output

Following is the output of the above code −

Original DataFrame:
     A  B  C         D         E          F
0   1  A  x  1.326898  0.106289 2013-01-01
1   1  B  x  0.173894  1.815661 2013-02-01
2   2  C  x -1.244485  2.225515 2013-03-01
3   3  A  y  1.333984  0.440766 2013-04-01
4   1  B  y  0.084371 -0.287495 2013-05-01
5   1  C  y -1.005378 -0.778836 2013-06-01
6   2  A  x  0.028336  0.074827 2013-07-01
7   3  B  x -0.726482  0.911117 2013-08-01
8   1  C  x -0.136928 -1.215960 2013-09-01
9   1  A  y -0.854257  1.398881 2013-10-01
10  2  B  y -0.339238  0.286684 2013-11-01
11  3  C  y -0.036610  1.820238 2013-12-01
12  1  A  x -0.589002  0.386662 2013-01-15
13  1  B  x -0.097922 -0.284418 2013-02-15
14  2  C  x -0.259232  0.751310 2013-03-15
15  3  A  y -0.685608  0.048374 2013-04-15
16  1  B  y  0.293147 -1.217476 2013-05-15
17  1  C  y  0.491561 -0.050036 2013-06-15
18  2  A  x -1.404094  1.101318 2013-07-15
19  3  B  x -0.551091 -1.400021 2013-08-15
20  1  C  x  0.300324 -1.230676 2013-09-15
21  1  A  y  1.278413 -1.970118 2013-10-15
22  2  B  y -0.866687  1.002037 2013-11-15
23  3  C  y -1.154852  1.245371 2013-12-15
Pivoted DataFrame:
 C           x         y
A B                    
1 A  0.368948  0.212078
  B  0.037986  0.188759
  C  0.081698 -0.256909
2 A -0.687879       NaN
  B       NaN -0.602962
  C -0.751859       NaN
3 A       NaN  0.324188
  B -0.638786       NaN
  C       NaN -0.595731

Pivoting with Aggregation

The Pandas pivot_table() method can be used to specify an aggregation function. By default it calculates the mean, but you can also use functions like sum, count, or even custom functions for applying aggregation to the pivoting.

Example

This example demonstrates how to apply aggregation function with pivoting a DataFrame using the df.pivot_table() method.

import numpy as np
import datetime
import pandas as pd

# Create a DataFrame
df = pd.DataFrame({"A": [1, 1, 2, 3] * 6,
"B": ["A", "B", "C"] * 8,
"C": ["x", "x", "x", "y", "y", "y"] * 4,
"D": np.random.randn(24),
"E": np.random.randn(24),
"F": [datetime.datetime(2013, i, 1) for i in range(1, 13)] +[datetime.datetime(2013, i, 15) for i in range(1, 13)]})

# Display the Input DataFrame
print('Original DataFrame:\n', df)

# Pivot the DataFrame with a aggregate function
pivot_table = pd.pivot_table(df, values=["D", "E"], index=["B"], columns=["A", "C"], aggfunc="sum")

# Display the output
print('Pivoted DataFrame:\n', pivot_table)

Output

Following is the output of the above code −

Original DataFrame:
     A  B  C         D         E          F
0   1  A  x  0.122689 -1.893287 2013-01-01
1   1  B  x  0.224266 -0.233766 2013-02-01
2   2  C  x  0.054157 -1.079204 2013-03-01
3   3  A  y  0.340879 -0.388190 2013-04-01
4   1  B  y  1.104747  1.402694 2013-05-01
5   1  C  y -1.788430 -1.925411 2013-06-01
6   2  A  x -0.445921 -0.835855 2013-07-01
7   3  B  x -0.866662 -0.110326 2013-08-01
8   1  C  x  0.881280 -0.828522 2013-09-01
9   1  A  y  0.270564 -0.484500 2013-10-01
10  2  B  y -1.674650 -0.121973 2013-11-01
11  3  C  y  0.887327  1.591251 2013-12-01
12  1  A  x -0.233174  0.387652 2013-01-15
13  1  B  x  0.224098 -1.580638 2013-02-15
14  2  C  x  0.872398  0.389193 2013-03-15
15  3  A  y -0.576675  0.178034 2013-04-15
16  1  B  y  0.812351  0.188114 2013-05-15
17  1  C  y  0.930502 -1.714477 2013-06-15
18  2  A  x -0.247967  1.381405 2013-07-15
19  3  B  x  0.556805 -1.048162 2013-08-15
20  1  C  x -1.040586  1.107391 2013-09-15
21  1  A  y -0.648565  1.118233 2013-10-15
22  2  B  y  1.660616  0.252881 2013-11-15
23  3  C  y  1.745446 -0.541543 2013-12-15
Pivoted DataFrame:
           D                      ...         E                    
A         1                   2  ...         2         3          
C         x         y         x  ...         y         x         y
B                                ...                              
A -0.110485 -0.378000 -0.693888  ...       NaN       NaN -0.210156
B  0.448364  1.917098       NaN  ...  0.130908 -1.158488       NaN
C -0.159307 -0.857928  0.926556  ...       NaN       NaN  1.049708

[3 rows x 12 columns]
Advertisements