Selected Reading

Python Pandas - GroupBy



Pandas groupby() is an essential method for data aggregation and analysis in python. It follows the "Split-Apply-Combine" pattern, which means it allows users to −

  • Split data into groups based on specific criteria.

  • Apply functions independently to each group.

  • Combine the results into a structured format.

In this tutorial, we will learn about basics of groupby operations in pandas, such as splitting data, viewing groups, and selecting specific groups using an example dataset.

Introduction to GroupBy Operations

Every groupby() operation involves three key steps, splitting data into groups based on some criteria, apply functions independently to each group, and then merge the results back into a meaningful structure.

In many situations, we apply some functions on each splitted groups. In the apply functionality, we can perform the following operations −

  • Aggregation: Computing summary statistics like mean, sum, etc.

  • Transformation: Applying a function to transform data.

  • Filtration: Removing groups based on some condition.

Split Data into Groups

Pandas objects can be split into groups based on any of their column values using the groupby() method.

Example

Let us now see how the grouping objects can be applied to the Pandas DataFrame using the groupby() method.

# import the pandas library
import pandas as pd

ipl_data = {'Team': ['Riders', 'Riders', 'Devils', 'Devils', 'Kings',
   'kings', 'Kings', 'Kings', 'Riders', 'Royals', 'Royals', 'Riders'],
   'Rank': [1, 2, 2, 3, 3,4 ,1 ,1,2 , 4,1,2],
   'Year': [2014,2015,2014,2015,2014,2015,2016,2017,2016,2014,2015,2017],
   'Points':[876,789,863,673,741,812,756,788,694,701,804,690]}

df = pd.DataFrame(ipl_data)

# Display the Original DataFrame
print("Original DataFrame:")
print(df)

# Display the Grouped Data
print('\nGrouped Data:')
print(df.groupby('Team'))

Output

Following is the output of the above code −

Original DataFrame:
      Team  Rank  Year  Points
0   Riders     1  2014     876
1   Riders     2  2015     789
2   Devils     2  2014     863
3   Devils     3  2015     673
4    Kings     3  2014     741
5    kings     4  2015     812
6    Kings     1  2016     756
7    Kings     1  2017     788
8   Riders     2  2016     694
9   Royals     4  2014     701
10  Royals     1  2015     804
11  Riders     2  2017     690

Grouped Data:
<pandas.core.groupby.generic.DataFrameGroupBy object at 0x7fca22795340>

GroupBy with Multiple Columns

You can group data based on multiple columns by applying a list of column values to the groupby() method.

Example

Here is an example where the data is grouped by multiple columns.

# import the pandas library
import pandas as pd

# Create a DataFrame
ipl_data = {'Team': ['Riders', 'Riders', 'Devils', 'Devils', 'Kings',
   'kings', 'Kings', 'Kings', 'Riders', 'Royals', 'Royals', 'Riders'],
   'Rank': [1, 2, 2, 3, 3,4 ,1 ,1,2 , 4,1,2],
   'Year': [2014,2015,2014,2015,2014,2015,2016,2017,2016,2014,2015,2017],
   'Points':[876,789,863,673,741,812,756,788,694,701,804,690]}
df = pd.DataFrame(ipl_data)

# Display the Grouped Data
print('Grouped Data:')

print(df.groupby(['Team','Year']).groups)

Output

Its output is as follows −

Grouped Data:
{('Devils', 2014): [2], ('Devils', 2015): [3], ('Kings', 2014): [4], 
('Kings', 2016): [6], ('Kings', 2017): [7], ('Riders', 2014): [0], 
('Riders', 2015): [1], ('Riders', 2016): [8], ('Riders', 2017): [11], 
('Royals', 2014): [9], ('Royals', 2015): [10], ('kings', 2015): [5]}

Viewing Grouped Data

Once you have your data split into groups, you can view them using different methods. One of the simplest ways is to view how it has been internally stored using the .groups attribute.

Example

The following example demonstrates how to view the grouped data using the using the .groups attribute.

# import the pandas library
import pandas as pd

# Create DataFrame 
ipl_data = {'Team': ['Riders', 'Riders', 'Devils', 'Devils', 'Kings',
   'kings', 'Kings', 'Kings', 'Riders', 'Royals', 'Royals', 'Riders'],
   'Rank': [1, 2, 2, 3, 3,4 ,1 ,1,2 , 4,1,2],
   'Year': [2014,2015,2014,2015,2014,2015,2016,2017,2016,2014,2015,2017],
   'Points':[876,789,863,673,741,812,756,788,694,701,804,690]}
df = pd.DataFrame(ipl_data)

print('Viewing Grouped Data:')
print(df.groupby('Team').groups)

Output

Its output is as follows −

Viewing Grouped Data:
{'Devils': [2, 3], 'Kings': [4, 6, 7], 'Riders': [0, 1, 8, 11], 
'Royals': [9, 10], 'kings': [5]}

Selecting a Specific Group

Using the get_group() method, we can select a specific group.

Example

The following example demonstrates selecting a group from a grouped data using the get_group() method.

# import the pandas library
import pandas as pd

ipl_data = {'Team': ['Riders', 'Riders', 'Devils', 'Devils', 'Kings',
   'kings', 'Kings', 'Kings', 'Riders', 'Royals', 'Royals', 'Riders'],
   'Rank': [1, 2, 2, 3, 3,4 ,1 ,1,2 , 4,1,2],
   'Year': [2014,2015,2014,2015,2014,2015,2016,2017,2016,2014,2015,2017],
   'Points':[876,789,863,673,741,812,756,788,694,701,804,690]}
df = pd.DataFrame(ipl_data)

grouped = df.groupby('Year')

# Display the Selected Data
print('Selected Group Data:')
print(grouped.get_group(2014))

Output

Its output is as follows −

Selected Group Data:
     Team  Rank  Year  Points
0  Riders     1  2014     876
2  Devils     2  2014     863
4   Kings     3  2014     741
9  Royals     4  2014     701
Advertisements