How to Work with Multiindex DataFrames in Pandas

In this blog we are going to talk about how to handle MultiIndex DataFrames in Pandas. As we know that Pandas is a powerful Python library for data analysis and manipulation. MultiIndex DataFrames are DataFrames with multiple levels of indexing, which allow for more complex and nuanced data analysis. We will look at how to create, access, and manipulate MultiIndex DataFrames in Pandas.

Creating MultiIndex DataFrames

To create a MultiIndex DataFrame in Pandas, we can use the pd.MultiIndex.from_tuples() function. This function takes a list of tuples, where each tuple represents a unique index value for each level of the MultiIndex. We can then pass this MultiIndex to the DataFrame constructor to create our MultiIndex DataFrame.

import pandas as pd
# Create a list of tuples for the MultiIndex
index = [('Group A', 'Category 1'),
         ('Group A', 'Category 2'),
         ('Group B', 'Category 1'),
         ('Group B', 'Category 2'),
         ('Group C', 'Category 1'),
         ('Group C', 'Category 2')]

# Create the MultiIndex from the list of tuples
multi_index = pd.MultiIndex.from_tuples(index)

# Create a DataFrame with the MultiIndex
df = pd.DataFrame(data=[11, 22, 33, 44, 55, 66], index=multi_index, columns=['Values'])
print(df)

we will get this output:
                    Values
Group A Category 1      11
        Category 2      22
Group B Category 1      33
        Category 2      44
Group C Category 1      55
        Category 2      66

2 – Accessing MultiIndex DataFrames

To access data in a MultiIndex DataFrame, we can use the .loc[] accessor with a tuple of index values. The tuple should have one value for each level of the MultiIndex, in the order they were defined.

For example, let’s access the value in the ‘Group A’/’Category 1’ cell:

value = df.loc[('Group A', 'Category 1'), 'Values']
print(value)

output:
11

Another example:

value2 = df.loc[('Group C', 'Category 2'), 'Values']
print(value2)

output:
66

We can also use .loc[] to select multiple rows or columns based on their index values. For example, let’s select all rows with ‘Group A’ in the first level of the index:

group_a = df.loc['Group A']
print(group_a)

output:
            Values
Category 1      11
Category 2      22

3 – Manipulating MultiIndex DataFrames

We can use various Pandas functions to manipulate MultiIndex DataFrames. For example, we can use .stack() to “compress” a level of the MultiIndex into the columns, or .unstack() to “uncompress” a level of the MultiIndex from the columns back into the index.

Let’s use .unstack() to move the second level of the MultiIndex (‘Category 1’ and ‘Category 2’) into the columns:

unstacked = df.unstack()
print(unstacked)

output:
         		Values    
   		Category 1     Category 2
Group A         11       		   22
Group B         33                   44
Group C         55                   66

Let’s the example of how to use .stack() to move columns back into the index of a DataFrame with a MultiIndex:

import pandas as pd

# create a DataFrame with a MultiIndex
df = pd.DataFrame({'A': [1, 2, 3, 4],
                   'B': [5, 6, 7, 8],
                   'C': [9, 10, 11, 12],
                   'D': [13, 14, 15, 16]},

index=pd.MultiIndex.from_tuples([('Group1', 'TeamA'), 
                                 ('Group1', 'TeamB'), 
                                 ('Group2', 'TeamC'), 
                                 ('Group2', 'TeamD')], 
                           names=['Group', 'Team']))

# print the original DataFrame
print("Original DataFrame:\n", df)

# stack the columns back into the index
df_stacked = df.stack()

# print the stacked DataFrame
print("Stacked DataFrame:\n", df_stacked)

In this example, we first create a DataFrame with a MultiIndex using the pd.DataFrame() function and the pd.MultiIndex.from_tuples() method. We then print the original DataFrame using the print() function.

Next, we use the .stack() method to move the columns back into the index of the DataFrame, creating a new stacked DataFrame. We print the stacked DataFrame using the print() function.

The resulting output would look like this:

Original DataFrame:
              A  B   C   D
Group  Team              
Group1 Team A  1  5   9  13
       Team B  2  6  10  14
Group2 Team C  3  7  11  15
       Team D  4  8  12  16

Stacked DataFrame:

Group   Team   
Group1  Team A A     1
               B     5
               C     9
               D    13
        Team B A     2
               B     6
               C    10
               D    14
Group2  Team C A     3
               B     7
               C    11
               D    15
        Team D A     4
               B     8
               C    12
               D    16

As we can see, the .stack() method has moved the columns back into the index of the DataFrame, creating a new stacked DataFrame with a MultiIndex. The resulting DataFrame has two levels of index: the first level is the original index, and the second level is the column names that were stacked.

Conclusion:

In this article we talked about working with Multi Index in Pandas. I hope you liked this, if you have any question, let me know.

Popular Posts

Author

  • Naveen Pandey Data Scientist Machine Learning Engineer

    Naveen Pandey has more than 2 years of experience in data science and machine learning. He is an experienced Machine Learning Engineer with a strong background in data analysis, natural language processing, and machine learning. Holding a Bachelor of Science in Information Technology from Sikkim Manipal University, he excels in leveraging cutting-edge technologies such as Large Language Models (LLMs), TensorFlow, PyTorch, and Hugging Face to develop innovative solutions.

    View all posts
Spread the knowledge
 
  

Leave a Reply

Your email address will not be published. Required fields are marked *