Python code: Multi-index pandas DataFrame
Published:
Although there several takes on using multi-index DataFrames around the internet, it still takes some time to get to a simple and elegant solution. Therefore, with this post I would like to show what at
Lets start with the definition on multi-index DataFrame. I will address only column-multi-index, since I find them more useful compared to row-multi-index DataFrames.
import pandas as pd
# set multi-index levels
levels = ['level_0', 'level_1']
# prepare multi-index
multi_index_column = pd.MultiIndex( levels=[[] for level in levels],
codes=[[] for level in levels],
names=levels)
# set columns to multi-index
df = pd.DataFrame(index=[], columns=multi_index_column)
The following test script:
import pandas as pd
import numpy as np
# set multi-index levels
levels = ['level_0', 'level_1']
# prepare multi-index
multi_index_column = pd.MultiIndex( levels=[[] for level in levels],
codes=[[] for level in levels],
names=levels)
# set columns to multi-index
df = pd.DataFrame(index=[], columns=multi_index_column)
data_len = 3
# fill in the columns
df[('A', 0)] = pd.Series( np.random.normal(0, 1, data_len) )
df[('B', 0)] = pd.Series( np.random.normal(0, 1, data_len) )
df[('B', 1)] = pd.Series( np.random.normal(0, 1, data_len) )
df[('A', 1)] = pd.Series( np.random.normal(0, 1, data_len) )
df[('B', 2)] = pd.Series( np.random.normal(0, 1, data_len) )
# set axis=0 to DatetimeIndex
start = pd.Timestamp('2021-01-01')
end = pd.Timestamp('2022-01-01')
time_vector = np.linspace(start.value, end.value, data_len)
datetime = pd.to_datetime(time_vector)
df.index = datetime
# show
print(df)
# sort axis=1 index
df = df.sort_index(axis=1)
# show
print(df)
illustrates the re-ordering of the columns:
level_0 A B A B
level_1 0 0 1 1 2
2021-01-01 00:00:00 -0.505519 0.021671 -0.741799 0.399135 -0.086691
2021-07-02 12:00:00 -1.011332 0.946652 -0.237040 -0.485605 0.098137
2022-01-01 00:00:00 1.450697 0.121733 -0.280538 -0.513861 2.013553
level_0 A B
level_1 0 1 0 1 2
2021-01-01 00:00:00 -0.505519 0.399135 0.021671 -0.741799 -0.086691
2021-07-02 12:00:00 -1.011332 -0.485605 0.946652 -0.237040 0.098137
2022-01-01 00:00:00 1.450697 -0.513861 0.121733 -0.280538 2.013553
The DataFrame is illustrated in the image below.