在 pandas 中将数据框细分为多个数据框

问题描述:

我有一个熊猫数据框,它具有51034行和10列.我想根据包含要切片的行的列表,将该数据帧切片为158个较小的数据帧.

I have a Pandas data frame that has 51034 rows and 10 columns. I want to slice this data frame into 158 smaller data frames based on a list that contains the rows to slice.

如何将熊猫数据帧切成较小的数据帧?

How is it possible to slice a pandas data frame into smaller data frames?

例如,如果我有一个包含10行4列的数据框:

For example, if I have a data frame with 10 rows and 4 columns:

      A    B    C    D
0     1    2    3    4
1     5    6    7    8
2     9    10   11   12
3     13   14   15   16
4     17   18   19   20
5     21   22   23   24
6     25   26   27   28
7     29   30   31   32
8     33   34   35   36
9     37   38   39   40

此示例数据帧将每2行进行切片,以创建5个新的较小数据帧:

This example data frame will be sliced every 2 rows to create 5 new smaller data frames:

DataFrame1:

      A    B    C    D
0     1    2    3    4
1     5    6    7    8

DataFrame2:

      A    B    C    D
0     9    10   11   12
1     13   14   15   16

DataFrame3:

      A    B    C    D
0     17   18   19   20
1     21   22   23   24

DataFrame4:

      A    B    C    D
0     25   26   27   28
1     29   30   31   32

DataFrame5:

      A    B    C    D
0     33   34   35   36
1     37   38   39   40

我不确定如何使用切片较大的数据帧来创建较小的数据帧.

I am not sure how to use the slice the larger data frame to create the smaller data frames.

关于如何实现此目标的任何建议?

Any suggestions on how to accomplish this goal?

谢谢.

Rodrigo

您可以使用

You can use groupby with a simple index-to-group mapping function, assuming that the index is consecutive and starts from 0:

for _, df_k in df.groupby(lambda x: x/2):
    print df_k.reset_index(drop=True)

输出:

   A  B  C  D
0  1  2  3  4
1  5  6  7  8
    A   B   C   D
0   9  10  11  12
1  13  14  15  16
    A   B   C   D
0  17  18  19  20
1  21  22  23  24
    A   B   C   D
0  25  26  27  28
1  29  30  31  32
    A   B   C   D
0  33  34  35  36
1  37  38  39  40

如果您有一个表示切片位置的数字列表,则可以传入字典作为组映射:

If you have a list of numbers indicating the slicing positions, you can pass in a dictionary as the group mapping:

import numpy as np
slice_at = [3, 5]
group_sizes = np.diff([0] + slice_at + [len(df)])
mapping = dict(zip(df.index, np.repeat(range(len(group_sizes)), group_sizes)))
for _, df_k in df.groupby(mapping):
    print df_k.reset_index(drop=True)

输出:

   A   B   C   D
0  1   2   3   4
1  5   6   7   8
2  9  10  11  12
    A   B   C   D
0  13  14  15  16
1  17  18  19  20
    A   B   C   D
0  21  22  23  24
1  25  26  27  28
2  29  30  31  32
3  33  34  35  36
4  37  38  39  40