在 pandas 中将数据框细分为多个数据框

问题描述：

我有一个熊猫数据框，它具有51034行和10列.我想根据包含要切片的行的列表，将该数据帧切片为158个较小的数据帧.

I have a Pandas data frame that has 51034 rows and 10 columns. I want to slice this data frame into 158 smaller data frames based on a list that contains the rows to slice.

如何将熊猫数据帧切成较小的数据帧?

How is it possible to slice a pandas data frame into smaller data frames?

例如，如果我有一个包含10行4列的数据框:

For example, if I have a data frame with 10 rows and 4 columns:

      A    B    C    D
0     1    2    3    4
1     5    6    7    8
2     9    10   11   12
3     13   14   15   16
4     17   18   19   20
5     21   22   23   24
6     25   26   27   28
7     29   30   31   32
8     33   34   35   36
9     37   38   39   40

此示例数据帧将每2行进行切片，以创建5个新的较小数据帧:

This example data frame will be sliced every 2 rows to create 5 new smaller data frames:

DataFrame1:

      A    B    C    D
0     1    2    3    4
1     5    6    7    8

DataFrame2:

      A    B    C    D
0     9    10   11   12
1     13   14   15   16

DataFrame3:

      A    B    C    D
0     17   18   19   20
1     21   22   23   24

DataFrame4:

      A    B    C    D
0     25   26   27   28
1     29   30   31   32

DataFrame5:

      A    B    C    D
0     33   34   35   36
1     37   38   39   40

我不确定如何使用切片较大的数据帧来创建较小的数据帧.

I am not sure how to use the slice the larger data frame to create the smaller data frames.

关于如何实现此目标的任何建议?

Any suggestions on how to accomplish this goal?

谢谢.

Rodrigo

答

您可以使用

You can use groupby with a simple index-to-group mapping function, assuming that the index is consecutive and starts from 0:

for _, df_k in df.groupby(lambda x: x/2):
    print df_k.reset_index(drop=True)

输出:

   A  B  C  D
0  1  2  3  4
1  5  6  7  8
    A   B   C   D
0   9  10  11  12
1  13  14  15  16
    A   B   C   D
0  17  18  19  20
1  21  22  23  24
    A   B   C   D
0  25  26  27  28
1  29  30  31  32
    A   B   C   D
0  33  34  35  36
1  37  38  39  40

如果您有一个表示切片位置的数字列表，则可以传入字典作为组映射:

If you have a list of numbers indicating the slicing positions, you can pass in a dictionary as the group mapping:

import numpy as np
slice_at = [3, 5]
group_sizes = np.diff([0] + slice_at + [len(df)])
mapping = dict(zip(df.index, np.repeat(range(len(group_sizes)), group_sizes)))
for _, df_k in df.groupby(mapping):
    print df_k.reset_index(drop=True)

输出:

   A   B   C   D
0  1   2   3   4
1  5   6   7   8
2  9  10  11  12
    A   B   C   D
0  13  14  15  16
1  17  18  19  20
    A   B   C   D
0  21  22  23  24
1  25  26  27  28
2  29  30  31  32
3  33  34  35  36
4  37  38  39  40

在 pandas 中将数据框细分为多个数据框

相关推荐