根据行数将数据帧拆分成多个数据帧

问题描述:

我有一个数据框 df

        a              b          c
0   0.897134    -0.356157   -0.396212
1   -2.357861   2.066570    -0.512687
2   -0.080665   0.719328    0.604294
3   -0.639392   -0.912989   -1.029892
4   -0.550007   -0.633733   -0.748733
5   -0.712962   -1.612912   -0.248270
6   -0.571474   1.310807    -0.271137
7   -0.228068   0.675771    0.433016
8   0.005606    -0.154633   0.985484
9   0.691329    -0.837302   -0.607225
10  -0.011909   -0.304162   0.422001
11  0.127570    0.956831    1.837523
12  -1.074771   0.379723    -1.889117
13  -1.449475   -0.799574   -0.878192
14  -1.029757   0.551023    2.519929
15  -1.001400   0.838614    -1.006977
16  0.677216    -0.403859   0.451338
17  0.221596    -0.323259   0.324158
18  -0.241935   -2.251687   -0.088494
19  -0.995426   0.665569    -2.228848
20  1.714709    -0.353391   0.671539
21  0.155050    1.136433    -0.005721
22  -0.502412   -0.610901   1.520165
23  -0.853906   0.648321    1.124464
24  1.149151    -0.187300   -0.412946
25  0.329229    -1.690569   -2.746895
26  0.165158    0.173424    0.896344
27  1.157766    0.525674    -1.279618
28  1.729730    -0.798158   0.644869
29  -0.107285   -1.290374   0.544023

,我需要分割成多个数据帧,每个数据帧将包含每10行 df ,每个小数据帧将写入单独的文件。所以我决定创建多级数据框,为此首先,使用这个方法,将 df 中的每10行分配索引:

that I need to split into multiple dataframes that will contain every 10 rows of df , and every small dataframe I will write to separate file. so I decided create multilevel dataframe, and for this first assign the index to every 10 rows in my df with this method:

df['split'] = df['split'].apply(lambda x: np.searchsorted(df.iloc[::10], x, side='right')[0])

抛出

TypeError: 'function' object has no attribute '__getitem__'



你是否想法如何解决?我的方法是错误的?

so have you idea how to fix it? where my method is wrong?

但是,如果您有另一个方法将我的数据框拆分成多个数据框,每个数据框包含10行 df ,你也是欢迎的,因为这个方法只是我想到的第一个,但我不确定它是最好的一个

but if you have another approache to split my dataframe into multiple dataframes every of which contains 10 rows of df, you are also welcome, cause this approach was just the first I thought about, but I'm not sure that it's the best one

有很多方法可以做你想要的,你的方法看起来过于复杂。使用缩放索引作为分组密钥的groupby将起作用:

There are many ways to do what you want, your method looks over-complicated. A groupby using a scaled index as the grouping key would work:

df = pd.DataFrame(data=np.random.rand(100, 3), columns=list('ABC'))
groups = df.groupby(np.arange(len(df.index))/10)
for (frameno, frame) in groups:
    frame.to_csv("%s.csv" % frameno)