将标题添加到csv文件
我有一个尺寸为100*512
的csv文件,我想在spark
中对其进行进一步处理.该文件的问题在于它不包含标题,即column names
.我需要这些列名称以进一步machine learning
中的ETL.我在另一个文件(文本文件)中有列名.我必须将这些列名称作为标题放在上述csv文件中.
例如
I have a csv file with the dimensions 100*512
, I want to process it further in spark
. The problem with the file is that it doesn't contain header i.e column names
. I need these column names for further ETL in machine learning
. I have the column names in another file(text file). I have to put these column names as headers in the csv file mentioned above.
e.g.
CSV文件:-
ab 1 23 sf 23 hjh
ab 1 23 sf 23 hjh
hs 6 89 iu 98 adf
hs 6 89 iu 98 adf
gh 7 78 pi 54 ngj
gh 7 78 pi 54 ngj
jh 5 22 kj 78 jdk
jh 5 22 kj 78 jdk
列标题文件:-
一,二,三,四,五,六
one,two,three,four,five, six
我想要这样的输出:-
一二三四五五六
one two three four five six
ab 1 23 sf 23 hjh
ab 1 23 sf 23 hjh
hs 6 89 iu 98 adf
hs 6 89 iu 98 adf
gh 7 78 pi 54 ngj
gh 7 78 pi 54 ngj
jh 5 22 kj 78 jdk
jh 5 22 kj 78 jdk
请提出一些将列标题添加到CSV文件的方法.(而不替换csv文件的行. 我通过将其转换为pandas数据框进行了尝试,但无法获得预期的输出.
Please suggest some method to add the column heads to the CSV file.(Without replacing the row of the csv file. I tried it by converting it to pandas dataframe but can't get the expected output.
首先阅读您的csv文件:
First read your csv file:
from pandas import read_csv
df = read_csv('test.csv')
如果数据集中有两列(a列和b列),请使用:
If there are two columns in your dataset(column a, and column b) use:
df.columns = ['a', 'b']
将此新数据帧写入csv
Write this new dataframe to csv
df.to_csv('test_2.csv')