将标题添加到csv文件

问题描述:

我有一个尺寸为100*512的csv文件,我想在spark中对其进行进一步处理.该文件的问题在于它不包含标题,即column names.我需要这些列名称以进一步machine learning中的ETL.我在另一个文件(文本文件)中有列名.我必须将这些列名称作为标题放在上述csv文件中. 例如

I have a csv file with the dimensions 100*512 , I want to process it further in spark. The problem with the file is that it doesn't contain header i.e column names . I need these column names for further ETL in machine learning . I have the column names in another file(text file). I have to put these column names as headers in the csv file mentioned above. e.g.

CSV文件:-

ab 1 23 sf 23 hjh

ab 1 23 sf 23 hjh

hs 6 89 iu 98 adf

hs 6 89 iu 98 adf

gh 7 78 pi 54 ngj

gh 7 78 pi 54 ngj

jh 5 22 kj 78 jdk

jh 5 22 kj 78 jdk

列标题文件:-

一,二,三,四,五,六

one,two,three,four,five, six

我想要这样的输出:-

一二三四五五六

one two three four five six

ab 1 23 sf 23 hjh

ab 1 23 sf 23 hjh

hs 6 89 iu 98 adf

hs 6 89 iu 98 adf

gh 7 78 pi 54 ngj

gh 7 78 pi 54 ngj

jh 5 22 kj 78 jdk

jh 5 22 kj 78 jdk

请提出一些将列标题添加到CSV文件的方法.(而不替换csv文件的行. 我通过将其转换为pandas数据框进行了尝试,但无法获得预期的输出.

Please suggest some method to add the column heads to the CSV file.(Without replacing the row of the csv file. I tried it by converting it to pandas dataframe but can't get the expected output.

首先阅读您的csv文件:

First read your csv file:

from pandas import read_csv      
df = read_csv('test.csv')

如果数据集中有两列(a列和b列),请使用:

If there are two columns in your dataset(column a, and column b) use:

df.columns = ['a', 'b']

将此新数据帧写入csv

Write this new dataframe to csv

df.to_csv('test_2.csv')