Python 2.7 CSV文件读/写\ xef \ xbb \ xbf代码

问题描述：

我对使用'utf-8-sig'代码，我的csv的Python 2.7读/写csv文件有疑问.标头是

I have a question about Python 2.7 read/write csv file with 'utf-8-sig' code, my csv . header is

['\xef\xbb\xbfID;timestamp;CustomerID;Email']

我从文件A.csv中读取了一些代码("\xef\xbb\xbfID")，我想将相同的代码和标头写入文件B.csv

there have some code("\xef\xbb\xbfID") I read from file A.csv and I want write the same code and header to file B.csv

显示我的打印日志:

['\xef\xbb\xbfID;timestamp;CustomerID;Email']

但是实际的输出文件头看起来像

But the actual output file header it looks like

ÔªøID;timestamp

这是代码:

def remove_gdpr_info_from_csv(file_path, file_name, temp_folder, original_header):
    new_temp_folder = tempfile.mkdtemp()
    new_temp_file = new_temp_folder + "/" + file_name
    # Blanked new file
    with open(new_temp_file, 'wb') as outfile:
        writer = csv.writer(outfile, delimiter=";")
        print original_header
        writer.writerow(original_header)
        # File from SFTP
        with open(file_path, 'r') as infile:
            reader = csv.reader(infile, delimiter=";")
            first_row = next(reader)
            email = first_row.index('Email')
            contract_detractor1 = first_row.index('Contact Detractor (Q21)')
            contract_detractor2 = first_row.index('Contact Detractor (Q20)')
            contract_detractor3 = first_row.index('Contact Detractor (Q43)')
            contract_detractor4 = first_row.index('Contact Detractor(Q26)')
            contract_detractor5 = first_row.index('Contact Detractor(Q27)')
            contract_detractor6 = first_row.index('Contact Detractor(Q44)')
            indexes = []
            for column_name in header_list:
                ind = first_row.index(column_name)
                indexes.append(ind)

            for row in reader:
                output_row = []
                for ind in indexes:
                    data = row[ind]
                    if ind == email:
                        data = ''
                    elif ind == contract_detractor1:
                        data = ''
                    elif ind == contract_detractor2:
                        data = ''
                    elif ind == contract_detractor3:
                        data = ''
                    elif ind == contract_detractor4:
                        data = ''
                    elif ind == contract_detractor5:
                        data = ''
                    elif ind == contract_detractor6:
                        data = ''
                    output_row.append(data)
                writer.writerow(output_row)
    s3core.upload_files(SPARKY_S3, DESTINATION_PATH, new_temp_file)
    shutil.rmtree(temp_folder)
    shutil.rmtree(new_temp_folder)

答

'\xef\xbb\xbf'是ZERO WIDTH NO-BREAK SPACE U + FEFF的Unicode UTF8编码版本.它通常在Unicode文本文件的开头用作字节顺序标记:

'\xef\xbb\xbf' is the UTF8 encoded version of the unicode ZERO WIDTH NO-BREAK SPACE U+FEFF. It is often used as a Byte Order Mark at the beginning of unicode text files:

当您有3个字节时:'\xef\xbb\xbf'，则文件是utf8编码的
当您有2个字节时:'\xff\xfe'，则文件位于utf16 little endian
当您有2个字节时:'\xfe\xff'，则文件位于utf16大字节序中

when you have 3 bytes: '\xef\xbb\xbf', then the file is utf8 encoded
when you have 2 bytes: '\xff\xfe', then the file is in utf16 little endian
when you have 2 bytes: '\xfe\xff', then the file is in utf16 big endian

'utf-8-sig'编码明确要求在文件的开头写入此BOM表

The 'utf-8-sig' encoding explicitely asks for writing this BOM at the beginning of the file

要在Python 2中读取csv文件时自动对其进行处理，可以使用编解码器模块:

To process it automatically at read time of a csv file in Python 2, you can use the codecs module:

with open(file_path, 'r') as infile:
    reader = csv.reader(codecs.EncodedFile(infile, 'utf8-sig', 'utf8'), delimiter=";")

EncodedFile将通过在utf8-sig中对其进行解码来包装原始文件对象，实际上跳过BOM并在没有BOM的情况下在utf8中对其进行重新编码.

EncodedFile will wrap the original file object by decoding it in utf8-sig, actually skipping the BOM and re-encoding it in utf8 with no BOM.

Python 2.7 CSV文件读/写\ xef \ xbb \ xbf代码

相关推荐