根据字典中的值,将键分离成csv

问题描述:

[使用Python3]我有一个csv文件,它有两列(一个电子邮件地址和一个国家代码;脚本实际上使它成为两列,如果原始文件不是这样) - 我想要根据第二列中的值分割,并在不同的csv文件中输出。

[Using Python3] I have a csv file that has two columns (an email address and a country code; script is made to actually make it two columns if not the case in the original file - kind of) that I want to split out by the value in the second column and output in separate csv files.

eppetj@desrfpkwpwmhdc.com       us      ==> output-us.csv
uheuyvhy@zyetccm.com            de      ==> output-de.csv
avpxhbdt@reywimmujbwm.com       es      ==> output-es.csv
gqcottyqmy@romeajpui.com        it      ==> output-it.csv
qscar@tpcptkfuaiod.com          fr      ==> output-fr.csv
qshxvlngi@oxnzjbdpvlwaem.com    gb      ==> output-gb.csv
vztybzbxqq@gahvg.com            us      ==> output-us.csv
...                             ...     ...

目前我的代码这样做,但是不是将每个电子邮件地址写入csv,它会覆盖之前放置的电子邮件。有人可以帮助我吗?

Currently my code kind of does this, but instead of writing each email address to the csv it overwrites the email placed before that. Can someone help me out with this?

我对编程和Python很新,我可能没有以最pythonic的方式编写代码,所以我真的很感激任何关于代码的反馈一般!

I am very new to programming and Python and I might not have written the code in the most pythonic way, so I would really appreciate any feedback on the code in general!

提前感谢!

代码:

import csv

def tsv_to_dict(filename):
    """Creates a reader of a specified .tsv file."""
    with open(filename, 'r') as f:
        reader = csv.reader(f, delimiter='\t') # '\t' implies tab
        email_list = []
        # Checks each list in the reader list and removes empty elements
        for lst in reader:
            email_list.append([elem for elem in lst if elem != '']) # List comprehension
        # Stores the list of lists as a dict
        email_dict = dict(email_list)
    return email_dict

def count_keys(dictionary):
    """Counts the number of entries in a dictionary."""
    return len(dictionary.keys())

def clean_dict(dictionary):
    """Removes all whitespace in keys from specified dictionary."""
    return { k.strip():v for k,v in dictionary.items() } # Dictionary comprehension

def split_emails(dictionary):
    """Splits out all email addresses from dictionary into output csv files by country code."""
    # Creating a list of unique country codes
    cc_list = []
    for v in dictionary.values():
        if not v in cc_list:
            cc_list.append(v)

    # Writing the email addresses to a csv based on the cc (value) in dictionary
    for key, value in dictionary.items():
        for c in cc_list:
            if c == value:
                with open('output-' +str(c) +'.csv', 'w') as f_out:
                    writer = csv.writer(f_out, lineterminator='\r\n')
                    writer.writerow([key])


您可以使用 defaultdict

import csv
from collections import defaultdict

emails = defaultdict(list)

with open('email.tsv','r') as f:
   reader = csv.reader(f, delimiter='\t')
   for row in reader:
      if row:
         if '@' in row[0]:
           emails[row[1].strip()].append(row[0].strip()+'\n')

for key,values in emails.items():
   with open('output-{}.csv'.format(key), 'w') as f:
       f.writelines(values)

由于您的分离文件不是逗号分隔,而是单列 - 您不需要csv模块,可以直接写行。

As your separated files are not comma separated, but single columns - you don't need the csv module and can simply write the rows.

电子邮件字典包含每个国家/地区代码的密钥,以及所有匹配电子邮件地址的列表。为了确保电子邮件地址的打印正确,我们删除任何空格并添加换行符(这样我们可以使用 writelines )。

The emails dictionary contains a key for each country code, and a list for all the matching email addresses. To make sure the email addresses are printed correctly, we remove any whitespace and add the a line break (this is so we can use writelines later).

一旦字典被填充,它只是一步一步地通过键创建文件,然后写出结果列表。

Once the dictionary is populated, its simply a matter of stepping through the keys to create the files and then writing out the resulting list.