搁置真的很慢并且占用大量内存吗,还是我做错了什么?

问题描述:

我正在尝试编写一个程序,该程序使用按字母顺序排列的货架数据库作为键,并使用可以从中创建的单词列表作为值.例如:

I'm trying to write a program that uses a shelve database of sorted letters as key, and a list of words that can be created from them as values. e.g:

db['mnoo'] = ['moon', 'mono']

因此,我编写了一个函数,该函数接受文件名并将其加载到文件架中.第一部分,将文件转换为具有与货架相同布局的字典,但货架部分需要很长的时间.

So I wrote a function that takes a filename and loads it into a shelve. The first part, that turns the file into a dictionary with the same layout as the shelve works fine, but the shelve part takes really long.

我正在尝试使用约100k项的字典,每个值都是一个列表.每1000个输入项似乎要花费15-20秒,每个输入项似乎要占用约1kb的空间.这是神灵吗?
代码:

I'm trying it with a dictionary of ~100k entries, each value being a list. It seems to take 15-20 seconds for each 1000 enteries, and each entry seems to take ~1kb of space. Is this nromal?
the code:

def save_to_db(filename, shelve_in='anagram_db'):
    dct = anagrams_from_list(process_file(filename))

    with shelve.open(shelve_in, 'c') as db:
        for key, wordlist in dct.items():
            if not key in db:
                db[key] = wordlist
            else:
                db[key].extend(wordlist)

快速澄清:字典中的每个列表长约1-3个字,不应太大

edit: just a quick clarification: each list in dict is about 1-3 words long, shouldn't be too large

首先-是的,货架的默认酸菜后端缓慢且效率低下,最好的选择是使用其他东西.

First -- yes, shelve's default pickle backend is slow and inefficient, and your best choice is to use something different.

第二个-通过使条目一旦存在就对其进行编辑,而不是仅使它们序列化一次就使它们进入内存的最终状态,使情况变得更糟.

Second -- you're making it worse by editing entries once they're there, rather than getting them into their final state in-memory before serializing them only once.

dct = anagrams_from_list(process_file(filename))
for key, wordlist in dct.items():
  content = {}
  for key, wordlist in dct.iteritems():
    if not key in content:
      content[key] = wordlist
    else:
      content[key].extend(wordlist)

for k, v in content.iteritems():
  db[k] = v

如果您想要一个高效的数据库,我会去找别的地方. tokyocabinet,kyotocabinet,SQLite,BDB;选项很多.

If you want an efficient database, I'd look elsewhere. tokyocabinet, kyotocabinet, SQLite, BDB; the options are numerous.