如何在保留顺序的同时从列表中删除重复项?

问题描述：

在 Python 中是否有从列表中删除重复项同时保留顺序的内置函数?我知道我可以使用集合来删除重复项，但这会破坏原始顺序.我也知道我可以像这样滚动自己的:

Is there a built-in that removes duplicates from list in Python, whilst preserving order? I know that I can use a set to remove duplicates, but that destroys the original order. I also know that I can roll my own like this:

def uniq(input):
  output = []
  for x in input:
    if x not in output:
      output.append(x)
  return output

(感谢放松代码示例.)

但如果可能的话，我想利用内置的或更 Pythonic 的习语.

But I'd like to avail myself of a built-in or a more Pythonic idiom if possible.

Related question: In Python, what is the fastest algorithm for removing duplicates from a list so that all elements are unique while preserving order?

答

这里有一些替代方案:http://www.peterbe.com/plog/uniqifiers-benchmark

最快的一个:

def f7(seq):
    seen = set()
    seen_add = seen.add
    return [x for x in seq if not (x in seen or seen_add(x))]

为什么将 seen.add 分配给 seen_add 而不是只调用 seen.add?Python 是一种动态语言，每次迭代解析 seen.add 比解析局部变量成本更高.seen.add 可能在迭代之间发生了变化，并且运行时不够智能，无法排除这种情况.为了安全起见，它必须每次都检查对象.

Why assign seen.add to seen_add instead of just calling seen.add? Python is a dynamic language, and resolving seen.add each iteration is more costly than resolving a local variable. seen.add could have changed between iterations, and the runtime isn't smart enough to rule that out. To play it safe, it has to check the object each time.

如果你打算在同一个数据集上多次使用这个函数，也许你最好使用有序集:http://code.activestate.com/recipes/528878/

If you plan on using this function a lot on the same dataset, perhaps you would be better off with an ordered set: http://code.activestate.com/recipes/528878/

O(1) 每个操作的插入、删除和成员检查.

O(1) insertion, deletion and member-check per operation.

(小附加说明:seen.add() 总是返回 None，所以上面的 or 是仅作为尝试更新集合的一种方式，而不是逻辑测试的组成部分.)

(Small additional note: seen.add() always returns None, so the or above is there only as a way to attempt a set update, and not as an integral part of the logical test.)

如何在保留顺序的同时从列表中删除重复项?

相关推荐