在python中创建列表的最佳和/或最快方法

问题描述:

据我所知,在python中,至少有3到4种方法可以创建和初始化给定大小的列表:

In python, as far as I know, there are at least 3 to 4 ways to create and initialize lists of a given size:

使用append的简单循环:

Simple loop with append:

my_list = []
for i in range(50):
    my_list.append(0)

使用+=的简单循环:

Simple loop with +=:

my_list = []
for i in range(50):
    my_list += [0]

列表理解:

my_list = [0 for i in range(50)]

列表和整数乘法:

my_list = [0] * 50

在这些示例中,鉴于列表仅包含50个元素,我认为性能不会有任何区别,但是如果我需要一百万个元素的列表怎么办?使用xrange会有所改善吗?在python中创建和初始化列表的首选/最快方法是什么?

In these examples I don't think there would be any performance difference given that the lists have only 50 elements, but what if I need a list of a million elements? Would the use of xrange make any improvement? Which is the preferred/fastest way to create and initialize lists in python?

让我们使用

Let's run some time tests* with timeit.timeit:

>>> from timeit import timeit
>>>
>>> # Test 1
>>> test = """
... my_list = []
... for i in xrange(50):
...     my_list.append(0)
... """
>>> timeit(test)
22.384258893239178
>>>
>>> # Test 2
>>> test = """
... my_list = []
... for i in xrange(50):
...     my_list += [0]
... """
>>> timeit(test)
34.494779364416445
>>>
>>> # Test 3
>>> test = "my_list = [0 for i in xrange(50)]"
>>> timeit(test)
9.490926919482774
>>>
>>> # Test 4
>>> test = "my_list = [0] * 50"
>>> timeit(test)
1.5340533503559755
>>>

如您在上面看到的,最后一种方法是迄今为止最快的.

As you can see above, the last method is the fastest by far.

但是,它只能 用于不可变项(例如整数).这是因为它将创建一个引用相同项目的列表.

However, it should only be used with immutable items (such as integers). This is because it will create a list with references to the same item.

下面是一个演示:

>>> lst = [[]] * 3
>>> lst
[[], [], []]
>>> # The ids of the items in `lst` are the same
>>> id(lst[0])
28734408
>>> id(lst[1])
28734408
>>> id(lst[2])
28734408
>>>

这种行为通常是不希望的,并且可能导致代码中的错误.

This behavior is very often undesirable and can lead to bugs in the code.

如果您有可变项(例如列表),则应该使用仍然非常快速的列表理解:

If you have mutable items (such as lists), then you should use the still very fast list comprehension:

>>> lst = [[] for _ in xrange(3)]
>>> lst
[[], [], []]
>>> # The ids of the items in `lst` are different
>>> id(lst[0])
28796688
>>> id(lst[1])
28796648
>>> id(lst[2])
28736168
>>>


*注意:在所有测试中,我都将range替换为xrange.由于后者返回一个迭代器,因此它应始终比前者更快.


*Note: In all of the tests, I replaced range with xrange. Since the latter returns an iterator, it should always be faster than the former.