具有连接功能的列表vs生成器理解速度

问题描述：

因此，我从官方文档中获得了这些示例. https://docs.python.org/2/library/timeit.html

So I got these examples from the official documentation. https://docs.python.org/2/library/timeit.html

究竟是什么使第一个示例(生成器表达式)比第二个示例(列表理解)更慢?

What exactly makes the first example (generator expression) slower than the second (list comprehension)?

>>> timeit.timeit('"-".join(str(n) for n in range(100))', number=10000)
0.8187260627746582
>>> timeit.timeit('"-".join([str(n) for n in range(100)])', number=10000)
0.7288308143615723

答

如果str.join方法还不是列表或元组，则将其可迭代参数转换为列表.这样一来，联接逻辑就可以对项目进行多次迭代(通过一次计算结果字符串的大小，然后进行第二次以实际复制数据).

The str.join method converts its iterable parameter to a list if it's not a list or tuple already. This lets the joining logic iterate over the items multiple times (it makes one pass to calculate the size of the result string, then a second pass to actually copy the data).

您可以在 CPython源代码中看到此内容:

PyObject *
PyUnicode_Join(PyObject *separator, PyObject *seq)
{
    /* lots of variable declarations at the start of the function omitted */

    fseq = PySequence_Fast(seq, "can only join an iterable");

    /* ... */
}

C API中的PySequence_Fast函数实现了我所描述的功能.它将一个任意的可迭代对象转换为一个列表(本质上是通过调用list)，除非它已经是一个列表或元组.

The PySequence_Fast function in the C API does just what I described. It converts an arbitrary iterable into a list (essentially by calling list on it), unless it already is a list or tuple.

将生成器表达式转换为列表意味着生成器通常的好处(较小的内存占用空间和发生短路的可能性)不适用于str.join，因此，(较小的)额外开销会导致发电机使它的性能变差.

The conversion of the generator expression to a list means that the usual benefits of generators (a smaller memory footprint and the potential for short-circuiting) don't apply to str.join, and so the (small) additional overhead that the generator has makes its performance worse.

具有连接功能的列表vs生成器理解速度

相关推荐