惰性求值和时间复杂度

惰性求值和时间复杂度

问题描述:

我正在浏览 stackoverflow 非平凡的懒惰评估,这让我想到了 Keegan McAllister's演示:为什么学习 Haskell.在幻灯片 8 中,他展示了最小函数,定义为:

I was looking around stackoverflow Non-Trivial Lazy Evaluation, which led me to Keegan McAllister's presentation: Why learn Haskell. In slide 8, he shows the minimum function, defined as:

minimum = head . sort

并声明其复杂度为 O(n).如果按替换排序是 O(nlog n),我不明白为什么说复杂性是线性的.帖子中提到的排序不能是线性的,因为它不对数据做任何假设,这是线性排序方法(例如计数排序)所需要的.

and states that its complexity is O(n). I don't understand why the complexity is said to be linear if sorting by replacement is O(nlog n). The sorting referred in the post can't be linear, as it does not assume anything about the data, as it would be required by linear sorting methods, such as counting sort.

懒惰评估在这里扮演着神秘的角色吗?如果有,背后的解释是什么?

Is lazy evaluation playing a mysterious role in here? If so, what is the explanation behind it?

minimum = head .sortsort 不会完全完成,因为它不会预先完成.sort 只会在产生 head 要求的第一个元素时执行.

In minimum = head . sort, the sort won't be done fully, because it won't be done upfront. The sort will only be done as much as needed to produce the very first element, demanded by head.

例如合并排序,首先将列表的 n 个数字成对比较,然后将获胜者配对并比较(n/2 个数字),然后是新的获胜者(n/4) 等.总而言之,O(n) 比较以产生最小元素.

In e.g. mergesort, at first n numbers of the list will be compared pairwise, then the winners will be paired up and compared (n/2 numbers), then the new winners (n/4), etc. In all, O(n) comparisons to produce the minimal element.

mergesortBy less [] = []
mergesortBy less xs = head $ until (null.tail) pairs [[x] | x <- xs]
  where
    pairs (x:y:t) = merge x y : pairs t
    pairs xs      = xs
    merge (x:xs) (y:ys) | less y x  = y : merge (x:xs) ys
                        | otherwise = x : merge  xs (y:ys)
    merge  xs     []                = xs
    merge  []     ys                = ys

可以扩充上面的代码以标记它产生的每个数字,并在其生产过程中进行一些比较:


The above code can be augmented to tag each number it produces with a number of comparisons that went into its production:

mgsort xs = go $ map ((,) 0) xs  where
  go [] = []
  go xs = head $ until (null.tail) pairs [[x] | x <- xs]   where
    ....
    merge ((a,b):xs) ((c,d):ys) 
            | (d < b)   = (a+c+1,d) : merge ((a+1,b):xs) ys    -- cumulative
            | otherwise = (a+c+1,b) : merge  xs ((c+1,d):ys)   --   cost
    ....

g n = concat [[a,b] | (a,b) <- zip [1,3..n] [n,n-2..1]]   -- a little scrambler

运行它几个列表长度我们看到它确实是~n:

Running it for several list lengths we see that it is indeed ~ n:

*Main> map (fst . head . mgsort . g) [10, 20, 40, 80, 160, 1600]
[9,19,39,79,159,1599]

要查看排序代码本身是否为~n log n,我们对其进行更改,使每个产生的数字只携带自己的成本,然后通过对整体求和得出总成本排序列表:


To see whether the sorting code itself is ~ n log n, we change it so that each produced number carries along just its own cost, and the total cost is then found by summation over the whole sorted list:

    merge ((a,b):xs) ((c,d):ys) 
            | (d < b)   = (c+1,d) : merge ((a+1,b):xs) ys      -- individual
            | otherwise = (a+1,b) : merge  xs ((c+1,d):ys)     --   cost

这是各种长度列表的结果,

Here are the results for lists of various lengths,

*Main> let xs = map (sum . map fst . mgsort . g) [20, 40, 80, 160, 320, 640]
[138,342,810,1866,4218,9402]

*Main> map (logBase 2) $ zipWith (/) (tail xs) xs
[1.309328,1.2439256,1.2039552,1.1766101,1.1564085]

上面显示了增长的经验顺序,用于增加列表的长度,n,正如 ~n log n 计算通常表现出的那样,它们正在迅速减少.另请参阅这篇博文.这是一个快速的相关性检查:

The above shows empirical orders of growth for increasing lengths of list, n, which are rapidly diminishing as is typically exhibited by ~ n log n computations. See also this blog post. Here's a quick correlation check:

*Main> let xs = [n*log n | n<- [20, 40, 80, 160, 320, 640]] in 
                                    map (logBase 2) $ zipWith (/) (tail xs) xs
[1.3002739,1.2484156,1.211859,1.1846942,1.1637106]

懒惰评估可以比喻为一种生产者/消费者习惯用法1,以独立的记忆存储作为中介.我们编写的任何生产性定义都定义了一个生产者,该生产者将根据消费者的要求一点一点地生产其输出——但不会更早.生产的任何东西都会被记忆,这样如果另一个消费者以不同的速度消耗相同的输出,它就会访问先前填充的相同存储.


edit: Lazy evaluation can metaphorically be seen as kind of producer/consumer idiom1, with independent memoizing storage as an intermediary. Any productive definition we write, defines a producer which will produce its output, bit by bit, as and when demanded by its consumer(s) - but not sooner. Whatever is produced is memoized, so that if another consumer consumes same output at different pace, it accesses same storage, filled previously.

当没有更多消费者引用一块存储时,它会被垃圾收集.有时通过优化编译器可以完全取消中间存储,省去中间人.

When no more consumers remain that refer to a piece of storage, it gets garbage collected. Sometimes with optimizations compiler is able to do away with the intermediate storage completely, cutting the middle man out.

1 另见:Simple Generators v. Lazy Evaluation 作者:Oleg Kiselyov,Simon Peyton-Jones 和 Amr Sabry.

1 see also: Simple Generators v. Lazy Evaluation by Oleg Kiselyov, Simon Peyton-Jones and Amr Sabry.