Haskell 有尾递归优化吗?
我今天在 unix 中发现了time"命令,并想用它来检查 Haskell 中尾递归和普通递归函数之间的运行时差异.
I discovered the "time" command in unix today and thought I'd use it to check the difference in runtimes between tail-recursive and normal recursive functions in Haskell.
我编写了以下函数:
--tail recursive
fac :: (Integral a) => a -> a
fac x = fac' x 1 where
fac' 1 y = y
fac' x y = fac' (x-1) (x*y)
--normal recursive
facSlow :: (Integral a) => a -> a
facSlow 1 = 1
facSlow x = x * facSlow (x-1)
这些是有效的,记住它们仅用于这个项目,所以我没有费心检查零或负数.
These are valid keeping in mind they were solely for use with this project, so I didn't bother to check for zeroes or negative numbers.
然而,在为每个方法编写主方法、编译它们并使用time"命令运行它们时,两者都具有相似的运行时间,普通递归函数将尾递归函数边缘化.这与我听说的关于 lisp 中尾递归优化的内容相反.这是什么原因?
However, upon writing a main method for each, compiling them, and running them with the "time" command, both had similar runtimes with the normal recursive function edging out the tail recursive one. This is contrary to what I'd heard with regards to tail-recursive optimization in lisp. What's the reason for this?
Haskell 使用惰性求值来实现递归,因此将任何事物视为在需要时提供值的承诺(这称为 thunk).Thunks 只会在继续进行所需的量时减少,仅此而已.这类似于您在数学上简化表达式的方式,因此这样思考会很有帮助.您的代码未指定求值顺序这一事实允许编译器进行许多更聪明的优化,而不仅仅是您习惯的尾调用消除.如果需要优化,请使用 -O2
编译!
Haskell uses lazy-evaluation to implement recursion, so treats anything as a promise to provide a value when needed (this is called a thunk). Thunks get reduced only as much as necessary to proceed, no more. This resembles the way you simplify an expression mathematically, so it's helpful to think of it that way. The fact that evaluation order is not specified by your code allows the compiler to do lots of even cleverer optimisations than just the tail-call elimination youre used to. Compile with -O2
if you want optimisation!
让我们看看如何评估 facSlow 5
作为案例研究:
Let's see how we evaluate facSlow 5
as a case study:
facSlow 5
5 * facSlow 4 -- Note that the `5-1` only got evaluated to 4
5 * (4 * facSlow 3) -- because it has to be checked against 1 to see
5 * (4 * (3 * facSlow 2)) -- which definition of `facSlow` to apply.
5 * (4 * (3 * (2 * facSlow 1)))
5 * (4 * (3 * (2 * 1)))
5 * (4 * (3 * 2))
5 * (4 * 6)
5 * 24
120
正如您所担心的那样,我们在进行任何计算之前都会积累一些数字,但是与您担心的不同,没有一堆 facSlow
函数调用闲置等待终止 - 每次减少都应用并消失,留下一个堆栈框架(即是因为 (*)
是严格的,因此会触发对其第二个参数的评估).
So just as you worried, we have a build-up of numbers before any calculations happen, but unlike you worried, there's no stack of facSlow
function calls hanging around waiting to terminate - each reduction is applied and goes away, leaving a stack frame in its wake (that is because (*)
is strict and so triggers the evaluation of its second argument).
Haskell 的递归函数不是以非常递归的方式计算的!唯一的一堆调用是乘法本身.如果 (*)
被视为严格的数据构造函数,这就是所谓的 guarded 递归(尽管它通常被称为 non>-严格的数据构造函数,在其唤醒后剩下的是数据构造函数 - 当进一步访问时).
Haskell's recursive functions aren't evaluated in a very recursive way! The only stack of calls hanging around are the multiplications themselves. If (*)
is viewed as a strict data constructor, this is what's known as guarded recursion (although it is usually referred to as such with non-strict data constructors, where what's left in its wake are the data constructors - when forced by further access).
现在让我们看看尾递归fac 5
:
Now let's look at the tail-recursive fac 5
:
fac 5
fac' 5 1
fac' 4 {5*1} -- Note that the `5-1` only got evaluated to 4
fac' 3 {4*{5*1}} -- because it has to be checked against 1 to see
fac' 2 {3*{4*{5*1}}} -- which definition of `fac'` to apply.
fac' 1 {2*{3*{4*{5*1}}}}
{2*{3*{4*{5*1}}}} -- the thunk "{...}"
(2*{3*{4*{5*1}}}) -- is retraced
(2*(3*{4*{5*1}})) -- to create
(2*(3*(4*{5*1}))) -- the computation
(2*(3*(4*(5*1)))) -- on the stack
(2*(3*(4*5)))
(2*(3*20))
(2*60)
120
因此您可以看到尾递归本身并没有为您节省任何时间或空间.它不仅总体上比 facSlow 5
需要更多的步骤,它还构建了一个嵌套的 thunk(此处显示为 {...}
)——需要一个 额外的空间 for it - 它描述了未来的计算,要执行的嵌套乘法.
So you can see how the tail recursion by itself hasn't saved you any time or space. Not only does it take more steps overall than facSlow 5
, it also builds a nested thunk (shown here as {...}
) - needing an extra space for it - which describes the future computation, the nested multiplications to be performed.
然后通过将 it 遍历到底部来解开这个 thunk,在堆栈上重新创建计算.对于两个版本,这里也存在导致堆栈溢出的危险,因为计算时间很长.
This thunk is then unraveled by traversing it to the bottom, recreating the computation on the stack. There is also a danger here of causing stack overflow with very long computations, for both versions.
如果我们想手动优化它,我们需要做的就是让它严格.您可以使用严格的应用运算符 $!
来定义
If we want to hand-optimise this, all we need to do is make it strict. You could use the strict application operator $!
to define
facSlim :: (Integral a) => a -> a
facSlim x = facS' x 1 where
facS' 1 y = y
facS' x y = facS' (x-1) $! (x*y)
这迫使 facS'
在其第二个参数中严格.(它的第一个参数已经很严格了,因为必须对其进行评估才能决定应用哪个 facS'
定义.)
This forces facS'
to be strict in its second argument. (It's already strict in its first argument because that has to be evaluated to decide which definition of facS'
to apply.)
有时严格会大有帮助,有时则是一个大错误,因为懒惰更有效率.这是一个好主意:
Sometimes strictness can help enormously, sometimes it's a big mistake because laziness is more efficient. Here it's a good idea:
facSlim 5
facS' 5 1
facS' 4 5
facS' 3 20
facS' 2 60
facS' 1 120
120
我认为您想要实现的目标是什么.
Which is what you wanted to achieve I think.
- 如果你想优化你的代码,第一步是用
-O2
编译 - 尾递归只有在没有 thunk 堆积时才有用,并且在适当的情况下,增加严格性通常有助于防止它.当您正在构建一个稍后需要的结果时,就会发生这种情况.
- 有时尾递归是一个糟糕的计划,而受保护的递归更适合,即当您构建的结果将被一点一点地、分部分地需要时.请参阅这个问题
foldr
和foldl
为例,并相互测试.
- If you want to optimise your code, step one is to compile with
-O2
- Tail recursion is only good when there's no thunk build-up, and adding strictness usually helps to prevent it, if and where appropriate. This happens when you're building a result that is needed later on all at once.
- Sometimes tail recursion is a bad plan and guarded recursion is a better fit, i.e. when the result you're building will be needed bit by bit, in portions. See this question about
foldr
andfoldl
for example, and test them against each other.
试试这两个:
length $ foldl1 (++) $ replicate 1000
"The size of intermediate expressions is more important than tail recursion."
length $ foldr1 (++) $ replicate 1000
"The number of reductions performed is more important than tail recursion!!!"
foldl1
是尾递归,而 foldr1
执行受保护的递归,以便立即呈现第一项以供进一步处理/访问.(第一个括号"到左边,(...((s+s)+s)+...)+s
,强制它的输入列表完全到它的结束并在需要完整结果之前更快地构建大量未来计算;第二个括号逐渐向右添加,s+(s+(...+(s+s)...))
,一点一点地消耗输入列表,所以整个事情能够在恒定空间中运行,并进行优化).
foldl1
is tail recursive, whereas foldr1
performs guarded recursion so that the first item is immediately presented for further processing/access. (The first "parenthesizes" to the left at once, (...((s+s)+s)+...)+s
, forcing its input list fully to its end and building a big thunk of future computation much sooner than its full results are needed; the second parenthesizes to the right gradually, s+(s+(...+(s+s)...))
, consuming the input list bit by bit, so the whole thing is able to operate in constant space, with optimizations).
您可能需要根据您使用的硬件来调整零的数量.
You might need to adjust the number of zeros depending on what hardware you're using.