从Haskell的列表中删除重复项
我试图定义一个将从列表中删除重复项的函数。到目前为止,我有一个工作实现:
I'm trying to define a function which will remove duplicates from a list. So far I have a working implementation:
rmdups :: Eq a => [a] -> [a]
rmdups [] = []
rmdups (x:xs) | x `elem` xs = rmdups xs
| otherwise = x : rmdups xs
然而,我想重做这个而不使用 ELEM
。什么是最好的方法呢?
However I'd like to rework this without using elem
. What would be the best method for this?
我想用我自己的函数做这个,而不是 nub
或 nubBy
。
I'd like to do this using my own function and not nub
or nubBy
.
我不认为你会无需 elem
(或者你自己重新实现它)。
I don't think you'll be able to do it without elem
(or your own re-implementation of it).
然而,你的实现的语义问题。当元素被复制时,您保留 last 一个。就我个人而言,我希望它保留第一个重复的项目,并放弃其余的。
However, there is a semantic issue with your implementation. When elements are duplicated you're keeping the last one. Personally, I'd expect it to keep the first duplicate item and drop the rest.
*Main> rmdups "abacd"
"bacd"
解决方案是将'seen'元素通过作为状态变量。
The solution is to thread the 'seen' elements through as a state variable.
removeDuplicates :: Eq a => [a] -> [a]
removeDuplicates = rdHelper []
where rdHelper seen [] = seen
rdHelper seen (x:xs)
| x `elem` seen = rdHelper seen xs
| otherwise = rdHelper (seen ++ [x]) xs
这或多或少是 nub 在标准库中实现(阅读源代码这里)。 nub
的实现中的小差别确保它是非严格,而上面的 removeDuplicates
是严格的(它在返回之前会占用整个列表)。
This is more-or-less how nub
is implemented in the standard library (read the source here). The small difference in nub
's implementation ensures that it is non-strict, while removeDuplicates
above is strict (it consumes the entire list before returning).
原始递归如果你不担心严格的话,这里实际上是过度杀伤力。 removeDuplicates
可以用 foldl
:
Primitive recursion is actually overkill here, if you're not worried about strictness. removeDuplicates
can be implemented in one line with foldl
:
removeDuplicates2 = foldl (\seen x -> if x `elem` seen
then seen
else seen ++ [x]) []