从Haskell的列表中删除重复项

问题描述：

我试图定义一个将从列表中删除重复项的函数。到目前为止，我有一个工作实现：

I'm trying to define a function which will remove duplicates from a list. So far I have a working implementation:

rmdups :: Eq a => [a] -> [a]
rmdups [] = []
rmdups (x:xs)   | x `elem` xs   = rmdups xs
                | otherwise     = x : rmdups xs

然而，我想重做这个而不使用 ELEM 。什么是最好的方法呢？

However I'd like to rework this without using elem. What would be the best method for this?

我想用我自己的函数做这个，而不是 nub 或 nubBy 。

I'd like to do this using my own function and not nub or nubBy.

答

我不认为你会无需 elem （或者你自己重新实现它）。

I don't think you'll be able to do it without elem (or your own re-implementation of it).

然而，你的实现的语义问题。当元素被复制时，您保留 last 一个。就我个人而言，我希望它保留第一个重复的项目，并放弃其余的。

However, there is a semantic issue with your implementation. When elements are duplicated you're keeping the last one. Personally, I'd expect it to keep the first duplicate item and drop the rest.

*Main> rmdups "abacd"
"bacd"

解决方案是将'seen'元素通过作为状态变量。

The solution is to thread the 'seen' elements through as a state variable.

removeDuplicates :: Eq a => [a] -> [a]
removeDuplicates = rdHelper []
    where rdHelper seen [] = seen
          rdHelper seen (x:xs)
              | x `elem` seen = rdHelper seen xs
              | otherwise = rdHelper (seen ++ [x]) xs

这或多或少是 nub 在标准库中实现（阅读源代码这里）。 nub 的实现中的小差别确保它是非严格，而上面的 removeDuplicates 是严格的（它在返回之前会占用整个列表）。

This is more-or-less how nub is implemented in the standard library (read the source here). The small difference in nub's implementation ensures that it is non-strict, while removeDuplicates above is strict (it consumes the entire list before returning).

原始递归如果你不担心严格的话，这里实际上是过度杀伤力。 removeDuplicates 可以用 foldl ：

Primitive recursion is actually overkill here, if you're not worried about strictness. removeDuplicates can be implemented in one line with foldl:

removeDuplicates2 = foldl (\seen x -> if x `elem` seen
                                      then seen
                                      else seen ++ [x]) []

从Haskell的列表中删除重复项

相关推荐