test :: [String] -> [String]

test = foldr step []

  where step x ys

          | elem x ys = x : ys

          | otherwise = ys

I am trying to build a new list consisting of all the distinct strings being input. My test data is:

test ["one", "one", "two", "two", "three"]

expected result:

["one", "two", "three"]

I am new to Haskell, and I am sure that I am missing something very fundamental and obvious, but have run out of ways to explore this. Could you provide pointers to where my thinking is deficient?

The actual response is []. It seems that the first guard condition is never met (if I replace it with True, the original list is replicated), so the output list is never built.

My understanding was that the fold would accumulate the result of step on each item of the list, adding it to the empty list. I anticipated that step would test each item for its inclusion in the output list (the first element tested not being there) and would add anything that was not already included to the output list. Obviously not :-)

upvote
  flag
Thank you Travis. As I said - something totally obvious missed. – Chris Walton
1 upvote
  flag
@Travis: Suggest you make your comment into an answer, so this question doesn't float around the unanswered pile until the end of days. Also the free rep would be a plus. @Chris: If you add an extra four spaces to the start of each line, your code will be rendered correctly. – Jack Kelly
upvote
  flag
@Jack: I've fixed the code formatting. – bcat
upvote
  flag
@Jack - thanks for sorting out format and for informing me how to render it correctly. @Travis - thanks for making this an answer. – Chris Walton

2 Answers 11

up vote 7 down vote accepted

Your reasoning is correct: you just need to switch = x : ys and = ys so that you add the x when it's not an element of ys. Also, Data.List.nub does this exact thing.

Think about it: your code is saying "when x is in the remainder, prepend x to the result", i.e. creating a duplicate. You just need to change it to "when x is not in the remainder, prepend x to the result" and you get the correct function.

This function differs from Data.List.nub in an important way: this function is more strict. Thus:

test [1..] = _|_   -- infinite loop (try it)
nub [1..] = [1..]

nub gives the answer correctly for infinite lists -- this means that it doesn't need the whole list to start computing results, and thus it is a nice player in the stream processing game.

The reason it is strict is because elem is strict: it searches the whole list (presuming it doesn't find a match) before it returns a result. You could write that like this:

nub :: (Eq a) => [a] -> [a]
nub = go []
    where
    go seen [] = []
    go seen (x:xs) | x `elem` seen = go seen xs
                   | otherwise     = x : go (x:seen) xs

Notice how seen grows like the output so far, whereas yours grows like the remainder of the output. The former is always finite (starting at [] and adding one at a time), whereas the latter may be infinite (eg. [1..]). So this variant can yield elements more lazily.

This would be faster (O(n log n) instead of O(n^2)) if you used a Data.Set instead of a list for seen. But it adds an Ord constraint.

upvote
  flag
I keep coming back to this response - as I continue with my attempts to learn Haskell, it is giving me deeper insights into the power of Haskell. – Chris Walton

Not the answer you're looking for? Browse other questions tagged or ask your own question.