I was trying to write a function that reads a list with lists in it and gives back the same list structure but without elements.

def remove_from_list(l):
    for e in l:
        if isinstance(e, list):
            remove_from_list(e)
        else:
            l.remove(e)
    return l

So for input like this:

[1, 2, [], [3,[4]], 5]

I should get something like this:

[[], [[]]]

I tried with these inputs:

[1, 2, [], [3,[4]], 5]
[1, 2, [], [2,[3]], 2]

And got these outputs:

[2, [], [[4]]]
[[], [[3]], 2]

Which is very confusing because the structure of both lists is the same; only the elements differ. So not only do I get a mistake, but I get a different one. Help with the function and explanation of my mistake would be very appreciated.

upvote
  flag
Indeed, this is a beginner mistake. You shouldn't remove list elements during iteration. – cᴏʟᴅsᴘᴇᴇᴅ
1 upvote
  flag
Do you necessarily want to modify the list you pass in in place? – juanpa.arrivillaga
1 upvote
  flag
@cᴏʟᴅsᴘᴇᴇᴅ Not true. It's possible to use reverse-iterate and del, no copies required. – wim
upvote
  flag
@wim interesting challenge you mentioned about recursive lists. What do you think about how I tried to address it in my answer? – גלעד ברקן
upvote
  flag
@גלעדברקן good! +1 – wim
1 upvote
  flag
Possible duplicate of Recursion removing list elements Python – BDL

5 Answers 11

The issue is that you're removing elements from a list while iterating over that same list. One solution is to make a copy and iterate over that, like so:

def remove_from_list(l):
    for e in l[:]: # make a copy by slicing
        if isinstance(e, list):
            remove_from_list(e)
        else:
            l.remove(e)
    return l

Result:

>>> remove_from_list([1, 2, [], [3,[4]], 5])
[[], [[]]]

An explanation for this behavior from the Python documentation:

There is a subtlety when the sequence is being modified by the loop (this can only occur for mutable sequences, i.e. lists). An internal counter is used to keep track of which item is used next, and this is incremented on each iteration. When this counter has reached the length of the sequence the loop terminates. This means that if the suite deletes the current (or a previous) item from the sequence, the next item will be skipped (since it gets the index of the current item which has already been treated). Likewise, if the suite inserts an item in the sequence before the current item, the current item will be treated again the next time through the loop.

up vote 5 down vote accepted

The key issue here is the fact that the for loop has a set number of iterations, but when you remove lists inside the loop, you're shrinking them. So, the loop pointer remains fixed while the list becomes smaller. At one point, the loop does not get the opportunity to finish iteration.

Option 1
As an easy fix, you can just create a new list inside the function. This should be simpler, and does not mutate your original list.

def remove_from_list(l):
    r = []
    for e in l:
        if isinstance(e, list):
            r.append(remove_from_list(e))
    return r

>>> remove_from_list([1, 2, [], [3,[4]], 5])
[[], [[]]]

Furthermore, since you're only appending empty structures, this should be faster than creating copies of your data and subsequent remove calls.


Option 2
Building on wim's idea, iterate in reverse and use del if you want to mutate the list in place.

def remove_from_list(l):
    r = []
    for i in range(len(l) - 1, -1, -1):
        if isinstance(l[i], list):
            remove_from_list(l[i])
        else:
            del l[i]

>>> l = [1, 2, [], [3,[4]], 5]
>>> remove_from_list(l)
>>> l
[[], [[]]]

I would recommend, from the perspective of good practice, to either return a copy, or modify in place without returning, but not both.


You can perform a timeit comparison to determine which method works faster on your data.

First, the setup -

def remove_from_list(l):
    r = []
    for e in l:
        if isinstance(e, list):
            r.append(remove_from_list(e))
    return r

def remove_from_list_reverse_del(l):
    r = []
    for i in range(len(l) - 1, -1, -1):
        if isinstance(l[i], list):
            remove_from_list(l[i])
        else:
            del l[i]


def remove_from_list_copy(l):
    for e in l[:]: # make a copy by slicing
        if isinstance(e, list):
            remove_from_list_copy(e)
        else:
            l.remove(e)
    return l

y = [1, 2, [], [3,[4]], 5]
z = copy.deepcopy(y  * 10000)

Next, the timings -

%timeit remove_from_list(z)
19.3 ms ± 334 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%%timeit
z2 = copy.deepcopy(z)    # copying because this function mutates the original
remove_from_list_reverse_del(z2)

78.6 ms ± 157 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

Although a big chunk of time is wasted in creating z2.

# Jacob Wood's function

%%timeit
z2 = copy.deepcopy(z)    # copying because this function mutates the original
remove_from_list_copy(z2)

5.08 s ± 33.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Also,

%timeit copy.deepcopy(z)
27.8 ms ± 313 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

The second function is inefficient for 2 main reasons: creating copies of your sub-lists is inefficient, and, calling remove on a list is O(N) in time.

Effectively, the latter is a quadratic solution or worse.

upvote
  flag
Downvoter, I'd appreciate voting on answers based on merit, not politics. In other words, if my answer is technically unsound in some way, please let me know so I can correct it. – cᴏʟᴅsᴘᴇᴇᴅ
1 upvote
  flag
Hi, this crashes when the list contains a reference to itself. – wim
1 upvote
  flag
Well, they all would. – cᴏʟᴅsᴘᴇᴇᴅ
2 upvote
  flag
Yeah, well I never said any other answer here was better. You asked if it was technically unsound in some way. – wim
upvote
  flag
@wim Ah, well. True, I did ask for it. Though if my answer was downvoted for that reason, they should all be. Anyway, I'll see if I can think of something. – cᴏʟᴅsᴘᴇᴇᴅ
upvote
  flag
@wim Well, added the reverse delete, but handling recursive lists is outside the scope of how much attention I'm willing to devote to a question like this. – cᴏʟᴅsᴘᴇᴇᴅ

Here's another way you can do it using recursion

def remove_from_list (l):
  if not l:
    return []
  elif isinstance (l[0], list):
    return [remove_from_list (l[0])] + remove_from_list (l[1:])
  else:
    return remove_from_list (l[1:])

print (remove_from_list ([1, 2, [], [3, [4]], 5]))
# [[], [[]]]

If you find it nice to think about problems in this way, you will find some generic functions make things nicer

def is_empty (l):
  return not l

def is_list (l):
  return isinstance (l, list)

def head (l):
  return l[0]

def tail (l):
  return l[1:]

def remove_from_list (l):
  if is_empty (l):
    return []
  elif is_list (head (l)):
    return [remove_from_list (head (l))] + remove_from_list (tail (l))
  else:
    return remove_from_list (tail (l))

print (remove_from_list ([1, 2, [], [3, [4]], 5]))
# [[], [[]]]

It does not mutate the input

data = [1, 2, [], [3, [4]], 5]

print (remove_from_list (data))
# [[], [[]]]

print (data)
# [1, 2, [], [3, [4]], 5]

And a tail-recursive version, that can be made stack-safe (changes in bold)

def identity (x):
  return x

def remove_from_list (l, k = identity):
  if is_empty (l):
    return k ([])
  elif is_list (head (l)):
    return remove_from_list (head (l), lambda x:
      remove_from_list (tail (l), lambda y:
        k ([x] + y)))
  else:
    return remove_from_list (tail (l), k)

print (remove_from_list (data))
# [[], [[]]]

What happened to good old fashioned recursion? (By the way, this answer also handles lists that contain references to themselves.)

def f(l, i, ids):
  if i >= len(l):
    return l
  if isinstance(l[i], list):
    if not id(l[i]) in ids:
      ids.add(id(l[i]))
      f(l[i], 0, ids)
    return f(l, i + 1, ids)
  else:
    del l[i]
    return f(l, i, ids)

a = [1, 2, [], [3,[4]], 5]
a.append(a)
a[3].append(a[3])

print a # [1, 2, [], [3, [4], [...]], 5, [...]]
print f(a, 0, set([id(a)])) # [[], [[], [...]], [...]]

(As for your misunderstanding - as cᴏʟᴅsᴘᴇᴇᴅ mentioned, deleting parts of a list during the for loop can cause unexpected results since the range of iteration is set before it starts, yet the list gets modified partway through.)

upvote
  flag
Nothing happened. All the answers here are recursive? – cᴏʟᴅsᴘᴇᴇᴅ
upvote
  flag
@cᴏʟᴅsᴘᴇᴇᴅ recursive, yes; old fashioned, no. – גלעד ברקן
upvote
  flag
Well, in that case, have an upvote from me. I love old fashioned. – cᴏʟᴅsᴘᴇᴇᴅ

Simple recursion

def remove_from_list(l):
  if l == []:
    return []
  elif not isinstance(l[0], list):
    return remove_from_list(l[1:])
  else:
    return [remove_from_list(l[0])] + remove_from_list(l[1:])

A bit more complicated

def remove_from_list(l):
  def remove_inner(l_in,l_out):
    if l_in == []:
      return l_out
    elif not isinstance(l_in[0], list) or l[0] == []:
      return remove_inner(l_in[1:],l_out)
    else:
      return remove_inner(l_in[1:], l_out + [remove_from_list(l_in[0])])
  return remove_inner(l,[])

print(remove_from_list([1, 2, [], [3,[4]], 5]))

Not the answer you're looking for? Browse other questions tagged or ask your own question.