true

I have a list of arbitrary length, and I need to split it up into equal size chunks and operate on it. There are some obvious ways to do this, like keeping a counter and two lists, and when the second list fills up, add it to the first list and empty the second list for the next round of data, but this is potentially extremely expensive.

I was wondering if anyone had a good solution to this for lists of any length, e.g. using generators.

I was looking for something useful in itertools but I couldn't find anything obviously useful. Might've missed it, though.

Related question: What is the most “pythonic” way to iterate over a list in chunks?

upvote
  flag
An optimized solution (more memory friendly) here: //allinonescript.com/questions/7133179/python-yield-and-delete – Radim
upvote
  flag
Might be able to do this better with slices... – user3917838
11 upvote
  flag
FWIW, the library more_itertools offers a chunked function that does this in an efficient way. – bgusach

55 Answers 11

up vote 1818 down vote accepted

Here's a generator that yields the chunks you want:

def chunks(l, n):
    """Yield successive n-sized chunks from l."""
    for i in range(0, len(l), n):
        yield l[i:i + n]

import pprint
pprint.pprint(list(chunks(range(10, 75), 10)))
[[10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
 [20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
 [30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
 [40, 41, 42, 43, 44, 45, 46, 47, 48, 49],
 [50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
 [60, 61, 62, 63, 64, 65, 66, 67, 68, 69],
 [70, 71, 72, 73, 74]]

If you're using Python 2, you should use xrange() instead of range():

def chunks(l, n):
    """Yield successive n-sized chunks from l."""
    for i in xrange(0, len(l), n):
        yield l[i:i + n]

Also you can simply use list comprehension instead of writing a function. Python 3:

[l[i:i + n] for i in range(0, len(l), n)]

Python 2 version:

[l[i:i + n] for i in xrange(0, len(l), n)]
43 upvote
  flag
What happens if we can't tell the length of the list? Try this on itertools.repeat([ 1, 2, 3 ]), e.g. – jespern
27 upvote
  flag
That's an interesting extension to the question, but the original question clearly asked about operating on a list. – Ned Batchelder
126 upvote
  flag
The 2to3 porting program changes all xrange calls to range since in Python 3.0 the functionality of range will be equivalent to that of xrange (i.e. it will return an iterator). So I would avoid using range and use xrange instead. – Tomi Kyöstilä
36 upvote
  flag
@attz actually range was removed from Python 3.0 and xrange was renamed to range. – Kos
upvote
  flag
With a tuple comprehension: chunks = (l[i:i+n] for i in xrange(0, len(l), n)) – zedr
10 upvote
  flag
@zedr, that "tuple comprehension" is actually a "generator expression". A tuple comprehension would be more like tuple(l[i:i+n] for i in xrange(0, len(l), n)). :-) – Ben Hoyt
upvote
  flag
@BenHoyt I'm gonna add that to the solution because it's faster and more succint, it should be there already – jamylak
upvote
  flag
@jamylak: Hmmm, that's not was I was suggesting at all. For one thing, people were upvoting this answer based on @Ned's original yield answer, not the generator expression. I think I prefer the more explict yield version, and I think your edit makes the answer more confusing, as now there are two different options within one answer. Perhaps better to put this in a different answer. – Ben Hoyt
upvote
  flag
@BenHoyt IMHO the generator expression is much better, it's faster and it's meant for these simple operations. yield is unneeded here, I agree that it is confusing having two different answers so I will move it. I thought it would be best to teach people the best method. – jamylak
2 upvote
  flag
@jespern I guess with an infinite or indefinite-length list you go to the related question that J.F. Sebastian linked: What is the most “pythonic” way to iterate over a list in chunks? – n611x007
upvote
  flag
neat and clean approach, used the same but with for in so it would not be obscure for new programmers :) +1 – Vitaliy Terziev

If you know list size:

def SplitList(list, chunk_size):
    return [list[offs:offs+chunk_size] for offs in range(0, len(list), chunk_size)]

If you don't (an iterator):

def IterChunks(sequence, chunk_size):
    res = []
    for item in sequence:
        res.append(item)
        if len(res) >= chunk_size:
            yield res
            res = []
    if res:
        yield res  # yield the last, incomplete, portion

In the latter case, it can be rephrased in a more beautiful way if you can be sure that the sequence always contains a whole number of chunks of given size (i.e. there is no incomplete last chunk).

upvote
  flag
I am sad this is buried so far down. The IterChunks works for everything and is the general solution and has no caveats that I know of. – Jason Dunkelberger
upvote
  flag
All questions about the general solution (non-lists) are marked as duplicates of this question, which they are not, so I have to push this up. – Jason Dunkelberger

Here is a generator that work on arbitrary iterables:

def split_seq(iterable, size):
    it = iter(iterable)
    item = list(itertools.islice(it, size))
    while item:
        yield item
        item = list(itertools.islice(it, size))

Example:

>>> import pprint
>>> pprint.pprint(list(split_seq(xrange(75), 10)))
[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
 [10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
 [20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
 [30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
 [40, 41, 42, 43, 44, 45, 46, 47, 48, 49],
 [50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
 [60, 61, 62, 63, 64, 65, 66, 67, 68, 69],
 [70, 71, 72, 73, 74]]

heh, one line version

In [48]: chunk = lambda ulist, step:  map(lambda i: ulist[i:i+step],  xrange(0, len(ulist), step))

In [49]: chunk(range(1,100), 10)
Out[49]: 
[[1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
 [11, 12, 13, 14, 15, 16, 17, 18, 19, 20],
 [21, 22, 23, 24, 25, 26, 27, 28, 29, 30],
 [31, 32, 33, 34, 35, 36, 37, 38, 39, 40],
 [41, 42, 43, 44, 45, 46, 47, 48, 49, 50],
 [51, 52, 53, 54, 55, 56, 57, 58, 59, 60],
 [61, 62, 63, 64, 65, 66, 67, 68, 69, 70],
 [71, 72, 73, 74, 75, 76, 77, 78, 79, 80],
 [81, 82, 83, 84, 85, 86, 87, 88, 89, 90],
 [91, 92, 93, 94, 95, 96, 97, 98, 99]]
26 upvote
  flag
Please, use "def chunk" instead of "chunk = lambda". It works the same. One line. Same features. MUCH easier to the n00bz to read and understand. – S.Lott
4 upvote
  flag
@S.Lott: not if the n00bz come from scheme :P this isn't a real problem. there's even a keyword to google! what other features show we avoid for the sake of the n00bz? i guess yield isn't imperative/c-like enough to be n00b friendly either then. – Janus Troelsen
9 upvote
  flag
The function object resulting from def chunk instead of chunk=lambda has .__name__ attribute 'chunk' instead of '<lambda>'. The specific name is more useful in tracebacks. – Terry Jan Reedy
1 upvote
  flag
@Alfe: I'm not sure if could be called a main semantic difference, but whether there's a useful name in a traceback instead of <lamba> or not is, at least, a notable difference. – martineau

Directly from the (old) Python documentation (recipes for itertools):

from itertools import izip, chain, repeat

def grouper(n, iterable, padvalue=None):
    "grouper(3, 'abcdefg', 'x') --> ('a','b','c'), ('d','e','f'), ('g','x','x')"
    return izip(*[chain(iterable, repeat(padvalue, n-1))]*n)

The current version, as suggested by J.F.Sebastian:

#from itertools import izip_longest as zip_longest # for Python 2.x
from itertools import zip_longest # for Python 3.x
#from six.moves import zip_longest # for both (uses the six compat library)

def grouper(n, iterable, padvalue=None):
    "grouper(3, 'abcdefg', 'x') --> ('a','b','c'), ('d','e','f'), ('g','x','x')"
    return zip_longest(*[iter(iterable)]*n, fillvalue=padvalue)

I guess Guido's time machine works—worked—will work—will have worked—was working again.

These solutions work because [iter(iterable)]*n (or the equivalent in the earlier version) creates one iterator, repeated n times in the list. izip_longest then effectively performs a round-robin of "each" iterator; because this is the same iterator, it is advanced by each such call, resulting in each such zip-roundrobin generating one tuple of n items.

32 upvote
  flag
It is izip_longest(*[iter(iterable)]*n, fillvalue=fillvalue) nowadays. – jfs
upvote
  flag
@ninjagecko: list(grouper(3, range(10))) returns [(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, None, None)], and all tuples are of length 3. Please elaborate on your comment because I can't understand it; what do you call a thing and how do you define it being a multiple of 3 in “expecting your thing to be a multiple of 3”? Thank you in advance. – tzot
upvote
  flag
If it is incorrect behavior for the user's code to have a tuple with None, they need to explicitly raise an error if len('0123456789')%3 != 0. This is not a bad thing, but a thing which could be documented. Oh wait my apologies... it is documented implicitly in by the padvalue=None argument. (Also by '3' I meant 'n') Nice code. – ninjagecko
10 upvote
  flag
upvoted this because it works on generators (no len) and uses the generally faster itertools module. – Michael Dillon
3 upvote
  flag
You can combine this all into a short one-liner: zip(*[iter(yourList)]*n) (or izip_longest with fillvalue) – ninjagecko
upvote
  flag
@ninjagecko: I can only assume you didn't read my answer to the end, because what you suggest is what the “alternate take” is. – tzot
upvote
  flag
For the record I did read tzot's answer to the end. I merely thought that defining a function for such an operation was not usually necessary (in imperative style) if you used a short idiom and the built-in zip, asserting that your list length is a multiple of n. (Yes, I am also aware the OP's question says arbitrary length, though I interpreted that term loosely.) – ninjagecko
41 upvote
  flag
A classic example of fancy itertools functional approach turning out some unreadable sludge, when compared to a simple and naive pure python implementation – wim
9 upvote
  flag
@wim Given that this answer began as a snippet from the Python documentation, I'd suggest you open an issue on bugs.python.org . – tzot
1 upvote
  flag
Use from itertools import zip_longest for Python 3, or from six.moves import zip_longest if you want to support 2/3. – Nick T
upvote
  flag
@tzot Apparently it's been brought up and rejected many times: grokbase.com/t/python/python-ideas/126tzj5djb/… – endolith
upvote
  flag
Wow, I recommend reading that thread if only to learn how not to discuss ideas. I don't have all of the context, but it read very toxic to me. – TNi
def split_seq(seq, num_pieces):
    start = 0
    for i in xrange(num_pieces):
        stop = start + len(seq[i::num_pieces])
        yield seq[start:stop]
        start = stop

usage:

seq = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

for seq in split_seq(seq, 3):
    print seq
def chunk(lst):
    out = []
    for x in xrange(2, len(lst) + 1):
        if not len(lst) % x:
            factor = len(lst) / x
            break
    while lst:
        out.append([lst.pop(0) for x in xrange(factor)])
    return out
>>> f = lambda x, n, acc=[]: f(x[n:], n, acc+[(x[:n])]) if x else acc
>>> f("Hallo Welt", 3)
['Hal', 'lo ', 'Wel', 't']
>>> 

If you are into brackets - I picked up a book on Erlang :)

14 upvote
  flag
This is by far the least readable, and would never pass a code review (“go back and re-write it so it's clear”). Clever code is hard-to-maintain code; meaningful names and simple statements are far better. – bignose

If you want something super simple:

def chunks(l, n):
    n = max(1, n)
    return (l[i:i+n] for i in xrange(0, len(l), n))
5 upvote
  flag
Or (if we're doing different representations of this particular function) you could define a lambda function via: lambda x,y: [ x[i:i+y] for i in range(0,len(x),y)] . I love this list-comprehension method! – J-P
2 upvote
  flag
after return there must be [, not ( – alwbtc
1 upvote
  flag
@alwbtc - no it's correct it's a generator – Mr_and_Mrs_D

Without calling len() which is good for large lists:

def splitter(l, n):
    i = 0
    chunk = l[:n]
    while chunk:
        yield chunk
        i += n
        chunk = l[i:i+n]

And this is for iterables:

def isplitter(l, n):
    l = iter(l)
    chunk = list(islice(l, n))
    while chunk:
        yield chunk
        chunk = list(islice(l, n))

The functional flavour of the above:

def isplitter2(l, n):
    return takewhile(bool,
                     (tuple(islice(start, n))
                            for start in repeat(iter(l))))

OR:

def chunks_gen_sentinel(n, seq):
    continuous_slices = imap(islice, repeat(iter(seq)), repeat(0), repeat(n))
    return iter(imap(tuple, continuous_slices).next,())

OR:

def chunks_gen_filter(n, seq):
    continuous_slices = imap(islice, repeat(iter(seq)), repeat(0), repeat(n))
    return takewhile(bool,imap(tuple, continuous_slices))
15 upvote
  flag
There is no reason to avoid len() on large lists; it's a constant-time operation. – Thomas Wouters
def chunk(input, size):
    return map(None, *([iter(input)] * size))
upvote
  flag
map(None, iter) equals izip_longest(iter). – Thomas Ahle
1 upvote
  flag
@TomaszWysocki Can you explain the * in front of you iterator tuple? Possibly in your answer text, but I have note seen that * used that way in Python before. Thanks! – theJollySin
1 upvote
  flag
@theJollySin In this context, it is called the splat operator. Its use is explained here - //allinonescript.com/questions/5917522/unzipping-and-the-operat‌​or. – rlms
1 upvote
  flag
Close but the last chunk has None elements to fill it out. This may or may not be a defect. Really cool pattern though. – user1969453

Simple yet elegant

l = range(1, 1000)
print [l[x:x+10] for x in xrange(0, len(l), 10)]

or if you prefer:

chunks = lambda l, n: [l[x: x+n] for x in xrange(0, len(l), n)]
chunks(l, 10)
10 upvote
  flag
Thou shalt not dub a variable in the likeness of an Arabic number. In some fonts, 1 and l are indistinguishable. As are 0 and O. And sometimes even I and 1. – Alfe
9 upvote
  flag
@Alfe Defective fonts. People shouldn't use such fonts. Not for programming, not for anything. – Jerry B
10 upvote
  flag
Lambdas are meant to be used as unnamed functions. There is no point in using them like that. In addition it makes debugging more difficult as the traceback will report "in <lambda>" instead of "in chunks" in case of error. I wish you luck finding a problem if you have whole bunch of these :) – Chris Koston
1 upvote
  flag
it should be 0 and not 1 inside xrange in print [l[x:x+10] for x in xrange(1, len(l), 10)] – scottydelta
upvote
  flag
NOTE: For Python 3 users use range. – Christian Dean

If you had a chunk size of 3 for example, you could do:

zip(*[iterable[i::3] for i in range(3)]) 

source: http://code.activestate.com/recipes/303060-group-a-list-into-sequential-n-tuples/

I would use this when my chunk size is fixed number I can type, e.g. '3', and would never change.

7 upvote
  flag
This doesn't work if len(iterable)%3 != 0. The last (short) group of numbers won't be returned. – sherbang

Consider using matplotlib.cbook pieces

for example:

import matplotlib.cbook as cbook
segments = cbook.pieces(np.arange(20), 3)
for s in segments:
     print s
def chunks(iterable,n):
    """assumes n is an integer>0
    """
    iterable=iter(iterable)
    while True:
        result=[]
        for i in range(n):
            try:
                a=next(iterable)
            except StopIteration:
                break
            else:
                result.append(a)
        if result:
            yield result
        else:
            break

g1=(i*i for i in range(10))
g2=chunks(g1,3)
print g2
'<generator object chunks at 0x0337B9B8>'
print list(g2)
'[[0, 1, 4], [9, 16, 25], [36, 49, 64], [81]]'
1 upvote
  flag
While this may not look as short or as pretty as many of the itertools based responses this one actually works if you want to print out the second sub-list before accessing the first, i.e., you can set i0=next(g2); i1=next(g2); and use i1 before using i0 and it doesn't break!! – Peter Gerdes

I realise this question is old (stumbled over it on Google), but surely something like the following is far simpler and clearer than any of the huge complex suggestions and only uses slicing:

def chunker(iterable, chunksize):
    for i,c in enumerate(iterable[::chunksize]):
        yield iterable[i*chunksize:(i+1)*chunksize]

>>> for chunk in chunker(range(0,100), 10):
...     print list(chunk)
... 
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
[20, 21, 22, 23, 24, 25, 26, 27, 28, 29]
... etc ...

No one use tee() function under itertools ?

http://docs.python.org/2/library/itertools.html#itertools.tee

>>> import itertools
>>> itertools.tee([1,2,3,4,5,6],3)
(<itertools.tee object at 0x02932DF0>, <itertools.tee object at 0x02932EB8>, <itertools.tee object at 0x02932EE0>)

This will split list to 3 iterator , loop the iterator will get the sublist with equal length

4 upvote
  flag
I don't think this does what you think it does. Each of the iterators in tee (at least for me) has the full list in it: >>> map(list, itertools.tee([1,2,3,4,5,6],3)) [[1, 2, 3, 4, 5, 6], [1, 2, 3, 4, 5, 6], [1, 2, 3, 4, 5, 6]] – Christopher Schmidt

See this reference

>>> orange = range(1, 1001)
>>> otuples = list( zip(*[iter(orange)]*10))
>>> print(otuples)
[(1, 2, 3, 4, 5, 6, 7, 8, 9, 10), ... (991, 992, 993, 994, 995, 996, 997, 998, 999, 1000)]
>>> olist = [list(i) for i in otuples]
>>> print(olist)
[[1, 2, 3, 4, 5, 6, 7, 8, 9, 10], ..., [991, 992, 993, 994, 995, 996, 997, 998, 999, 1000]]
>>> 

Python3

2 upvote
  flag
Nice, but drops elements at the end if the size does not match whole numbers of chunks, e. g. zip(*[iter(range(7))]*3) only returns [(0, 1, 2), (3, 4, 5)] and forgets the 6 from the input. – Alfe

A generator expression:

def chunks(seq, n):
    return (seq[i:i+n] for i in xrange(0, len(seq), n))

eg.

print list(chunks(range(1, 1000), 10))

more-itertools has a chunks iterator.

It also has a lot more things, including all the recipes in the itertools documentation.

4 upvote
  flag
The large number of different solutions on this page is a demonstration that this function ought to be standard. Every iterable in Scala has a grouped method that does exactly this and a sliding that provides more flexibility (grouped(N) is sliding(N, N)). – Jim Pivarski
upvote
  flag

using List Comprehensions of python

[range(t,t+10) for t in range(1,1000,10)]

[[1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
 [11, 12, 13, 14, 15, 16, 17, 18, 19, 20],
 [21, 22, 23, 24, 25, 26, 27, 28, 29, 30],....
 ....[981, 982, 983, 984, 985, 986, 987, 988, 989, 990],
 [991, 992, 993, 994, 995, 996, 997, 998, 999, 1000]]

visit this link to know about List Comprehensions

1 upvote
  flag
How would you apply your approach on an existing list which comes as input? – Alfe
1 upvote
  flag
@Alfe for chunk in [some_list[i:i + 10] for i in range(0, len(some_list), 10)]: print chunk – flexd
upvote
  flag
This way it looks a lot like the accepted top-answer ;-) – Alfe

I know this is kind of old but I don't why nobody mentioned numpy.array_split:

lst = range(50)
In [26]: np.array_split(lst,5)
Out[26]: 
[array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]),
 array([10, 11, 12, 13, 14, 15, 16, 17, 18, 19]),
 array([20, 21, 22, 23, 24, 25, 26, 27, 28, 29]),
 array([30, 31, 32, 33, 34, 35, 36, 37, 38, 39]),
 array([40, 41, 42, 43, 44, 45, 46, 47, 48, 49])]
4 upvote
  flag
This allows you to set the total number of chunks, not the number of elements per chunk. – FizxMike
upvote
  flag
you can do the math yourself. if you have 10 elements you can group them into 2, 5 elements chunks or five 2 elements chunks – Moj
7 upvote
  flag
+1 This is my favorite solution, as it splits the array into evenly sized arrays, while other solutions don't (in all other solutions I looked at, the last array may be arbitrarily small). – MiniQuark
1 upvote
  flag
I assume that b should be lst – Gian Luca Scoccia
upvote
  flag
@GianLucaScoccia b should be array_like – endolith

Not exactly the same but still nice

def chunks(l, chunks):
    return zip(*[iter(l)]*chunks)

l = range(1, 1000)
print chunks(l, 10) -> [ ( 1..10 ), ( 11..20 ), .., ( 991..999 ) ]
2 upvote
  flag
Nice, but drops end-elements if the sizes don't match exactly, e.g. zip(*[iter(range(7))]*3) only returns [(0, 1, 2), (3, 4, 5)] and forgets the 6 from the input. – Alfe
upvote
  flag
Nice catch. But its easy to add those extra elements directly. – Moss
upvote
  flag
Easy (necessary) things should be part of the answer ;-) – Alfe
upvote
  flag
Because this drops any non-multiple group of elements at the end, it won't produce the output shown -- the end would be ... ( 981..990 ). It's also considered a poor practice to use identifiers that are already assigned to built-ins like list. – martineau
  • Works with any iterable
  • Inner data is generator object (not a list)
  • One liner
In [259]: get_in_chunks = lambda itr,n: ( (v for _,v in g) for _,g in itertools.groupby(enumerate(itr),lambda (ind,_): ind/n))

In [260]: list(list(x) for x in get_in_chunks(range(30),7))
Out[260]:
[[0, 1, 2, 3, 4, 5, 6],
 [7, 8, 9, 10, 11, 12, 13],
 [14, 15, 16, 17, 18, 19, 20],
 [21, 22, 23, 24, 25, 26, 27],
 [28, 29]]
upvote
  flag
g = get_in_chunks(range(30),7); i0=next(g);i1=next(g);list(i1);list(i0); Last evaluation is empty. Hidden requirement about accessing all the sublists in order seems really bad here to me because the goal with these kind of utils is often to shuffle data around in various ways. – Peter Gerdes
def chunked(iterable, size):
    chunk = ()

    for item in iterable:
        chunk += (item,)
        if len(chunk) % size == 0:
            yield chunk
            chunk = ()

    if chunk:
        yield chunk

I like the Python doc's version proposed by tzot and J.F.Sebastian a lot, but it has two shortcomings:

  • it is not very explicit
  • I usually don't want a fill value in the last chunk

I'm using this one a lot in my code:

from itertools import islice

def chunks(n, iterable):
    iterable = iter(iterable)
    while True:
        yield tuple(islice(iterable, n)) or iterable.next()

UPDATE: A lazy chunks version:

from itertools import chain, islice

def chunks(n, iterable):
   iterable = iter(iterable)
   while True:
       yield chain([next(iterable)], islice(iterable, n-1))

The toolz library has the partition function for this:

from toolz.itertoolz.core import partition

list(partition(2, [1, 2, 3, 4]))
[(1, 2), (3, 4)]
upvote
  flag
This looks like the simplest of all the suggestions. I am just wondering if it really can be true that one has to use a third party library to get such a partition function. I would have expected something equivalent with that partition function to exist as a language builtin. – kasperd
upvote
  flag
you can do a partition with itertools. but I like the toolz library. its a clojure-inspired library for working on collections in a functional style. you don't get immutability but you get a small vocabulary for working on simple collections. As a plus, cytoolz is written in cython and gets a nice performance boost. github.com/pytoolz/cytoolz matthewrocklin.com/blog/work/2014/05/01/Introducing-CyToolz – zach

Yes, it is an old question, but I had to post this one, because it is even a little shorter than the similar ones. Yes, the result looks scrambled, but if it is just about even length...

>>> n = 3 # number of groups
>>> biglist = range(30)
>>>
>>> [ biglist[i::n] for i in xrange(n) ]
[[0, 3, 6, 9, 12, 15, 18, 21, 24, 27],
 [1, 4, 7, 10, 13, 16, 19, 22, 25, 28],
 [2, 5, 8, 11, 14, 17, 20, 23, 26, 29]]

Critique of other answers here:

None of these answers are evenly sized chunks, they all leave a runt chunk at the end, so they're not completely balanced. If you were using these functions to distribute work, you've built-in the prospect of one likely finishing well before the others, so it would sit around doing nothing while the others continued working hard.

For example, the current top answer ends with:

[60, 61, 62, 63, 64, 65, 66, 67, 68, 69],
[70, 71, 72, 73, 74]]

I just hate that runt at the end!

Others, like list(grouper(3, xrange(7))), and chunk(xrange(7), 3) both return: [(0, 1, 2), (3, 4, 5), (6, None, None)]. The None's are just padding, and rather inelegant in my opinion. They are NOT evenly chunking the iterables.

Why can't we divide these better?

My Solution(s)

Here's a balanced solution, adapted from a function I've used in production (Note in Python 3 to replace xrange with range):

def baskets_from(items, maxbaskets=25):
    baskets = [[] for _ in xrange(maxbaskets)] # in Python 3 use range
    for i, item in enumerate(items):
        baskets[i % maxbaskets].append(item)
    return filter(None, baskets) 

And I created a generator that does the same if you put it into a list:

def iter_baskets_from(items, maxbaskets=3):
    '''generates evenly balanced baskets from indexable iterable'''
    item_count = len(items)
    baskets = min(item_count, maxbaskets)
    for x_i in xrange(baskets):
        yield [items[y_i] for y_i in xrange(x_i, item_count, baskets)]

And finally, since I see that all of the above functions return elements in a contiguous order (as they were given):

def iter_baskets_contiguous(items, maxbaskets=3, item_count=None):
    '''
    generates balanced baskets from iterable, contiguous contents
    provide item_count if providing a iterator that doesn't support len()
    '''
    item_count = item_count or len(items)
    baskets = min(item_count, maxbaskets)
    items = iter(items)
    floor = item_count // baskets 
    ceiling = floor + 1
    stepdown = item_count % baskets
    for x_i in xrange(baskets):
        length = ceiling if x_i < stepdown else floor
        yield [items.next() for _ in xrange(length)]

Output

To test them out:

print(baskets_from(xrange(6), 8))
print(list(iter_baskets_from(xrange(6), 8)))
print(list(iter_baskets_contiguous(xrange(6), 8)))
print(baskets_from(xrange(22), 8))
print(list(iter_baskets_from(xrange(22), 8)))
print(list(iter_baskets_contiguous(xrange(22), 8)))
print(baskets_from('ABCDEFG', 3))
print(list(iter_baskets_from('ABCDEFG', 3)))
print(list(iter_baskets_contiguous('ABCDEFG', 3)))
print(baskets_from(xrange(26), 5))
print(list(iter_baskets_from(xrange(26), 5)))
print(list(iter_baskets_contiguous(xrange(26), 5)))

Which prints out:

[[0], [1], [2], [3], [4], [5]]
[[0], [1], [2], [3], [4], [5]]
[[0], [1], [2], [3], [4], [5]]
[[0, 8, 16], [1, 9, 17], [2, 10, 18], [3, 11, 19], [4, 12, 20], [5, 13, 21], [6, 14], [7, 15]]
[[0, 8, 16], [1, 9, 17], [2, 10, 18], [3, 11, 19], [4, 12, 20], [5, 13, 21], [6, 14], [7, 15]]
[[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, 10, 11], [12, 13, 14], [15, 16, 17], [18, 19], [20, 21]]
[['A', 'D', 'G'], ['B', 'E'], ['C', 'F']]
[['A', 'D', 'G'], ['B', 'E'], ['C', 'F']]
[['A', 'B', 'C'], ['D', 'E'], ['F', 'G']]
[[0, 5, 10, 15, 20, 25], [1, 6, 11, 16, 21], [2, 7, 12, 17, 22], [3, 8, 13, 18, 23], [4, 9, 14, 19, 24]]
[[0, 5, 10, 15, 20, 25], [1, 6, 11, 16, 21], [2, 7, 12, 17, 22], [3, 8, 13, 18, 23], [4, 9, 14, 19, 24]]
[[0, 1, 2, 3, 4, 5], [6, 7, 8, 9, 10], [11, 12, 13, 14, 15], [16, 17, 18, 19, 20], [21, 22, 23, 24, 25]]

Notice that the contiguous generator provide chunks in the same length patterns as the other two, but the items are all in order, and they are as evenly divided as one may divide a list of discrete elements.

upvote
  flag
You say that none of the above provides evenly-sized chunks. But this one does, as does this one. – senderle
1 upvote
  flag
@senderle, The first one, list(grouper(3, xrange(7))), and the second one, chunk(xrange(7), 3) both return: [(0, 1, 2), (3, 4, 5), (6, None, None)]. The None's are just padding, and rather inelegant in my opinion. They are NOT evenly chunking the iterables. Thanks for your vote! – Aaron Hall
2 upvote
  flag
You raise the question (without doing it explicitly, so I do that now here) whether equally-sized chunks (except the last, if not possible) or whether a balanced (as good as possible) result is more often what will be needed. You assume that the balanced solution is to prefer; this might be true if what you program is close to the real world (e. g. a card-dealing algorithm for a simulated card game). In other cases (like filling lines with words) one will rather like to keep the lines as full as possible. So I can't really prefer one over the other; they are just for different use cases. – Alfe
upvote
  flag
@ChristopherBarrington-Leigh Good point, for DataFrames, you should probably use slices, since I believe DataFrame objects do not usually copy on slicing, e.g. import pandas as pd; [pd.DataFrame(np.arange(7))[i::3] for i in xrange(3)] – Aaron Hall
1 upvote
  flag
@AaronHall Oops. I deleted my comment because I second-guessed my critique, but you were quick on the draw. Thanks! In fact, my claim that it doesn't work for dataframes is true. If items is a dataframe, just use yield items[range(x_i, item_count, baskets)] as the last line. I offered a separate (yet another) answer, in which you specify the desired (minimum) group size. – CPBL
upvote
  flag
@ChristopherBarrington-Leigh Thanks, very nice of you. I wouldn't use the code from my answer to do this, though. If you're iterating over a DataFrame, you can use iterrows. I wouldn't use range to slice, it creates an object in memory. I'd prefer a slice object, created with the slicing syntax e.g. i::3, or equivalently, slice(i, None, 3). – Aaron Hall
upvote
  flag
I like this, but I wish it would work on an arbitrary length lambda instead of just len() – Steve Yeago
upvote
  flag
Don't be tempted to use [[]]*maxbaskets it is not the same thing as [[] for _ in range(maxbaskets)]. In the first case there is realy only a single instance of a bucket referenced multiple times. – qbolec

I'm surprised nobody has thought of using iter's two-argument form:

from itertools import islice

def chunk(it, size):
    it = iter(it)
    return iter(lambda: tuple(islice(it, size)), ())

Demo:

>>> list(chunk(range(14), 3))
[(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, 11), (12, 13)]

This works with any iterable and produces output lazily. It returns tuples rather than iterators, but I think it has a certain elegance nonetheless. It also doesn't pad; if you want padding, a simple variation on the above will suffice:

from itertools import islice, chain, repeat

def chunk_pad(it, size, padval=None):
    it = chain(iter(it), repeat(padval))
    return iter(lambda: tuple(islice(it, size)), (padval,) * size)

Demo:

>>> list(chunk_pad(range(14), 3))
[(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, 11), (12, 13, None)]
>>> list(chunk_pad(range(14), 3, 'a'))
[(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, 11), (12, 13, 'a')]

Like the izip_longest-based solutions, the above always pads. As far as I know, there's no one- or two-line itertools recipe for a function that optionally pads. By combining the above two approaches, this one comes pretty close:

_no_padding = object()

def chunk(it, size, padval=_no_padding):
    if padval == _no_padding:
        it = iter(it)
        sentinel = ()
    else:
        it = chain(iter(it), repeat(padval))
        sentinel = (padval,) * size
    return iter(lambda: tuple(islice(it, size)), sentinel)

Demo:

>>> list(chunk(range(14), 3))
[(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, 11), (12, 13)]
>>> list(chunk(range(14), 3, None))
[(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, 11), (12, 13, None)]
>>> list(chunk(range(14), 3, 'a'))
[(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, 11), (12, 13, 'a')]

I believe this is the shortest chunker proposed that offers optional padding.

upvote
  flag
Wonderful, your simple version is my favorite. Others too came up with the basic islice(it, size) expression and embedded it (like I had done) in a loop construct. Only you thought of the two-argument version of iter() (I was completely unaware of), which makes it super-elegant (and probably most performance-effective). I had no idea that the first argument to iter changes to a 0-argument function when given the sentinel. You return a (pot. infinite) iterator of chunks, can use a (pot. infinite) iterator as input, have no len() and no array slices. Awesome! – ThomasH
upvote
  flag
This is why I read down through the answers rather than scanning just the top couple. Optional padding was a requirement in my case, and I too learned about the two-argument form of iter. – Kerr

I wrote a small library expressly for this purpose, available here. The library's chunked function is particularly efficient because it's implemented as a generator, so a substantial amount of memory can be saved in certain situations. It also doesn't rely on the slice notation, so any arbitrary iterator can be used.

import iterlib

print list(iterlib.chunked(xrange(1, 1000), 10))
# prints [(1, 2, 3, 4, 5, 6, 7, 8, 9, 10), (11, 12, 13, 14, 15, 16, 17, 18, 19, 20), ...]

Like @AaronHall I got here looking for roughly evenly sized chunks. There are different interpretations of that. In my case, if the desired size is N, I would like each group to be of size>=N. Thus, the orphans which are created in most of the above should be redistributed to other groups.

This can be done using:

def nChunks(l, n):
    """ Yield n successive chunks from l.
    Works for lists,  pandas dataframes, etc
    """
    newn = int(1.0 * len(l) / n + 0.5)
    for i in xrange(0, n-1):
        yield l[i*newn:i*newn+newn]
    yield l[n*newn-newn:]

(from Splitting a list of into N parts of approximately equal length) by simply calling it as nChunks(l,l/n) or nChunks(l,floor(l/n))

upvote
  flag
seems to yield some empty chunks (len=26, 10) , or a final very unbalanced chunk (len=26, 11). – idij

letting r be the chunk size and L be the initial list, you can do.

chunkL = [ [i for i in L[r*k:r*(k+1)] ] for k in range(len(L)/r)] 

Use list comprehensions:

l = [1,2,3,4,5,6,7,8,9,10,11,12]
k = 5 #chunk size
print [tuple(l[x:y]) for (x, y) in [(x, x+k) for x in range(0, len(l), k)]]

Another more explicit version.

def chunkList(initialList, chunkSize):
    """
    This function chunks a list into sub lists 
    that have a length equals to chunkSize.

    Example:
    lst = [3, 4, 9, 7, 1, 1, 2, 3]
    print(chunkList(lst, 3)) 
    returns
    [[3, 4, 9], [7, 1, 1], [2, 3]]
    """
    finalList = []
    for i in range(0, len(initialList), chunkSize):
        finalList.append(initialList[i:i+chunkSize])
    return finalList
upvote
  flag
(2016 Sep 12) This answer is the most language independent and easiest to read. – D Adams

I saw the most awesome Python-ish answer in a duplicate of this question:

from itertools import zip_longest

a = range(1, 16)
i = iter(a)
r = list(zip_longest(i, i, i))
>>> print(r)
[(1, 2, 3), (4, 5, 6), (7, 8, 9), (10, 11, 12), (13, 14, 15)]

You can create n-tuple for any n. If a = range(1, 15), then the result will be:

[(1, 2, 3), (4, 5, 6), (7, 8, 9), (10, 11, 12), (13, 14, None)]

If the list is divided evenly, then you can replace zip_longest with zip, otherwise the triplet (13, 14, None) would be lost. Python 3 is used above. For Python 2, use izip_longest.

upvote
  flag
that is nice if your list and chunks are short, how could you adapt this to split your list in to chunks of 1000 though? you"re not going to code zip(i,i,i,i,i,i,i,i,i,i.....i=1000) – Tom Smith
8 upvote
  flag
zip(i, i, i, ... i) with "chunk_size" arguments to zip() can be written as zip(*[i]*chunk_size) Whether that's a good idea or not is debatable, of course. – Wilson F
1 upvote
  flag
The downside of this is that if you aren't dividing evenly, you'll drop elements, as zip stops at the shortest iterable - & izip_longest would add default elements. – Aaron Hall
upvote
  flag
zip_longest should be used, as done in: //allinonescript.com/a/434411/1959808 – Ioannis Filippidis
upvote
  flag
The answer with range(1, 15) is already missing elements, because there are 14 elements in range(1, 15), not 15. – Ioannis Filippidis

The answer above (by koffein) has a little problem: the list is always split into an equal number of splits, not equal number of items per partition. This is my version. The "// chs + 1" takes into account that the number of items may not be divideable exactly by the partition size, so the last partition will only be partially filled.

# Given 'l' is your list

chs = 12 # Your chunksize
partitioned = [ l[i*chs:(i*chs)+chs] for i in range((len(l) // chs)+1) ]

code:

def split_list(the_list, chunk_size):
    result_list = []
    while the_list:
        result_list.append(the_list[:chunk_size])
        the_list = the_list[chunk_size:]
    return result_list

a_list = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

print split_list(a_list, 3)

result:

[[1, 2, 3], [4, 5, 6], [7, 8, 9], [10]]
upvote
  flag
Looks nice and easy to understand. Much better then the answers before. – buhtz
upvote
  flag
Don't use list as variable/objekt name because it is a type. – buhtz
a = [1, 2, 3, 4, 5, 6, 7, 8, 9]
CHUNK = 4
[a[i*CHUNK:(i+1)*CHUNK] for i in xrange((len(a) + CHUNK - 1) / CHUNK )]
upvote
  flag
Can you explain more your answer please ? – Zulu
upvote
  flag
Let's work from backwards:(len(a) + CHUNK -1) / CHUNK – AdvilUser
upvote
  flag
Working from backwards: (len(a) + CHUNK -1) / CHUNK Gives you the number of chunks that you will end up with. Then, for each chunk at index i, we are generating a sub-array of the original array like this: a[ i * CHUNK : (i + 1) * CHUNK ] where, i * CHUNK is the index of the first element to put into the subarray, and, (i + 1) * CHUNK is 1 past the last element to put into the subarray. This solution uses list comprehension, so it might be faster for large arrays. – AdvilUser
upvote
  flag
sorry about the bad formatting.... I don't know how to make comments have nice formatting... – AdvilUser

I have come up to following solution without creation temorary list object, which should work with any iterable object. Please note that this version for Python 2.x:

def chunked(iterable, size):
    stop = []
    it = iter(iterable)
    def _next_chunk():
        try:
            for _ in xrange(size):
                yield next(it)
        except StopIteration:
            stop.append(True)
            return

    while not stop:
        yield _next_chunk()

for it in chunked(xrange(16), 4):
   print list(it)

Output:

[0, 1, 2, 3]
[4, 5, 6, 7]
[8, 9, 10, 11]
[12, 13, 14, 15] 
[]

As you can see if len(iterable) % size == 0 then we have additional empty iterator object. But I do not think that it is big problem.

upvote
  flag
What do you think the following code should produce? i=0 – Peter Gerdes
upvote
  flag
Try only executing list(it) on every other iteration through the loop, i.e. add a counter and check if it 0 mod 2. The expected behavior is to only print every other line of your output. The actual behavior is to print every line. – Peter Gerdes

Since I had to do something like this, here's my solution given a generator and a batch size:

def pop_n_elems_from_generator(g, n):
    elems = []
    try:
        for idx in xrange(0, n):
            elems.append(g.next())
        return elems
    except StopIteration:
        return elems

At this point, I think we need a recursive generator, just in case...

In python 2:

def chunks(li, n):
    if li == []:
        return
    yield li[:n]
    for e in chunks(li[n:], n):
        yield e

In python 3:

def chunks(li, n):
    if li == []:
        return
    yield li[:n]
    yield from chunks(li[n:], n)

Also, in case of massive Alien invasion, a decorated recursive generator might become handy:

def dec(gen):
    def new_gen(li, n):
        for e in gen(li, n):
            if e == []:
                return
            yield e
    return new_gen

@dec
def chunks(li, n):
    yield li[:n]
    for e in chunks(li[n:], n):
        yield e
2 upvote
  flag
You are the alien. Double yields? Haha nice one. +1 – PascalvKooten

At this point, I think we need the obligatory anonymous-recursive function.

Y = lambda f: (lambda x: x(x))(lambda y: f(lambda *args: y(y)(*args)))
chunks = Y(lambda f: lambda n: [n[0][:n[1]]] + f((n[0][n[1]:], n[1])) if len(n[0]) > 0 else [])
[AA[i:i+SS] for i in range(len(AA))[::SS]]

Where AA is array, SS is chunk size. For example:

>>> AA=range(10,21);SS=3
>>> [AA[i:i+SS] for i in range(len(AA))[::SS]]
[[10, 11, 12], [13, 14, 15], [16, 17, 18], [19, 20]]
# or [range(10, 13), range(13, 16), range(16, 19), range(19, 21)] in py3

As per this answer, the top-voted answer leaves a 'runt' at the end. Here's my solution to really get about as evenly-sized chunks as you can, with no runts. It basically tries to pick exactly the fractional spot where it should split the list, but just rounds it off to the nearest integer:

from __future__ import division  # not needed in Python 3
def n_even_chunks(l, n):
    """Yield n as even chunks as possible from l."""
    last = 0
    for i in range(1, n+1):
        cur = int(round(i * (len(l) / n)))
        yield l[last:cur]
        last = cur

Demonstration:

>>> pprint.pprint(list(n_even_chunks(list(range(100)), 9)))
[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
 [11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21],
 [22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32],
 [33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43],
 [44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55],
 [56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66],
 [67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77],
 [78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88],
 [89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99]]
>>> pprint.pprint(list(n_even_chunks(list(range(100)), 11)))
[[0, 1, 2, 3, 4, 5, 6, 7, 8],
 [9, 10, 11, 12, 13, 14, 15, 16, 17],
 [18, 19, 20, 21, 22, 23, 24, 25, 26],
 [27, 28, 29, 30, 31, 32, 33, 34, 35],
 [36, 37, 38, 39, 40, 41, 42, 43, 44],
 [45, 46, 47, 48, 49, 50, 51, 52, 53, 54],
 [55, 56, 57, 58, 59, 60, 61, 62, 63],
 [64, 65, 66, 67, 68, 69, 70, 71, 72],
 [73, 74, 75, 76, 77, 78, 79, 80, 81],
 [82, 83, 84, 85, 86, 87, 88, 89, 90],
 [91, 92, 93, 94, 95, 96, 97, 98, 99]]

Compare to the top-voted chunks answer:

>>> pprint.pprint(list(chunks(list(range(100)), 100//9)))
[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
 [11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21],
 [22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32],
 [33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43],
 [44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54],
 [55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65],
 [66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76],
 [77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87],
 [88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98],
 [99]]
>>> pprint.pprint(list(chunks(list(range(100)), 100//11)))
[[0, 1, 2, 3, 4, 5, 6, 7, 8],
 [9, 10, 11, 12, 13, 14, 15, 16, 17],
 [18, 19, 20, 21, 22, 23, 24, 25, 26],
 [27, 28, 29, 30, 31, 32, 33, 34, 35],
 [36, 37, 38, 39, 40, 41, 42, 43, 44],
 [45, 46, 47, 48, 49, 50, 51, 52, 53],
 [54, 55, 56, 57, 58, 59, 60, 61, 62],
 [63, 64, 65, 66, 67, 68, 69, 70, 71],
 [72, 73, 74, 75, 76, 77, 78, 79, 80],
 [81, 82, 83, 84, 85, 86, 87, 88, 89],
 [90, 91, 92, 93, 94, 95, 96, 97, 98],
 [99]]
1 upvote
  flag
This solution seems to fail in some situations: - when n > len(l) - for l = [0,1,2,3,4] and n=3 it returns [[0], [1], [2]] instead of [[0,1], [2,3], [4]] – DragonTux
upvote
  flag
@DragonTux: Ah I wrote the function for Python 3 - it gives [[0, 1], [2], [3, 4]]. I added the future import so it works in Python 2 as well – Claudiu
1 upvote
  flag
Thanks a lot. I keep forgetting the subtle differences between Python 2 and 3. – DragonTux

Since everybody here talking about iterators. boltons has perfect method for that, called iterutils.chunked_iter.

from boltons import iterutils

list(iterutils.chunked_iter(list(range(50)), 11))

Output:

[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
 [11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21],
 [22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32],
 [33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43],
 [44, 45, 46, 47, 48, 49]]

But if you don't want to be mercy on memory, you can use old-way and store the full list in the first place with iterutils.chunked.

upvote
  flag
And this one actually works regardless of order one looks at the subiterators!! – Peter Gerdes

You could use numpy's array_split function e.g., np.array_split(np.array(data), 20) to split into 20 nearly equal size chunks.

To make sure chunks are exactly equal in size use np.split.

I have one solution below which does work but more important than that solution is a few comments on other approaches. First, a good solution shouldn't require that one loop through the sub-iterators in order. If I run

g = paged_iter(list(range(50)), 11))
i0 = next(g)
i1 = next(g)
list(i1)
list(i0)

The appropriate output for the last command is

 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

not

 []

As most of the itertools based solutions here return. This isn't just the usual boring restriction about accessing iterators in order. Imagine a consumer trying to clean up poorly entered data which reversed the appropriate order of blocks of 5, i.e., the data looks like [B5, A5, D5, C5] and should look like [A5, B5, C5, D5] (where A5 is just five elements not a sublist). This consumer would look at the claimed behavior of the grouping function and not hesitate to write a loop like

i = 0
out = []
for it in paged_iter(data,5)
    if (i % 2 == 0):
         swapped = it
    else: 
         out += list(it)
         out += list(swapped)
    i = i + 1

This will produce mysteriously wrong results if you sneakily assume that sub-iterators are always fully used in order. It gets even worse if you want to interleave elements from the chunks.

Second, a decent number of the suggested solutions implicitly rely on the fact that iterators have a deterministic order (they don't e.g. set) and while some of the solutions using islice may be ok it worries me.

Third, the itertools grouper approach works but the recipe relies on internal behavior of the zip_longest (or zip) functions that isn't part of their published behavior. In particular, the grouper function only works because in zip_longest(i0...in) the next function is always called in order next(i0), next(i1), ... next(in) before starting over. As grouper passes n copies of the same iterator object it relies on this behavior.

Finally, while the solution below can be improved if you make the assumption criticized above that sub-iterators are accessed in order and fully perused without this assumption one MUST implicitly (via call chain) or explicitly (via deques or other data structure) store elements for each subiterator somewhere. So don't bother wasting time (as I did) assuming one could get around this with some clever trick.

def paged_iter(iterat, n):
    itr = iter(iterat)
    deq = None
    try:
        while(True):
            deq = collections.deque(maxlen=n)
            for q in range(n):
                deq.append(next(itr))
            yield (i for i in deq)
    except StopIteration:
        yield (i for i in deq)

You may also use get_chunks function of utilspie library as:

>>> from utilspie import iterutils
>>> a = [1, 2, 3, 4, 5, 6, 7, 8, 9]

>>> list(iterutils.get_chunks(a, 5))
[[1, 2, 3, 4, 5], [6, 7, 8, 9]]

You can install utilspie via pip:

sudo pip install utilspie

Disclaimer: I am the creator of utilspie library.

1 upvote
  flag
Looks cool. How about the performance of this lib? – endle

Here's an idea using itertools.groupby:

def chunks(l, n):
    c = itertools.count()
    return (it for _, it in itertools.groupby(l, lambda x: next(c)//n))

This returns a generator of generators. If you want a list of lists, just replace the last line with

    return [list(it) for _, it in itertools.groupby(l, lambda x: next(c)//n)]

Example returning list of lists:

>>> chunks('abcdefghij', 4)
[['a', 'b', 'c', 'd'], ['e', 'f', 'g', 'h'], ['i', 'j']]

(So yes, this suffers form the "runt problem", which may or may not be a problem in a given situation.)

upvote
  flag
Again this fails if the sub-iterators are not evaluated in order in the generator case. Let c = chunks('abcdefghij', 4) (as generator). Then set i0 = next(c); i1 = next(c); list(i1) //FINE; list(i0) //UHHOH – Peter Gerdes
upvote
  flag
@PeterGerdes, thank you for noting that omission; I forgot because I always used the groupby generators in order. The documentation does mention this limitation: "Because the source is shared, when the groupby() object is advanced, the previous group is no longer visible." – itub

One more solution

def make_chunks(data, chunk_size): 
    while data:
        chunk, data = data[:chunk_size], data[chunk_size:]
        yield chunk

>>> for chunk in make_chunks([1, 2, 3, 4, 5, 6, 7], 2):
...     print chunk
... 
[1, 2]
[3, 4]
[5, 6]
[7]
>>> 
1 upvote
  flag
This would be a lot more useful if you showed how to use it. It's currently useless as-is. – Anthony

This works in v2/v3, is inlineable, generator-based and uses only the standard library:

import itertools
def split_groups(iter_in, group_size):
    return ((x for _, x in item) for _, item in itertools.groupby(enumerate(iter_in), key=lambda x: x[0] // group_size))
upvote
  flag
Fails exactly in same way as above when sub-iterators are not evaluated in order. c = split_groups('abcdefghij', 4); i0 = next(c); i1 = next(c); list(i1); list(i0); – Peter Gerdes

No magic, but simple and correct:

def chunks(iterable, n):
    """Yield successive n-sized chunks from iterable."""
    values = []
    for i, item in enumerate(iterable, 1):
        values.append(item)
        if i % n == 0:
            yield values
            values = []
    if values:
        yield values

I don't think I saw this option, so just to add another one :)) :

def chunks(iterable, chunk_size):
  i = 0;
  while i < len(iterable):
    yield iterable[i:i+chunk_size]
    i += chunk_size

I was curious about the performance of different approaches and here it is:

Tested on Python 3.5.1

import time
batch_size = 7
arr_len = 298937

#---------slice-------------

print("\r\nslice")
start = time.time()
arr = [i for i in range(0, arr_len)]
while True:
    if not arr:
        break

    tmp = arr[0:batch_size]
    arr = arr[batch_size:-1]
print(time.time() - start)

#-----------index-----------

print("\r\nindex")
arr = [i for i in range(0, arr_len)]
start = time.time()
for i in range(0, round(len(arr) / batch_size + 1)):
    tmp = arr[batch_size * i : batch_size * (i + 1)]
print(time.time() - start)

#----------batches 1------------

def batch(iterable, n=1):
    l = len(iterable)
    for ndx in range(0, l, n):
        yield iterable[ndx:min(ndx + n, l)]

print("\r\nbatches 1")
arr = [i for i in range(0, arr_len)]
start = time.time()
for x in batch(arr, batch_size):
    tmp = x
print(time.time() - start)

#----------batches 2------------

from itertools import islice, chain

def batch(iterable, size):
    sourceiter = iter(iterable)
    while True:
        batchiter = islice(sourceiter, size)
        yield chain([next(batchiter)], batchiter)


print("\r\nbatches 2")
arr = [i for i in range(0, arr_len)]
start = time.time()
for x in batch(arr, batch_size):
    tmp = x
print(time.time() - start)

#---------chunks-------------
def chunks(l, n):
    """Yield successive n-sized chunks from l."""
    for i in range(0, len(l), n):
        yield l[i:i + n]
print("\r\nchunks")
arr = [i for i in range(0, arr_len)]
start = time.time()
for x in chunks(arr, batch_size):
    tmp = x
print(time.time() - start)

#-----------grouper-----------

from itertools import zip_longest # for Python 3.x
#from six.moves import zip_longest # for both (uses the six compat library)

def grouper(iterable, n, padvalue=None):
    "grouper(3, 'abcdefg', 'x') --> ('a','b','c'), ('d','e','f'), ('g','x','x')"
    return zip_longest(*[iter(iterable)]*n, fillvalue=padvalue)

arr = [i for i in range(0, arr_len)]
print("\r\ngrouper")
start = time.time()
for x in grouper(arr, batch_size):
    tmp = x
print(time.time() - start)

Results:

slice
31.18285083770752

index
0.02184295654296875

batches 1
0.03503894805908203

batches 2
0.22681021690368652

chunks
0.019841909408569336

grouper
0.006506919860839844

Not the answer you're looking for? Browse other questions tagged or ask your own question.