Flatten... or How to determine sequenceability?

Noel Rappin

unread,

May 25, 2001, 10:44:57 AM5/25/01

to pytho...@python.org

I'm writing code that needs to flatten a multi-dimension list or tuple into
a single dimension list.

[1, [2, 3], 4] => [1, 2, 3, 4]

As part of the algorithm, I need to determine whether each object in the
list is itself a sequence or whether it is an atom. Using the types module
would cause me to miss any sequence-like object that isn't actually the
basic type. So, what's the best (easiest, most foolproof) way to determine
whether a Python object is a sequence?

Thanks,

Noel Rappin

_________________________________________________________
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com

Leighton Pritchard

unread,

May 25, 2001, 11:52:28 AM5/25/01

to

On Fri, 25 May 2001 10:44:57 -0400, Noel Rappin <noe...@yahoo.com>
made a brave stab at grammar with:

>I'm writing code that needs to flatten a multi-dimension list or tuple into
>a single dimension list.
>
>[1, [2, 3], 4] => [1, 2, 3, 4]
>
>As part of the algorithm, I need to determine whether each object in the
>list is itself a sequence or whether it is an atom. Using the types module
>would cause me to miss any sequence-like object that isn't actually the
>basic type. So, what's the best (easiest, most foolproof) way to determine
>whether a Python object is a sequence?
>
>Thanks,
>
>Noel Rappin

Funny you should ask this, as there's been a similar thread on the
Python-Tutor list, in which the following link:

http://aspn.activestate.com/ASPN/Mail/msg/python-Tutor:454781

was mentioned. It contains an excellent flatten() function that might
do just what you're looking for.

--
Leighton Pritchard
l...@spamfree.aber.ac.uk

Thomas Wagner

unread,

May 25, 2001, 11:58:09 AM5/25/01

to

Hi Noel,

try this ...:

import types

def additem(li, item):
if type(item)==types.ListType:
for litem in item:
additem(li,litem)
else:
li.append(item)

def flatten(li):
dest = []
for item in li:
additem(dest,item)
return dest

# source list
src = ['1', [2, ['a', 'b']], 4]

# destination list ...
dest = flatten(src)

Regards,
Thomas

"Noel Rappin" <noe...@yahoo.com> schrieb im Newsbeitrag
news:mailman.990802119...@python.org...

Roman Suzi

unread,

May 25, 2001, 10:59:58 AM5/25/01

to Noel Rappin, pytho...@python.org

On Fri, 25 May 2001, Noel Rappin wrote:

> I'm writing code that needs to flatten a multi-dimension list or tuple into
> a single dimension list.
>
> [1, [2, 3], 4] => [1, 2, 3, 4]
>
> As part of the algorithm, I need to determine whether each object in the
> list is itself a sequence or whether it is an atom. Using the types module
> would cause me to miss any sequence-like object that isn't actually the
> basic type. So, what's the best (easiest, most foolproof) way to determine
> whether a Python object is a sequence?

What is your definition of sequence-object?

Probably you could check if

try:
s[0]
# sequence
except:
# not sequence

> Thanks,
>
> Noel Rappin

Sincerely yours, Roman A.Suzi
--
- Petrozavodsk - Karelia - Russia - mailto:r...@onego.ru -

Chad Everett

unread,

May 25, 2001, 12:24:27 PM5/25/01

to

and here it is:

def flatten(L):
if type(L) != type([]): return [L]
if L == []: return L
return flatten(L[0]) + flatten(L[1:])

-----= Posted via Newsfeeds.Com, Uncensored Usenet News =-----
http://www.newsfeeds.com - The #1 Newsgroup Service in the World!
-----== Over 80,000 Newsgroups - 16 Different Servers! =-----

Alex Martelli

unread,

May 25, 2001, 12:08:35 PM5/25/01

to

"Noel Rappin" <noe...@yahoo.com> wrote in message
news:mailman.990802119...@python.org...

> I'm writing code that needs to flatten a multi-dimension list or tuple
into
> a single dimension list.
>
> [1, [2, 3], 4] => [1, 2, 3, 4]
>
> As part of the algorithm, I need to determine whether each object in the
> list is itself a sequence or whether it is an atom. Using the types
module
> would cause me to miss any sequence-like object that isn't actually the
> basic type. So, what's the best (easiest, most foolproof) way to
determine
> whether a Python object is a sequence?

Hmmm -- good question. Checking whether x[:] raises an exception
is reasonable, but a mapping that accepts slice objects would fool
this test. I _suspect_ the most-foolproof test today might be:

def is_sequence(x):
try:
for _ in x:
return 1
else: return 1 # empty sequences are sequences too
except: return 0

However, if I understand correctly, this would break in 2.2,
since mappings then also become usable in a for statement,
not just sequences. This would then incorrectly identify
_any_ mapping as "a sequence", too. What *IS* there that
you can do ONLY to a sequence, ANY sequence, but NOT to
any mapping...? zip(), which seems like it could serve,
apparently does NOT test its arguments for "sequencehood"
(well, depending how you define that, I guess...): it
does a PySequence_GetItem, and fails iff that fails with
other than an IndexError.

>>> class x:
... def __getitem__(self, k): return k
...
>>> a=x()
>>> zip(a,'x')
[(0, 'x')]

map() and friends would be slow on long sequences, AND do
require a sequence to have a length to accept it, too.

I'm almost tempted to propose a tiny extension module, since
the C-API level *DOES* expose an "is-a-sequence" test...:

#include "Python.h"
static PyObject *is_seq(PyObject* self, PyObject* args) {
int i, rc;
PyObject *arg;
if(!PyArg_ParseTuple(args, "O", &arg)) return 0;
return Py_BuildValue("i",PySequence_Check(arg));
}
static PyMethodDef sq_module_functions[] =
{{ "is_seq", (PyCFunction)is_seq, METH_VARARGS },
{ 0, 0 }};
voidinitsq(void) {
PyObject* sq_module = Py_InitModule("sq", sq_module_functions);
}

and of course the setup.py to go with it:

from distutils.core import setup, Extension
sq_ext=Extension('sq', sources=['sq.c'])
setup(name="sq", ext_modules=[sq_ext])

but that doesn't seem fair...

...plus, it ALSO gets thrown by above-exemplified
object a from class x...!

So, maybe zip IS best after all, something like:

def is_sequence(x):
try: zip(x,'')
except: return 0
else: return 1

Oh BTW -- all of these methods will see a string or
unicode object as a 'sequence', because it IS one by
Python's rules (you can index and slice it, loop on
it with for, &c). Most often in such tasks one
wants to consider strings as atoms, but that will
require some further special-casing, anyway.

Alex

Nick Perkins

unread,

May 25, 2001, 2:13:53 PM5/25/01

to

>
> def flatten(L):
> if type(L) != type([]): return [L]
> if L == []: return L
> return flatten(L[0]) + flatten(L[1:])

...hmm
this might make a good Scheme program,
but I would consider this excessive and
unnecessary recursion in any language that
does not do proper tail recursion.
Python performance on any long list would
probably be very bad.
(hmm...Stackless?...i wonder...)

..what about:

def flatten(L):
if type(L) != type([]): return [L]
if L == []: return L

return reduce(lambda L1,L2:L1+L2,map(flatten,L))

...maybe someone knows how to avoid the lambda,
but I think this should be much faster.

Bob Weiner

unread,

May 25, 2001, 2:20:31 PM5/25/01

to

On Fri, 25 May 2001 15:52:28 GMT, l...@aber.ac.uk (Leighton Pritchard) wrote:
> On Fri, 25 May 2001 10:44:57 -0400, Noel Rappin <noe...@yahoo.com>
> made a brave stab at grammar with:
>
> >I'm writing code that needs to flatten a multi-dimension list or tuple into
> >a single dimension list.

> Funny you should ask this, as there's been a similar thread on the

> Python-Tutor list, in which the following link:
>
> http://aspn.activestate.com/ASPN/Mail/msg/python-Tutor:454781
>
> was mentioned. It contains an excellent flatten() function that might
> do just what you're looking for.
> --
> Leighton Pritchard
> l...@spamfree.aber.ac.uk

My interface.py module in http://www.deepware.com/pub/python/bwcto-interface-00.80.tgz
includes a somewhat more generic and expandable flatten function plus a unique function
(eliminates successive duplicate entries after flattening) and an ancestors function (returns
a flattened inheritance tree for a class or instance).

Mike C. Fletcher

unread,

May 25, 2001, 3:44:08 PM5/25/01

to Nick Perkins, pytho...@python.org

Oh, let's pull out this hoary old nugget anyway...

def collapse(inlist, type=type, ltype=types.ListType, maxint= sys.maxint):
'''
Destructively flatten a list hierarchy to a single level.
Non-recursive, and (as far as I can see, doesn't have any
glaring loopholes).
Further speedups and obfuscations by Tim Peters :)
'''
try:
# for every possible index
for ind in xrange( maxint):
# while that index currently holds a list
while type(inlist[ind]) is ltype:
# expand that list into the index (and subsequent indicies)
inlist[ind:ind+1] = inlist[ind]
#ind = ind+1
except IndexError:
pass
return inlist

Who knows, maybe there's even ways to make it faster these days. Enjoy,
Mike

..what about:

--
http://mail.python.org/mailman/listinfo/python-list

Fredrik Lundh

unread,

May 25, 2001, 3:47:42 PM5/25/01

to

Alex Martelli wrote:
> I'm almost tempted to propose a tiny extension module, since
> the C-API level *DOES* expose an "is-a-sequence" test...:

or you could import the "operator" module, and use
the operator.isSequenceType(obj) predicate.

Cheers /F

Tim Peters

unread,

May 25, 2001, 5:39:53 PM5/25/01

to pytho...@python.org

[Alex Martelli]

> Hmmm -- good question. Checking whether x[:] raises an exception
> is reasonable, but a mapping that accepts slice objects would fool
> this test. I _suspect_ the most-foolproof test today might be:
>
> def is_sequence(x):
> try:
> for _ in x:
> return 1
> else: return 1 # empty sequences are sequences too
> except: return 0
>
> However, if I understand correctly, this would break in 2.2,
> since mappings then also become usable in a for statement,
> not just sequences.

Correct, and it's that any iterable object becomes usable in a "for",
including dicts and even files. However, that also applies in 2.2 to every
other idea you're going to have:

> ...

> What *IS* there that you can do ONLY to a sequence, ANY sequence,
> but NOT to any mapping...? zip(), which seems like it could serve,
> apparently does NOT test its arguments for "sequencehood"
> (well, depending how you define that, I guess...): it
> does a PySequence_GetItem, and fails iff that fails with
> other than an IndexError.

True today but 2.2 zip() also works with any iterable object(s):

Python 2.2a0 (#16, May 14 2001, 19:54:03) [MSC 32 bit (Intel)] on win32
Type "copyright", "credits" or "license" for more information.
>>> d = {1:3, 2:4}
>>> zip(d, d)
[(1, 1), (2, 2)]
>>>

...

> I'm almost tempted to propose a tiny extension module, since
> the C-API level *DOES* expose an "is-a-sequence" test...:

This is already available, via the operator.isSequenceType() function (which
just calls PySequence_Check()). However, again in 2.2,

>>> import operator
>>> operator.isSequenceType({})
1
>>>

dict objects grow a sequence interpretation in 2.2 in order to support
containment testing, i.e. so that you can do

if key in dict:

instead of

if dict.has_key(key)

> ...
> ...plus, it ALSO gets thrown by above-exemplified
> object a from class x...!

Yup! Instances fill in *every* type-object slot, "just in case".

> So, maybe zip IS best after all, something like:
>
> def is_sequence(x):
> try: zip(x,'')
> except: return 0
> else: return 1

This is so clever I'm tempted to revert the 2.2 zip() enhancement <wink>.

> Oh BTW -- all of these methods will see a string or
> unicode object as a 'sequence', because it IS one by
> Python's rules (you can index and slice it, loop on
> it with for, &c). Most often in such tasks one
> wants to consider strings as atoms, but that will
> require some further special-casing, anyway.

All of which is why Guido is more than half in favor of throwing things like
operator.isSequenceCheck away: they don't do what people expect them to do,
but then that's because people come to Python expecting stuff like "is a
sequence" to be crisp concepts. But they aren't in Python, and after
introducing iterable objects, they're even fuzzier.

but-fuzzy-in-a-warm-and-loving-way-ly y'rs - tim

Alex Martelli

unread,

May 26, 2001, 3:18:24 AM5/26/01

to

"Tim Peters" <tim...@home.com> wrote in message
news:mailman.990826839...@python.org...
...

> This is already available, via the operator.isSequenceType() function
(which

Ooops (and thanks to /F for the same remark, too). I wonder if and
ever I'll ever know by heart everything that's in the standard lib!-)

[snippage]

> but then that's because people come to Python expecting stuff like "is a
> sequence" to be crisp concepts. But they aren't in Python, and after
> introducing iterable objects, they're even fuzzier.
>
> but-fuzzy-in-a-warm-and-loving-way-ly y'rs - tim

OK, "is-a sequence" is a typical Korzybskian "is of identity", thus
I guess it's quite reasonable to have it be a fuzzy concept.

But a need for something sharper does arise. "Behaves like a
sequence" (or "can adapt to sequence protocol", or "respects
sequence interface"). A dictionary "behaves" like a sequence
in a very partial way: if I do a "for k in dict" and just process
k I've lost substantial information (all of the values) and have
warmly and lovingly been fed non-information (an arbitrary
ordering of the keys).

I believe the normal concept of sequence does include ordering
as being significant, and no information loss -- list, tuple,
array.array, UserList -- no doubt files and (special cases that
it would be nice to have another sharp test for:-) strings and
unicode-strings, too.

With distinctions within the overall category of sequences, sure
(mutable vs not, with predefined length vs without). But it seems
to me that if either or both of the interfaces and protocol-adaptation
PEPs passed, 'such-and-such-a-sequence's would be useful
interfaces/protocols to have.

Alex

Marcin 'Qrczak' Kowalczyk

unread,

May 26, 2001, 3:39:29 PM5/26/01

to

Fri, 25 May 2001 10:44:57 -0400, Noel Rappin <noe...@yahoo.com> pisze:

> As part of the algorithm, I need to determine whether each object in
> the list is itself a sequence or whether it is an atom. Using the
> types module would cause me to miss any sequence-like object that
> isn't actually the basic type. So, what's the best (easiest, most
> foolproof) way to determine whether a Python object is a sequence?

There is no universal and formal distinction between sequences and
atoms. For example strings may be considered either.

IMHO data should not be encoded as a mix of unknown types of atoms
and unknown types of sequences in the first place if it's necessary
to distinguish atoms from sequences, but as something with more
explicitly defined structure.

--
__("< Marcin Kowalczyk * qrc...@knm.org.pl http://qrczak.ids.net.pl/
\__/
^^ SYGNATURA ZASTĘPCZA
QRCZAK

Martin von Loewis

unread,

May 27, 2001, 8:25:14 AM5/27/01

to

"Nick Perkins" <nper...@home.com> writes:

> def flatten(L):
> if type(L) != type([]): return [L]
> if L == []: return L
> return reduce(lambda L1,L2:L1+L2,map(flatten,L))

[...]

> ...maybe someone knows how to avoid the lambda,

Sure: Use operator.add.

Regards,
Martin

Delaney, Timothy

unread,

May 27, 2001, 9:28:33 PM5/27/01

to pytho...@python.org

> Alex Martelli wrote:
> > I'm almost tempted to propose a tiny extension module, since
> > the C-API level *DOES* expose an "is-a-sequence" test...:
>

> or you could import the "operator" module, and use
> the operator.isSequenceType(obj) predicate.

Another option:

def f (l)

try:
l = list(l)
except TypeError:
# not a sequence

Tim Delaney

Tim Peters

unread,

May 28, 2001, 12:46:31 AM5/28/01

to pytho...@python.org

[Alex Martelli]

> OK, "is-a sequence" is a typical Korzybskian "is of identity", thus
> I guess it's quite reasonable to have it be a fuzzy concept.
>
> But a need for something sharper does arise.

I agree, but haven't seen anything convincingly useful toward that end.
Call that "something sharper" S. Is S more akin to universal truth, or to
something merely expedient depending on context? In my experience, it's the
latter.

> "Behaves like a sequence" (or "can adapt to sequence protocol", or
> "respects sequence interface"). A dictionary "behaves" like a sequence
> in a very partial way: if I do a "for k in dict" and just process
> k I've lost substantial information (all of the values) and have
> warmly and lovingly been fed non-information (an arbitrary
> ordering of the keys).

Except we can't know that *just* from knowing "it's a dict". For example,
dicts are often used to implement sets, typically using 1 or None as the
only value. In that context all the info is in the keys, and hiding the
values is a helpful weeding out of irrelevant implementation detail. We can
agree to say that a dict does, or does not, implement the sequence
interface, but for some apps some of the time either choice will get in the
way.

Overall I expect it's more useful more often to say that a dict does not
implement the sequence interface. But in 2.2 it does implement the
iteration interface, and people will have to get used to that sequences are
a proper subset of iterable objects.

> I believe the normal concept of sequence does include ordering
> as being significant, and no information loss -- list, tuple,
> array.array, UserList -- no doubt files and (special cases that
> it would be nice to have another sharp test for:-) strings and
> unicode-strings, too.

Files are alone there in not supporting integer indexing (__getitem__(i)),
and that's "the essence" of a sequence to many apps. So are files "really"
sequences? Strings are alone in that s[i] returns an object of the same
type as s, and that's been the cause of many an unbounded recursion in
"generic sequence" routines. So I fear that no matter how we define "it's a
sequence", it won't be particularly useful. Perhaps we should have a large
set of predefined tag classes (IsIterable, IsOrdered, IsIntegerIndexable,
IsMutable, IsFixedLength, IsHashable, IsLexicographicallyOrdered, IsAString
<wink -- but it reads better than IsElementTypeSameAsContainerType!>, ...)
and let people derive from them however best suits their apps. Not
particularly useful today, but Guido is working on changes that should allow
efficient subclassing of the builtin types.

> With distinctions within the overall category of sequences, sure
> (mutable vs not, with predefined length vs without). But it seems
> to me that if either or both of the interfaces and protocol-adaptation
> PEPs passed, 'such-and-such-a-sequence's would be useful
> interfaces/protocols to have.

I do wonder. If the goal is to document assumptions-- which is a good
thing --people could write comments today. If the goal is to automate
error-checking, there's more value but it's likely to slow calls
significantly -- and then the audience in practice is slashed. If the goal
is to speed programs by hoping Python will exploit type information,
somebody's smoking grass again <wink>. Maybe the greatest value would be to
give us *a* documented answer to "what's a sequence?", which, perfect or
not, would be something everyone can rely on.

otoh-you-can-just-try-it-and-see-whether-it-breaks-ly y'rs - tim