which new library modules?

4 views
Skip to first unread message

srepmub

unread,
Nov 29, 2009, 1:01:51 PM11/29/09
to shedskin-discuss
hi all,

for the next release, I'd like to support at least three more standard
library modules. interesting candidates at this point are csv and
hashlib. any suggestions for a third one? modules shouldn't be too
big, somewhat associated with efficient programming, and preferrably
of course there are pure Python implementations that look like they
might be compiled with Shedskin (after some plumbing, typically).


thanks!
mark.

Jérémie Roquet

unread,
Nov 30, 2009, 5:58:04 AM11/30/09
to shedskin...@googlegroups.com
Hello (again),

2009/11/29 srepmub <mark....@gmail.com>:
I worked on the itertools module a few weeks ago, but didn't find
enough time since then to finish it (I've it on a long TODO-list).
While there still is a lot of work to get it finished, some functions
are already usable, so you might want to add it to the next release
(I'll post improvements as I find time to work on them).
Actually, it's impossible to get everything to work because this
module intensively use argument unpacking (which is not yet supported
by Shek Skin -- or at least wasn't back in October), and also because
I found no way to make the distinction between None and 0 in the
generated cpp code.
BTW, some broken GCC versions (at least 4.1.x, as mentioned in the
code) cannot compile the islice function because of type issues.

The attached patch adds the (partial) itertools module, the __GC_DEQUE
macro and a paragraph to the LICENSE file.
There is no clean unit tests yet, do I didn't add them.

Fully CPython 2.7 (trunk) compliant functions are : count, cycle,
repeat, compress, dropwhile, groupby, ifilter, ifilterfalse and
takewhile (compress is not supported, and count support is limited
with CPython 2.6.x)
chain requires argument unpacking to be fully compliant, but I
hard-coded support for up to five iterables
tee requires arbitrary sized tuples (don't know if it is already
supported or not), for now only n == 2 is supported
islice requires distinction between None and 0 in C++, for now the
case where the stop argument is 0 is not CPython 2.7 compliant, others
should be

imap, starmap, izip, izip_longest, product, permutations, combinations
and combinations_with_replacement are not yet available
(combinations_with_replacement is neither supported by CPython 2.6.x).

Best regards,

--
Jérémie Roquet - Arkanosis
Programming artist
Developer in natural language processing - Exalead
itertools.1.patch

Mark Dufour

unread,
Nov 30, 2009, 10:46:49 AM11/30/09
to shedskin...@googlegroups.com
nice work, thanks!!!

I will have a good look at your code next week, after an exam on sunday, and probably commit it to SVN.

before releasing a new module, however, I want to make sure we support it almost completely (one or two functions that cannot be supported is fine), so we can be very clear in the tutorial about which modules work and which don't.

some notes about the limitations you mention:

-while argument (un)packing is not supported in 'user space', it _is_ possible to add minor hacks to the compiler to improve things for library functions.. there are several other hacks already for other normally unsupported situations.

-true, tuples with different types of elements and length > 2 are currently not supported.. but because this is clearly mentioned in the tutorial, it's fine imo if itertools doesn't support these either for now.

-mixing of None and 0 is more worrisome. I'm afraid this cannot be supported at all, nor at any time in the near future. is this only a problem for islice? could we perhaps add a warning here, saying that 0 is used instead of None, for integers?


thanks again!
mark.

2009/11/30 Jérémie Roquet <arka...@gmail.com>

--

You received this message because you are subscribed to the Google Groups "shedskin-discuss" group.
To post to this group, send email to shedskin...@googlegroups.com.
To unsubscribe from this group, send email to shedskin-discu...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/shedskin-discuss?hl=en.





--
"One of my most productive days was throwing away 1000 lines of code" - Ken Thompson

Jérémie Roquet

unread,
Nov 30, 2009, 2:03:57 PM11/30/09
to shedskin...@googlegroups.com
2009/11/30 Mark Dufour <mark....@gmail.com>:
> before releasing a new module, however, I want to make sure we support it
> almost completely (one or two functions that cannot be supported is fine),
> so we can be very clear in the tutorial about which modules work and which
> don't.

Makes sense. I'll try to work a bit more on this, maybe this week-end,
but I'm unable to guarantee anything.

> some notes about the limitations you mention:
> -while argument (un)packing is not supported in 'user space', it _is_
> possible to add minor hacks to the compiler to improve things for library
> functions.. there are several other hacks already for other normally
> unsupported situations.

Great, I'll have a look at it.
But it'd be very nice to have generalized support for this in the
future ; maybe I'll look for a clean solution -- if you've not planned
to do it yourself, of course.

> -true, tuples with different types of elements and length > 2 are currently
> not supported.. but because this is clearly mentioned in the tutorial, it's
> fine imo if itertools doesn't support these either for now.

It's fine for me too, at least for itertools, because I rarely use
more than two sequence at a time.

> -mixing of None and 0 is more worrisome. I'm afraid this cannot be supported
> at all, nor at any time in the near future. is this only a problem for
> islice? could we perhaps add a warning here, saying that 0 is used instead
> of None, for integers?

Yes, for now this is the only case where I had the problem, so this is
not critical (a workaround is possible in Python), and a warning would
suffice.
I wonder if it would be possible to use __null for None, which is a
typed wrapper for a null integer, but faking the C++ type-checker
(I've to check GCC's template issues).

Anyway, thanks for your reply and your comments about the limitations.

Mark Dufour

unread,
Nov 30, 2009, 2:33:13 PM11/30/09
to shedskin...@googlegroups.com
Great, I'll have a look at it.
But it'd be very nice to have generalized support for this in the
future ; maybe I'll look for a clean solution -- if you've not planned
to do it yourself, of course.

I think we can just use the normal '*args' syntax in lib/*.py, and let shedskin handle this somewhat correctly, good enough to perform type inference. this should also be useful to improve some builtins at least (such as zip?)

on the C++ side, it would be nice to be able to avoid special-casing for 1 argument, 2 arguments etc., and to use va_list macros instead. but then I think we would have to let shedskin check that all arguments are of the same type.


thanks,
mark.

Mark Dufour

unread,
Dec 1, 2009, 8:13:21 AM12/1/09
to shedskin...@googlegroups.com
> -mixing of None and 0 is more worrisome. I'm afraid this cannot be supported
> at all, nor at any time in the near future. is this only a problem for
> islice? could we perhaps add a warning here, saying that 0 is used instead
> of None, for integers?

Yes, for now this is the only case where I had the problem, so this is
not critical (a workaround is possible in Python), and a warning would
suffice.

to be clear, I think you meant 'izip_longest' instead of 'islice'..? I don't see where None and 0 are mixed in 'islice'.
 
I wonder if it would be possible to use __null for None, which is a
typed wrapper for a null integer, but faking the C++ type-checker
(I've to check GCC's template issues).

I don't think it's possible to efficiently mix None and int in C/C++ programs (though I'd love to be proven wrong!), so for maximum speed I prefer to just disallow it. in many cases it's also the result of spaghetti programming (of course not all cases), and -1 or 0 should work just as well. I'm hoping for hardware support at some point, to support the cases where it's actually useful, but I don't see that happening really.


thanks,
mark.

Jérémie Roquet

unread,
Dec 1, 2009, 9:56:32 AM12/1/09
to shedskin...@googlegroups.com
2009/11/30 Mark Dufour <mark....@gmail.com>:
>> Great, I'll have a look at it.
>> But it'd be very nice to have generalized support for this in the
>> future ; maybe I'll look for a clean solution -- if you've not planned
>> to do it yourself, of course.
> I think we can just use the normal '*args' syntax in lib/*.py, and let
> shedskin handle this somewhat correctly, good enough to perform type
> inference. this should also be useful to improve some builtins at least
> (such as zip?)

Sure, this would be of great help.

> on the C++ side, it would be nice to be able to avoid special-casing for 1
> argument, 2 arguments etc., and to use va_list macros instead. but then I
> think we would have to let shedskin check that all arguments are of the same
> type.

Agree, hard-coded functions are only a ugly and limited workaround and
actual variadics would be far better.
There are alternatives to stdarg with some type safety: C++0x variadic
templates or boost::mpl pseudo-variadic templates. I don't know if
this is desirable, though.

2009/12/1 Mark Dufour <mark....@gmail.com>:
>> > -mixing of None and 0 is more worrisome. I'm afraid this cannot be
>> > supported at all, nor at any time in the near future. is this only a problem for
>> > islice? could we perhaps add a warning here, saying that 0 is used
>> > instead of None, for integers?
>> Yes, for now this is the only case where I had the problem, so this is
>> not critical (a workaround is possible in Python), and a warning would
>> suffice.
> to be clear, I think you meant 'izip_longest' instead of 'islice'..? I don't
> see where None and 0 are mixed in 'islice'.

You may use both of them for islice's 'stop' argument (running CPython
2.7 trunk) :

>>> import itertools
>>> for i in itertools.islice('ABCDEFG', 2, 0): print i
...
>>> for i in itertools.islice('ABCDEFG', 2, None): print i
...
C
D
E
F
G
>>>

But izip_longest may have the same problem, I haven't tried to implement it yet.

>> I wonder if it would be possible to use __null for None, which is a
>> typed wrapper for a null integer, but faking the C++ type-checker
>> (I've to check GCC's template issues).
> I don't think it's possible to efficiently mix None and int in C/C++
> programs (though I'd love to be proven wrong!), so for maximum speed I
> prefer to just disallow it. in many cases it's also the result of spaghetti
> programming (of course not all cases), and -1 or 0 should work just as well.
> I'm hoping for hardware support at some point, to support the cases where
> it's actually useful, but I don't see that happening really.

The idea is to just have a class containing the integer value (a
workaround for the strong typedef available in D, which C++ lacks of).
This would allow to select the right template specialization at
compilation time, so there should be no overhead at all at runtime.
I'll try to get a working sample for illustration.

Mark Dufour

unread,
Dec 1, 2009, 10:19:49 AM12/1/09
to shedskin...@googlegroups.com
There are alternatives to stdarg with some type safety: C++0x variadic
templates or boost::mpl pseudo-variadic templates. I don't know if
this is desirable, though.

thanks, I didn't know that. but I'd much rather wait for c++0x (no idea what its status is now) than depend on boost. in this case, it's also quite easy for shedskin to check if all *arguments have the same type.

You may use both of them for islice's 'stop' argument

oic, yes. then izip_longest has a larger problem, because it may actually mix ints and None in its output (default fillvalue)..
 
by now I looked at all itertools functions, and found three problematic ones:

-groupby. looks like we would have to add a template variable for the return type of the key function, and let shedskin instantiate it explicitly. I'm not sure how you solved this?

-izip_longest. fillvalue is also a keyword argument :/

-starmap. I don't think we can easily support this one for now.. I would be quite satisfied if everything else can be supported though.


thanks,

Mark Dufour

unread,
Dec 1, 2009, 10:39:28 AM12/1/09
to shedskin...@googlegroups.com
-starmap. I don't think we can easily support this one for now.. I would be quite satisfied if everything else can be supported though.

no, wait, map and imap have the same problem: the call to the passed function has to modeled correctly somehow. I will have to think about this..

Mark Dufour

unread,
Dec 1, 2009, 5:03:44 PM12/1/09
to shedskin...@googlegroups.com
I've just refactored things a bit, so '*args' syntax can now be used in lib/, and made os.path.join, os.execl*, os.spawnl* use this..

it's still a bit rough, but I will probably make some further improvements over the coming days. something I won't improve is that the type of 'args' is just modeled as the type of the first element of what it would be in python.. which should be enough to do type inference on itertools.

the following type model in lib/itertools.py now works at least:

def test(func, *iterable):
    return __iter(func(iter(iterable).next()))

(__iter is the name of the builtin iterator class)

on the C++ side, an argument is added to every call, giving the total number of arguments (this will probably become the number of only the variadic arguments).

the following code now at least is analyzed correctly:

import itertools
def hoppa(x): return x
y = itertools.test(hoppa, [1,2,3], [3,4,5], [5,6,7])

this is the first step to modeling map/imap/starmap I guess. for these to work properly, I think argument _packing_ has to be somewhat supported as well.. anyway, so far it all seems relatively easy to add.

but yes I also seem to remember that variadic arguments and templates don't play well together at the moment. so we'll probably still have to do some copy-pasting in itertools.?pp.. ?


thanks,
mark.

Jérémie Roquet

unread,
Dec 2, 2009, 4:38:53 AM12/2/09
to shedskin...@googlegroups.com
Hello,

2009/12/1 Mark Dufour <mark....@gmail.com>:
> I've just refactored things a bit, so '*args' syntax can now be used in
> lib/, and made os.path.join, os.execl*, os.spawnl* use this..

Great :D

> it's still a bit rough, but I will probably make some further improvements
> over the coming days. something I won't improve is that the type of 'args'
> is just modeled as the type of the first element of what it would be in
> python.. which should be enough to do type inference on itertools.

AFAIR, that's fine for itertools: it works on sequences of only one type.

> the following type model in lib/itertools.py now works at least:
> def test(func, *iterable):
>     return __iter(func(iter(iterable).next()))
> (__iter is the name of the builtin iterator class)
> on the C++ side, an argument is added to every call, giving the total number
> of arguments (this will probably become the number of only the variadic
> arguments).
> the following code now at least is analyzed correctly:
> import itertools
> def hoppa(x): return x
> y = itertools.test(hoppa, [1,2,3], [3,4,5], [5,6,7])

Looks good, I'll play with this soon.

> this is the first step to modeling map/imap/starmap I guess. for these to
> work properly, I think argument _packing_ has to be somewhat supported as
> well.. anyway, so far it all seems relatively easy to add.
> but yes I also seem to remember that variadic arguments and templates don't
> play well together at the moment. so we'll probably still have to do some
> copy-pasting in itertools.?pp.. ?

Maybe in some cases like this one, it'd be cool to the have C++ code
generated by Shed Skin when needed, instead of pure C++ templates.
I mean: to have, for some builtins, a Python function that is called
by Shed Skin with the arguments the C++ function is expected to
accept, which generates the latter with the good types and number of
arguments.

BTW, I don't remember having had any blocking problem with the
'groupby' function (but I may have missed something).
And yes, it looks like the return value of izip_longest will be
problematic... :s

Best regards,

Mark Dufour

unread,
Dec 3, 2009, 6:35:08 AM12/3/09
to shedskin...@googlegroups.com
> is just modeled as the type of the first element of what it would be in
> python.. which should be enough to do type inference on itertools.

AFAIR, that's fine for itertools: it works on sequences of only one type.

but imap, for example, can return a tuple iterator with tuples with different types (tuple2<A,B>). so we'd need to add a few exceptions to handle this, at least with the way things are modeled at the moment.
 
Looks good, I'll play with this soon.

between studying I also added some _unpacking_ improvements, so the following type model in lib/builtin.py now works:

def map(func, *iterable):
    return [func(*iter(iterable).next())]

next thing is to see what will happen on the C++ side for the return type of 'func'. we'll need two template variables, but hopefully shedskin won't have to instantiate these explicitly..

Maybe in some cases like this one, it'd be cool to the have C++ code
generated by Shed Skin when needed, instead of pure C++ templates.

yes, but (quite) a bit of ugliness in lib/ is fine.. and it may be less effort to just wait for variadic templates (if that's the correct term) in c++0x.

And yes, it looks like the return value of izip_longest will be
problematic... :s
 
we'll probably run into a few more of these (relatively obscure cases). while looking at 'map' I found it does the same thing..


thanks,

Jérémie Roquet

unread,
Dec 3, 2009, 8:42:57 AM12/3/09
to shedskin...@googlegroups.com
2009/12/3 Mark Dufour <mark....@gmail.com>:
>> > is just modeled as the type of the first element of what it would be in
>> > python.. which should be enough to do type inference on itertools.
>> AFAIR, that's fine for itertools: it works on sequences of only one type.
> but imap, for example, can return a tuple iterator with tuples with
> different types (tuple2<A,B>). so we'd need to add a few exceptions to
> handle this, at least with the way things are modeled at the moment.

Oh, you're right. In fact, even the arguments' type is problematic, if
I do something like:

import itertools
def foo(a, b):
a += 21
b += 'foo'
return '%i %s' % (a, b)
for i in itertools.imap(foo, (21, 12, 42), ('foo', 'bar', 'baz')):
print i

I not only need a variadic imap function but its variadic parameters
are tuples of different types :s

>> Looks good, I'll play with this soon.
> between studying I also added some _unpacking_ improvements,

Please, don't spend too much time on Shed Skin if you have an exam on Sunday ;-)

> so the following type model in lib/builtin.py now works:
> def map(func, *iterable):
>     return [func(*iter(iterable).next())]

Looks like this is perfect for map.

For imap, we'd need something like :

def imap(func, *iterables):
iters = map(iter, iterables) # 1
args = map(next, iters) # 2
return __iter(func(*args)) # 3

With 'func' defaulting to 'tuple'

Each line causes a problem:
#1: iterables may be of different types, so may be iters
#2: again, iters may be of different types, so may be args
#3: args may be of different types

Everything looks fine if all 'func' s parameters are of the same type
; otherwise it's not the case at all :s

> next thing is to see what will happen on the C++ side for the return type of
> 'func'. we'll need two template variables, but hopefully shedskin won't have
> to instantiate these explicitly..

This works:

template <typename T> struct iter
{ iter(T) {} };

template <typename T> iter<T> imap(T (*func)(int, ...), int argc, ...)
{ return iter<T>(func(argc, "bar")); } // oops

double foo(int /* argc */, ...) { return .0; }

int main()
{ imap(foo, 1, "baz"); }

The return type of 'func' is known, but the type of '...' is unknown,
and appart from explicit instantiation, I don't see how to indicate
it...
The line I commented ("oops"), is another (bigger) problem: how can we
call the variadic function 'func', without knowing in advance the
number of parameters?

>> Maybe in some cases like this one, it'd be cool to the have C++ code
>> generated by Shed Skin when needed, instead of pure C++ templates.
> yes, but (quite) a bit of ugliness in lib/ is fine.. and it may be less
> effort to just wait for variadic templates (if that's the correct term) in
> c++0x.

I don't know... variadic templates (that's the correct term, btw)
should already be usable with the last versions of gcc (with
-std=c++0x), but I've no idea when 0x will become standard.

Actually, I think there might be less effort to code small code
generators than to try to overcome C++ limitations...

Mark Dufour

unread,
Dec 3, 2009, 10:46:53 AM12/3/09
to shedskin...@googlegroups.com
Please, don't spend too much time on Shed Skin if you have an exam on Sunday ;-)

most of the machinery to handle itertools is already available.. I just have to 'nudge' it a bit.. :)

Everything looks fine if all 'func' s parameters are of the same type
; otherwise it's not the case at all :s

I'm playing a bit with 'map', and now have the following in SVN (see the new test in unit.py):


def map(func, *iterable):
    return [func(*iter(iterable).next())]
def __map3(func, iter1, iter2):
    return [func(iter(iter1).next(), iter(iter2).next())]

shedskin automatically selects __map3 if map gets three arguments (hard-coded for now like for __zip2, but probably a good idea to make into a general rule).

this is not extremely pretty, but works well when there's only tuple<A,B> to worry about. I'd like to go to tuple<A,B,C> at least later, but we'll see about that  then..

I don't think we want to model the case where func is None, at least for now.. it's a bit of a silly case, and we can just disallow it (again, for now) with an error message. 
 
template <typename T> iter<T> imap(T (*func)(int, ...), int argc, ...)

okay, phew thanks :) I just used this to implement map for 2 and 3 arguments.

but now I'm wondering what will happen when we replace the function-pointer types with pointers to pyobj-inheriting objects.. which I'd also like to do for the next release (so function pointers can be contained in lists and such). 

The line I commented ("oops"), is another (bigger) problem: how can we
call the variadic function 'func', without knowing in advance the
number of parameters?

yes, I'm afraid this will require a bit of copy-pasting or auto-generating..

Actually, I think there might be less effort to code small code
generators than to try to overcome C++ limitations...

yes, but I'm not looking forward to building something like that into shedskin at the moment.. perhaps it's possible to use a separate itertools-specific little script to do this for now?


thanks,

srepmub

unread,
Dec 4, 2009, 8:30:29 AM12/4/09
to shedskin-discuss
okay, I've just generalized the __%s%d naming scheme, so you can now
redirect calls during type inference based on actual numbers of
arguments. see for example the 'map' and 're.match_object.group' type
models in lib/.

while hacking away at this, I also added 'filter', 'reduce' and 'next'
implementations, and am preparing to add support for the 'key'
argument to 'sorted' (and list.sort).. btw, the 'map' implementation
is not yet complete, because of None-issues. I will get back to that
soon..

srepmub

unread,
Dec 9, 2009, 1:27:47 PM12/9/09
to shedskin-discuss
hello jeremie,

> I wonder if it would be possible to use __null for None, which is a
> typed wrapper for a null integer, but faking the C++ type-checker
> (I've to check GCC's template issues).

I've been banging my head on this issue today, and it seems hard to
solve without some kind of such help from the compiler. I will have a
good look at _null - thanks for the suggestion!

would you like me to look into merging your patch next, or would you
prefer to send in a new version first..?


thanks,
ma

Jérémie Roquet

unread,
Dec 9, 2009, 2:42:09 PM12/9/09
to shedskin...@googlegroups.com
Hello Mark,

How was your exam ?

2009/12/9 srepmub <mark....@gmail.com>:
>> I wonder if it would be possible to use __null for None, which is a
>> typed wrapper for a null integer, but faking the C++ type-checker
>> (I've to check GCC's template issues).
> I've been banging my head on this issue today, and it seems hard to
> solve without some kind of such help from the compiler. I will have a
> good look at _null - thanks for the suggestion!

It may help to choose the right template specialization, but I fear
this will sadly not solve the return type problem at all (when 0 and
None are two possible and distinct return values) :s

> would you like me to look into merging your patch next, or would you
> prefer to send in a new version first..?

I should have sent a mail before... looks like you made some great
improvements :-)

So I'm really busy and the remaining functions are a bit harder than
the previous ones, so you may consider merging the patch before it's
finished (i'll update my checkout and patch against the merged version
later): this will at least add partial support for itertools.

I attached a short tests suite for itertools you might want to add to unit.py.
I commented the tests for islice, because it does not compile for me
right now (it did back in October, so I don't know if it's the updated
version of shedskin or my updated version of g++ that broke it -- I'll
investigate this later).

Btw, you'll see that the UT for groupby does not pass, because
shedskin displays booleans as integers while Python displays them as,
well, booleans ("True" and "False") -- otherwise the results are
correct.
I looks like shedskin considers the following lambda :
key = lambda x: x > 5
as a function returning an integer :
typedef int (*lambda0)(int);

I didn't look further for the cause of this...

Note that you'll maybe prefer not to include the UT, because I tested
my implementation against the CPython trunk (ie. version 2.7), so
CPython 2.6 and earlier versions do not support everything.

Best regards,

--
Jérémie
unit.itertools.py

srepmub

unread,
Dec 9, 2009, 2:43:40 PM12/9/09
to shedskin-discuss

btw, c++0x apparently has a nullptr type:

http://www.devx.com/cplus/10MinuteSolution/35167/1954

looks like they're solving a lot of long-standing ugliness with c++0x!


mark.

srepmub

unread,
Dec 10, 2009, 7:08:38 AM12/10/09
to shedskin-discuss

> How was your exam ?

not that great - it was a language exam, and I completely failed at
the listening part, because I usually only read and never listen. so
if I fail it's fortunately not because of working on shedskin too
much.. ^^

> Btw, you'll see that the UT for groupby does not pass, because
> shedskin displays booleans as integers while Python displays them as,
> well, booleans ("True" and "False") -- otherwise the results are
> correct.

yes, bools are currently modeled as integers to simplify type
inference. I should probably do an experiment (maybe for the next
release) to see if this really (still) simplifies things.. I'm
guessing we can make them into real booleans without too much
problems.

I'm looking forward to your new patch! note that in the previous
patch, there were many differences that were not really different (if
you understand what I mean).. it would be nice if those could be
avoided this time.


thanks!
mark.

Jérémie Roquet

unread,
Dec 10, 2009, 7:45:47 AM12/10/09
to shedskin...@googlegroups.com
2009/12/10 srepmub <mark....@gmail.com>:
> note that in the previous patch, there were many differences that were not really different (if
> you understand what I mean).. it would be nice if those could be avoided this time.

Trailing whitespaces... I know :s
I'll set my editor to avoid auto-removal for next time.

(The last file I sent is not a patch, just the part you may want -- or
not -- to add.)

Best regards,

--
Jérémie

srepmub

unread,
Dec 11, 2009, 5:08:33 AM12/11/09
to shedskin-discuss

> Trailing whitespaces... I know :s
> I'll set my editor to avoid auto-removal for next time.

it's probably better if there are no trailing whitespaces in the
shedskin code. so I used sed to remove all of them. thanks for
triggering this ;)

> (The last file I sent is not a patch, just the part you may want -- or
> not -- to add.)

I think I prefer to add the patch + unit tests at the same time.


thanks!
mark.

srepmub

unread,
Dec 11, 2009, 5:38:12 AM12/11/09
to shedskin-discuss

> You may use both of them forislice's'stop' argument (running CPython
> 2.7 trunk) :
>
> >>> import itertools
> >>> for i in itertools.islice('ABCDEFG', 2, 0): print i
> ...
> >>> for i in itertools.islice('ABCDEFG', 2, None): print i

note that an indirect 'None' argument does have a different type from
0 (void *):

a = None
itertools.islice('ABCDEFG', 2, a)

so the distinction problem is only there when passing None directly.

I added a small exception to shedskin, to cast None (NULL) to void *
when passed directly to itertools.islice. if we encounter more
builtins that need this later, we can make the exception more
general.

so to support all of itertools, I think I still need to add:
-a warning for izip_longest, saying that 0 is used as fillvalue for
integers
-something to make the keyword args of izip_longest and product work

did I miss anything?


thanks,
mark.

srepmub

unread,
Dec 13, 2009, 5:37:46 AM12/13/09
to shedskin-discuss

> so to support all of itertools, I think I still need to add:
> -a warning for izip_longest, saying that 0 is used as fillvalue for
> integers
> -something to make the keyword args of izip_longest and product work

I added the warning, and the following trick to support **kwargs for
predefined keywords in builtins:

def product(*iterable, **kwargs):
__kw_repeat = 1
return __iter((iter(iterable).next(),))

this has the effect of putting any 'repeat' keyword argument in the
front, defaulting to '1'. it should also work for multiple keyword
arguments and of course izip_longest (__kw_fillvalue = None).


mark.

Jérémie Roquet

unread,
Dec 14, 2009, 9:11:50 AM12/14/09
to shedskin...@googlegroups.com
2009/12/11 srepmub <mark....@gmail.com>:
> note that an indirect 'None' argument does have a different type from
> 0 (void *):
> a = None
> itertools.islice('ABCDEFG', 2, a)
> so the distinction problem is only there when passing None directly.
> I added a small exception to shedskin, to cast None (NULL) to void *
> when passed directly to itertools.islice. if we encounter more
> builtins that need this later, we can make the exception more
> general.

Then template specialization should be enough to support both None and
0, great :-)
Would it be a problem to do this additional cast everywhere ?

> so to support all of itertools, I think I still need to add:
> -a warning for izip_longest, saying that 0 is used as fillvalue for
> integers
> -something to make the keyword args of izip_longest and product work
> did I miss anything?

I think that's all (but I'm sure I'm wrong ^^).

2009/12/13 srepmub <mark....@gmail.com>:
> I added the warning, and the following trick to support **kwargs for
> predefined keywords in builtins:
> def product(*iterable, **kwargs):
> __kw_repeat = 1
> return __iter((iter(iterable).next(),))
> this has the effect of putting any 'repeat' keyword argument in the
> front, defaulting to '1'. it should also work for multiple keyword
> arguments and of course izip_longest (__kw_fillvalue = None).

Great work, thanks!

Best regards,

--
Jérémie

Mark Dufour

unread,
Dec 14, 2009, 9:19:35 AM12/14/09
to shedskin...@googlegroups.com
Then template specialization should be enough to support both None and
0, great :-)
Would it be a problem to do this additional cast everywhere ?

I'm afraid so, because in many cases we need to use NULL to automatically pick the correct specialization. consider this:

l = ['ha']
l->append(None)

if 'None' is of type void *, the C compiler will complain that list<str *>->append doesn't accept a void * without a cast to str *..

so, the following relatively silly example actually doesn't work at the moment with shedskin:

l = ['ha']
a = None
l->append(a)

I think we need something like c++0x's nullptr to support None properly. and from what I read about  nullptr, we may actually be able to define it ourselves without needing c++0x..


thanks,

Jérémie Roquet

unread,
Dec 15, 2009, 2:17:13 PM12/15/09
to shedskin...@googlegroups.com
Hello,

2009/12/14 Mark Dufour <mark....@gmail.com>:
>> Would it be a problem to do this additional cast everywhere ?
> I'm afraid so, because in many cases we need to use NULL to automatically
> pick the correct specialization. consider this:
> l = ['ha']
> l->append(None)
> if 'None' is of type void *, the C compiler will complain that list<str
> *>->append doesn't accept a void * without a cast to str *..
> so, the following relatively silly example actually doesn't work at the
> moment with shedskin:
> l = ['ha']
> a = None
> l->append(a)

OK, I see...

~~~~~~~

I managed to get a working itertools.izip implementation (you've
really done awesome improvements :p), but I needed some changes to the
current unpacking handler.

For now the Python code
f(a, b, c, d)
for the function
f(*args)
is translated to the C++ code
f(4, a, b, c, d)
which is enough for hard-coded "variadics", but not for real ones,
because it's not possible to instantiate a template with something
like
template <typename T> void f(int count, ...);
where "..." is of type "zero or more T"

What I used and would be nice to have instead of
f(4, a, b, c, d)
is
f(a, 3, b, c, d)
which I use with
template <typename T> void f(T first, int countMinusOne, ...);
which is OK for template instantiation (the first parameter being mandatory).

It means that unpacking 0-length parameters is not supported anymore,
but I think it's far less important than the support for an arbitrary
number of arguments...

What's your opinion about this?

BTW, it also means that there is no support for variadics with
parameters of different types, but anyway, I don't see how we could
support this without code generation (and even if it was possible,
this would not be usable for izip, since we'd also need tuples of
arbitrary length with differents types for the return value).
There is still possible to keep a hard-coded version when there are
only two parameters to izip, to support parameters of differents types
in this case (since in that case there is no instantiation problem,
and we have tuple2<A, B>).

Best regards,

--
Jérémie

Jérémie Roquet

unread,
Dec 16, 2009, 12:02:52 PM12/16/09
to shedskin...@googlegroups.com
2009/12/15 Jérémie Roquet <arka...@gmail.com>:
> I managed to get a working itertools.izip implementation (you've
> really done awesome improvements :p), but I needed some changes to the
> current unpacking handler.
> [...]

The attached patch modifies cpp.py to generate variadic function calls
the way I explained in my previous message.

It also fixes a bug (when nrvarargs is 0, the first comma is not
added, but 0 still is).

I didn't spend much time on it, so a quick code review would not be a luxury ;-)

Best regards,

--
Jérémie
cpp-variadics.patch

Mark Dufour

unread,
Dec 17, 2009, 3:38:57 AM12/17/09
to shedskin...@googlegroups.com
I managed to get a working itertools.izip implementation (you've
really done awesome improvements :p), but I needed some changes to the
current unpacking handler.

I'm glad your work on itertools motivated me to add these improvements.. I wouldn't call them awesome, because they weren't particularly hard to do, but I too think they can be quite useful.. :-)


which is enough for hard-coded "variadics", but not for real ones,
because it's not possible to instantiate a template with something
like
 template <typename T> void f(int count, ...);
where "..." is of type "zero or more T"

What I used and would be nice to have instead of
 f(4, a, b, c, d)
is
 f(a, 3, b, c, d)

although generated code is typically not something you want to look at, putting a number in the middle of the arguments would be particularly ugly of course.. wouldn't the following work, too..?

template<class A> __iter<tuple2<A, A> *> *izip(int n, pyiter<A> *a, ...) { .. }
template<class A, class B> __iter<tuple2<A, B> *> *izip(int n, pyiter<A> *a, pyiter<B> *b) { .. }


It means that unpacking 0-length parameters is not supported anymore,
but I think it's far less important than the support for an arbitrary
number of arguments...

yes, 0 arguments to izip only makes sense with starred arguments, which is not supported in user code. this will give an error prior to running 'make'.


BTW, it also means that there is no support for variadics with
parameters of different types, but anyway, I don't see how we could
support this without code generation (and even if it was possible,
this would not be usable for izip, since we'd also need tuples of
arbitrary length with differents types for the return value).

yes, and this should also lead to early warnings from shedskin (dynamic tuple (sub)type).

btw, we will lose type checking of course with variadics. also, we probably want to optimize for  sequence types (pyseq, as done in places in lib/builtin.?pp), because iterating over them can be done much faster than using the normal iterator protocol. so I'm thinking along the lines of adding some kind of attribute to be able to identify types at runtime.. what would be your thoughts on this?


thanks,
mark.

Mark Dufour

unread,
Dec 17, 2009, 3:58:01 AM12/17/09
to shedskin...@googlegroups.com
It also fixes a bug (when nrvarargs is 0, the first comma is not
added, but 0 still is).

thanks for the patch! but could you please also give me an example for which the bug occurs..? I would think, if the are no varargs, then there are no following args, so we don't need a comma after '0'..? in any case, I cannot reproduce the problem..


Jérémie Roquet

unread,
Dec 17, 2009, 8:42:22 AM12/17/09
to shedskin...@googlegroups.com
2009/12/17 Mark Dufour <mark....@gmail.com>:

> although generated code is typically not something you want to look at,
> putting a number in the middle of the arguments would be particularly ugly
> of course.. wouldn't the following work, too..?
> template<class A> __iter<tuple2<A, A> *> *izip(int n, pyiter<A> *a, ...) {
> .. }
> template<class A, class B> __iter<tuple2<A, B> *> *izip(int n, pyiter<A> *a,
> pyiter<B> *b) { .. }

That's the first thing I wanted to do (I agree it looks better, and it
doesn't require changes in shedskin), but unfortunately, va_args
requires the number of variadic arguments to be put on the stack just
after the variadic arguments themselves, so the compiler complains
with this version.

> btw, we will lose type checking of course with variadics. also, we probably
> want to optimize for  sequence types (pyseq, as done in places in
> lib/builtin.?pp), because iterating over them can be done much faster than
> using the normal iterator protocol. so I'm thinking along the lines of
> adding some kind of attribute to be able to identify types at runtime.. what
> would be your thoughts on this?

Yes, there is room for optimization there.
But, actually, isn't it possible to do the distinction at compile time ?
Removing the virtual calls would allow to inline the iterator
protocol, and templates templates could be used to replace the dynamic
polymorphism (in case we don't mind allowing a function to accept
parameters with different sequence types at runtime).
If it isn't possible for some reason, then yes, I second your idea.

2009/12/17 Mark Dufour <mark....@gmail.com>:


>> It also fixes a bug (when nrvarargs is 0, the first comma is not
>> added, but 0 still is).
> thanks for the patch! but could you please also give me an example for which
> the bug occurs..? I would think, if the are no varargs, then there are no
> following args, so we don't need a comma after '0'..? in any case, I cannot
> reproduce the problem..

You're right, I though about it yesterday evening. In fact I should
have written :

> It also *adds* a bug

Sorry for this one ;-)

But, while this doesn't solve a problem that didn't exist, it's not
really a bug, it just changes the behaviour of shedskin by generating
calls like f() instead of f(0) when no parameter is supplied to a
function.
Playing a bit with this, I found that it may be desirable, because it
allows me to handle the "no parameter" case of izip: having three cpp
funtions :

inline izipiter<void*> *izip() {
return new izipiter<void*>();
}
template<class T> inline izipiter<T> *izip(pyiter<T> *iterable) {
return new izipiter<T>(iterable);
}
template<class T> inline izipiter<T> *izip(pyiter<T> *iterable, int
iterable_count, ...) {
// ...
}

We can handle any number of parameters, including 0.
Which actually would have been possible without the change by testing
if "iterable" is 0, but I personally prefer with the change because
the compiler would complain if the "no parameter" function was not
implemented, while it would say nothing if the "0" case was not
properly handled.

Best regards,

--
Jérémie

Mark Dufour

unread,
Dec 17, 2009, 10:08:27 AM12/17/09
to shedskin...@googlegroups.com
That's the first thing I wanted to do (I agree it looks better, and it
doesn't require changes in shedskin), but unfortunately, va_args
requires the number of variadic arguments to be put on the stack just
after the variadic arguments themselves, so the compiler complains
with this version.

am I missing something..? this seems to work okay here:


template<class A> __iter<tuple2<A, A> *> *izip(int n, pyiter<A> *a, ...) {
    va_list ap;
    va_start(ap, a);
    print(1, __list(a));
    for(int i=1; i<n; i++) {
        pyiter<A> *p = va_arg(ap, pyiter<A> *);
        print(1, __list(p));
    }
    va_end(ap);
    return NULL;
}

I will reply to the rest of your mail tomorrow. btw, note that this works since a few hours:

print sorted([[2,3,4], [5,6], [7]], key=len)
print map(len, ['a','bc'])
print map(str, range(12))
print map(list, 'abc')

multi-parameter (or variadic, yikes.. :/) builtins cannot be used this way yet.. this is next on my todo..

thanks,
mark.
 

Best regards,

--
Jérémie

--

You received this message because you are subscribed to the Google Groups "shedskin-discuss" group.
To post to this group, send email to shedskin...@googlegroups.com.
To unsubscribe from this group, send email to shedskin-discu...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/shedskin-discuss?hl=en.





--
"Overdesigning is a SIN. It's the archetypal example of what I call 'bad taste'" - Linus Torvalds

Jérémie Roquet

unread,
Dec 17, 2009, 12:27:12 PM12/17/09
to shedskin...@googlegroups.com
2009/12/17 Mark Dufour <mark....@gmail.com>:

> am I missing something..? this seems to work okay here:
> template<class A> __iter<tuple2<A, A> *> *izip(int n, pyiter<A> *a, ...) {
> va_list ap;
> va_start(ap, a);
> print(1, __list(a));
> for(int i=1; i<n; i++) {
> pyiter<A> *p = va_arg(ap, pyiter<A> *);
> print(1, __list(p));
> }
> va_end(ap);
> return NULL;
> }

Did you mean
va_start(ap, n);
instead of
va_start(ap, a);
?

Doing "va_start(ap, n);", gives me the following warning:
itertools.hpp:659: warning: second parameter of ‘va_start’ not last
named argument

And the C++98 standard says (18.7.23.3):

> The restrictions that ISO C places on the second parameter to the va_start() macro in header
> <stdarg.h> are different in this International Standard. The parameter parmN is the identifier of the
> rightmost parameter in the variable parameter list of the function definition (the one just before the ...).

So, unless I missed something, we have to pass the number of variadic
arguments just before them, ie. after the "non-variadic" first
iterable.

> I will reply to the rest of your mail tomorrow. btw, note that this works
> since a few hours:
>
> print sorted([[2,3,4], [5,6], [7]], key=len)
> print map(len, ['a','bc'])
> print map(str, range(12))
> print map(list, 'abc')

\o/

Best regards,

--
Jérémie

Mark Dufour

unread,
Dec 17, 2009, 1:11:37 PM12/17/09
to shedskin...@googlegroups.com
Did you mean
 va_start(ap, n);
instead of
 va_start(ap, a);
?

no.. the second argument to 'va_start' should be the last named parameter, which is 'a' in this case, so it knows where to look for the variadic arguments. it doesn't matter where you pass 'n'.. :)
 

mark.

Jérémie Roquet

unread,
Dec 17, 2009, 1:27:07 PM12/17/09
to shedskin...@googlegroups.com
2009/12/17 Mark Dufour <mark....@gmail.com>:

>> Did you mean
>>  va_start(ap, n);
>> instead of
>>  va_start(ap, a);
>> ?
> no.. the second argument to 'va_start' should be the last named parameter,
> which is 'a' in this case, so it knows where to look for the variadic
> arguments. it doesn't matter where you pass 'n'.. :)

Oh! My mistake...

For some reason I though the second parameter to va_start had to be
the number of variadic parameters...
This is what appends when I don't read the man ;-)

So, the first version should work, and I learned something new, thanks ;-)

--
Jérémie

Mark Dufour

unread,
Dec 18, 2009, 8:26:33 AM12/18/09
to shedskin...@googlegroups.com

Yes, there is room for optimization there.
But, actually, isn't it possible to do the distinction at compile time ?
Removing the virtual calls would allow to inline the iterator
protocol, and templates templates could be used to replace the dynamic
polymorphism (in case we don't mind allowing a function to accept
parameters with different sequence types at runtime).
If it isn't possible for some reason, then yes, I second your idea.

more generally, there is probably a lot of stuff that is now done using virtuals that could be moved to compile-time.. but I admit I cannot fully oversee the situation at the moment. specializing for sequence types usually solves the performance issue, so I typically don't think much further.. :-)


thanks,
mark.
Reply all
Reply to author
Forward
0 new messages