XML pull parser

54 vistas
Ir al primer mensaje no leído

Dirk Lattermann

no leída,
20 jul 2016, 9:05:03 a.m.20/7/16
para ceylon...@googlegroups.com
As another topic under the rubric of "Old-Fashioned Things That Yet
Might Still Prove Useful", I have written a non-validating,
namespace-aware XML 1.0 (nearly, see below) pull parser.

The API is loosely taken from StAX, and I have implemented it in a mad
attempt of an ad-hoc state machine, hence the module name
de.dlkw.madstax.

One big thing is still missing, namely the parsing of document type
declarations (<!DOCTYPE ...>), but if you don't need that (and need XML
parsing), it should fit.

Have fun,
Dirk Lattermann

Gavin King

no leída,
20 jul 2016, 9:11:01 a.m.20/7/16
para ceylon...@googlegroups.com
Link?
> --
> You received this message because you are subscribed to the Google Groups "ceylon-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to ceylon-users...@googlegroups.com.
> To post to this group, send email to ceylon...@googlegroups.com.
> Visit this group at https://groups.google.com/group/ceylon-users.
> To view this discussion on the web visit https://groups.google.com/d/msgid/ceylon-users/20160720150501.46820abc%40dinu.
> For more options, visit https://groups.google.com/d/optout.



--
Gavin King
ga...@ceylon-lang.org
http://profiles.google.com/gavin.king
http://ceylon-lang.org
http://hibernate.org
http://seamframework.org

Dirk Lattermann

no leída,
20 jul 2016, 9:11:03 a.m.20/7/16
para ceylon...@googlegroups.com
Am Wed, 20 Jul 2016 15:05:01 +0200
schrieb Dirk Lattermann <dl...@alqualonde.de>:

Aww, sorry, I forgot: it's at https://github.com/dlkw/ceylon-stax/

Gavin King

no leída,
20 jul 2016, 9:12:15 a.m.20/7/16
para ceylon...@googlegroups.com
thanks :)
> --
> You received this message because you are subscribed to the Google Groups "ceylon-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to ceylon-users...@googlegroups.com.
> To post to this group, send email to ceylon...@googlegroups.com.
> Visit this group at https://groups.google.com/group/ceylon-users.
> To view this discussion on the web visit https://groups.google.com/d/msgid/ceylon-users/20160720151101.23d4eb27%40dinu.

John Vasileff

no leída,
20 jul 2016, 11:06:25 a.m.20/7/16
para ceylon...@googlegroups.com
Very nice!

Any reason this couldn’t be made cross platform by dropping the 'ceylon.io’ in favor of ‘ceylon.buffer’?

Dirk Lattermann

no leída,
20 jul 2016, 11:27:24 a.m.20/7/16
para ceylon...@googlegroups.com
Am Wed, 20 Jul 2016 11:06:23 -0400
schrieb John Vasileff <jo...@vasileff.com>:

> Very nice!
>
> Any reason this couldn’t be made cross platform by dropping the
> 'ceylon.io’ in favor of ‘ceylon.buffer’?

It seems so. I guess I was confused (again (and again)) by the buffer
and io dependencies. That maybe comes from the fact that in 1.2.1,
buffer was part of io. It keeps confusing me.

I just changed the module import of ceylon.io to ceylon.buffer, removed
the native("jvm") and no compilation errors appear...

Thanks!

Tako Schotanus

no leída,
20 jul 2016, 12:01:53 p.m.20/7/16
para ceylon-users
Very nice job!!!

A couple of remarks:

 - I'd extend the example code in the README to include the necessary `import` statements (making it much more copy&pastable)
 - Maybe even include the necessary import for the `module.ceylon` file (beginners might struggle a bit if they have to figure out exactly which import to use and where to put it)
 - Would using `for (event in reader) {` be more Ceylonic?
 - Is it ready yet to be published to the Herd? :)




-Tako

--
You received this message because you are subscribed to the Google Groups "ceylon-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ceylon-users...@googlegroups.com.
To post to this group, send email to ceylon...@googlegroups.com.
Visit this group at https://groups.google.com/group/ceylon-users.

Dirk Lattermann

no leída,
20 jul 2016, 12:50:32 p.m.20/7/16
para ceylon...@googlegroups.com
Am Wed, 20 Jul 2016 18:01:33 +0200
schrieb Tako Schotanus <ta...@codejive.org>:

> Very nice job!!!
>
> A couple of remarks:
>
> - I'd extend the example code in the README to include the necessary
> `import` statements (making it much more copy&pastable)
> - Maybe even include the necessary import for the `module.ceylon`
> file (beginners might struggle a bit if they have to figure out
> exactly which import to use and where to put it)

Ok, these are good ideas!


> - Would using `for (event in reader) {` be more Ceylonic?

I made the reader satisfy Iterator instead of Iterable to make it clear
that it can be iterated only once.

I still have no good clue about the choice of input structure to play
along with that. The main use case will be reading a file or from a
socket which also can be done only once. So, the input Iterable<Byte>
may also not be the best data structure. I recently tried to bring that
topic up here, but couldn't get good ideas from it.

> - Is it ready yet to be published to the Herd? :)

I intend to at least make the parser skip over DOCTYPE without raising a
ParseError before publishing it in such a prominent place.

Thanks for your remarks and encouragement!

Dirk

John Vasileff

no leída,
20 jul 2016, 1:26:00 p.m.20/7/16
para ceylon...@googlegroups.com
A very minor point: the “else” at https://github.com/dlkw/ceylon-stax/blob/15913b160fe26785548a46502452fdee8d7e7e3d/source/de/dlkw/madstax/XMLReader.ceylon#L701 actually isn’t necessary. If the typechecker can prove that the cases are exhaustive, the “else” becomes optional, and if not provided, the backends automatically add code that throws if for some reason the cases turn out to be not exhaustive at runtime.


On Jul 20, 2016, at 9:11 AM, Dirk Lattermann <dl...@alqualonde.de> wrote:

--
You received this message because you are subscribed to the Google Groups "ceylon-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ceylon-users...@googlegroups.com.
To post to this group, send email to ceylon...@googlegroups.com.
Visit this group at https://groups.google.com/group/ceylon-users.

Lucas Werkmeister

no leída,
20 jul 2016, 1:27:25 p.m.20/7/16
para ceylon...@googlegroups.com

Only the JVM backend does, actually. On the JS backend, this can still be a good idea. See https://github.com/ceylon/ceylon/issues/6312

Dirk Lattermann

no leída,
20 jul 2016, 1:31:54 p.m.20/7/16
para ceylon...@googlegroups.com
Am Wed, 20 Jul 2016 13:25:58 -0400
schrieb John Vasileff <jo...@vasileff.com>:

> A very minor point: the “else” at
> https://github.com/dlkw/ceylon-stax/blob/15913b160fe26785548a46502452fdee8d7e7e3d/source/de/dlkw/madstax/XMLReader.ceylon#L701
> <https://github.com/dlkw/ceylon-stax/blob/15913b160fe26785548a46502452fdee8d7e7e3d/source/de/dlkw/madstax/XMLReader.ceylon#L701>
> actually isn’t necessary. If the typechecker can prove that the cases
> are exhaustive, the “else” becomes optional, and if not provided, the
> backends automatically add code that throws if for some reason the
> cases turn out to be not exhaustive at runtime.
>

Ah, yes I know. That's a remnant from the time when I hadn't
implemented all states in the switch. The IDE even produces a warning
that state has type nothing there!

While developing, I seem to prefer the explicit default so adding
another subclass or three won't force me to add default and then
removing it again when they are implemented until the next few are
added again...

Tako Schotanus

no leída,
20 jul 2016, 4:07:37 p.m.20/7/16
para ceylon-users

On Wed, Jul 20, 2016 at 6:50 PM, Dirk Lattermann <dl...@alqualonde.de> wrote:
I made the reader satisfy Iterator instead of Iterable to make it clear
that it can be iterated only once.

Ah understood. Still I wish we had some kind of Iterable that we could use for these kind of one-shot data streams.
Because functions like filters and maps are still very useful for them.

-Tako

Gavin King

no leída,
20 jul 2016, 4:21:11 p.m.20/7/16
para ceylon...@googlegroups.com
Dirk, it seems to me that an Iterator<T> is just a complicated way to
write the function type T(). So I would just use a function.
> --
> You received this message because you are subscribed to the Google Groups
> "ceylon-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to ceylon-users...@googlegroups.com.
> To post to this group, send email to ceylon...@googlegroups.com.
> Visit this group at https://groups.google.com/group/ceylon-users.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/ceylon-users/CAOJRyvrRh7JVDQUtaSN7FH_etX%3DP65%2B7w_PW%3DKL%2BCBE5TxgKjw%40mail.gmail.com.
>
> For more options, visit https://groups.google.com/d/optout.



Dirk Lattermann

no leída,
20 jul 2016, 4:35:04 p.m.20/7/16
para ceylon...@googlegroups.com
Am Wed, 20 Jul 2016 22:20:51 +0200
schrieb Gavin King <gavin...@gmail.com>:

> Dirk, it seems to me that an Iterator<T> is just a complicated way to
> write the function type T(). So I would just use a function.

Isn't there some value by using the next() function defined by the
Iterator interface instead of a function (maybe with the same name)
that does the same thing without stemming from a well-known interface?

I mean, there is much "implicit documentation" by using the common
Iterator semantics.

But maybe you mean something else?

John Vasileff

no leída,
20 jul 2016, 4:37:29 p.m.20/7/16
para ceylon...@googlegroups.com
I think I’d still go with Iterable.

1) Iterable’s are much more convenient
2) Iterable’s don’t necessarily have to be re-iterable
3) An error attempting to re-iterate is likely to be found during development
4) It looks like XMLEventReader accepts a "{Byte*}” which may possibly be re-iterable. So perhaps XMLEventReader can just “pass the buck” on re-iterability?

John

--
You received this message because you are subscribed to the Google Groups "ceylon-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ceylon-users...@googlegroups.com.
To post to this group, send email to ceylon...@googlegroups.com.
Visit this group at https://groups.google.com/group/ceylon-users.

Gavin King

no leída,
20 jul 2016, 4:40:52 p.m.20/7/16
para ceylon...@googlegroups.com
On Wed, Jul 20, 2016 at 10:35 PM, Dirk Lattermann <dl...@alqualonde.de> wrote:
> Am Wed, 20 Jul 2016 22:20:51 +0200
> schrieb Gavin King <gavin...@gmail.com>:
>
>> Dirk, it seems to me that an Iterator<T> is just a complicated way to
>> write the function type T(). So I would just use a function.
>
> Isn't there some value by using the next() function defined by the
> Iterator interface instead of a function (maybe with the same name)
> that does the same thing without stemming from a well-known interface?

Well, I dunno, I can't see what that value might be...

John Vasileff

no leída,
20 jul 2016, 4:54:37 p.m.20/7/16
para ceylon...@googlegroups.com
I’d generally prefer Iterables, but if "<T | Finished>()” is seen as a nice alternative, perhaps the language module should have:

shared {T*} functionIterable<T>(<T|Finished>()() f)
        => object satisfies {T*} {
    iterator() => object satisfies Iterator<T> {
        next = f();
    };
};

The Dart backend relies on something like this.

John

--
You received this message because you are subscribed to the Google Groups "ceylon-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ceylon-users...@googlegroups.com.
To post to this group, send email to ceylon...@googlegroups.com.
Visit this group at https://groups.google.com/group/ceylon-users.

Dirk Lattermann

no leída,
20 jul 2016, 4:54:42 p.m.20/7/16
para ceylon...@googlegroups.com
Am Wed, 20 Jul 2016 16:37:27 -0400
schrieb John Vasileff <jo...@vasileff.com>:

> I think I’d still go with Iterable.
>
> 1) Iterable’s are much more convenient
> 2) Iterable’s don’t necessarily have to be re-iterable
> 3) An error attempting to re-iterate is likely to be found during
> development 4) It looks like XMLEventReader accepts a "{Byte*}” which
> may possibly be re-iterable. So perhaps XMLEventReader can just “pass
> the buck” on re-iterability?

That might be possible. I'll give it a try.

Tako Schotanus

no leída,
20 jul 2016, 7:11:05 p.m.20/7/16
para ceylon-users
But don't a whole bunch of methods in Iterable depend on being able to recreate the underlying Iterator?

-Tako

John Vasileff

no leída,
20 jul 2016, 7:18:07 p.m.20/7/16
para ceylon...@googlegroups.com
Which ones?

You can’t do something like call ‘count' and then 'sequence()', but I think that’s to be expected.

Gavin King

no leída,
20 jul 2016, 7:27:36 p.m.20/7/16
para ceylon...@googlegroups.com
Actually I think we went through at some stage and made sure that the
operations of Iterable *don't* require re-iteration (since the
iterable may not be "stable").
> https://groups.google.com/d/msgid/ceylon-users/CADFF250-F5E3-411C-9460-414DF8E54CBA%40vasileff.com.
>
> For more options, visit https://groups.google.com/d/optout.



--

Tako Schotanus

no leída,
20 jul 2016, 7:29:31 p.m.20/7/16
para ceylon-users
Any of them that access `size` for example
(a quick look seems to suggest that at least affects: scan, repeat, paired, partition, interpose, group and string)

-Tako

On Thu, Jul 21, 2016 at 1:18 AM, John Vasileff <jo...@vasileff.com> wrote:

John Vasileff

no leída,
20 jul 2016, 7:41:03 p.m.20/7/16
para ceylon...@googlegroups.com
Aside from ‘repeat’ not making sense for non-re-iterables, I’m not seeing a problem with those functions (based on a quick look). Of course, the returned Iterables won’t be re-iterable.

Tako Schotanus

no leída,
20 jul 2016, 8:08:31 p.m.20/7/16
para ceylon-users
So you mean that the results returned from those methods will exhibit the same characteristics as the stream that was used to create it?

I'm still not convinced. I look at your `functionIterable` and I look at how `Iterable.first` is implemented and AFAICT if you call it repeatedly it would basically function like a `next`.

To me this is different than having an unstable iteration order and IMO an Iterable that can only be used once should either give correct results or fail with an exception.

And from a user's perspective it's not very clear which ones are "safe" to use and which ones aren't (eg. you can safely filter, map and then iterate, but you can't get the value of "first" and then iterate over "rest", at least not with the default implementation).

So it seems to me that at the very least the `functionIterable` would need some more work to make sure the Iterable doesn't behave in all kinds of strange ways when not used exactly right.



-Tako

John Vasileff

no leída,
20 jul 2016, 8:30:06 p.m.20/7/16
para ceylon...@googlegroups.com
On Jul 20, 2016, at 8:08 PM, Tako Schotanus <ta...@codejive.org> wrote:

So you mean that the results returned from those methods will exhibit the same characteristics as the stream that was used to create it?

I'm still not convinced. I look at your `functionIterable` and I look at how `Iterable.first` is implemented and AFAICT if you call it repeatedly it would basically function like a `next`.


Note that the parameter for the functionIterable that I pasted is a function that returns a function - <T | Finished>()(). So it’s more tailored for re-iterables. The parameter is analogous to an Iterable, not an Iterator.

I agree that for non-re-iterables, it might be better to take a  <T | Finished>() and throw on all but the first call to iterator().

To me this is different than having an unstable iteration order and IMO an Iterable that can only be used once should either give correct results or fail with an exception.

And from a user's perspective it's not very clear which ones are "safe" to use and which ones aren't (eg. you can safely filter, map and then iterate, but you can't get the value of "first" and then iterate over "rest", at least not with the default implementation).

I’m not sure it’s so important to distinguish the “safe” ones. I “never” assume re-iterability in my code, and instead either memoize or call sequence() when I have to.

If I’m wrong, and we should always use iterator-like-functions for non-re-iterables, then at least the functionIterable idea would provide some relief for when we want to *immediately* call something like map, filter, or sequence.

But, really, I “can’t” be wrong, because inevitably, you're going to want to pass a non-re-iterable to some really useful third party utility. And then we’ll just “abuse”  functionIterable to do so and be back to square one not knowing which Iterables can only be used once!


So it seems to me that at the very least the `functionIterable` would need some more work to make sure the Iterable doesn't behave in all kinds of strange ways when not used exactly right.


I agree. Or at least I agree that this would be worth debating if we did want to add something like this.

Gavin King

no leída,
20 jul 2016, 8:31:29 p.m.20/7/16
para ceylon...@googlegroups.com
On Thu, Jul 21, 2016 at 1:29 AM, Tako Schotanus <ta...@codejive.org> wrote:
> Any of them that access `size` for example
> (a quick look seems to suggest that at least affects: scan, repeat, paired,
> partition, interpose, group and string)

I just reviewed these and they all look good to me except for paired()
and partition() which, for an unstable stream, return Iterables which
violate the general contract for an immutable stream. One can somewhat
rescue that by claiming that they return *mutable* streams, but in
fact the truth is that for an unstable stream they return streams with
no well-defined element set.

Tako Schotanus

no leída,
20 jul 2016, 8:58:08 p.m.20/7/16
para ceylon-users
On Thu, Jul 21, 2016 at 2:30 AM, John Vasileff <jo...@vasileff.com> wrote:

I'm still not convinced. I look at your `functionIterable` and I look at how `Iterable.first` is implemented and AFAICT if you call it repeatedly it would basically function like a `next`.


Note that the parameter for the functionIterable that I pasted is a function that returns a function - <T | Finished>()(). So it’s more tailored for re-iterables. The parameter is analogous to an Iterable, not an Iterator.

Ah indeed, I had missed that. But that's outside the context of this discussion then. It's as if `f` returns a result only the first time.
And from a user's perspective it's not very clear which ones are "safe" to use and which ones aren't (eg. you can safely filter, map and then iterate, but you can't get the value of "first" and then iterate over "rest", at least not with the default implementation).

I’m not sure it’s so important to distinguish the “safe” ones. I “never” assume re-iterability in my code, and instead either memoize or call sequence() when I have to.

Well `Iterable.string` says you're wrong to think everybody else does so too ;)
    

If I’m wrong, and we should always use iterator-like-functions for non-re-iterables, then at least the functionIterable idea would provide some relief for when we want to *immediately* call something like map, filter, or sequence.

But, really, I “can’t” be wrong, because inevitably, you're going to want to pass a non-re-iterable to some really useful third party utility. And then we’ll just “abuse”  functionIterable to do so and be back to square one not knowing which Iterables can only be used once!

Well perhaps there's a way to have a super interface of Iterable that only guarantees single-use and only defines those methods that support that? Iterable would then be an extension to that where reusability is guaranteed. We could then refactor things like for loops and comprehensions to use that super interface because they never need to restart their data streams. Is that doable?

Gavin King

no leída,
20 jul 2016, 9:05:08 p.m.20/7/16
para ceylon...@googlegroups.com
On Thu, Jul 21, 2016 at 2:57 AM, Tako Schotanus <ta...@codejive.org> wrote:

>
> Well `Iterable.string` says you're wrong to think everybody else does so too
> ;)

Huh?!

Iterable.string *does* call sequence() to memoize the stream. It does
exactly the thing John is describing.


> Well perhaps there's a way to have a super interface of Iterable that only
> guarantees single-use and only defines those methods that support that?
> Iterable would then be an extension to that where reusability is guaranteed.
> We could then refactor things like for loops and comprehensions to use that
> super interface because they never need to restart their data streams. Is
> that doable?

I don't understand. How would this interface be different to Iterable?
As far as I can tell it would have exactly the same operations.
(Except, perhaps, for paired and partition().)

How would you represents it's singleshotedness within the type system?

I can't quite see how such a thing could be defined.

Dirk Lattermann

no leída,
21 jul 2016, 2:58:35 a.m.21/7/16
para ceylon...@googlegroups.com
Am Thu, 21 Jul 2016 03:04:47 +0200
schrieb Gavin King <gavin...@gmail.com>:

> On Thu, Jul 21, 2016 at 2:57 AM, Tako Schotanus <ta...@codejive.org>
> wrote:
>
> >
> > Well `Iterable.string` says you're wrong to think everybody else
> > does so too ;)
>
> Huh?!
>
> Iterable.string *does* call sequence() to memoize the stream. It does
> exactly the thing John is describing.
>
>
> > Well perhaps there's a way to have a super interface of Iterable
> > that only guarantees single-use and only defines those methods that
> > support that? Iterable would then be an extension to that where
> > reusability is guaranteed. We could then refactor things like for
> > loops and comprehensions to use that super interface because they
> > never need to restart their data streams. Is that doable?
>
> I don't understand. How would this interface be different to Iterable?
> As far as I can tell it would have exactly the same operations.
> (Except, perhaps, for paired and partition().)
>
> How would you represents it's singleshotedness within the type system?
>
> I can't quite see how such a thing could be defined.
>

That's a bit the "value" I wrote about earlier, which you said you
didn't see.

I think the "value" in an interface is not only about defining some
type, but also to define an API with an expected behaviour. Of course,
if you only take into account differences that can be seen by the
typechecker, the behaviour semantics are invisible, but in my opinion,
that would be a loss in the language's API definition.

Tako Schotanus

no leída,
21 jul 2016, 5:55:52 a.m.21/7/16
para ceylon-users
On Thu, Jul 21, 2016 at 3:04 AM, Gavin King <gavin...@gmail.com> wrote:
On Thu, Jul 21, 2016 at 2:57 AM, Tako Schotanus <ta...@codejive.org> wrote:

>
> Well `Iterable.string` says you're wrong to think everybody else does so too
> ;)

Huh?!

Iterable.string *does* call sequence() to memoize the stream. It does
exactly the thing John is describing.

Ok true, bad example. But then something as simple as:

    if (!iter.empty) {
        for (T item : iter) { }
    }

for example can't work. So you'd at least have to write special implementations for a bunch of methods to make use of Iterable with a one.shot data source somewhat usable. And given the fact that code dealing with Iterables might not know they're dealing with a limited source like that they might very well use it the wrong way. Expecting everybody to memoize and use sequence and stuff is not the solution IMO. It's putting the responsibility where it doesn't belong. It doesn't say anywhere that dealing with one-shot data sources is expected of users of Iterable for example.
 


> Well perhaps there's a way to have a super interface of Iterable that only
> guarantees single-use and only defines those methods that support that?
> Iterable would then be an extension to that where reusability is guaranteed.
> We could then refactor things like for loops and comprehensions to use that
> super interface because they never need to restart their data streams. Is
> that doable?

I don't understand. How would this interface be different to Iterable?
As far as I can tell it would have exactly the same operations.
(Except, perhaps, for paired and partition().)

How would you represents it's singleshotedness within the type system?

Just by it's type?
 

I can't quite see how such a thing could be defined.

Just by saying that that is the defined behaviour of that API.
Something like "when using this API only a single call is guaranteed to return a result, further calls will result in an "input exhausted" exception. Subtypes may override this behaviour to provide extended behaviour. See Iterable,"

Of course you can also extend Iterator with a new type that adds that behaviour, but that seems the wrong way around, the subtype shouldn't be more limited in its behaviour than the super type because it would be too easy for people writing code for Iterables to forget that they might be dealing with the one-shot form. By making it a super type you just can't pass it to any code dealing with Iterables in general. But APIs that want to can specifically target the one-shot super type can do so.

Gavin King

no leída,
21 jul 2016, 7:11:31 a.m.21/7/16
para ceylon...@googlegroups.com
I don't see how this would be better. It's completely untypesafe,
since nothing in the type system enforces singleshotedness. So you're
introducing a completely useless type to, what, add a *comment* to it?
Definitely not worth the additional complexity.


On Thu, Jul 21, 2016 at 11:55 AM, Tako Schotanus <ta...@codejive.org> wrote:

>> How would you represents it's singleshotedness within the type system?
>
>
> Just by it's type?
>
>>
>>
>> I can't quite see how such a thing could be defined.
>
>
> Just by saying that that is the defined behaviour of that API.
> Something like "when using this API only a single call is guaranteed to
> return a result, further calls will result in an "input exhausted"
> exception. Subtypes may override this behaviour to provide extended
> behaviour. See Iterable,"
>
> Of course you can also extend Iterator with a new type that adds that
> behaviour, but that seems the wrong way around, the subtype shouldn't be
> more limited in its behaviour than the super type because it would be too
> easy for people writing code for Iterables to forget that they might be
> dealing with the one-shot form. By making it a super type you just can't
> pass it to any code dealing with Iterables in general. But APIs that want to
> can specifically target the one-shot super type can do so.


Tako Schotanus

no leída,
21 jul 2016, 7:37:55 a.m.21/7/16
para ceylon-users
On Thu, Jul 21, 2016 at 1:11 PM, Gavin King <gavin...@gmail.com> wrote:
I don't see how this would be better. It's completely untypesafe,
since nothing in the type system enforces singleshotedness. So you're
introducing a completely useless type to, what, add a *comment* to it?

That's not what I said and you know it.

And often that's exactly what types do, they don't add much but they have a contract that you have to adhere to that can't be enforced by the typechecker (equals? hash?)

And we know the two types wouldn't be the same because there are methods that cannot be implemented with single-shot sources, you yourself mentioned a couple. So those would be implemented on Iterable and not on the super type.

 
Definitely not worth the additional complexity.

It would be a single type and most defaulted methods would retain their current implementations but would just move up to the super type. We would have some refinements and that's it. That's hardly complex.

Think about possible advantages here. One could treat let's say a connection to a socket as we now treat Iterables and use filter, map, etc. Te same for the contents of files and such. We have several places in the SDK where we could improve the API by having a much more powerful interface.

Just wrapping a socket in an Iterable isn't a solution because you're not giving potential users of that Iterable any hints that it is special in any way.

Now maybe my suggestion isn't the best one, maybe it's a lousy one, okay, but then let's come up with something better. Because it's a problem well worth solving IMO.


Gavin King

no leída,
21 jul 2016, 8:10:18 a.m.21/7/16
para ceylon...@googlegroups.com
Again: there would be zero difference between the interface
OnceIterable and Iterable. They would be the same type.
> --
> You received this message because you are subscribed to the Google Groups
> "ceylon-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to ceylon-users...@googlegroups.com.
> To post to this group, send email to ceylon...@googlegroups.com.
> Visit this group at https://groups.google.com/group/ceylon-users.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/ceylon-users/CAOJRyvoM7Kt6gdzpAfKFK08Kd7mcy%3D9_zyVz6sykWjqj4yKXMw%40mail.gmail.com.
>
> For more options, visit https://groups.google.com/d/optout.



Gavin King

no leída,
21 jul 2016, 8:19:10 a.m.21/7/16
para ceylon...@googlegroups.com
i.e. You would have:

interface OnceIterable<Element> {
//every single one of the operations that are currently defined
on Iterable
}

interface Iterable<Element>
satisfies OnceIterable<Element> {
//noop
}

Not every distinction one can make in one's head can be meaningfully
encoded into the type system.

Gavin King

no leída,
21 jul 2016, 8:49:52 a.m.21/7/16
para ceylon...@googlegroups.com
And, of course, then we would have to go through and redefine every
single stream utility function we have that accepts Iterable to now
accept OnceIterable, except, ooops, now they would mostly return
OnceIterables instead of Iterables, breaking oodles of code.

So in fact we would need "overloaded" versions of every stream utility
function zip(), zipOnce(), expand(), expandOnce(), etc.

Note that this is a problem that can be solved with higher-order
polymorphism by abstracting over a generic Stream<Element> type. But
the Java backend doesn't yet support this, and anyway it would be a
massive break to BC.

Gavin King

no leída,
21 jul 2016, 8:50:35 a.m.21/7/16
para ceylon...@googlegroups.com
"Stream<Element> type"

I mean "Stream<Element> type parameter", of course.

John Vasileff

no leída,
21 jul 2016, 10:53:20 a.m.21/7/16
para ceylon...@googlegroups.com

On Jul 21, 2016, at 5:55 AM, Tako Schotanus <ta...@codejive.org> wrote:

Ok true, bad example. But then something as simple as:

    if (!iter.empty) {
        for (T item : iter) { }
    }

for example can't work.

I’ve come to believe that Iterables are a pretty low level thing (indeed they are very close to Anything), and if you want to write code in the above style, you’re better off using a a List, Sequential, or something else.

Note that even for re-iterables, the ‘iter.empty’ may be expensive, possibly involving opening a file, creating a network connection, querying a database, or performing an expensive calculation.

Rather than trying to come up with yet another Iterable type, I think time is better spent developing utilities and techniques to deal with iterables. For example, the memoizing stream that I posted somewhere before (something that can even be used with infinite streams).

Regarding techniques, an interesting real-world example is ‘printAll’, which used to use first & rest, but was modified to only iterate once:


For non-library code, a more functional style is probably better (more straight forward), and eliminates risk of re-iterating:

    void printAll({Anything*} values)
        =>  values.map(stringify)
                  .interpose(", ")
                  .each(process.write);

John

Tako Schotanus

no leída,
21 jul 2016, 11:57:18 a.m.21/7/16
para ceylon-users

I’ve come to believe that Iterables are a pretty low level thing (indeed they are very close to Anything), and if you want to write code in the above style, you’re better off using a a List, Sequential, or something else.

It still wouldn't be strange to do, even when writing more functional (but not entirely functional, this isn't Haskell):
    if (!iter.empty) {
        iter.map(stringify).each(process.write);
    } else {
         process.write("nothing to see here")
    }


Note that even for re-iterables, the ‘iter.empty’ may be expensive, possibly involving opening a file, creating a network connection, querying a database, or performing an expensive calculation.

Sure, but there's a difference between expensive and impossible.
 

Rather than trying to come up with yet another Iterable type, I think time is better spent developing utilities and techniques to deal with iterables. For example, the memoizing stream that I posted somewhere before (something that can even be used with infinite streams).

That sounds interesting too, where can I take a look at it? And if it's so good why isn't it part of the language module yet? ;)
But wouldn't that stream implement Iterable? If not it doesn't seem that useful, because I'd expect it to implement the whole kaboodle of filter, map, scan, count, etc etc etc.

John Vasileff

no leída,
21 jul 2016, 12:36:39 p.m.21/7/16
para ceylon...@googlegroups.com
On Jul 21, 2016, at 11:56 AM, Tako Schotanus <ta...@codejive.org> wrote:

Note that even for re-iterables, the ‘iter.empty’ may be expensive, possibly involving opening a file, creating a network connection, querying a database, or performing an expensive calculation.

Sure, but there's a difference between expensive and impossible.

The point is, it’s probably a bad idea for a public api to read a lazy datasource multiple times if it doesn’t have to, so don’t do it!

 

Rather than trying to come up with yet another Iterable type, I think time is better spent developing utilities and techniques to deal with iterables. For example, the memoizing stream that I posted somewhere before (something that can even be used with infinite streams).

That sounds interesting too, where can I take a look at it? And if it's so good why isn't it part of the language module yet? ;)
But wouldn't that stream implement Iterable? If not it doesn't seem that useful, because I'd expect it to implement the whole kaboodle of filter, map, scan, count, etc etc etc.



It allows you to:

variable value i = 0;
value it = { ++}.cycled;

NonemptyStream<Integer> s1 = streamOf(it);
printAll(s1.take(3)); // 1, 2, 3
printAll(s1.take(4)); // 1, 2, 3, 4
printAll(s1.filter(4.smallerThan).take(3)); // 5, 6, 7

Stream<Integer> s2 = s1;
if (is NonemptyStream<Integer> s2) {
    Integer one = s2.first; // non-null
    printAll(s2.take(5)); // 1, 2, 3, 4, 5
}

if (!s2.empty) {
    printAll(s2.take(6)); // 1, 2, 3, 4, 5, 6
}

John

Tako Schotanus

no leída,
21 jul 2016, 3:38:27 p.m.21/7/16
para ceylon-users
Ok, but doesn't this load the entire stream into memory? As a cons-list on top of that which seems quite wasteful. Or am I missing something here?





-Tako

--
You received this message because you are subscribed to the Google Groups "ceylon-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ceylon-users...@googlegroups.com.
To post to this group, send email to ceylon...@googlegroups.com.
Visit this group at https://groups.google.com/group/ceylon-users.

John Vasileff

no leída,
21 jul 2016, 4:09:11 p.m.21/7/16
para ceylon...@googlegroups.com

> On Jul 21, 2016, at 3:38 PM, Tako Schotanus <ta...@codejive.org> wrote:
>
> Ok, but doesn't this load the entire stream into memory? As a cons-list on top of that which seems quite wasteful. Or am I missing something here?
>

WTF, of course you can’t load an entire { ++i }.cycled into memory! But yes, it does use *some* memory. How else can you avoid recalculating results but to use memory?

Note that if you follow the pattern 'stream = current.skip(consumed);’ in the function transform() [1], no-longer-relevant leading portions of the stream will be gc'd.

Obviously having to deal with one-shot, mutable, or expensive lazy streams is less convenient than working with arrays. But you seem to be disagreeing with all potential solutions!

[1] https://gist.github.com/jvasileff/d27770ede6d6954d16bd60b59da84ff0#file-enhancediterable-ceylon-L7

Tako Schotanus

no leída,
21 jul 2016, 4:40:11 p.m.21/7/16
para ceylon-users
Let me first start with this one before things go further out of hand:

> Obviously having to deal with one-shot, mutable, or expensive lazy streams is less convenient than working with arrays. But you seem to be disagreeing with all potential solutions!

I'm sorry if I seem to be disagreeing, that's not my intention. I'm asking questions to understand the potential advantages and pitfalls of any suggested solution.
So please take any seeming negativity more as me not understanding things and trying to figure things out by saying things like "but doesn't this mean it won't work if you do A?" :)

(Also I'm only looking at one-shot streams, the ones this thread started with)


On Thu, Jul 21, 2016 at 10:09 PM, John Vasileff <jo...@vasileff.com> wrote:

> On Jul 21, 2016, at 3:38 PM, Tako Schotanus <ta...@codejive.org> wrote:
>
> Ok, but doesn't this load the entire stream into memory? As a cons-list on top of that which seems quite wasteful. Or am I missing something here?
>

WTF, of course you can’t load an entire { ++i }.cycled into memory! But yes, it does use *some* memory. 

Well this thread started with an XML parser, and the whole idea of having a pull parser is that you do NOT load it all in memory of course. And all *I*, personally, am looking for is a way to treat that Iterator that the parser gives us as a kind of Iterable, giving me the ability to do filters and maps etc. without loading it all into memory at once. So if passing the Iterator to that Stream and filtering for let's say Comment nodes and printing them would result in the entire document being in memory when it prints the last node then I'd say that is not what I want. If it does *not* do that then that's great! But then I'd like to know how it does that (see next).


Note that if you follow the pattern 'stream = current.skip(consumed);’ in the function transform() [1], no-longer-relevant leading portions of the stream will be gc'd.

That sounds interesting, but how does it work exactly? Can't you give a short explanation how your Stream does what it does?
 

John Vasileff

no leída,
21 jul 2016, 5:15:41 p.m.21/7/16
para ceylon...@googlegroups.com
I’m not arguing that the Stream idea is the most efficient possible solution, but it is perhaps the most convenient option if you want to take an Iterable of unknown origin and freely call its methods multiple times, which is what your goal seemed to be with “if (!iter.empty) { … }”.

It was also designed with a particular usage in mind (see below about first/rest/skip).

On Jul 21, 2016, at 4:39 PM, Tako Schotanus <ta...@codejive.org> wrote:

(Also I'm only looking at one-shot streams, the ones this thread started with)

But it’s *also* wrong to consume expensive or mutable streams more times than necessary. So saying, “don’t call ‘first' on one-shot streams” isn’t sufficient, and therefore a solution to *just* that isn’t valuable.


Note that if you follow the pattern 'stream = current.skip(consumed);’ in the function transform() [1], no-longer-relevant leading portions of the stream will be gc'd.

That sounds interesting, but how does it work exactly? Can't you give a short explanation how your Stream does what it does?
 

Stream is a linked list.

If you look at skip() [1], it eagerly evaluates the “cdr”s you are skipping, and returns just the desired tail portion of the stream. If your program discards the previous head (i.e. Stream), then previous memoized results will no longer be referenced and will be gc’d.

You can also get this memory efficiency using first & rest (rest is implemented like skip(1)).

Of course, if you hold on to an Iterator, it will prevent its containing Stream from being collected, so “for (el in streamOf(infiniteStream)) { }” will eventually exhaust all memory.

[1] https://gist.github.com/jvasileff/d27770ede6d6954d16bd60b59da84ff0#file-memostream-ceylon-L62

John

Tako Schotanus

no leída,
21 jul 2016, 6:46:04 p.m.21/7/16
para ceylon-users

On Thu, Jul 21, 2016 at 11:15 PM, John Vasileff <jo...@vasileff.com> wrote:

(Also I'm only looking at one-shot streams, the ones this thread started with)

But it’s *also* wrong to consume expensive or mutable streams more times than necessary. So saying, “don’t call ‘first' on one-shot streams” isn’t sufficient, and therefore a solution to *just* that isn’t valuable.

Well that was my whole point in suggesting a super type with only those method that make sense for a one-shot stream.

Of course I could make an Iterable that takes an Iterator and just implements those methods and throws NotImplementedExceptions for anything else. But I could never be sure that some change in the future of how Ceylin implements for loops/comprehensions wouldn't break looping over them. A dedicated super type *could* solve that. But Gavin already explained why that wouldn't be a good idea either.

Anyway, all this hasn't gotten us one inch closer to something nicer than what Dirk wrote in his first message which sucks.

-Tako

John Vasileff

no leída,
21 jul 2016, 6:58:46 p.m.21/7/16
para ceylon...@googlegroups.com
On Jul 21, 2016, at 6:45 PM, Tako Schotanus <ta...@codejive.org> wrote:


On Thu, Jul 21, 2016 at 11:15 PM, John Vasileff <jo...@vasileff.com> wrote:

(Also I'm only looking at one-shot streams, the ones this thread started with)

But it’s *also* wrong to consume expensive or mutable streams more times than necessary. So saying, “don’t call ‘first' on one-shot streams” isn’t sufficient, and therefore a solution to *just* that isn’t valuable.

Well that was my whole point in suggesting a super type with only those method that make sense for a one-shot stream.


All the methods (except cycled :) ) make sense for a 1-shot stream, including parsing XML, for which I’d like the source to possibly be a network connection.

Of course I could make an Iterable that takes an Iterator and just implements those methods and throws NotImplementedExceptions for anything else. But I could never be sure that some change in the future of how Ceylin implements for loops/comprehensions wouldn't break looping over them.

I can’t imagine why a [] comprehension would ever need to iterate more than once.

A dedicated super type *could* solve that. But Gavin already explained why that wouldn't be a good idea either.

Anyway, all this hasn't gotten us one inch closer to something nicer than what Dirk wrote in his first message which sucks.


Darn. I thought we were making progress discussing various techniques to handle Iterables (as opposed to Lists).

John

Dirk Lattermann

no leída,
22 jul 2016, 3:22:49 a.m.22/7/16
para ceylon...@googlegroups.com
Am Fri, 22 Jul 2016 00:45:43 +0200
schrieb Tako Schotanus <ta...@codejive.org>:
Trying to transform my XMLEventReader from an Iterator to an Iterable,
I stumbled over the Charset decoder again, and it's also connected to
the Iterable / (possibly) one-shot / Iterator situation.
Here's what I'm trying to do:

The XML parser reads the first few Bytes from the input and tries to
guess the character encoding (for now restricted to UTF-8 or UTF-16).

Then, the input from the start needs to be passed into a Charset
decoder. As that should work for non-reiterable inputs, the bytes read
so far must be prepended to the Iterator that was used to read them and
the whole passed to the decoder.

It feels a bit of overkill to create some custom Byte Iterable for that
instead of just an Iterator, but the Charset decoder needs an Iterable
where IMO an Iterator would suffice.

Is it really necessary to have the decode method take an Iterable
instead of an Iterator?

Gavin King

no leída,
22 jul 2016, 4:51:57 a.m.22/7/16
para ceylon...@googlegroups.com
On Fri, Jul 22, 2016 at 9:22 AM, Dirk Lattermann <dl...@alqualonde.de> wrote:

> Is it really necessary to have the decode method take an Iterable
> instead of an Iterator?

I again advise against using Iterator in this way. It was never an
abstraction that is intended to be used in APIs. Iterator<T> offers no
advantages over T().

Dirk Lattermann

no leída,
22 jul 2016, 5:17:51 a.m.22/7/16
para ceylon...@googlegroups.com
Am Fri, 22 Jul 2016 10:51:36 +0200
schrieb Gavin King <gavin...@gmail.com>:

> On Fri, Jul 22, 2016 at 9:22 AM, Dirk Lattermann
> <dl...@alqualonde.de> wrote:
>
> > Is it really necessary to have the decode method take an Iterable
> > instead of an Iterator?
>
> I again advise against using Iterator in this way. It was never an
> abstraction that is intended to be used in APIs. Iterator<T> offers no
> advantages over T().
>

Ok, but when exchanging my use of Byte Iterator into a nextByte()
function, that doesn't answer or help with anything in my last post,
does it?

I fear I may not be understanding what you're saying.


Just now, I realized I cannot use the utf8.decode() to decode from a
one-shot Iterable, it wants to determine the input size first!

Is there a simple way to decode obtaining one character at a time?
Using chunkDecoder with a CharacterBuffer of size 1 seems overkill.

Gavin King

no leída,
22 jul 2016, 5:49:35 a.m.22/7/16
para ceylon...@googlegroups.com
On Fri, Jul 22, 2016 at 11:17 AM, Dirk Lattermann <dl...@alqualonde.de> wrote:

> Just now, I realized I cannot use the utf8.decode() to decode from a
> one-shot Iterable, it wants to determine the input size first!

That seems ... very wrong. Does it *really* need to do that?

Dirk Lattermann

no leída,
22 jul 2016, 6:00:58 a.m.22/7/16
para ceylon...@googlegroups.com
Am Fri, 22 Jul 2016 11:49:15 +0200
schrieb Gavin King <gavin...@gmail.com>:

> On Fri, Jul 22, 2016 at 11:17 AM, Dirk Lattermann
> <dl...@alqualonde.de> wrote:
>
> > Just now, I realized I cannot use the utf8.decode() to decode from a
> > one-shot Iterable, it wants to determine the input size first!
>
> That seems ... very wrong. Does it *really* need to do that?
>

It works via writing into a CharacterBuffer and allocates that using a
size estimation which involves the input size and the average output
size per input units.

Gavin King

no leída,
22 jul 2016, 6:12:15 a.m.22/7/16
para ceylon...@googlegroups.com
The problem is that size can be a very expensive operation in Ceylon
... even for *String* it is expensive.
> --
> You received this message because you are subscribed to the Google Groups "ceylon-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to ceylon-users...@googlegroups.com.
> To post to this group, send email to ceylon...@googlegroups.com.
> Visit this group at https://groups.google.com/group/ceylon-users.
> To view this discussion on the web visit https://groups.google.com/d/msgid/ceylon-users/20160722120057.681abfaa%40dinu.
> For more options, visit https://groups.google.com/d/optout.



Dirk Lattermann

no leída,
22 jul 2016, 6:22:11 a.m.22/7/16
para ceylon...@googlegroups.com
Am Fri, 22 Jul 2016 12:00:57 +0200
schrieb Dirk Lattermann <dl...@alqualonde.de>:
I think my view sharpens. While looking at ChunkConvert.convert, I
saw it's using for (inputElement in input). But that makes sense only
if it reads the next input unit from input, even on consecutive calls
of convert!

If I'm not wrong, for (a in b) creates a new Iterator from b and
iterates over the elements until finished. That means here,
b.iterator() must return the same already started iterator on
consecutive calls.

This seems to imply that for re-iterable Iterables, a call to
iterator() returns a new Iterator that starts from the beginning, while
for non-reiterable Iterables, only one instance of an Iterator exists
which is returned from iterator().

If that really is the intention, I think that's very error-prone.

Tako Schotanus

no leída,
22 jul 2016, 6:41:12 a.m.22/7/16
para ceylon-users

On Fri, Jul 22, 2016 at 12:22 PM, Dirk Lattermann <dl...@alqualonde.de> wrote:
If that really is the intention, I think that's very error-prone.

If that's really what happens I actually think it's completely wrong.
To me the contract of `Iterable` requires you to create a new `Iterator` on each call to `iterator()`

Seems to me someone has already tried to subvert Iterables to be able to use them with one-shot streams.

-Tako
Responder a todos
Responder al autor
Reenviar
0 mensajes nuevos