Iterate over Strings, Maps

961 views
Skip to first unread message

Clement

unread,
Mar 8, 2012, 7:50:43 AM3/8/12
to General Dart Discussion
Hi,

After a bit of searching through this group I was wondering if it's
ever been discussed to make Strings iterable.

I'm a Pythonista so I'm quite used to have convenient short-cuts like:
for c in u'My Unicode string with あいうえお': print(c)

Doing the same in Dart seems to be something like:
for (var c in 'My Unicode String with あいうえお'.splitChars())
{ print(c); }

What I'd love to be able to do is something like the Python example
above. i.e.:
for (var c in 'My Unicode String with あいうえお') { print(c); }


Secondly, while on the topic of iteration, I've noticed iterating over
Maps is also a bit awkward.

While I would usually do:
myMap = {'a':1, 'b':2}
for k in myMap: print(myMap[k])

The only way to go in Dart seems to be:
var myMap = {'a':1, 'b':2};
myMap.forEach((k, v) => print(v));

What I would like to do in Dart:
var myMap = {'a':1, 'b':2};
for (var k, v in myMap): print(v)


I would argue that a less fragmented experience can be achieved with
little cost on the compiler end by adopting some of these changes.
In my opinion having to remember every single iteration convention for
each type is something the developer shouldn't have to worry about.

I'll be glad to get some feedback and opposing opinions on this, and
if I've missed something obvious please let me know.

Sam McCall

unread,
Mar 8, 2012, 8:23:35 AM3/8/12
to Clement, General Dart Discussion
Here's a couple of opposing opinions :-)
(You might notice I'm not a big fan of python syntax, lots and lots of
other people are, so I may be way off base here...)

On Thu, Mar 8, 2012 at 1:50 PM, Clement <cleme...@gmail.com> wrote:
> After a bit of searching through this group I was wondering if it's
> ever been discussed to make Strings iterable.
>
> I'm a Pythonista so I'm quite used to have convenient short-cuts like:
> for c in u'My Unicode string with あいうえお': print(c)
> Doing the same in Dart seems to be something like:
> for (var c in 'My Unicode String with あいうえお'.splitChars())
> { print(c); }
>
> What I'd love to be able to do is something like the Python example
> above. i.e.:
> for (var c in 'My Unicode String with あいうえお') { print(c); }

My experience with this in ruby was a mixed bag; people often want
iteration over characters, lines, or words, and it's not obvious which
is which.

I'd like to see properties that return lazy iterables (vs splitChars()
which splits eagerly). For example:

for (var c in "My unicode string".chars) {...}
for (var c in "My unicode string".words) {...}
for (var c in "My unicode string".lines) {...}

(On a more abstract level, I think this is just another form of
preferring aggregation over composition on the interface level, which
often makes APIs easier to understand).

> Secondly, while on the topic of iteration, I've noticed iterating over
> Maps is also a bit awkward.
>
> While I would usually do:
> myMap = {'a':1, 'b':2}
> for k in myMap: print(myMap[k])
>
> The only way to go in Dart seems to be:
> var myMap = {'a':1, 'b':2};
> myMap.forEach((k, v) => print(v));
>
> What I would like to do in Dart:
> var myMap = {'a':1, 'b':2};
> for (var k, v in myMap): print(v)
>
> I would argue that a less fragmented experience can be achieved with
> little cost on the compiler end by adopting some of these changes.

I actually prefer the other way around, I think iterating over Lists
should also be
list.forEach((v) => print(v));

The reason is that this mechanism is available to user code too, so
libraries can have their own control structures that look like part of
the language.

It also composes better: note that it can be written as:
list.forEach(print)
and your example is
myMap.values.forEach(print)

and is consistent with other operations that can make use of chaining:
myMap.values.transform((x) => 2*x).forEach(myList.add)

> In my opinion having to remember every single iteration convention for
> each type is something the developer shouldn't have to worry about.

I don't think that "m.forEach(k,v)" is any harder to remember than
"for (k,v in m)". In fact it's easier, because you can find the method
in the documentation for Map.

I think as with character iteration over strings, what seems obvious
is perhaps just familiar...

Cheers,
Sam

Don

unread,
Mar 8, 2012, 12:19:40 PM3/8/12
to General Dart Discussion
previousPost.arguments.forEach((arg) => I.agreeWith(arg));

That said, if only we had lexical block return in Dart, then we could
really use methods for all control flow. So...why don't we again?

On Mar 8, 8:23 am, Sam McCall <sammcc...@google.com> wrote:
> Here's a couple of opposing opinions :-)
> (You might notice I'm not a big fan of python syntax, lots and lots of
> other people are, so I may be way off base here...)
>

Ladislav Thon

unread,
Mar 8, 2012, 12:34:33 PM3/8/12
to Don, General Dart Discussion
That said, if only we had lexical block return in Dart, then we could
really use methods for all control flow. So...why don't we again?

IIRC, this was explained as impossible to implement efficiently when compiling to JavaScript.

And to the topic: I'd much rather see iterator methods on String, like forEachChar (and maybe also forEachCodePoint, where forEachChar iterates over Strings and forEachCodePoint iterates over ints? Not sure about this), forEachWord and forEachLine.

LT

Don

unread,
Mar 8, 2012, 12:46:59 PM3/8/12
to General Dart Discussion
> IIRC, this was explained as *impossible to implement efficiently when
> compiling to JavaScript*.

I think for 95% of scenarios the overhead would be acceptable. (I'm
assuming this would be implemented via exceptions). But of course I'm
not suggesting removing the existing for/while etc. statements, and
people could choose to go back to those if it makes a big difference.
Anyway though, thanks for the response, I was genuinely curious.

Anyway,

> And to the topic: I'd much rather see iterator methods on String, like
> forEachChar (and maybe also forEachCodePoint, where forEachChar iterates
> over Strings and forEachCodePoint iterates over ints? Not sure about this),
> forEachWord and forEachLine.

I like the distinct String methods option as well. Don't think there
is much of a difference between that and the property methods though.
And by forEachCodePoint, do you mean like forEachByte or something?

On Mar 8, 12:34 pm, Ladislav Thon <ladi...@gmail.com> wrote:
> > That said, if only we had lexical block return in Dart, then we could
> > really use methods for all control flow. So...why don't we again?
>
> IIRC, this was explained as *impossible to implement efficiently when
> compiling to JavaScript*.

Ladislav Thon

unread,
Mar 8, 2012, 12:50:47 PM3/8/12
to Don, General Dart Discussion
And by forEachCodePoint, do you mean like forEachByte or something?

I was actually thinking

void forEachChar(void callback(String char)); // each character gets passed to the callback as String
void forEachCodePoint(void callback(int codePoint)); // each character gets passed to the callback as an integral Unicode code point -- not really a byte

But that was just me thinking out loud, I'm not sure about it at all.

LT

Clement

unread,
Mar 8, 2012, 2:19:30 PM3/8/12
to General Dart Discussion
On Mar 8, 2:23 pm, Sam McCall <sammcc...@google.com> wrote:
> [...]
>
> My experience with this in ruby was a mixed bag; people often want
> iteration over characters, lines, or words, and it's not obvious which
> is which.

Surely we're just talking familiar here, but the most obvious to me
would be iterating over the smallest possible block that makes sense
i.e. the Unicode character. (I always hated languages that decide to
split strings into a byte array)
Lines and words are easily made available through functions for those
specific purposes. (something like .split('\n') and .split(' '))

> I'd like to see properties that return lazy iterables (vs splitChars()
> which splits eagerly). For example:
>
> for (var c in "My unicode string".chars) {...}
> for (var c in "My unicode string".words) {...}
> for (var c in "My unicode string".lines) {...}

I whole-heartedly agree with this. Even better than the last part of
my above proposal.
But when we're at it anyway, why not default to the simple case and
let for..in iterations over a string be over the .chars property ?

> [...]

> I actually prefer the other way around, I think iterating over Lists
> should also be
>   list.forEach((v) => print(v));
>
> The reason is that this mechanism is available to user code too, so
> libraries can have their own control structures that look like part of
> the language.
>
> It also composes better: note that it can be written as:
>   list.forEach(print)
> and your example is
>   myMap.values.forEach(print)
>
> and is consistent with other operations that can make use of chaining:
>   myMap.values.transform((x) => 2*x).forEach(myList.add)

Again, I agree with your arguments for the forEach functions.

But I'm not quite sure if you want to get rid of the for..in and
perhaps it's related Iterator class ?

> > In my opinion having to remember every single iteration convention for
> > each type is something the developer shouldn't have to worry about.
>
> I don't think that "m.forEach(k,v)" is any harder to remember than
> "for (k,v in m)". In fact it's easier, because you can find the method
> in the documentation for Map.

While the two cases listed here are not difficult to remember, what I
expect to become a nuisance is having to remember which method to use
when.
If I've got a Map in hand do I need to use for..in or forEach? What
about Lists, Strings?
Old rule of thumb: Don't make me think.

It's not that I'm arguing for..in is in anyway better than forEach.
All I want is a uniform way of iterating over my objects and data.
The easiest case seems to be making things like Maps and String
iterable too.


Hm.. there are quite a lot of issues up in the air here.
I think it's going to take some effort not getting everything too
mixed up.

Sam McCall

unread,
Mar 8, 2012, 9:10:09 PM3/8/12
to Don, General Dart Discussion
On Thu, Mar 8, 2012 at 6:46 PM, Don <dga...@gmail.com> wrote:
>> And to the topic: I'd much rather see iterator methods on String, like
>> forEachChar (and maybe also forEachCodePoint, where forEachChar iterates
>> over Strings and forEachCodePoint iterates over ints? Not sure about this),
>> forEachWord and forEachLine.
>
> I like the distinct String methods option as well. Don't think there
> is much of a difference between that and the property methods though.

I think the properties compose better, because forEach isn't the only
interesting thing to do with an iterator:

"quick brown fox".chars.transform((c) => rot13[c]).join('');
"to be or not to be".words.forEach((word) => frequency(word));
sourceCode.lines.filter((line) => line.length > 80);
novel.words.foldl(0, (x, y) => x + y.length) / novel.words.count();

and you don't want to have to define forEachChar, forEachWord,
forEachLine, transformChar, transformWord, transformLine, ...

Clement wrote:
> But I'm not quite sure if you want to get rid of the for..in and
> perhaps it's related Iterator class ?

Yeah, if I had my 'druthers:
* for..in would go away, replaced by the forEach method
* Iterator would stay, because...
* Iterable would gain some mixin methods. Any class implementing it
would get forEach, transform, count, foldl, foldr, filter.
* Map would look something like:
class Map<K,V> implements Iterable<Entry<K,V>> {
Collection<K> get() keys;
Collection<V> get() values;
}
class Entry<K,V> {
K get key();
V get value();
void set value(V newValue);
}

Ladislav Thon

unread,
Mar 9, 2012, 2:24:51 AM3/9/12
to Sam McCall, Don, General Dart Discussion
I think the properties compose better, because forEach isn't the only
interesting thing to do with an iterator:

I stand corrected here. Yes, having something like List<String> get chars(), List<int> get codePoints(), List<String> get words(), List<String> get lines() is definitely better. The main point stays, though: I don't think that having String implement Iterable<whatever> is a good idea.

LT

Clement

unread,
Mar 10, 2012, 7:09:14 AM3/10/12
to General Dart Discussion
On 9 Mar., 08:24, Ladislav Thon <ladi...@gmail.com> wrote:
> I stand corrected here. Yes, having something like List<String> get
> chars(), List<int> get codePoints(), List<String> get words(), List<String>
> get lines() is definitely better. The main point stays, though: I don't
> think that having String implement Iterable<whatever> is a good idea.

Care to elaborate?

Personally I find that one of the basic things I expect of any modern
language is an ability to deal with strings and common operations on
them. Especially any language aimed at the web.
Iteration over strings is just one of those things that have cropped
up often enough for me to expect it to be there.

I hope I'm not repeating myself too much, but I'd argue Unicode
characters is the only sensible choice for iteration. [1]
And with that I'd still argue it's a net gain to make Strings iterable
over these chars. It won't add complexity or obscurity, and would be
as easy as having String extend Iterable.

[1]
* Words are highly ambiguous, and there is no simple, short regex
that can capture word-boundaries. Words can have have spaces, hyphens,
dots and even other symbols in them. Think: San Francisco, Ph.D. etc..
Word boundaries are even language dependent.
* Lines can also be slightly ambiguous with \n, \r and \r\n, and if
we're super lucky we get them all mixed in one big mess. However in
most cases it's probably safe to assume that each of them is a newline
no matter how they're mixed.


On 9 Mar., 03:10, Sam McCall <sammcc...@google.com> wrote:
> Yeah, if I had my 'druthers:
>   * for..in would go away, replaced by the forEach method
>   * Iterator would stay, because...
>   * Iterable would gain some mixin methods. Any class implementing it
> would get forEach, transform, count, foldl, foldr, filter.
>   * Map would look something like:
>     class Map<K,V> implements Iterable<Entry<K,V>> {
>       Collection<K> get() keys;
>       Collection<V> get() values;
>     }
>     class Entry<K,V> {
>       K get key();
>       V get value();
>       void set value(V newValue);
>     }

While I love the idea of the forEach and friends (please, when can we
have this?), I can't help thinking why make away with for..in now that
it's there? I can easily see a world where the two co-exist.

I think code clarity might also be worth considering:

for (var e in myVariable) {
// code ..
}

myVariable.forEach((e) {
// code ..
});

For simple loops like these the for..in feels cleaner and therefore
more readable.
When scanning a large piece of code the second you see "for .. " you
know you've got a for loop followed by a block of code for the
iteration.
When you see the other one you first have to parse the variable, then
you will have to assume it's of an iterable type because it has a
forEach method, at which point you'd know the argument is a code block
belonging to the iteration over the elements in the variable.

Just a thought.

/Clement

Sam McCall

unread,
Mar 10, 2012, 11:11:36 AM3/10/12
to Clement, General Dart Discussion
On Sat, Mar 10, 2012 at 1:09 PM, Clement <cleme...@gmail.com> wrote:
> On 9 Mar., 08:24, Ladislav Thon <ladi...@gmail.com> wrote:
>> I stand corrected here. Yes, having something like List<String> get
>> chars(), List<int> get codePoints(), List<String> get words(), List<String>
>> get lines() is definitely better. The main point stays, though: I don't
>> think that having String implement Iterable<whatever> is a good idea.
>
> Care to elaborate?
Strings are very commonly used for tasks that look at them as atomic
units, rather than sequences of characters. For example: UI text,
identifiers in protocols, etc.
So while you can argue that a string is-a rather than has-a sequence
of characters, in practical use that's often not the case; so an
explicit conversion makes the code easier to read and less ambiguous.

(And yes, it's ambiguous - Ruby used to iterate over lines by default,
which is perfectly sensible for a language with Perl heritage that was
initially largely used for similar tasks. Eventually they deprecated
it in favor of each_line and each_char to remove the ambiguity. So
iterating characters will confuse at least rubyists and likely perl
hackers).

> Personally I find that one of the basic things I expect of any modern
> language is an ability to deal with strings and common operations on
> them. Especially any language aimed at the web.
> Iteration over strings is just one of those things that have cropped
> up often enough for me to expect it to be there.

Nobody's suggesting it shouldn't exist, just that it doesn't need to
live in a particular spot in the API just because python has it there
:-)


> While I love the idea of the forEach and friends (please, when can we
> have this?), I can't help thinking why make away with for..in now that
> it's there? I can easily see a world where the two co-exist.

There's a cost to having the for..in language feature (spec, parser,
compiler support, plus the complexity of learning the language) and
little to no benefit, IMO.

> I think code clarity might also be worth considering:
>
>  for (var e in myVariable) {
>    // code ..
>  }
>
>  myVariable.forEach((e) {
>    // code ..
>  });
>
> For simple loops like these the for..in feels cleaner and therefore
> more readable.
> When scanning a large piece of code the second you see "for .. " you
> know you've got a for loop followed by a block of code for the
> iteration.
> When you see the other one you first have to parse the variable, then
> you will have to assume it's of an iterable type because it has a
> forEach method, at which point you'd know the argument is a code block
> belonging to the iteration over the elements in the variable.

I think this is just familiarity - I can quickly infer the loop from
the indentation, it's much easier to find the data being operated on
when it's on the left before the dot (just like any other method
call).
Basically it's a wash, you've got exactly the same elements just in a
different order.
(Except for the extra closing paren after .forEach, I wish we could do
something about that).

Familiarity is worth considering, but bear in mind that in JS, this is
iteration:
list.forEach(function(x) { ... });
while this is something different (and usually wrong):
for (var x in list) { ... }

Given the positioning of Dart, familiarity to JSers is quite important.

Cheers,
Sam

Mark Bennett

unread,
Mar 19, 2012, 3:39:53 PM3/19/12
to General Dart Discussion
Just wanted to pipe in and say that I definitely support moving away
from a for..in syntax to a forEach() function. My main reasoning being
that this allows other objects to implement this behaviour in the
future without special syntax support from the parser. Though for..in
isn't a complicated anything that can be removed from the parser
without losing features should be considered. I also agree the
forEach() may be more familiar to programmers used to JS.

On another note, the Iterable mixin is the main reason I support
adding some form of mixins or default implementations to the language,
as I've found them invaluable in Ruby and JS (via extend()).

On Mar 10, 10:11 am, Sam McCall <sammcc...@google.com> wrote:

Ladislav Thon

unread,
Mar 19, 2012, 4:22:52 PM3/19/12
to Mark Bennett, General Dart Discussion
Just wanted to pipe in and say that I definitely support moving away
from a for..in syntax to a forEach() function. My main reasoning being
that this allows other objects to implement this behaviour in the
future without special syntax support from the parser.

I love forEach(), but remember that you can't break/continue in it. You don't always need it, but you need it a lot. Non-local return is a no-go in Dart, that's why we need for..in.
 
On another note, the Iterable mixin is the main reason I support
adding some form of mixins or default implementations to the language,
as I've found them invaluable in Ruby and JS (via extend()).

+1, Ruby's Enumerable FTW!

LT
Reply all
Reply to author
Forward
0 new messages