String.split option to ignore empty parts?

21,995 views
Skip to first unread message

Toby

unread,
Apr 15, 2013, 10:32:38 AM4/15/13
to mi...@dartlang.org
Basically String.split works great if your data is 'A,B,C' and you do .split(','), but it doesn't behave as desired if your data is 'A\nB\nC\n' and you do .split('\n'), always leaving you with an empty element at the end of your resulting list.

As splitting strings that are 'end-delimited' instead of 'mid-delimited' is a very common operation, and virtually every other language/library in existence handles this as expected, it'd be nice if this behavior changed for Dart as well (or at the very least there was an explicit option for it).

Having the explicit option is also very convenient for scrubbing out stuff like 'A,B,,,E' and 'A\nB\n\n\n\n\nC\n', although I'd personally prefer the default behavior to not give me empty split parts unless I explicitly ask for them.

Regards,
Toby

Aza Tek

unread,
Apr 15, 2013, 10:40:01 AM4/15/13
to mi...@dartlang.org

Whilst on this subject, would it not be best if the split(...) method was renamed to toList(...)?

--
Consider asking HOWTO questions at Stack Overflow: http://stackoverflow.com/tags/dart
 
 

Ahmet A. Akın

unread,
Apr 15, 2013, 11:28:01 AM4/15/13
to mi...@dartlang.org
Perhaps Guava's Splitter (http://docs.guava-libraries.googlecode.com/git/javadoc/com/google/common/base/Splitter.html) can be an example for this. But that is a whole different class. Perhaps as you said, some sensible defaults and optional parameters would help on this (Default pattern should be white spaces, trimming results etc). Also, I think that method should return an Iterable. not a List.

 

Jos Hirth

unread,
Apr 15, 2013, 1:14:03 PM4/15/13
to mi...@dartlang.org
split() is the inverse of join().

So, if you split ",," you want indeed an array of 3 empty strings, because joining 3 empty strings with "," would result in ",,".

If you want to get rid of empty strings, you can use where() to filter it down:

print('a,,,b,c'.split(',').where((s) => !s.isEmpty).toList());

Output:

[a, b, c]

Toby

unread,
Apr 15, 2013, 3:22:11 PM4/15/13
to mi...@dartlang.org
I don't think split/join orthogonality is nowhere near as useful as split to scrubbed, significant data. Especially in a web environment, where string data has until recently been the only dynamically available data type, it's a very common operation to grab text with an HTTP request, and split/tokenize it into actually useful data. It is, IMO, much much less common to split and then join that data back into an identically formed string.

Of course I could filter out what I don't want by writing more code, that's what I'm currently forced to do. But that goes for any problem, doesn't it? Why provide a LinkedHashMap? I could easily build that myself using a Map and a List.

Obviously, this being optional would accommodate both our illusions of how it ought to work :)

Alex Tatumizer

unread,
Apr 15, 2013, 3:46:24 PM4/15/13
to mi...@dartlang.org
Split is just a part of functionality related to parsing. Other things can be accomplished with trim(), replaceMatching and other regexp methods they are VERY POWERFUL. Please note that "replace" doesn't necessarily mean that you have to replace anything - it simply calls you with each matching token.
Split itself is very easy to accomplish using "replace".

I'm for keeping split simple - when it's not enough, use regexps.
As for the original problem (removing extra \n) just trim + split would be enough.


Jos Hirth

unread,
Apr 15, 2013, 3:48:38 PM4/15/13
to mi...@dartlang.org
Well, if you don't want empty strings in your list, you shouldn't output them in first place. How did they get there anyways?

Outputting empty strings only makes sense if the index is important. If the index is important, you can't just throw them away.

Toby

unread,
Apr 15, 2013, 4:56:24 PM4/15/13
to mi...@dartlang.org
Perhaps you should read the original post :)  I'm splitting a document into lines, i.e. splitting on \n. If the document ends with a newline, you end up with an additional empty 'line' in the split list.

Of course I can write code to throw that away, but there's a number of cases where data is naturally structured in a way that split will yield an empty end of the list. The fact that virtually every other programming environment has a built-in solution for this, offers some backing to the claim that I'm not the first/only to encounter a useful case for such functionality.

Alex Tatumizer

unread,
Apr 15, 2013, 5:32:41 PM4/15/13
to mi...@dartlang.org
Why dont you just use
replaceFirst('\n*$','').split("\n");

???

Toby

unread,
Apr 15, 2013, 5:49:08 PM4/15/13
to mi...@dartlang.org
Partially because I think priming a very long string with a regexp replace is a bad idea, but primarily because I'd prefer not to worry about it at all. Incidentally, that's what e.g. StringSplitOptions.RemoveEmptyEntries does for me in nearly any applicable case.

It's not like removing that/those empty elements is some kind of insurmountable programming challenge, nor is this some kind of end all be all feature request. It's rather that I find a built-in option massively usable, and very clear in intent compared to the manual alternatives, and I would love if Dart offered the same convenience.

Bob Nystrom

unread,
Apr 15, 2013, 6:40:48 PM4/15/13
to General Dart Discussion

On Mon, Apr 15, 2013 at 2:49 PM, Toby <to...@shaggydog.biz> wrote:
It's not like removing that/those empty elements is some kind of insurmountable programming challenge, nor is this some kind of end all be all feature request. It's rather that I find a built-in option massively usable, and very clear in intent compared to the manual alternatives, and I would love if Dart offered the same convenience.

Personally, I'm OK with Jos' suggestion of just using .where(). Another option is:

 'A\nB\nC\n'.trim().split('\n')

But I also wouldn't mind something like: "a,,b".split(',', skipEmpty: true). File a bug?

Cheers,

- bob

Matthew Butler

unread,
Apr 15, 2013, 8:37:55 PM4/15/13
to mi...@dartlang.org
For me, the str.trim().split('\n'); is almost always the solution I'd want. I never want split to omit empty values without being told. For instance, working with csv (or tsv) files. I expect that last value to be there, even if it's empty, than to get a range error because one of the values was skipped (which one?)
Reply all
Reply to author
Forward
0 new messages