Documentation of clojure.string/split is confusing

94 views
Skip to first unread message

Oliver

unread,
Jan 12, 2021, 3:17:28 PM1/12/21
to Clojure
Happy new year, folks.

(Might not be the best place to post this, but I failed to find a better one, please advise if there is. I've also tried to look for previous comments about this. If there are, then my Google-fu was simply too weak.)

------
Usage:
(split s re)
(split s re limit)

Splits string on a regular expression. Optional argument limit is the maximum number of splits. Not lazy. Returns vector of the splits.
------

Where I initially understand "number of splits" to be the number of matches acted upon, that's where a split occurs. I realize there's another meaning of "split", meaning fragment, and that is the one meant here. "Returns vector of the splits." hints at that as well, but it literally says "splits string on..., ... maximum number of splits.".

And even in the same docs a few lines later the example for this is phrased like this:
------
;; Note that the 'limit' arg is the maximum number of strings to
;; return (not the number of splits)
user=> (str/split "q1w2e3r4t5y6u7i8o9p0" #"\d+" 5) ["q" "w" "e" "r" "t5y6u7i8o9p0"]
------
Contradicting the upper description with "(not the number of splits)".

I've run into this more than once now, because different languages do this differently. Whenever I go to the documentation, I read the description and am satisfied that I've grokked it, just to be mocked by my off-by-one code minutes later.

Does anyone agree that this might be worth changing?
I think at least the contradiction in the example should be addressed, but I think a better solution would be to rephrase the main description to be unambiguous.

I've also looked up a few documentations in other languages. Those that limit the number of "splits" in my sense (number of matches at which to split) use the word "splits", those with a behaviour similar to that of Clojure explicitly talk about the maximum number of resulting elements/substrings/strings, and I think that would make it much clearer.

Apologies for the length!

Cheerio
  Oliver

Alex Miller

unread,
Jan 12, 2021, 7:58:55 PM1/12/21
to Clojure
The best place to ask questions like this is at https://ask.clojure.org, which is the official forum to file requests/bug reports (and get turned into tickets after triage there).

It turns out you are not the first to point this out and it has already been filed at https://clojure.atlassian.net/browse/CLJ-1857 and was then duped to include changes in https://clojure.atlassian.net/browse/CLJ-1360 which is still open. I will try to get that included in 1.11.

You can (and should!) vote for this issue at https://ask.clojure.org/index.php/4282/doc-that-clojure-string-split-strips-trailing-delimiters (which is the ask.clojure question corresponding to that last ticket). Votes really do matter in our prioritization!

Alex

Peter Strömberg

unread,
Jan 13, 2021, 2:40:03 AM1/13/21
to clo...@googlegroups.com
About the examples. clojuredocs.org is a community driven, crowd sourced, service. Clarifying some doc string ambiguities is exactly how an example could help.

--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clo...@googlegroups.com
Note that posts from new members are moderated - please be patient with your first post.
To unsubscribe from this group, send email to
clojure+u...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
---
You received this message because you are subscribed to the Google Groups "Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email to clojure+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/clojure/0c8683ae-cb1a-4555-ad91-bd92095532abn%40googlegroups.com.

Oliver

unread,
Jan 13, 2021, 9:11:04 AM1/13/21
to Clojure
Awesome, thank you.

I'll use that in the future, and I voted for the issue with the proposed patch. I think "parts" makes it much clearer, and then the example also is spot on as it is.

Ciao!

Reply all
Reply to author
Forward
0 new messages