Proposed Change to str-utils

148 views
Skip to first unread message

Sean

unread,
Mar 23, 2009, 9:18:14 PM3/23/09
to Clojure
Hello Everyone,
I've been reviewing the str-utils package, and I'd like to propose a
few changes to the library. I've included the code at the bottom.

USE MULTI-METHODS

I'd like to propose re-writing the following methods to used multi-
methods. Every single method will take an input called input-string.

*re-split[input-string & remaining-inputs](...)*

The remaining inputs can be dispatched based on a regex pattern, a
list of patterns, or a map.

regex pattern - splits a string into a list, like it does now.
e.g. (re-split "1 2 3\n4 5 6" #"\n") => ("1 2 3" "4 5 6")

list - this splits each element either like a map or a regex. The map
operator is applied recursively to each element
e.g. (re-split "1 2 3\n4 5 6" (list #"\n" #"\s+")) => (("1" "2" "3")
("4" "5" "6"))

map - this splits each element based on the inputs of the map. It is
how options are passed to the method.
e.g (re-split "1 2 3" {:pattern #"\s+" :limit 2 :marshal-fn #
(java.lang.Double/parseDouble %)}) => (1.0 2.0)
The :pattern and :limit options are relatively straightforward.
The :marshal-fn is mapped after the string is split.

These items can be chained together, as the following example shows
e.g. (re-split "1 2 3\n4 5 6" (list #"\n" {:pattern #"\s+" :limit
2 :marshal-fn #(java.lang.Double/parseDouble %)})) => ((1.0 2.0) (4.0
5.0))

In my opinion, the :marshal-fn is best used at the end of the list.
However, it could be used earlier in the list, but a exception will
most likely be thrown.


*re-partion[input-string & remaining-inputs]

This methods behaves like the original re-partition method, with the
remaining-inputs being able to a list or a pattern. I don't see a
need to change the behavior of this method at the moment.

*re-gsub[input-string & remaining-inputs]

This method can take a list or two atoms as the remaining inputs.

Two atoms -
e.g. (re-gsub "1 2 3 4 5 6" #"\s" "") => "123456"

A paired list
e.g (re-gsub "1 2 3 4 5 6" '((#"\s" " ) (#"\d" "D"))) => "DDDDDD"

*re-sub[input-string & remaining-inputs]

Again, this method can take a list or two atoms as the remaining
inputs.

Two atoms
e.g. (re-sub "1 2 3 4 5 6" #"\d" "D") => "D 2 3 4 5 6"

A paired list
e.g (re-sub "1 2 3 4 5 6" '((#"\d" "D") (#"\d" "E"))) => "D E 3 4 5 6"

NEW PARSING HELPERS
I've created four methods, str-before, str-before-inc, str-after, str-
after-inc. They are designed to help strip off parts of string before
a regex.

(str-before "Clojure Is Awesome" #"\s") => "Clojure"
(str-before-inc "Clojure Is Awesome" #"\s") => "Clojure "
(str-after "Clojure Is Awesome" #"\s") => "Is Awesome"
(str-after-inc "Clojure Is Awesome" #"\s") => " Is Awesome"

These methods can be used to help parse strings

(str-before (str-after "<h4 ... >" #"<h4") ">") => ;the stuff in the
middle

NEW INFLECTORS
I've added a few inflectors that I am familiar with from Rails. My
apologies if their origin is anther language. I'd be interested in
knowing where the method originated

str-reverse
This methods reverses a string
e.g. (str-reverse "Clojure") => "erujolC"

trim
This is a convenience wrapper for the trim method java supplies
e.g. (trim " Clojure ") => "Clojure"

strip
This is an alias for trim. I accidently switch between *trim* and
*strip* all the time.
e.g. (strip " Clojure ") => "Clojure"

ltrim
This method removes the leading whitespace
e.g. (ltrim " Cloure ") => "Clojure "

rtrim
This method removes the trailing whitespace
e.g. (ltrim " Cloure ") => " Clojure"

downcase
This is a convenience wrapper for the toLowerCase method java supplies
e.g. (downcase "Clojure") => "clojure"

upcase
This is a convenience wrapper for the toUpperCase method java supplies
e.g. (upcase "Clojure") => "CLOJURE"

capitalize
This method capitalizes a string
e.g (capitalize "clojure") => "Clojure"

titleize, camelize, dasherize, underscore
These methods manipulate "sentences", producing a consistent output.
Check the unit tests for more examples
(titleize "clojure iS Awesome") => "Clojure Is Awesome"
(camleize "clojure iS Awesome") => "clojureIsAwesome"
(dasherize "clojure iS Awesome") => "clojure-is-awesome"
(underscore "clojure iS Awesome") => "clojure_is_awesome"

*FINAL THOUGHTS*
There are three more methods, str-join, chop, and chomp that were
already in str-utils. I change the implementation of the methods, but
the behavior should be the same.

There is a big catch with my proposed change. The signature of re-
split, re-partition, re-gsub and re-sub changes. They will not be
backwards compatible, and will break code. However, I think the
flexibility is worth it.

*TO-DOs*
There are a few more things I'd like to add, but that could done at a
later date.

*Add more inflectors

The following additions become pretty easy if the propsed re-gsub is
included:

*Add HTML-escape function (like Rails' h method)
*Add Javascript-escape function (like Rails' javascript-escape method)
*Add SQL-escape function

Okay, that's everything I can think of for now. I'd like to thank the
Stuart Sierra, and all of the contributors to this library. This is
possible because I'm standing on their shoulders.

Oh, and I apologize for not putting this up on github, especially
after I asked someone else to do the same yesterday. I'll try not to
be so hypocritical going forward.

*CODE*

(ns devlinsf.str-utils)

;;; String Merging & Slicing

(defn str-join
"Returns a string of all elements in 'sequence', separated by
'separator'. Like Perl's 'join'."
[separator sequence]
(apply str (interpose separator sequence)))


(defmulti re-split (fn[input-string & remaining-inputs] (class (first
remaining-inputs))))

(defmethod re-split java.util.regex.Pattern
([string #^java.util.regex.Pattern pattern] (seq (. pattern (split
string)))))

(defmethod re-split clojure.lang.PersistentList
[input-string patterns]
(let [reversed (reverse patterns)
pattern (first reversed)
remaining (rest reversed)]
(if (empty? remaining)
(re-split input-string pattern)
(map #(re-split % pattern) (re-split input-string (reverse
remaining))))))

(defmethod re-split clojure.lang.PersistentArrayMap
[input-string map-options]
(cond (:limit map-options) (take (:limit map-options) (re-split
input-string (dissoc map-options :limit)))
(:marshal-fn map-options) (map (:marshal-fn map-options) (re-split
input-string (dissoc map-options :marshal-fn)))
'true (re-split input-string (:pattern map-options))))

(defmulti re-partition (fn[input-string & remaining-inputs] (class
(first remaining-inputs))))

(defmethod re-partition java.util.regex.Pattern
[string #^java.util.regex.Pattern re]
(let [m (re-matcher re string)]
((fn step [prevend]
(lazy-seq
(if (.find m)
(cons (.subSequence string prevend (.start m))
(cons (re-groups m)
(step (+ (.start m) (count (.group m))))))
(when (< prevend (.length string))
(list (.subSequence string prevend (.length string)))))))
0)))

(defmethod re-partition clojure.lang.PersistentList
[input-string patterns]
(let [reversed (reverse patterns)
pattern (first reversed)
remaining (rest reversed)]
(if (empty? remaining)
(re-partition input-string pattern)
(map #(re-partition % pattern) (re-partition input-string
(reverse remaining))))))

(defmulti re-gsub (fn[input-string & remaining-inputs] (class (first
remaining-inputs))))

(defmethod re-gsub java.util.regex.Pattern
[#^String string #^java.util.regex.Pattern regex replacement]
(if (ifn? replacement)
(let [parts (vec (re-partition regex string))]
(apply str
(reduce (fn [parts match-idx]
(update-in parts [match-idx] replacement))
parts (range 1 (count parts) 2))))
(.. regex (matcher string) (replaceAll replacement))))

(defmethod re-gsub clojure.lang.PersistentList
[input-string regex-pattern-pairs]
(let [reversed (reverse regex-pattern-pairs)
pair (first reversed)
remaining (rest reversed)]
(if (empty? remaining)
(re-gsub input-string (first pair) (second pair))
(re-gsub (re-gsub input-string (reverse remaining)) (first pair)
(second pair)))))


(defmulti re-sub (fn[input-string & remaining-inputs] (class (first
remaining-inputs))))

(defmethod re-sub java.util.regex.Pattern
[#^String string #^java.util.regex.Pattern regex replacement ]
(if (ifn? replacement)
(let [m (re-matcher regex string)]
(if (.find m)
(str (.subSequence string 0 (.start m))
(replacement (re-groups m))
(.subSequence string (.end m) (.length string)))
string))
(.. regex (matcher string) (replaceFirst replacement))))

(defmethod re-sub clojure.lang.PersistentList
[input-string regex-pattern-pairs]
(let [reversed (reverse regex-pattern-pairs)
pair (first reversed)
remaining (rest reversed)]
(if (empty? remaining)
(re-sub input-string (first pair) (second pair))
(re-sub (re-sub input-string (reverse remaining)) (first pair)
(second pair)))))

;;; Parsing Helpers
(defn str-before [input-string regex]
(let [matches (re-partition input-string regex)]
(first matches)))

(defn str-before-inc [input-string regex]
(let [matches (re-partition input-string regex)]
(str (first matches) (second matches))))

(defn str-after [input-string regex]
(let [matches (re-partition input-string regex)]
(str-join "" (rest (rest matches)))))

(defn str-after-inc [input-string regex]
(let [matches (re-partition input-string regex)]
(str-join "" (rest matches))))


;;; Inflectors
;;; These methods only take the input string.
(defn str-reverse
"This method excepts a string and returns the reversed string as a
results"
[input-string]
(apply str (reverse input-string)))


(defn upcase
"Converts the entire string to upper case"
[input-string]
(. input-string toUpperCase))

(defn downcase [input-string]
"Converts the entire string to lower case"
(. input-string toLowerCase))

(defn trim[input-string]
"Shortcut for String.trim"
(. input-string trim))

(defn strip
"Alias for trim, like Ruby."
[input-string]
(trim input-string))

(defn ltrim
"This method chops all of the leading whitespace."
[input-string]
(str-after input-string #"\s+"))

(defn rtrim
"This method chops all of the trailing whitespace."
[input-string]
(str-reverse (str-after (str-reverse input-string) #"\s+")))

(defn chop
"Removes the last character of string."
[input-string]
(subs input-string 0 (dec (count input-string))))

(defn chomp
"Removes all trailing newline \\n or return \\r characters from
string. Note: String.trim() is similar and faster."
[input-string]
(str-before input-string #"[\r\n]+"))

(defn capitalize
"This method turns a string into a capitalized version, Xxxx"
[input-string]
(str-join "" (list
(upcase (str (first input-string)))
(downcase (apply str (rest input-string))))))

(defn titleize
"This method takes an input string, splits it across whitespace,
dashes, and underscores. Each word is capitalized, and the result is
joined with \" \"."
[input-string]
(let [words (re-split input-string #"[\s_-]+")]
(str-join " " (map capitalize words))))

(defn camelize
"This method takes an input string, splits it across whitespace,
dashes, and underscores. The first word is captialized, and the rest
are downcased, and the result is joined with \"\"."
[input-string]
(let [words (re-split input-string #"[\s_-]+")]
(str-join "" (cons (downcase (first words)) (map capitalize (rest
words))))))

(defn dasherize
"This method takes an input string, splits it across whitespace,
dashes, and underscores. Each word is downcased, and the result is
joined with \"-\"."
[input-string]
(let [words (re-split input-string #"[\s_-]+")]
(str-join "-" (map downcase words))))

(defn underscore
"This method takes an input string, splits it across whitespace,
dashes, and underscores. Each word is downcased, and the result is
joined with \"_\"."
[input-string]
(let [words (re-split input-string #"[\s_-]+")]
(str-join "_" (map downcase words))))

;;; Escapees

;TO-DO

;(defn sql-escape[x])
;(defn html-escape[x])
;(defn javascript-escape[x])
;(defn pdf-escape)


*UNIT TESTS*
(ns devlinsf.test-contrib.str-utils
(:use clojure.contrib.test-is
devlinsf.str-utils))

(deftest test-str-reverse
(is (= (str-reverse "Clojure") "erujolC")))

(deftest test-downcase
(is (= (downcase "Clojure") "clojure")))

(deftest test-upcase
(is (= (upcase "Clojure") "CLOJURE")))

(deftest test-trim
(is (= (trim " Clojure ") "Clojure")))

(deftest test-strip
(is (= (strip " Clojure ") "Clojure")))

(deftest test-ltrim
(is (= (ltrim " Clojure ") "Clojure ")))

(deftest test-rtrim
(is (= (rtrim " Clojure ") " Clojure")))

(deftest test-chop
(is (= (chop "Clojure") "Clojur")))

(deftest test-chomp
(is (= (chomp "Clojure \n") "Clojure "))
(is (= (chomp "Clojure \r") "Clojure "))
(is (= (chomp "Clojure \n\r") "Clojure ")))

(deftest test-capitalize
(is (= (capitalize "clojure") "Clojure")))

(deftest test-titleize
(let [expected-string "Clojure Is Awesome"]
(is (= (titleize "clojure is awesome") expected-string))
(is (= (titleize "clojure is awesome") expected-string))
(is (= (titleize "CLOJURE IS AWESOME") expected-string))
(is (= (titleize "clojure-is-awesome") expected-string))
(is (= (titleize "clojure- _ is---awesome") expected-string))
(is (= (titleize "clojure_is_awesome") expected-string))))

(deftest test-camelize
(let [expected-string "clojureIsAwesome"]
(is (= (camelize "clojure is awesome") expected-string))
(is (= (camelize "clojure is awesome") expected-string))
(is (= (camelize "CLOJURE IS AWESOME") expected-string))
(is (= (camelize "clojure-is-awesome") expected-string))
(is (= (camelize "clojure- _ is---awesome") expected-string))
(is (= (camelize "clojure_is_awesome") expected-string))))

(deftest test-underscore
(let [expected-string "clojure_is_awesome"]
(is (= (underscore "clojure is awesome") expected-string))
(is (= (underscore "clojure is awesome") expected-string))
(is (= (underscore "CLOJURE IS AWESOME") expected-string))
(is (= (underscore "clojure-is-awesome") expected-string))
(is (= (underscore "clojure- _ is---awesome") expected-string))
(is (= (underscore "clojure_is_awesome") expected-string))))

(deftest test-dasherize
(let [expected-string "clojure-is-awesome"]
(is (= (dasherize "clojure is awesome") expected-string))
(is (= (dasherize "clojure is awesome") expected-string))
(is (= (dasherize "CLOJURE IS AWESOME") expected-string))
(is (= (dasherize "clojure-is-awesome") expected-string))
(is (= (dasherize "clojure- _ is---awesome") expected-string))
(is (= (dasherize "clojure_is_awesome") expected-string))))

(deftest test-str-before
(is (= (str-before "Clojure Is Awesome" #"Is") "Clojure ")))

(deftest test-str-before-inc
(is (= (str-before-inc "Clojure Is Awesome" #"Is") "Clojure Is")))

(deftest test-str-after
(is (= (str-after "Clojure Is Awesome" #"Is") " Awesome")))

(deftest test-str-after-inc
(is (= (str-after-inc "Clojure Is Awesome" #"Is") "Is Awesome")))

(deftest test-str-join
(is (= (str-join " " '("A" "B")) "A B")))

(deftest test-re-split-single-regex
(let [source-string "1\t2\t3\n4\t5\t6"]
(is (= (re-split source-string #"\n") '("1\t2\t3" "4\t5\t6")))))

(deftest test-re-split-single-map
(let [source-string "1\t2\t3\n4\t5\t6"]
(is (= (re-split source-string {:pattern #"\n"}) '("1\t2\t3"
"4\t5\t6")))
(is (= (re-split source-string {:pattern #"\n" :limit 1})
'("1\t2\t3")))
(is (= (re-split source-string {:pattern #"\n" :marshal-fn #(str %
"\ta")}) '("1\t2\t3\ta" "4\t5\t6\ta")))
(is (= (re-split source-string {:pattern #"\n" :limit 1 :marshal-
fn #(str % "\ta")}) '("1\t2\t3\ta")))
))

(deftest test-re-split-single-element-list
(let [source-string "1\t2\t3\n4\t5\t6"]
(is (= (re-split source-string (list #"\n")) '("1\t2\t3"
"4\t5\t6")))))

(deftest test-re-split-pure-list
(let [source-string "1\t2\t3\n4\t5\t6"]
(is (= (re-split source-string (list #"\n" #"\t")) '(("1" "2" "3")
("4" "5" "6"))))))

(deftest test-re-split-mixed-list
(let [source-string "1\t2\t3\n4\t5\t6"]
(is (= (re-split source-string (list {:pattern #"\n" :limit 1}
#"\t")) '(("1" "2" "3"))))
(is (= (re-split source-string (list {:pattern #"\n" :limit 1}
{:pattern #"\t" :limit 2})) '(("1" "2"))))
(is (= (re-split source-string (list
{:pattern #"\n" :limit 1}
{:pattern #"\t" :limit 2 :marshal-fn #(java.lang.Double/
parseDouble %)}))
'((1.0 2.0))))
(is (= (re-split source-string (list
{:pattern #"\n"}
{:pattern #"\t" :marshal-fn #(java.lang.Double/parseDouble
%)}))
'((1.0 2.0 3.0) (4.0 5.0 6.0))))
(is (= (map #(reduce + %) (re-split source-string (list
{:pattern #"\n"}
{:pattern #"\t" :marshal-fn #(java.lang.Double/
parseDouble %)})))
'(6.0 15.0)))
(is (= (reduce +(map #(reduce + %) (re-split source-string (list
{:pattern #"\n"}
{:pattern #"\t" :marshal-fn #(java.lang.Double/parseDouble
%)}))))
'21.0))
))

(deftest test-re-partition
(is (= (re-partition "Clojure Is Awesome" #"\s+") '("Clojure" " "
"Is" " " "Awesome"))))

(deftest test-re-gsub
(let [source-string "1\t2\t3\n4\t5\t6"]
(is (= (re-gsub source-string #"\s+" " ") "1 2 3 4 5 6"))
(is (= (re-gsub source-string '((#"\s+" " "))) "1 2 3 4 5 6"))
(is (= (re-gsub source-string '((#"\s+" " ") (#"\d" "D"))) "D D D
D D D"))))

(deftest test-re-sub
(let [source-string "1 2 3 4 5 6"]
(is (= (re-sub source-string #"\d" "D") "D 2 3 4 5 6"))
(is (= (re-sub source-string '((#"\d" "D") (#"\d" "E"))) "D E 3 4
5 6"))))

David Nolen

unread,
Mar 23, 2009, 9:27:02 PM3/23/09
to clo...@googlegroups.com
Looks interesting and maybe even very useful. Why not put your code on Github or some other public repo of your liking. It's much nicer than pasting all this code ;)

Sean

unread,
Mar 23, 2009, 9:46:48 PM3/23/09
to Clojure
Okay, it's up. Still new to github. Sorry about that. I *think* it's
here:

http://github.com/francoisdevlin/clojure-str-utils-proposal/tree/master

I'm not sure what the directory structure should be for everything
still. Perhaps somebody can point out how it should be done.

I'll put the original post in the README

Long story short: multi-methods could be awesome

Sean

On Mar 23, 9:27 pm, David Nolen <dnolen.li...@gmail.com> wrote:
> Looks interesting and maybe even very useful. Why not put your code on
> Github or some other public repo of your liking. It's much nicer than
> pasting all this code ;)
>
> >      (re-sub (re-sub...
>
> read more »

Stuart Sierra

unread,
Mar 24, 2009, 2:22:03 PM3/24/09
to Clojure
On Mar 23, 9:46 pm, Sean <francoisdev...@gmail.com> wrote:
> http://github.com/francoisdevlin/clojure-str-utils-proposal/tree/master
>
> I'm not sure what the directory structure should be for everything
> still.  Perhaps somebody can point out how it should be done.
>
> I'll put the original post in the README
>
> Long story short:  multi-methods could be awesome


Hi Sean,

This is interesting. One quick comment, though: the trim, strip,
upcase, and downcase functions don't add anything to the original Java
methods. Rich and others have warned against adding wrapper functions
just to hide Java.

Some of these functions, like capitalize, are available in Apache
Commons Lang, the "StringUtils" class.

-Stuart Sierra

Sean

unread,
Mar 24, 2009, 10:05:50 PM3/24/09
to Clojure
Okay, I've made some changes to my proposed str-utils. I've also got
a few answers to some of the issues Stuart raised.

New Changes

1. re-strip is now lazy
I re-wrote this method to used the re-partition method in str-utils.
This enables the laziness, and helped be consolidate my Java
interactions into fewer functions.

2. re-strip options changed
I removed the ability to pass a :limit option. I replaced it
with :offset and :length options. Seemed to make the function more
flexible :)

3. created nearby function
The nearby function returns a lazy sequence of strings "nearby" the
input string. It's inspired by the Norvig spellchecker example. I'd
like to propose adding this method to the library, because I'm
interested in what uses creative people will have for it.

4. Added README.html
This file contains usage on every method in my proposed str-utils

Response to Stuart's Issues
You've raised some good points with my proposal. However, I think
there is some hidden value in including the functions in the library.

1. Repeated
The main reason I have for including the repeated methods
(trim,strip,downcase,upcase) in the str-utils library is program
flow. Take the following two examples

(map downcase a-list)

(map #(. % toLowerCase) a-list)

It's my purely subjective opinion that the first method reads much
nicer. I'm also a web developer, so I do a lot of string processing.
I have a subjective preference for functions that make code more
concise, and my code will read a little shorter with the downcase
function in it.

I guess my main argument is that a more concise way of stating the
same thing does add value. For example, the jQuery selector can be
called using the function jQuery(). However, it is always written $()
to save time. In my view, the $() shortcut does add value. This is
why I think you should consider adding my repeated methods to str-
utils.

2. Methods in Apache Commons
The second issue you raise is with methods like capitalize, that are
available in Apache Commons. First of all, the exact reasoning I went
through above could easily be applied to the following s-exp

(map capitalize a-list)

However, I believe there is an additional reason for including these
methods in the library. By requiring Apache Commons, you've increased
the number of jars I need to maintain. Granted, a lot of people are
used to using the commons. It is still one more thing that requires
maintenance, though. By making capitalize (and others) part of
contrib, it reduces the amount of work I have to do to get this
functionality.

In conclusion, I'd recommend making the str-utils one of the slickest
libraries in Clojure. As developers, we know how strong string
manipulation makes writing code easier. If Clojure has incredible
string support, developers will be more impressed. Let's go out of
our way to make string manipulation in Clojure easier than in any
other language.

Besides, once we get a kick-ass string library working, we can then
abstract the routines to work on any pattern of symbols, and give our
macro writers a boost.

Sean

pc

unread,
Mar 24, 2009, 10:21:54 PM3/24/09
to Clojure
Hi,

I would generally agree with Stuart that wrapping Java functions is
not a good idea.

However, string functions come up so often that I think that this is
one area where the rule should be broken, if only for readablility.

Making str-utils kick-ass is a great idea.

pc

Perry Trolard

unread,
Mar 25, 2009, 3:08:09 PM3/25/09
to Clojure
Whatever it's worth as a datum, my experience is that I usually find
myself writing upcase, downcase, titlecase functions in most
applications, because

(1) they're prettier & more succinct when passed as first-class
(downcase vs. #(.toLowerCase %))
(2) I can add type hints once, in the downcase, upcase, etc.
functions, instead of doing so at each invocation (#(.toLowerCase
#^String %))

I think (2)'s the most compelling reason. The type-hinting situation
in Clojure is currently pretty impressive, I've found; a relatively
small number of hints strategically placed usually eliminate most or
all of the reflection that occurs in my first draft of functions. But
many string-processing operations -- for whatever reason -- usually
need a manual hint.

I agree that it's not desirable to balloon the Clojure API with thin
wrappers on the Java APIs, but, like pc, think this might be an
exception.

I'm less sure about the other proposed changes to str-utils -- the
variable-arity versions of re-split, -partition, -sub, -gsub. Maybe a
regex-parse lib in contrib would be a better place?

Perry

Sean

unread,
Mar 25, 2009, 4:54:26 PM3/25/09
to Clojure
Perry,

1. Thanks for the tip on using type hints! I just added them to my
code and pushed it to github

2. If you take a close look at my re- * methods, I actually tried to
enforce an arity of 2 on as many methods as I could. This way the
methods would read like so

(re-split input-sting work-instructions)
(re-partition input-sting work-instructions)
(re-gsub input-sting work-instructions)
(re-sub input-sting work-instructions)

However, this didn't quite work with the lowest levels of re-gsub and
re-sub, and forcing a map at the lowest level didn't feel right.

3. Library location is a slight issue. I agree these methods are in
a completely different category than downcase, upcase, etc. The
current str-utils.clj file has these methods in it. That is why they
started there. There may be a case for creating a separate regex-
utils library, and I know I have a few more parsing methods I'd like
to propose in the near future. At the current moment, I personally
prefer to have everything in one file. We'll see how big things get,
though.

To Everyone,
I'd like to add Perry's type-hinting argument to the list of reasons
these changes should be in contrib. Pooling our efforts to create a
high performance version of the code does add value beyond a simple
wrapper.

A *fast*, tested and slick string library is even better than a
tested and slick string library.

Sean

linh

unread,
Mar 25, 2009, 5:56:40 PM3/25/09
to Clojure

> Hi,
>
> I would generally agree with Stuart that wrapping Java functions is
> not a good idea.
>
> However, string functions come up so often that I think that this is
> one area where the rule should be broken, if only for readablility.
>

I agree, I use these string functions frequently. Maybe these String
wrapper functions can be in their own namespace to make it explicit
that these are wrapper functions.

Tom Faulhaber

unread,
Mar 26, 2009, 1:55:36 AM3/26/09
to Clojure
Having great string and regex manipulation is a must for anything that
will be used as a scripting language and I think this should be
conveniently available in Clojure.

So, yes, I agree that these functions are ones that it makes sense to
wrap.

I'm not (yet, at least) commenting specifically on Sean's proposal,
because I haven't had time to look at it in depth,

Tom

Sean

unread,
Mar 28, 2009, 12:07:33 PM3/28/09
to Clojure
Yesterday a coworker need a few excel spreadsheets (tab delimited)
stitched together. I took this as an opportunity to test run my
proposed string functions. Here's what I found:

Good stuff
* Having re-split be lazy is awesome. This made partially traversing
a row super quick.
* The str-before/after methods were useful for trimming headers.
* Having :offset in re-split was also useful for skipping the first
few lines.
* Passing an options map is useful.
* I needed to create a hashmap with one of the columns as a key.
Having trim as a clojure function made the code cleaner.
* I couldn't have delivered what my coworker needed without the
currently existing chop and chomp functions. Thanks to the original
str-utils authors!

Bad Stuff
* Some of the proposed mutlimethod functions look cool on paper, but
don't get used in practice. Suppling a list to re-split and re-
partition is not as useful as I thought, and I needed to use a normal
map operation instead
* The :marshal-fn parameter wasn't as useful as I thought, either.
Again, I used a normal map operation instead.

The biggest challenge was learning to write the stitcher application
in a functional style. Once my brian started to make the switch from
Ruby to Clojure, it really helped. Anyway, the code it on github, and
now has a build.xml file, so you can build that jar. I'd really
appreciate more feedback on the library.

I hope other people find this useful. Happy Hacking!
Reply all
Reply to author
Forward
0 new messages