Prefix tree creation in clojure

89 views
Skip to first unread message

rahulpilani

unread,
Dec 30, 2011, 6:21:41 PM12/30/11
to Clojure
I am relatively new to clojure and still trying to learn my way
around. In my day job, my primary language is Java, so breaking the
imperative and mutable habit was kind of tough for me.

To get my feet wet, I wrote a small set of functions to implement a
prefix tree. It's larger than something I feel comfortable posting
here, so I created a gist instead:
https://gist.github.com/1541958

I would appreciate any pointers on style or if I could implement it
any better.

Thanks,
Rahul

Stephen Compall

unread,
Jan 1, 2012, 5:55:44 PM1/1/12
to clo...@googlegroups.com
On Fri, 2011-12-30 at 15:21 -0800, rahulpilani wrote:
> https://gist.github.com/1541958
>
> I would appreciate any pointers on style or if I could implement it
> any better.

Here are some style pointers on 74487e9, though they may be more
specific than you were looking for.

> 1: (ns prefix-tree)

While this is just a sample, namespaces without at least one `.' are
discouraged. I favor the Java convention (prefix with backwards
Internet domain that you actually have), but I'm starting to see
things like `slingshot.slingshot'.

> 3: ;Record representing a node in the prefix tree.

You'll find autoindentation easier if you follow something like the
Emacs conventions for semicolon count:

https://www.gnu.org/software/emacs/manual/html_node/elisp/Comment-Tips.html

> 11: (let [a-length (.length a) b-length (.length b)]

Despite its name, the builtin `count' function is better for this.

> 15:(defn match-string [label string]

You may find a simpler definition overall by taking advantage of the
fact that `map' takes more than one sequence; start with `(map = label
string)'.

> 17: (defn mismatch-index [seq]

Unlike Scheme, this won't make a lexical `mismatch-index'; it'll
replace the current top-level binding for `mismatch-index'. So this
makes `match-string' non-reentrant.

> 24: (.charAt string index)))) seq)))

Having `seq' here obfuscates its relationship with the rest of the
expression. I favor:

(filter (fn [index]
(not (= (.charAt label index)
(.charAt string index))))
seq)

In other words, always starting the first argument to a function on
the same line as the function, except as a last resort. As you saw
later in the file, `->' and `->>' can help with indentation problems
caused by heavy nesting.

> 33: (let [substring (.substring string start end)]

Try the builtin `subs'.

> 34: (if-not (empty? substring) substring nil))))

Try `(not-empty substring)'.

> 41: (hash-map :common (substring-or-nil label 0 match) :label-substring (substring-or-nil label match) :string-substring (substring-or-nil string match))))

Unlike other Lisps with literal vector/hash syntax, Clojure evaluates
the contents, making these syntaxes not so "literal" in Clojure. So
{:common ...} is better here.

> 47: (defn first-char [string]
> 48: "Return the first character of a string."
> 49: (.charAt string 0))

This is `first', except that it doesn't do the same thing for empty
strings.

> 63: (assoc node :edges (apply assoc (.edges node) (edge-entry label child)))

:edges is better than .edges here. Accordingly, the builtin
`update-in' can simplify this.

> 70: (Node. (conj (.strings node) string) edges)))

It remains to be seen, but it seems likely to me that the new (1.3)
`map->Node' and `->Node' will become idiomatic here.


--
Stephen Compall
^aCollection allSatisfy: [:each|aCondition]: less is better

Cedric Greevey

unread,
Jan 1, 2012, 10:10:08 PM1/1/12
to clo...@googlegroups.com
On Sun, Jan 1, 2012 at 5:55 PM, Stephen Compall
<stephen...@gmail.com> wrote:
> On Fri, 2011-12-30 at 15:21 -0800, rahulpilani wrote:
>>   1: (ns prefix-tree)
>
> While this is just a sample, namespaces without at least one `.' are
> discouraged.  I favor the Java convention (prefix with backwards
> Internet domain that you actually have), but I'm starting to see
> things like `slingshot.slingshot'.

Probably because not everyone actually has their own Internet domain. :)

>>  34:       (if-not (empty? substring) substring nil))))
>
> Try `(not-empty substring)'.

Odd that this isn't named not-empty? with a ? character.

> It remains to be seen, but it seems likely to me that the new (1.3)
> `map->Node' and `->Node' will become idiomatic here.

Where are these documented? Better yet, all the differences from 1.2
in 1.3? There's one frequently-referenced page about the numerics
changes, but that's obviously not the full extent of 1.3's changes.

Stephen Compall

unread,
Jan 1, 2012, 10:31:49 PM1/1/12
to clo...@googlegroups.com
On Sun, 2012-01-01 at 22:10 -0500, Cedric Greevey wrote:
> Odd that this isn't named not-empty? with a ? character.

Not at all; it's not a predicate. See also `every?' versus `some'.

> Where are these documented? Better yet, all the differences from 1.2
> in 1.3? There's one frequently-referenced page about the numerics
> changes, but that's obviously not the full extent of 1.3's changes.

http://dev.clojure.org/display/design/defrecord+improvements

The mostly complete master list is changes.txt or changes.md in the
source.

Cedric Greevey

unread,
Jan 1, 2012, 11:09:31 PM1/1/12
to clo...@googlegroups.com
On Sun, Jan 1, 2012 at 10:31 PM, Stephen Compall
<stephen...@gmail.com> wrote:
> On Sun, 2012-01-01 at 22:10 -0500, Cedric Greevey wrote:
>> Odd that this isn't named not-empty? with a ? character.
>
> Not at all; it's not a predicate.  See also `every?' versus `some'.

It's useful as an if or cond clause; that to me says "predicate".
"Some" is a more borderline case, since its name refers to the element
it returns if one fits, rather than to a yes-or-no property like
"empty" or "not empty". A question-name for using the behavior of the
some function as a predicate would probably be better off as "any?"
rather than "some?".

>> Where are these documented? Better yet, all the differences from 1.2
>> in 1.3? There's one frequently-referenced page about the numerics
>> changes, but that's obviously not the full extent of 1.3's changes.
>
> http://dev.clojure.org/display/design/defrecord+improvements

That's obviously not the full extent of 1.3's changes any more than
the numerics changes are.

> The mostly complete master list is changes.txt or changes.md in the
> source.

And that will obviously be chock-full of internals changes and
miscellaneous tweaks and not just the user-visible feature
changes/additions, aimed more at developers of Clojure itself than at
developers using Clojure to make other things.

I'm asking where there's a page oriented at Clojure's *users* and
listing all of the feature/semantic/API changes from 1.2. Something to
get someone more up to speed on the new version's differences without
having to pore over all of the API docs and other docu looking for
things they don't remember from 1.2.

Cedric Greevey

unread,
Jan 1, 2012, 11:16:19 PM1/1/12
to clo...@googlegroups.com
On Sun, Jan 1, 2012 at 11:09 PM, Cedric Greevey <cgre...@gmail.com> wrote:
> On Sun, Jan 1, 2012 at 10:31 PM, Stephen Compall
> <stephen...@gmail.com> wrote:
>> On Sun, 2012-01-01 at 22:10 -0500, Cedric Greevey wrote:
>>> Odd that this isn't named not-empty? with a ? character.
>>
>> Not at all; it's not a predicate.  See also `every?' versus `some'.
>
> It's useful as an if or cond clause; that to me says "predicate".
> "Some" is a more borderline case, since its name refers to the element
> it returns if one fits, rather than to a yes-or-no property like
> "empty" or "not empty". A question-name for using the behavior of the
> some function as a predicate would probably be better off as "any?"
> rather than "some?".

Just to add to this, it's also the case that you're likely to be using
the return value of some, beyond conditioning on its truthiness; you
usually use "some" because you want the element itself right away, and
only occasionally because (for now) you just want to see if the
collection has a suitable element. On the other hand, the return value
of not-empty is generally an object you already had in hand. Other
than as a shorthand for (if (not-empty x) x) the return value is
unlikely to be used very often for more than its truthiness.

> And that will obviously be chock-full of internals changes and
> miscellaneous tweaks and not just the user-visible feature
> changes/additions, aimed more at developers of Clojure itself than at
> developers using Clojure to make other things.

And likely organized in some less-than-desirable manner, such as
chronologically or even more or less randomly (e.g. alphabetically by
the name of an affected component or function), rather than grouped by
related changes/feature set (numerics changes in one spot; record
changes in one spot; new seq API functions, if any, in one spot; new
map API functions, if any, in one spot; etc.).

Stephen Compall

unread,
Jan 2, 2012, 5:28:59 PM1/2/12
to clo...@googlegroups.com
On Sun, 2012-01-01 at 23:16 -0500, Cedric Greevey wrote:
> And that will obviously be chock-full of internals changes and
> miscellaneous tweaks and not just the user-visible feature
> changes/additions, aimed more at developers of Clojure itself than at
> developers using Clojure to make other things.

No, it's not. That stuff is in `git log'.

> Other than as a shorthand for (if (not-empty x) x) the return value is
> unlikely to be used very often for more than its truthiness.

Most library functions are shorthand for something else, and much of
their value lies in capturing a common idiom. `not-empty' is used
several times in core.clj itself, and only once as a truth test.
(Moreover, that instance, in `underive', deviates from common Clojure
style in other ways.)

The usual function to use for truth in this situation is `seq', not
`not-empty'. See core.clj:

(defn empty?
"Returns true if coll has no items - same as (not (seq coll)).
Please use the idiom (seq x) rather than (not (empty? x))"

...and several past discussions on this list not otherwise relevant to
this thread.

Cedric Greevey

unread,
Jan 2, 2012, 11:47:42 PM1/2/12
to clo...@googlegroups.com
On Mon, Jan 2, 2012 at 5:28 PM, Stephen Compall
<stephen...@gmail.com> wrote:
> On Sun, 2012-01-01 at 23:16 -0500, Cedric Greevey wrote:
>> And that will obviously be chock-full of internals changes and
>> miscellaneous tweaks and not just the user-visible feature
>> changes/additions, aimed more at developers of Clojure itself than at
>> developers using Clojure to make other things.
>
> No, it's not.  That stuff is in `git log'.

Anything that's packaged with the source rather than with the binary,
and/or is on the repo site and/or the dev.* site rather than on the
main web site, is thereby more or less being *stated* to be intended
for the software's developers rather than its users.

> Most library functions are shorthand for something else, and much of
> their value lies in capturing a common idiom.  `not-empty' is used
> several times in core.clj itself, and only once as a truth test.

Not a representative sample; core.clj does a lot of things weirdly due
to things not yet being defined at a particular point, etc.

> The usual function to use for truth in this situation is `seq', not
> `not-empty'.

Yes; not-empty does seem rather redundant with that, other than for
the narrow and unusual case where you'd otherwise have (if (seq x) x).
But it's more likely you'd have (if (seq x) (...more complex
expression using x) (...alternative)) or (if-let [x (seq x)] ...).
About the only spot I can think of for using it, offhand, is when
calling a function and passing an argument that should be a collection
or nil, and wanting the collection type unmodified if it isn't nil, so
(seq x) won't do. All of the subcases I can think of offhand, in turn,
*do* work fine with (seq x), or even with a non-nil empty collection,
for example concat.

The case above, where you want to nil out substring if it's not empty
but not disturb its stringness, is actually about the only one I think
likely, and there are probably other ways to solve the problem,
probably by accepting an empty string instead of nil or by moving the
empty check to where a nil check is currently used. Nil punning is
clever, but sometimes it's cleverness purely for its own sake and at
the expense of tangling some of the rest of the code.

> ^aCollection allSatisfy: [:each|aCondition]: less is better

Eh, another (ex-)Smalltalk hacker?

Reply all
Reply to author
Forward
0 new messages