Yes, much of the initial excitement around Clojure comes from the
feeling of "Wow, I can do so much with so little code". But at some
point, all projects grow. I'm thinking that by now, there may be
enough people using Clojure in large projects and on large teams to
offer some good feedback about how well that works.
My Clojure codebase is somewhere around 2-3kloc and I already feel
like I'm bumping up against some frustration when it comes time to
refactor, maintain, and extend the code, all while keeping up with
ongoing changes to libraries, contrib structures, and Clojure
versions.
I want to hear war stories from those with even larger code bases than
mine. Has it proven to be a major hassle on large projects to avoid
circular dependencies in the modules? Are the lack of debugging
tools, documentation tools, and refactoring tools holding you back?
Anyone miss static typing?
One of my main gripes is that some of Clojure's built-ins return
nonsensical results (or nil), rather than errors, for certain classes
of invalid inputs. To me, one of the main benefits of functional
programming is that debugging is generally easier, in large part
because failures usually occur within close proximity of the flaw that
triggered the failure. Erlang, in particular, has really promoted the
idea of "fail fast" as a way to build robust systems. But Clojure's
lack of a "fail-fast" philosophy has burned me several times, with
hard-to-track-down bugs that were far-removed from the actual cause.
The larger my code grows, the more this annoys me, reminding me too
much of my days tracking down bugs in imperative programs.
One specific example of this is get, which returns nil whenever the
first input isn't something that supports get. For example, (get 2 2)
produces nil. This becomes especially problematic when you pass
something to get that seems like it should support get, but doesn't.
For example, (get (transient #{1}) 1) produces nil, when there's
absolutely no reason to think that (get (transient #{1} 1) would
behave any differently from ((transient #{1}) 1).
I have a codebase with 2.6kloc of production code and 4.8kloc of tests, and I feel your pain (even despite having been a Lisp programmer in the early 80's). I'm not sure yet how to navigate the transition to 1.3 while retaining backwards compatibility. And organizing things into namespaces is something I still haven't figured out.
Russ Olsen said on this list: "The community behind a language and the techniques that it develops are as much a part of the language as the syntax." I think we, the community, need to step up and figure out these techniques and *publicize* them. I hope the core team can provide the infrastructure/support to make that work.
I was moderately heavily involved in the Ruby world starting in 2001 up until some time before Rails took the world by storm. There was a ton of inadvertent preparatory work done by people like Pragmatic Dave Thomas, Chad Fowler, Nathaniel Talbott, and Jim Weirich. We'd do well to learn from their oral histories of the early days of Ruby.
-----
Brian Marick, Artisanal Labrador
Contract programming in Ruby and Clojure
Occasional consulting on Agile
www.exampler.com, www.twitter.com/marick
But Clojure's
lack of a "fail-fast" philosophy has burned me several times, with
hard-to-track-down bugs that were far-removed from the actual cause.
The larger my code grows, the more this annoys me, reminding me too
much of my days tracking down bugs in imperative programs.
> Ideally, I was hoping to start a more in-depth discussion about the
> pros and cons of "programming in the large" in Clojure than just
> waxing poetic about Clojure/Lisp's capabilities in the abstract :)
>
> Yes, much of the initial excitement around Clojure comes from the
> feeling of "Wow, I can do so much with so little code". But at some
> point, all projects grow. I'm thinking that by now, there may be
> enough people using Clojure in large projects and on large teams to
> offer some good feedback about how well that works.
>
> My Clojure codebase is somewhere around 2-3kloc and I already feel
> like I'm bumping up against some frustration when it comes time to
> refactor, maintain, and extend the code, all while keeping up with
> ongoing changes to libraries, contrib structures, and Clojure
> versions.
We have above 6.5K lines of Clojure (src only) growing and it's all structured with name spaces.
We still have a mixed code base here (Java + Clojure + JRuby) and we had already
name spaces to structure the code.
The code base is structured in 10 different projects.
We use Eclipse and CounterClockWise for dev. Dev coding/testing is done in Eclipse
by specifying projects in dependencies.
We use leinigen to build these for Q/A and prod.
Moving from 1.0 to 1.2 was not painful. We did it methodically. With basic tests in each
project, we spotted issues quite fast. We rolled this over a week roughly.
>
> I want to hear war stories from those with even larger code bases than
> mine. Has it proven to be a major hassle on large projects to avoid
> circular dependencies in the modules? Are the lack of debugging
> tools, documentation tools, and refactoring tools holding you back?
> Anyone miss static typing?
Again using name spaces/individual projects here is the key to avoid circular dependencies.
We do not miss static typing at all, in fact we are in the process of
getting rid of the Java code. The goal is to clear this by next fall.
For debugging when it's serious, we use the Eclipse JVM debugger
and look at the Clojure runtime context when needed.
As far as documentation tool we rely on (doc ...) and document our code accordingly.
Since the code ratio versus Java is around one to 10, refactoring is not
a big deal even without the heavy assistance you may get in Java from
your IDE.
>
> One of my main gripes is that some of Clojure's built-ins return
> nonsensical results (or nil), rather than errors, for certain classes
> of invalid inputs. To me, one of the main benefits of functional
> programming is that debugging is generally easier, in large part
> because failures usually occur within close proximity of the flaw that
> triggered the failure. Erlang, in particular, has really promoted the
> idea of "fail fast" as a way to build robust systems. But Clojure's
> lack of a "fail-fast" philosophy has burned me several times, with
> hard-to-track-down bugs that were far-removed from the actual cause.
> The larger my code grows, the more this annoys me, reminding me too
> much of my days tracking down bugs in imperative programs.
Were did you find the link between functional languages and close proximity of
errors ? That's a language design decision. You may want to use assertions
on your fns to validate inputs. That sould improve your ability to track errors
before they carry things too far from the spotwhere it failed.
I would not trade this for systematic exception reporting.
>
> One specific example of this is get, which returns nil whenever the
> first input isn't something that supports get. For example, (get 2 2)
> produces nil. This becomes especially problematic when you pass
> something to get that seems like it should support get, but doesn't.
> For example, (get (transient #{1}) 1) produces nil, when there's
> absolutely no reason to think that (get (transient #{1} 1) would
> behave any differently from ((transient #{1}) 1).
>
The choice was made not to throw exceptions. Agree, it may feel frustrating
at the beginning. That's a choice that accommodate others while frustrating the
other half.
For your specific case, the first arg does not support the interface that get expects,
however you may do this:
(get 1 1 "WHATTHE...")
The third parm is the "not found" value. That may shed some light if your code starts to carry this value
elsewhere. Or add assertions to your fns or create a wrapper fn.
As for transient sets, pretty sure this is a bug in 1.2.1:
user=> (get (transient {:a 1}) :a)
1
user=> ((transient {:a 1}) :a)
1
user=> (get [1] 0)
1
user=> (get (transient [1]) 0)
1
user=> (get #{1} 1)
1
user=> (get (transient #{1}) 1)
nil <--- Oups...
user=>
Dunno if it is fixed in 1.3, no time to play with it these times.
--
Luc P.
================
The rabid Muppet
Since I mostly work with 50-100kloc projects, I think 5-10kloc
projects are kinda small :)
Given the compression ratio between Clojure and other languages, I
have to say that I'm not very worried about dealing with 10kloc of
Clojure. It was reassuring to see comments about Emacs being 3 million
lines of code.
--
Sean A Corfield -- (904) 302-SEAN
An Architect's View -- http://corfield.org/
World Singles, LLC. -- http://worldsingles.com/
Railo Technologies, Inc. -- http://www.getrailo.com/
"Perfection is the enemy of the good."
-- Gustave Flaubert, French realist novelist (1821-1880)
At World Singles we moved to 1.3 pretty much as a matter of course,
mostly because I'm used to "planning for the future" and trying to
work with (b)leading edge builds. When I was at Macromedia, I pushed
hard for us to take prerelease versions of our own products live so
that we could get early real world feedback on them. Since then I've
always tried to work with the latest version of tools because that
brings both the best set of features as well as allowing more
influence and more input on tools - and the feedback is useful to the
projects.
The biggest problem has been 3rd party libraries being slower to move
to 1.3. I was pleased to see Chas Emerick's tweet about this recently,
because he's working hard to ensure all the libraries referenced in
his book are all up to date. At World Singles, we're using CongoMongo
and I approached that team and they were very open about changes to
enable it to run on 1.2 and 1.3. More recently we wanted to use
clojure-csv and, again, the folks behind that were keen to get
compatible with 1.3. Both libraries are working great for us on 1.3
now.
Overall, whilst there's clearly going to be a lot of churn getting
everyone up to 1.3, I think it's still early in Clojure's cycle and we
should all be a bit more aggressive about getting on to the latest
version.
--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clo...@googlegroups.com
Note that posts from new members are moderated - please be patient with your first post.
To unsubscribe from this group, send email to
clojure+u...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
My point was that I'm running into interesting questions even with a small program. The answers are not obvious to me. There's evidence I'm not alone, so those to whom the answers *are* obvious would help the community by describing them.
* An example: organizing code into namespaces (skippable)
I was uncertain that Midje's "sweet" (syntactically sugared) interface would catch on, so I organized it by translation layers. I wrote the "unprocessed" layer first; it had functions that worked solely on maps. The "semi-sweet" layer provided macros that introduced some useful conventions but had only one syntactic innovation. It was easy to translate the `expect` and `fake` macros into "unprocessed" function calls on maps. Then I added the "sweet" layer that has a considerably more ambitious set of macros that translate `facts` into `expects` and `fakes`.
As time went on, I pulled out utility functions into namespaces like [midje.util thread-safe-var-nesting laziness file-position]. But that organization failed. When I divide things up into files, I want the division such that I usually find things in the first place I look. That wasn't happening.
So I started migrating to an organization based on verbs (this is a functional language, right?). So I have namespaces like [midje.midje-forms recognizing translating building]. Two problems: 1) New features require recognizing, translating, and building, so all the hopping around files was annoying. 2) The functions didn't fall into such clear-cut categories that I could reliably find things in the first place I look. (Unsurprising, since clear-cut categories are rare in nature: http://www.exampler.com/testing-com/writings/pnsqc-2005-communication.pdf)
Now I'm moving toward an organization around nouns, which feels a bit too OO to me, but at least I'm far enough in the project that the key concepts/nouns are likely to stay stable.
This progression feels a lot more wasteful than it would have been in Java (which has IDE support) or Ruby (which lets you mention a file once and have it be available throughout the program). So I'd have preferred to get it (more) right in the first place.
* What would help
It'd be useful for people happy with their multi-namespace codebases to volunteer them as exemplars. What's grouped together and why? What are the dependencies? How'd you arrive at this structure? A really interesting thing to do would be to implement a feature and narrate how you decide where to put things, where existing things must be, and so forth. [I spend a fair amount of time parachuting into projects and learning the code structure by pairing. Works pretty nicely.]
I don't think there are obvious answers to most questions around large
programs. If those answers were obvious, we wouldn't have shelves full
of books talking about how to tackle the problems of large scale
software development :)
FWIW, at World Singles, we have namespaces for high-level concerns -
config, data, interop, logging - and nested namespaces either for
implementation (indicating only intended to be used from the API
namespace, e.g., worldsingles.config.impl.* files are only used by
worldsingles.config.* files) or specialization / layering, much like
you describe in midje (e.g., we have worldsingles.data.crud for a
high-level CRUD API for persistence that is exposed to our non-Clojure
code and worldsingles.data.crud.core which implements it and is
intended to be used elsewhere in our Clojure code). I expect we'll add
namespaces for more of our business concerns as our use of Clojure
expands: worldsingles.membership, worldsingles.search and
worldsingles.commerce are probably the three most obvious candidates
right now.
We're in an unusual place, I suspect, since we're inherently polyglot
so our top-level namespaces contain code we expose to non-Clojure code
and nested namespaces contain code we use internally within our
Clojure code. That said, I wouldn't be surprised if we refactored
extensively as our Clojure codebase grows larger and larger.
> This progression feels a lot more wasteful than it would have been in
> Java (which has IDE support) or Ruby (which lets you mention a file
> once and have it be available throughout the program). So I'd have
> preferred to get it (more) right in the first place.
Have you tried Slamhound? http://technomancy.us/148
It allows you to rebuild your ns clauses based on searching the
classpath for the vars that are referenced. I wrote it because we were
going through similar pains at work: shuffling things around in order to
improve modularity while trying not to break things. Basically it lets
you move a given defn and then it can automate the modifications needed
to ns forms to deal with the move. Of course, if you're renaming
functions or splitting them up it won't help, but it eased a lot of the
pain we were having.
-Phil
Sorry if I wasn't clear about this. One time I was rereading a book
about the art of debugging (I think it was this book:
http://www.amazon.com/Why-Programs-Fail-Second-Systematic/dp/0123745152),
and realized that the main theme of the book is that the #1 reason
that debugging is hard is that most bugs result from some sort of
mutation of state in one part of your code that inadvertently violates
some assumption or invariant you had in your mind. But your program
doesn't crash right away, it keeps quietly chugging along with that
corrupted state until some completely separate portion of your program
tries to do something with that data that no longer makes sense and
KA-BOOM. But the line your debugger shows you just shows you where
the crash happened; it can't show you the series of steps that led to
the corruption of state that actually caused the crash. Thus, you
need to do a lot of detective work and step through the program. This
is precisely why, for example, most programmers will gladly pay the
performance penalty for bounds-checking on array reads and writes --
it's incredibly valuable to have your program crash where the problem
actually occurs, rather than continuing for a while with spurious
values or corrupted memory and getting a delayed crash with no clear
connection to the cause.
I had a personal a-ha moment when I read that, which made me realize
that one of the reasons I enjoy functional programming so much more is
that this class of bug just doesn't happen. Generally speaking,
crashes have good locality with respect to the flaw in the code that
causes them because there's no "state" to get corrupted and eventually
cause a delayed crash.
Of course, often the hardest bugs of all to find are the ones that are
the result of deep logical flaws. The program may be an exact
implementation of what you had in mind, but what you had in mind
doesn't quite accomplish what you expected it to.
And that's the problem I have with some of Clojure's core functions --
they can turn a blatant mismatch (between a function's input
requirements and the inputs that actually get passed) into a deep
logical flaw. The get example I raised is a perfect example of this.
When I passed a transient set to a function that used get, I
reasonably assumed that transient sets implement whatever interface
get requires. But rather than raise an error because the object
didn't support the desired interface, get just returned nil -- which
is the exact same value that is returned in ordinary usage when you
test whether something is in the set and it isn't! So now, I have
sets that are quietly being passed around, and returning sensible
values but behaving as if they don't have any elements. What should
be an easy bug has turned into a deep logical flaw in my program.
Everything appears to be working, but my program generates completely
bogus outputs because at some stage of its processing it tested for
membership in a set and got back nil for something that was actually
in the set. This is the kind of thing that is a real nuisance to
track down, requiring detailed detective work and a careful analysis
of the entire chain of logic to find the spot where things actually go
wrong. Given that get creates the illusion of working even when it
doesn't, I fail to see how a pre or post condition in my own code
could have picked up on this or validated the input, short of having a
deep understanding of all the interfaces required by every core
function and testing every input explicitly for support of those
interfaces (in which case, I might as well be using a statically typed
language).
Yes, and this is IMHO driven by the fact that there is less
dependencies between two functions in a namespace than two methods in
a class (which may share state via the instance).
One problem with scaling up namespaces, though, is that ongoing
"invalid constant tag 32" issue with big enough input files (see other
thread). For now, until it's fixed, there's an effective size cap on
namespaces that is hit at around 1kloc (typically no more than a few
hundred functions).
--
Protege: What is this seething mass of parentheses?!
Master: Your father's Lisp REPL. This is the language of a true
hacker. Not as clumsy or random as C++; a language for a more
civilized age.
I've also started leaning toward that approach. At first I tended to
:use clojure.* namespaces and :require our own code with aliases but
now I'm moving more to :require on all namespaces, often without an
alias (on short ns names) and then using the long form in calls. In
other words, only using an alias if it really cleans up the code (one
tooling deficiency I noticed is that CCW won't recognize clojure.*
namespace functions if you use an alias and I find the color-coding is
worth more than the conciseness of the code).
> In general, I have found that namespaces should be larger than my OO
> intuition would have them be.
I'm beginning to find that. At first I was creating namespaces much as
I would have for classes but that soon produced long (ns) forms
requiring all the small namespaces so I backed off to less granular
namespaces and I'm finding that easier to manage.
I just saw Ken's note come in about "invalid contant tag 32" and
looking at the threads behind that, it looks like folks hit it when
they have "large" files but I'd be concerned about any single
namespace-based-API that grew that large - I would have expected to
break it down into a "public" API and a "private" implementation
namespace before files got that large. I guess it will be interesting
to see how this pans out as Clojure adoption continues to grow...
maybe that limitation should be endorsed and the compiler could issue
an "Error: your namespace is too big - please modularize your code!"
message as a way to keep namespaces to a maintainable length... :)
On Tue, Jul 5, 2011 at 9:01 AM, Stuart HallowayOne problem with scaling up namespaces, though, is that ongoing
<stuart....@gmail.com> wrote:
> In general, I have found that namespaces should be larger than my OO
> intuition would have them be.
"invalid constant tag 32" issue with big enough input files (see other
thread). For now, until it's fixed, there's an effective size cap on
namespaces that is hit at around 1kloc (typically no more than a few
hundred functions).
I have no idea. But clojure.core is hardly typical; for one thing, it
loads during rather than after bootstrap. Possibly AOT-compiled
namespaces don't have the problem, or have it at larger sizes, than if
JIT-compiled with load-file. It *is* interesting that core.clj is over
200k and doesn't fail whereas others have reported the error happening
consistently for any .clj file exceeding exactly 64k. Perhaps there is
some way to make larger .clj files palatable that could be used by us
normal folk, though if so it isn't obvious what.
So far, the Clojure culture has strongly encouraged a sense for
particular idiomatic coding conventions for most common tasks; so
hopefully "10 different coding styles competing with one another"
won't be the sort of issue it might be if you were using, say, Common
Lisp.
for those of you who
(a) find such reports tantalizing
and
(b) don't totally dislike static typing, especially when done with inference
i suggest checking into things in the ML family of languages. i found
that when i used SML it gave me the same amazing feeling only more so.
sincerely.
$0.02.
I agree that namespaces should be designed to be consumed, but that can be pretty taxing on the developer. In my libraries, I tend to split the functions into whatever sub-namespaces I want to keep the organization easy for me, and then import all the functions I want to expose into a higher-level namespace.For example, in Aleph I have HTTP functionality implemented in aleph.http.client, aleph.http.server, aleph.http.websocket, etc. but all the useful functions are gathered together into aleph.http. This means that I don't have to navigate a monolithic namespace, but the users of my library don't have to declare a dozen namespaces to get anything done. I find this approach scales for me pretty well, and I haven't heard any complaints from the people using my libraries about the organization.
In case you're not familiar with these (not saying they're full-featured):
https://github.com/pallet/ritz
http://www.youtube.com/watch?v=d_L51ID36w4
https://github.com/tcrayford/clojure-refactoring
Scott