One of the biggest issues we have for 1.3.0 is adoption and a large
part of that in my opinion is shifting people from contrib 1.2.0 to
the new contrib libraries. So Stuart's right that people need to find
the libraries (or be told about them) etc.
In order to solve that problem, I think we need clear communication
(from Clojure/core) about the libraries - which means we first need
consensus on a number of problems and how to solve them:
On discoverability, Clojure is available as a download, and Contrib
used to be available as a download. That's not practical for 30+
individual contrib libraries. That means we have to acknowledge build
tools and we somehow need to tell people which libraries are
available, how to get them into your projects and how to figure out
what versions are available (seems like that should be done
automatically from Maven Central?).
On guidelines for contrib library developers...
For a given new contrib library, how should users expect it to relate
to its stated old contrib counterpart?
e.g., clojure.tools.cli is indicated as a replacement for
clojure.contrib.command-line but it is a completely different library.
For a given old contrib library that has been migrated to new contrib,
how should "dropped" features be documented?
e.g., clojure.core.incubator is claimed to have "contrib.def stuff"
but defnk has not been migrated (because Clojure now supports map
destructuring).
In the past, users expected contrib to have a version number that
lined up with Clojure. In the new world, contrib libraries move
forward at their on pace but what expectations should users have
around versions? Should all new contrib libraries be aiming for 1.0.0
releases to indicate stability? Saying "it's all semantic versioning"
is fine but, really, what will a user think when confronted with
data.finger-tree 0.0.1 and core.logic 0.6.3? (and everything in
between)
There was some discussion that a "first version" of a new contrib
library should be identical to the contrib 1.2.0 version (except for
the namespace) but that's clearly not happening. Should the version
number give any indication of compatibility with old contrib?
What really are the criteria for a library becoming a new contrib candidate?
e.g., clojure.data.csv is cljcsv which seems to be used far less
widely than clojure-csv (which was offered up as a contrib library in
the past) - cljcsv has had no discussion on the list (and only one
thread on the Clojure list mentioned it at all, compared to several
dozen threads mentioning clojure-csv).
e.g., what about clj-http? This seems to be very heavily used but
hasn't been updated since February (and there are active forks with a
lot of updates since). Clearly there's the usual concerns about
copyright and the CA but this seems like a library just crying out for
a place in new contrib.
Finally, are there areas that we feel need a new contrib library to
address, in order to improve the overall richness of the Clojure
platform? I mentioned clj-http above but a nice date/time library
seems like a good candidate too. Should we maintain a list of themes /
areas needing attention?
There are probably other issues not listed here so please pile in!
--
Sean A Corfield -- (904) 302-SEAN
An Architect's View -- http://corfield.org/
World Singles, LLC. -- http://worldsingles.com/
Railo Technologies, Inc. -- http://www.getrailo.com/
"Perfection is the enemy of the good."
-- Gustave Flaubert, French realist novelist (1821-1880)
On discoverability, Clojure is available as a download, and Contrib
used to be available as a download. That's not practical for 30+
individual contrib libraries. That means we have to acknowledge build
tools and we somehow need to tell people which libraries are
available, how to get them into your projects and how to figure out
what versions are available (seems like that should be done
automatically from Maven Central?).
What really are the criteria for a library becoming a new contrib candidate?
> On discoverability, Clojure is available as a download, and Contrib
> used to be available as a download. That's not practical for 30+
> individual contrib libraries. That means we have to acknowledge build
> tools and we somehow need to tell people which libraries are
> available, how to get them into your projects and how to figure out
> what versions are available (seems like that should be done
> automatically from Maven Central?).
Is there any reason not to provide a single jar containing all of
Contrib, like we did before? Contrib was split up into parts in order
to permit more fine-grained dependency handling, but not to enforce
it. Especially for newcomers, a single jar with all of Contrib, or
even a "batteries included" Clojure+Contrib jar, would be much easier
to handle.
However, this does require a decision about the following point:
> In the past, users expected contrib to have a version number that
> lined up with Clojure. In the new world, contrib libraries move
> forward at their on pace but what expectations should users have
> around versions?
I agree that a common policy would help users, but I don't have a good
idea for what would be a good and maintainable scheme.
> What really are the criteria for a library becoming a new contrib
> candidate?
My best guess is that there aren't any. It comes down to a kind of
vote. Still, I don't see this as a major problem for the moment.
Konrad.
What really are the criteria for a library becoming a new contrib candidate?
e.g., clojure.data.csv is cljcsv which seems to be used far less
widely than clojure-csv (which was offered up as a contrib library in
the past) - cljcsv has had no discussion on the list (and only one
thread on the Clojure list mentioned it at all, compared to several
dozen threads mentioning clojure-csv).
+1
Also for rapid prototyping. One doesn't want to bother with
fine-grained dependencies, before knowing the code will actually be
used.
> However, this does require a decision about the following point:
>
>> In the past, users expected contrib to have a version number that
>> lined up with Clojure. In the new world, contrib libraries move
>> forward at their on pace but what expectations should users have
>> around versions?
>
> I agree that a common policy would help users, but I don't have a good idea
> for what would be a good and maintainable scheme.
How about a org.clojure/batteries artefact, whose version number is
kept in sync with the org.clojure/clojure version.
It would depend on the most recent stable versions (at the time of
release) of the contrib projects.
It could also be curated, to only include the more mature projects
under the org.clojure umbrella.
kind regards
--
__________________________________________________________________
Herwig Hochleitner
> One of the biggest issues we have for 1.3.0 is adoption and a large
> part of that in my opinion is shifting people from contrib 1.2.0 to
> the new contrib libraries. So Stuart's right that people need to find
> the libraries (or be told about them) etc.
One thing that would greatly simplify conversion to 1.3.0 would be an
"old-contrib" compatibility lib, containing all the old contrib code that
hasn't made it into new contrib, or that has made it into 1.3 core.
Obviously this would be a fair amount of work to set up and make work with
1.3.0, but if driving 1.3.0 adoption is a priority, this would, I think,
definitely facilitate the process for users.
--
Hugo Duncan
That way you would only need to put two deps in your (for example)
Leiningen project:
:dependencies [[org.clojure/clojure "1.3.0"]
[org.clojure/contrib-all "1.3.0"]]
And you would get all of the libs via dependency. You *could* even
name this project in an identical fashion to the old contrib
(clojure-contrib). I go back and forth on whether that would be
better or worse from helping users understand what's happening.
I imagine that this umbrella project would probably rev frequently
(this could even be made automatic when the contrib projects release
new versions).
Alex
> --
> You received this message because you are subscribed to the Google Groups "Clojure Dev" group.
> To post to this group, send email to cloju...@googlegroups.com.
> To unsubscribe from this group, send email to clojure-dev...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/clojure-dev?hl=en.
>
>
--
You received this message because you are subscribed to the Google Groups "Clojure Dev" group.
To view this discussion on the web visit https://groups.google.com/d/msg/clojure-dev/-/DaM_4fIUHjAJ.
The first thing is, you aren't really calling parse-csv the way it's intended to be called. It takes a CharSequence, but if you use the reader to char-seq adapter, you're creating a list of chars out of a file. Loading the file into memory with slurp and passing that in as a string really improves the speed without changing any code. When I tried your benchmark that way, clojure-csv was only about 4x slower for me.
Still, reading ridiculously large csv files isn't a use case that I've given much thought to, and being able to stream the data in from a reader would be nice, so I converted the code to work off of a reader instead of a char seq (plus a few other tweaks here and there), and your benchmark program turns in best times of 26409msecs for data.csv vs. 37734msecs for clojure-csv (apparently my laptop is less awesome than yours).
So anyways, I bring all this up just to point out that had there been a public discussion about adding csv support to contrib, we could have had an interesting discussion about how to best serve users' needs and how far each of the projects were from the goals.
So anyways, I bring all this up just to point out that had there been a public discussion about adding csv support to contrib, we could have had an interesting discussion about how to best serve users' needs and how far each of the projects were from the goals. As it happens, I signed a CA last summer, anticipating a proposal for contrib after 1.3 was out (it was my understanding, though perhaps I have not paid close enough attention here, that new contrib projects were on hold until 1.3 was done). On the other hand, it's been interesting to hear about a shortcoming that I wasn't aware was turning people off, and now I can back to my users with some big speedups.
I'm handling 200MB files there with half a million lines so I took the
simple approach and it performed well enough for my needs. I very
specifically did not want to load the whole file into memory.
As David notes, performance wasn't a problem here. Flexibility might have been.
The issue, from my point of view, is that there was ZERO discussion on
creating data.csv and of the two CSV libraries, the one that is hardly
used has been promoted over the more heavily used and feature-rich
library:
http://clojuresphere.herokuapp.com/cljcsv
http://clojuresphere.herokuapp.com/data.csv
http://clojuresphere.herokuapp.com/clojure-csv
>> Still, reading ridiculously large csv files isn't a use case that I've
>> given much thought to, and being able to stream the data in from a reader
>> would be nice, so I converted the code to work off of a reader instead of a
>> char seq (plus a few other tweaks here and there), and your benchmark
>> program turns in best times of 26409msecs for data.csv vs. 37734msecs for
>> clojure-csv (apparently my laptop is less awesome than yours).
Sounds like that new version would be a good upgrade for us at World Singles.
> I agree. Maybe we could work together to improve data.csv? It would be nice
> to walk through the feature-set of clojure-csv to see what data.csv lacks.
> If you were given commit access to the data.csv repo we could both enhance
> and maintain the code together with ideas from clojure-csv moved over to
> data.csv. Maybe someone from Clojure/core can make a design page on
> http://dev.clojure.org, then much of the decision making and design
> concerning the project would be public.
Hopefully this situation works out well, but it really does point to a
fundamental flaw in how contrib is being managed that could easily
have been solved by some public discussion on the selection of the
library.
That begs another question for which contrib maintainers need
guidance: right now new contrib must run on Clojure 1.2 and Clojure
1.3 - what does the future compatibility path look like? Will new
contrib always be able to run on Clojure 1.2? If not, what is
acceptable in terms of backward compatibility and how should
maintainers indicate this (or is it a build system responsibility to
tag compatibility somehow for each build?).
That begs another question for which contrib maintainers need
guidance: right now new contrib must run on Clojure 1.2 and Clojure
1.3 - what does the future compatibility path look like? Will new
contrib always be able to run on Clojure 1.2?
One consequence of decoupling the contrib version numbers from clojure proper is that it will not be immediately evident which version(s) of clojure is compatible with a particular version of an individual contrib library.