[groovy-user] Groovy for data science, a killer application for Groovy?

141 views
Skip to first unread message

dracodoc

unread,
Jan 4, 2015, 12:18:45 PM1/4/15
to us...@groovy.codehaus.org
I've been using Groovy for several years for my side projects and love
Groovy. However Groovy is still not popular enough.

Recently I looked at the very hot Data Science/Big Data field, found there
could be a great opportunity for Groovy. Python is currently very popular in
that field, because it is easy to explore/prototype with python, process
data. There are many good library support for Python now. I think Python do
have some limitation, one is Python 3 compatibility, another is that people
often need to move to other language to build larger scale systems after the
prototyping phase.

I believe Groovy could be a much better language for Data Science compared
to Python, because
1. Groovy have all the dynamic language features of Python or more. It could
be interactive interpreted too.
2. Web support, visualization could be built with Grails base.
3. All the JVM languages could be integrated more easily, the transition to
larger scale production system later will be much easier. If doing right,
the transition could be minimized with static typing, performance tuning
etc.

The only disadvantage is the library support, which is vital for ecosystem
and language adoption. There are many Java libraries available though, so
the library support problem is not very critical.

I think Groovy developers could look at the possibility to explore in this
direction. The first thing I thought will be very useful is a notebook style
environment Groovy interpreter. IPython is a great example which embed text,
scripts, visualization, language integration. R studio have Shiny and other
tools to support so called Reproducible Research. Groovy have great
potential in this direction. There is a IPython based product Beaker also
support multiple languages including Groovy.





-----
http://dracodoc.blogspot.com/
--
View this message in context: http://groovy.329449.n5.nabble.com/Groovy-for-data-science-a-killer-application-for-Groovy-tp5722061.html
Sent from the groovy - user mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe from this list, please visit:

http://xircles.codehaus.org/manage_email


Mark Fortner

unread,
Jan 4, 2015, 5:41:45 PM1/4/15
to us...@groovy.codehaus.org

I couldn't agree more. You have libraries like POI that can extract data from spreadsheets, Apache Commons Math for a lot of the typical mathematical analyses and JavaFXs charting library or JFreeChart to render the results.  The Groovy Console is a passable tool for some basic scripting tasks and might make a decent starting point. I saw a demo several years ago of embedding executable groovy code in an Open Office presentation.

Anyway, the topic might make a good subject for some blog posts.

Mark

Søren Berg Glasius

unread,
Jan 4, 2015, 11:50:13 PM1/4/15
to Groovy User
It could potentially also make a good topic for conference presentations. GR8Conf cfp is till open: http://cfp.gr8conf.org



Best regards / Med venlig hilsen,
Søren Berg Glasius

40 Stevenson Ave, Berkeley, CA 94708
Mobile: (+1)510 984 8362, Skype: sbglasius
--- Press ESC once to quit - twice to save the changes.

Jim White

unread,
Jan 5, 2015, 1:18:58 AM1/5/15
to us...@groovy.codehaus.org
I agree that Groovy is a great language for scientific programming and have been using it for that purpose for years.  The reason I adopted Groovy back at 1.0 was to develop IFCX Wings, a literate scripting tool for any JVM-based language.  More recently I've been developing Gondor, a Groovy DSL for HTCondor DAGman workflows (compute intensive batch systems, machine learning for NLP in my case).  I also gave solutions in Groovy when I taught Computational Linguistics Fundamentals at the University of Washington as well as use in talks at user groups about my research.

Some others using Groovy for scientific work that I know of are Paolo Di Tommaso's Nextflow (http://www.nextflow.io/) which is used primarily in bioinformatics research but (like Gondor) is a general purpose tool for making batch processing on grids Groovy.  Stergios Papadimitriou is developing GroovyLab (https://code.google.com/p/jlabgroovy/wiki/SciLabInGroovyLab) and the Groovy web site has an example using JScience.  I'm sure there are more that I don't know about.

That said, I don't think there is an especially more or less compelling case for Groovy in science than in any other domain.  The challenges remain the same and I don't see anything that is going to shift its adoption rate in a big way.  For example, a lot of scientific computing is done in academic environments where Python is quite strong (and has displaced Java in many institutions) and the domain-specific language is often R or Matlab.

Having said *that*, I do agree that blogging and doing more presentations that feature Groovy at work in science and machine learning applications is the way to get the word out and raise Groovy's profile.  As I say, I've been doing that for my part (including starting a new blog: http://jimwhite.github.io/ which will be getting some Groovy content RSN) and may have won a few converts but no signs of any big waves building (yet).

Jim

dracodoc

unread,
Jan 5, 2015, 9:35:41 AM1/5/15
to us...@groovy.codehaus.org
I love many features of Groovy as a language, which are not found in others.
However it took much more than just language features to be successful,
especially a good marketing pitch and position in good time. After some
time, the library support and ecosystem will be the key factors.

I found Groovy is the best option if I only need core language features and
some Java libraries. For scientific programming, my limited experience with
Java libraries on numerical computing was not very smooth. I ported a medium
sized matlab script to Java/Groovy, so I need to find Java equivalent of
matrix computing, linear regression, chart etc. I found that:
1. Using Groovy SwingBuilder and Miglayout to write a medium complexity GUI
is not hard, although there is very little documentation on this specific
topic, I have to explore by myself and only succeeded after I had deeper
understanding on how SwingBuidler worked. Though I guess this path will not
be chosen by most newcomers because they may want a GUI designer and afraid
of the learning curve of Miglayout.
2. The apache commons math library is not easy to use. V3 documentation is
not complete and changed a lot from V2, I have to use V2 at last. The design
of API often conflict with your intuition and change randomly in different
places. I also need to search and use different libraries just to replicate
one function in Matlab. At last I used several math libraries.
3. Jfreechart is very mature and have good documentation. However it is not
actively updated, maybe it doesn't need major update.
4. I tried GroovyLab but it doesn't really solve my problems.

All these situations may look not as shining as the new options of python,
Javascript web visualization etc.

That being said, what I meant is actually a little different from scientific
programming.

There are this new Data Science/Big Data hype and classic numeric computing
in science. They are pretty different in the tasks and requirements. However
I think more and more scientists need some Data Science tasks now. They have
much more experimental data to import, clean, process. They can use many
advanced statistical methods and machine learning methods relatively easily
thanks to all kinds of new libraries. There are numerous documentation and
tutorial available. So Data Science is not just for internet companies,
actually everybody can run some data analysis quickly with free tools and
open data.

I believe Groovy could have a much better position in this trend.






-----
http://dracodoc.blogspot.com/
--
View this message in context: http://groovy.329449.n5.nabble.com/Groovy-for-data-science-a-killer-application-for-Groovy-tp5722061p5722073.html

Jochen Theodorou

unread,
Jan 5, 2015, 11:16:58 AM1/5/15
to us...@groovy.codehaus.org
The problem in all this is knowing what is needed and liked.
... more inline:

Am 05.01.2015 15:34, schrieb dracodoc:
[...]
> 1. Using Groovy SwingBuilder and Miglayout to write a medium complexity GUI
> is not hard, although there is very little documentation on this specific
> topic, I have to explore by myself and only succeeded after I had deeper
> understanding on how SwingBuidler worked. Though I guess this path will not
> be chosen by most newcomers because they may want a GUI designer and afraid
> of the learning curve of Miglayout.

Ok, point (1), a data science application needs something to easily
build guis with... what does Python provide here?

> 2. The apache commons math library is not easy to use. V3 documentation is
> not complete and changed a lot from V2, I have to use V2 at last. The design
> of API often conflict with your intuition and change randomly in different
> places. I also need to search and use different libraries just to replicate
> one function in Matlab. At last I used several math libraries.

point (2) people coming from Malab would need some kind of API that is
quite similar to what Matlab provides. commons math seems not to be
enough... You say that is mostly because documentation is not so well..
is http://commons.apache.org/proper/commons-math/userguide/index.html
not good?

> 3. Jfreechart is very mature and have good documentation. However it is not
> actively updated, maybe it doesn't need major update.

Haven't used it in the past either... when I needed charts of data it
was mostly done with gnuplot ;)

> 4. I tried GroovyLab but it doesn't really solve my problems.

here it would be good if you could tell us why.

[...]
> That being said, what I meant is actually a little different from scientific
> programming.
>
> There are this new Data Science/Big Data hype and classic numeric computing
> in science. They are pretty different in the tasks and requirements. However
> I think more and more scientists need some Data Science tasks now.They have
> much more experimental data to import, clean, process. They can use many
> advanced statistical methods and machine learning methods relatively easily
> thanks to all kinds of new libraries. There are numerous documentation and
> tutorial available. So Data Science is not just for internet companies,
> actually everybody can run some data analysis quickly with free tools and
> open data.
>
> I believe Groovy could have a much better position in this trend.

Yeah, I would like to improve Groovy here to make it a good alternative
to Python. Too bad nobody will pay me for that ;)

bye blackdrag

--
Jochen "blackdrag" Theodorou - Groovy Project Tech Lead
blog: http://blackdragsview.blogspot.com/
german groovy discussion newsgroup: de.comp.lang.misc
For Groovy programming sources visit http://groovy-lang.org

dracodoc

unread,
Jan 5, 2015, 11:38:36 AM1/5/15
to us...@groovy.codehaus.org
Sorry if I didn't make myself clear. My original post is that Groovy could
have a opportunity in the new Data Science/Big Data hype. This is actually
quite different from scientific computing.

My second post talked about my experience in scientific computing with
Groovy, which was to respond to earlier post about this. However my original
point is not about classic scientific computing, you can actually discard my
comments about the matlab program port experience, which is not very
relevant to the original discussion.

I didn't talk too much about Data Science/Big Data in my original post
because there are too many resources about them available. As I said before,
it is used in all kinds of companies, and everybody including scientists can
use some Data Science tools. I didn't intend to suggest Groovy to replace
Python, which is impossible with existing libraries and ecosystems.

My ideas are simple: Python enjoyed lots of growth of interests from Data
Science/Big Data, Groovy could use this opportunity too.

Actually I think Pivotal provides some Data Science service, so actually
it's not impossible to find some support from the company. Of course all
these are just guess of an outsider, it could well just be wishes.



-----
http://dracodoc.blogspot.com/
--
View this message in context: http://groovy.329449.n5.nabble.com/Groovy-for-data-science-a-killer-application-for-Groovy-tp5722061p5722076.html
Sent from the groovy - user mailing list archive at Nabble.com.

Russel Winder

unread,
Jan 5, 2015, 6:40:09 PM1/5/15
to us...@groovy.codehaus.org
I hate to be a damper on this and all the other supportive posts but
wishing things to be true will achieve nothing. Data science is full of
PhD statistics folks who like R or perhaps
Python/SciPy/Matplotlib/Pandas. Trust me I run workshops for these folk.
Currently "data science" is an R, Python, Julia, place mostly because
the infrastructure is already there and everyone already uses R, Python,
Julia.

The core issue is that R, Python, Julia already have the infrastructure
for analysing and (more importantly) visualizing data and algorithms
over it. I am sure JVM and Groovy can do this, but it doesn't have the
systems these folk use today.

The core issue is really that the frameworks require seriously fast
computation frameworks and these are available now in Fortran and C++
but not in Java and the JVM is rubbish at making use of Fortran and C++
libraries.

For anecdotal evidence, the London PyData meeting this month has 200+
people turning up, there is no JVM-based equivalent even scheduled.

Data science is not big data. Big data isn't really doing anything
sophisticated. If JVM and Groovy is to make inroads on the data science
activity it needs to find a new milieu to bring to the community rather
than trying to compete with the extant situation.

I do not have an explicit suggestion even though I have "skin in the
game".

--
Russel.
=============================================================================
Dr Russel Winder t: +44 20 7585 2200 voip: sip:russel...@ekiga.net
41 Buckmaster Road m: +44 7770 465 077 xmpp: rus...@winder.org.uk
London SW11 1EN, UK w: www.russel.org.uk skype: russel_winder
signature.asc

Jochen Theodorou

unread,
Jan 6, 2015, 2:54:38 AM1/6/15
to us...@groovy.codehaus.org
Am 06.01.2015 00:38, schrieb Russel Winder:
> I hate to be a damper on this and all the other supportive posts but
> wishing things to be true will achieve nothing. Data science is full of
> PhD statistics folks who like R or perhaps
> Python/SciPy/Matplotlib/Pandas. Trust me I run workshops for these folk.
> Currently "data science" is an R, Python, Julia, place mostly because
> the infrastructure is already there and everyone already uses R, Python,
> Julia.

That's why I was trying to figure out what is missing. If nothing is
offered nothing will happen. But if the keypoint here is Fortran and C++
computation frameworks, then things look indeed bad. Funny thing is that
I did talk with Cedric about native code binding just a few weeks ago
and that I was wondering if it is not possible to provide something
better than what the JVM has today... basically I came to similar things
that caused the creation of http://openjdk.java.net/jeps/191 quite a
while ago. But not sure that is enough.... I really liked back then the
integration of native code done by gcj... but even that I think is not
suitable here

Anyway, I wanted to see what people come up with before I try to spread
some realism ;)

[...]
> Data science is not big data. Big data isn't really doing anything
> sophisticated.

well... I guess this is a kind of dispute... data science is for example
not machine learning... but enough data scientists use machine learning.
Also data science is not statistics... but frankly, what would data
science be without it.. at the same time... what is big data without
data science? They might not be the same, but I see strong connections.

bye blackdrag

--
Jochen "blackdrag" Theodorou - Groovy Project Tech Lead
blog: http://blackdragsview.blogspot.com/
german groovy discussion newsgroup: de.comp.lang.misc
For Groovy programming sources visit http://groovy-lang.org


Dylan Cali

unread,
Jan 6, 2015, 5:24:27 AM1/6/15
to us...@groovy.codehaus.org
On Tue, Jan 6, 2015 at 1:57 AM, Jochen Theodorou <blac...@gmx.org> wrote:
> Am 06.01.2015 00:38, schrieb Russel Winder:
> But if the keypoint here is Fortran and C++ computation
> frameworks, then things look indeed bad. Funny thing is that I did talk with
> Cedric about native code binding just a few weeks ago and that I was
> wondering if it is not possible to provide something better than what the
> JVM has today...

Is JNA not an option here? I've had good success using it to leverage
native libraries, and it is much, much easier to get going with than
JNI.

Jochen Theodorou

unread,
Jan 6, 2015, 7:50:53 AM1/6/15
to us...@groovy.codehaus.org
Am 06.01.2015 11:19, schrieb Dylan Cali:
> On Tue, Jan 6, 2015 at 1:57 AM, Jochen Theodorou <blac...@gmx.org> wrote:
>> Am 06.01.2015 00:38, schrieb Russel Winder:
>> But if the keypoint here is Fortran and C++ computation
>> frameworks, then things look indeed bad. Funny thing is that I did talk with
>> Cedric about native code binding just a few weeks ago and that I was
>> wondering if it is not possible to provide something better than what the
>> JVM has today...
>
> Is JNA not an option here? I've had good success using it to leverage
> native libraries, and it is much, much easier to get going with than
> JNI.

well, you want easy usage and good performance. JNA performs even worse
than JNI.

bye blackdrag

--
Jochen "blackdrag" Theodorou - Groovy Project Tech Lead
blog: http://blackdragsview.blogspot.com/
german groovy discussion newsgroup: de.comp.lang.misc
For Groovy programming sources visit http://groovy-lang.org


sterg

unread,
Jan 6, 2015, 8:46:01 AM1/6/15
to us...@groovy.codehaus.org

Hi all,

from my experience pure Java performs generally better than
unoptimized C, and optimized C is slightly only faster.

In GroovyLab, matrix multiplication is multithreaded ,
for Linux it uses native BLAS and only MATLAB offers higher speed,
for Windows however native routines do not work well,
and I use pure Java multithreaded matrix multiplication.


.. today I tried scavis, which is Jython based,
I don't liked the Python programming style and it doesn't seem fast
either ..

Regards

Stergios

On 01/06/2015 02:43 PM, Jochen Theodorou wrote:
> Am 06.01.2015 11:19, schrieb Dylan Cali:
>> On Tue, Jan 6, 2015 at 1:57 AM, Jochen Theodorou <blac...@gmx.org>
>> wrote:
>>> Am 06.01.2015 00:38, schrieb Russel Winder:
>>> But if the keypoint here is Fortran and C++ computation
>>> frameworks, then things look indeed bad. Funny thing is that I did
>>> talk with
>>> Cedric about native code binding just a few weeks ago and that I was
>>> wondering if it is not possible to provide something better than
>>> what the
>>> JVM has today...
>>
>> Is JNA not an option here? I've had good success using it to leverage
>> native libraries, and it is much, much easier to get going with than
>> JNI.
>
> well, you want easy usage and good performance. JNA performs even
> worse than JNI.
>
> bye blackdrag
>


dracodoc

unread,
Jan 6, 2015, 9:40:07 AM1/6/15
to us...@groovy.codehaus.org
Actually Julia is a perfect example. Julia is very new, created by a small
group, without much libraries support.

Of course Groovy will be a very different case from Julia, however I think
in language aspect, Groovy have both advantages and disadvantages compared
to Julia. In the case of library support, Groovy have actually much better
potential than Julia.

If somebody told you in 2012 that they will create Julia from scratch in an
already crowded field, without any library support, do you believe they will
succeed?



-----
http://dracodoc.blogspot.com/
--
View this message in context: http://groovy.329449.n5.nabble.com/Groovy-for-data-science-a-killer-application-for-Groovy-tp5722061p5722090.html

Paolo Di Tommaso

unread,
Jan 6, 2015, 2:01:45 PM1/6/15
to us...@groovy.codehaus.org
In  general I would agree with Russel: data science or, more in general, scientific computing is a very competitive field where it is difficult to enter. 

But it is also true this is a very broad area that includes many different requirements. 

The people who need the top performance generally use super-computer and program in C/C++/Fortran/MPI. 

Then there's a vast area in which performance is not critical whilst is given more importance on problem modelling or to application fast prototyping. Here most of the people use Perl, Python, R and, more recently, Julia, because they are easy to use and (almost) portable. 

The scientific community made a huge investment to develop or integrate data analysis and visualisation libraries, above all for Python, so it's very difficult that they switch to a different environment without a clear benefit. 

Also for this class of applications it is very important the ability to compose them within other scripts or simply by using the Unix command line pipe. 

Here, unfortunately, the slow bootstrap time of Groovy runtime represents a serious handicap, because it makes inefficient to invoke Groovy scripts from another tool or from the command line in a timely manner (and it makes *perceive* it as a slow). In my opinion resolving this, it would be much easier to propose Groovy as an alternative scripting language in the data science community.


However this does not mean that Groovy cannot be used with profit for data science. Indeed, together with Gpars, they provide a compelling set of features valuable in range of scientific applications. 

As said by Jim in a previous email, I'm using both of them in Nextflow, that is a DSL for data analysis pipelines, with a good feedback. http://www.nextflow.io

Also, though Python is the reference language in this field, it is also true that out there exists plenty of libraries/resources (for ML, big data, genomics, etc) that run on the JVM and for which Groovy is, by definition, a perfect match for the reason we know.



Cheers,
Paolo

Dylan Cali

unread,
Jan 11, 2015, 4:54:05 PM1/11/15
to us...@groovy.codehaus.org
On Tue, Jan 6, 2015 at 12:59 PM, Paolo Di Tommaso
<paolo.d...@gmail.com> wrote:
> Also for this class of applications it is very important the ability to
> compose them within other scripts or simply by using the Unix command line
> pipe.
>
> Here, unfortunately, the slow bootstrap time of Groovy runtime represents a
> serious handicap, because it makes inefficient to invoke Groovy scripts from
> another tool or from the command line in a timely manner (and it makes
> *perceive* it as a slow). In my opinion resolving this, it would be much
> easier to propose Groovy as an alternative scripting language in the data
> science community.

This is probably a silly idea, but thought I'd throw it out there:
could the bootstrap time be addressed by using a 'daemon' approach,
similar to how the Gradle daemon is used to speed up launching Gradle
build scripts?

Anther Astimony

unread,
Jan 11, 2015, 6:03:31 PM1/11/15
to us...@groovy.codehaus.org
This is probably a silly idea, but thought I'd throw it out there:
could the bootstrap time be addressed by using a 'daemon' approach,
similar to how the Gradle daemon is used to speed up launching Gradle
build scripts?


Already accomplished by the GroovyServ project:

http://kobo.github.io/groovyserv/

~~~ astimony

Dylan Cali

unread,
Jan 11, 2015, 6:08:14 PM1/11/15
to us...@groovy.codehaus.org
Nice!

Paolo Di Tommaso

unread,
Jan 12, 2015, 5:32:12 AM1/12/15
to us...@groovy.codehaus.org
I think this is not the right solution if we want to propose Groovy as an alternative programming language in the data science community. These are people working most of the time with the command line, and they need to work with tools that are fast and can be composed in an efficient manner. 

For experience, when you propose to evaluate Groovy to some of these guys, and they will notice that it requires 700ms to print an "Hello world", most of them will stop there, because they have the perception that is slow. You won't have a second chance.



Cheers, p

Russel Winder

unread,
Jan 12, 2015, 6:36:00 AM1/12/15
to us...@groovy.codehaus.org

On Mon, 2015-01-12 at 11:28 +0100, Paolo Di Tommaso wrote:
> […]
> won't have a second chance.

s/second //
>
--
Russel.
=============================================================================
Dr Russel Winder t: +44 20 7585 2200 voip: sip:russel...@ekiga.net
41 Buckmaster Road m: +44 7770 465 077 xmpp: rus...@winder.org.uk
London SW11 1EN, UK w: www.russel.org.uk skype: russel_winder


Paolo Di Tommaso

unread,
Jan 12, 2015, 8:22:42 AM1/12/15
to us...@groovy.codehaus.org
:) 


However, I disagree on this. For example in the genomics field, though the most used programming languages are C and Python, there are really important pieces of software that are written in Java. 

In this context Groovy could really play a role, as an easy-to-go scripting alternative to Java, well founded parallelisation libraries, and good performance (not fast as C code, but much better then Python). 


Cheers,
Paolo

 

Russel Winder

unread,
Jan 12, 2015, 9:30:34 AM1/12/15
to us...@groovy.codehaus.org

On Mon, 2015-01-12 at 14:20 +0100, Paolo Di Tommaso wrote:
> :)
>
>
> However, I disagree on this. For example in the genomics field,
> though the
> most used programming languages are C and Python, there are really
> important pieces of software that are written in Java.
>
> In this context Groovy could really play a role, as an easy-to-go
> scripting
> alternative to Java, well founded parallelisation libraries, and
> good performance (not fast as C code, but much better then Python).
>

Java performance often surpasses that of C, C++, D and Fortran. There
are no "always" on this any more. Indeed Python + Numba is as fast as C
as well for many CPU bound activities.

And PyPy runs CPU-bound Python code 5 to 30 times faster than CPython.

Dylan Cali

unread,
Jan 12, 2015, 9:33:21 PM1/12/15
to us...@groovy.codehaus.org
On Mon, Jan 12, 2015 at 8:28 AM, Russel Winder <rus...@winder.org.uk> wrote:
>
> Java performance often surpasses that of C, C++, D and Fortran. There
> are no "always" on this any more. Indeed Python + Numba is as fast as C
> as well for many CPU bound activities.

To follow up on the performance aspect of this conversation, I took
the time to read JEP 191 that Jochen mentioned. Unless I'm missing
something this seems to just be a more streamlined, first class libffi
interface. I'm confused about the performance claims they make, since
ffi calls have a definite overhead.. do they just mean it will perform
better than the 'bolt on' libffi JNI interfaces currently being used
(JNA/JNR) ?

To use libffi everywhere that JNI is currently being used seems...
dubious. Sometimes you need 'real' native level bindings, which is
why Python has both ctypes and native extensions after all.

It seems like there should be two JEPs, one for a first-class JVM FFI
sure, but also one focused on improving (or replacing) JNI itself...

Jochen Theodorou

unread,
Jan 13, 2015, 5:30:13 AM1/13/15
to us...@groovy.codehaus.org
In the end, what makes JNI slow is imho the conversion of objects, call
conventions and primitives conversion.... well and of course there is
the issue of multithreading

libffi would probably be able to address the part of call conventions.
What stays is the problem of converting objects... though... if we
degrade the JVM to a mere frontend, then I think we can do here a lot
with invokedynamic and work "directly" on the native code generated
object or struct. Again ffi helps here.

Any object and primitive from Java that you would maybe use to talk to
the backend would maybe require conversion... I think primitives won't
hurt too much - though doing a complete conversion is something to think
about, but Strings for example can be a problem. I am also thinking
about for example an unsigned 32bit value. Would you keep that in an
integer and live with the number appearing wrong to avoid overhead for
conversions? And let us not forget that there are numbers with more than
64 bit too.

And of course there is a lot more...

bye Jochen
--
Jochen "blackdrag" Theodorou - Groovy Project Tech Lead
blog: http://blackdragsview.blogspot.com/
german groovy discussion newsgroup: de.comp.lang.misc
For Groovy programming sources visit http://groovy-lang.org


Eric MacAdie

unread,
Jan 13, 2015, 9:09:55 PM1/13/15
to us...@groovy.codehaus.org
To take this in another direction:

Could use by sysadmins be a killer app/niche for Groovy?

Anecdotally, I have noticed there seem to be some shops that have their main app in Java, and their sysadmins/dev-ops people use Ruby or Python. It seems that Groovy would be a good fit for those Java shops.

= Eric MacAdie

Jochen Theodorou

unread,
Jan 17, 2015, 3:39:48 PM1/17/15
to us...@groovy.codehaus.org, Russel Winder
Am 06.01.2015 00:38, schrieb Russel Winder:
> I hate to be a damper on this and all the other supportive posts but
> wishing things to be true will achieve nothing. Data science is full of
> PhD statistics folks who like R or perhaps
> Python/SciPy/Matplotlib/Pandas. Trust me I run workshops for these folk.
> Currently "data science" is an R, Python, Julia, place mostly because
> the infrastructure is already there and everyone already uses R, Python,
> Julia.
>
> The core issue is that R, Python, Julia already have the infrastructure
> for analysing and (more importantly) visualizing data and algorithms
> over it. I am sure JVM and Groovy can do this, but it doesn't have the
> systems these folk use today.
>


coming back to that again.... Did you ever look at projects like for
example renjin (R on the JVM)?

bye blackdrag

Simon

unread,
Jan 22, 2015, 8:07:41 PM1/22/15
to us...@groovy.codehaus.org
I am a bit late to this conversation, but I thought I would mention that I did make a kind of experimental framework for enabling some "groovy"-ish features for data science on top of commons-math:


It's pretty simple but gives a basic "data frames" style framework for Groovy.

As I said, I consider this an "experiment", but I do use it extensively myself as part of a broader bioinformatics package for groovy. Having done so, I tend to agree with others that there are some basic challenges for Groovy and really any JVM based package. Firstly, there are not a wealth of established peer reviewed algorithms for data analysis. There are some, but not nearly what you observe for Python and R. The second problem though is deeper, and it is that the actual data types and computational operations employed by the JVM are not designed to be numerically robust and stable in the way that true numerical computing requires. I run the same algorithm in R and in Groovy and I get different results. Sometimes highly divergent results, more often than not it's either gone to infinity or NaN while R or Python are sailing through producing a reasonable result from the same algorithm. Python achieves this by completely re-inventing data types and operations via Numpy. Essentially we need the same kind of thing for the JVM if we are to be serious in the scientific world.

Cheers,

Simon

Dylan Cali

unread,
Jan 22, 2015, 9:37:48 PM1/22/15
to us...@groovy.codehaus.org
On Thu, Jan 22, 2015 at 7:06 PM, Simon <simon...@gmail.com> wrote:
> The second problem though is
> deeper, and it is that the actual data types and computational operations
> employed by the JVM are not designed to be numerically robust and stable in
> the way that true numerical computing requires. I run the same algorithm in
> R and in Groovy and I get different results. Sometimes highly divergent
> results, more often than not it's either gone to infinity or NaN while R or
> Python are sailing through producing a reasonable result from the same
> algorithm. Python achieves this by completely re-inventing data types and
> operations via Numpy. Essentially we need the same kind of thing for the JVM
> if we are to be serious in the scientific world.

So I have zero knowledge of what goes into the Groovy internals, but
does Groovy have to _only_ be a JVM language? It sounds like the
limitations people are raising are more limitations with the JVM, and
not Groovy itself.

Obviously this is a pretty out-there idea, but if you look at Python,
Ruby, etc... these languages have versions that can run on the JVM, on
the CLR, as well as independent implementations. Is it impossible for
there to ever be a non-JVM version of Groovy?

Yes, being able to use Groovy to leverage the JVM ecosystem is
definitely a killer feature, but IMHO Groovy itself is also a killer
feature :). As part of my job I have to program in a wide variety of
languages.. Perl, Python, Java, Groovy, C#, C, C++, Tcl. But Groovy
is my favorite (hint: it's the only language whose mailing list I'm
on!). The language just makes really sensible design decisions, and
to the credit of the core devs Groovy just continues to get better and
better.

Seamless integration with Java is a bonus (a realy, really big bonus),
but what makes Groovy a pleasure to program in is Groovy itself.

Jochen Theodorou

unread,
Jan 23, 2015, 4:55:13 AM1/23/15
to us...@groovy.codehaus.org
Am 23.01.2015 02:06, schrieb Simon:
[...]
> The second problem though is deeper, and it is that the actual data
> types and computational operations employed by the JVM are not designed
> to be numerically robust and stable in the way that true numerical
> computing requires.

If the problem is understood, we can do something about it. If it is
then fast or not will be another story of course. Normally "true"
numerical computing takes the inexactness of number representations into
account... at least it was like this when I did physics simulations at
the university. If you need "arbitrary precision" then of course you
should not use data types like float or double. BigDecimal would be a
better choice then to get a more stable result.

> I run the same algorithm in R and in Groovy and I
> get different results. Sometimes highly divergent results, more often
> than not it's either gone to infinity or NaN while R or Python are
> sailing through producing a reasonable result from the same algorithm.

Maybe you can show an example?

> Python achieves this by completely re-inventing data types and
> operations via Numpy. Essentially we need the same kind of thing for the
> JVM if we are to be serious in the scientific world.

Letting Groovy understand a new number type should be the smallest
problem, as long as it is 100% clear how it is supposed to behave

bye blackdrag

--
Jochen "blackdrag" Theodorou - Groovy Project Tech Lead
blog: http://blackdragsview.blogspot.com/
german groovy discussion newsgroup: de.comp.lang.misc
For Groovy programming sources visit http://groovy-lang.org


Reply all
Reply to author
Forward
0 new messages