[ANN] thurber: Clojure on Apache Beam (distributed batch/streaming)

341 views
Skip to first unread message

atdixon

unread,
Jan 21, 2020, 4:10:25 PM1/21/20
to Clojure
Here is thurber (https://github.com/atdixon/thurber) (at early alpha release) that enables Clojure on Apache Beam platforms like Google Dataflow.

thurber's goals include:

- Full support for Beam capabilities
- AOT-less (AOT not required; full dynamic support for serializing functions, including inlined functions, and proxies)
- Macro-less (very few, always optional, macros)
- Performance focus (core optimized for large volume data streaming)
- Idiomatic Clojure focus (Clojure functions are automatically distributable functional transforms, lazy sequences over iterative output, ..)

When coming to Apache Beam and wanting to use Clojure there are a few hurdles to overcome, some discussed here in the past.  Clojure's Java interop commonly falls short in the domain of distributed big data Java platforms (proxies and functions not serializable, no support for generation of generic type signatures, minimal/insufficient support for method annotations, suboptimal dynamic binding performance, etc)

thurber bridges these issues internally, giving a full dynamic/Clojure experience on top of Apache Beam.

(For Onyx users, thurber + Beam meet the same ideals as Onyx on a well-backed platform.)

This is early alpha release and feedback on the API & facilities are welcome.

For the curious, the walkthrough covers most of thurber capability: https://github.com/atdixon/thurber/blob/master/demo/walkthrough.clj

Dominic Parry

unread,
Jan 22, 2020, 8:30:17 AM1/22/20
to 'Dirk Wetzel' via Clojure
Hi!

Congratulations on the library! It makes me super happy when people build clojure libraries for the Google cloud ecosystem. I wanted to draw your attention to datasplash (https://github.com/ngrunwald/datasplash) which has made a start on this. I thought perhaps you could leverage some of it.

Hope you have a great day!
--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clo...@googlegroups.com
Note that posts from new members are moderated - please be patient with your first post.
To unsubscribe from this group, send email to
clojure+u...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
---
You received this message because you are subscribed to the Google Groups "Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email to clojure+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/clojure/c18cc8e1-01c9-4688-bff3-6d50f128d0e4%40googlegroups.com.

Aaron D.

unread,
Jan 22, 2020, 11:21:57 AM1/22/20
to Clojure

Hi Dominic thank you!

Are you maintainer/contrib to datasplash, I would be happy to swap notes, synthesize ideas.

My org looked at datasplash. The biggest dealbreaker for us was datasplash's AOT-orientation; its AOT packaging meant we couldn't float its dependencies, and didn't like the requirement to AOT compile our own code.

With thurber I started w/ this goal to avoid AOT, be highly dynamic in the repl, but was also able to focus on certain performance areas from the bottom-up. thurber also eschews sugared/dsl-ish api for more direct/explicit interop w the Beam SDK leaving this implementation concern to layers above (though I may implement if interest)-- just a different 'opinionated' take here.

My org had been using Onyx for streaming use cases but its original developers have moved on and we were concerned with its long-term viability. Many of the ideals of thurber are consistent with Onyx's and reaching previous Onyx users like ourselves was another line of sight for thurber.

We'd also looked at clj-headlights - this was the other clojure Beam lib we'd surveyed in this space. 

On Wednesday, January 22, 2020 at 7:30:17 AM UTC-6, Dominic Parry wrote:
Hi!

Congratulations on the library! It makes me super happy when people build clojure libraries for the Google cloud ecosystem. I wanted to draw your attention to datasplash (https://github.com/ngrunwald/datasplash) which has made a start on this. I thought perhaps you could leverage some of it.

Hope you have a great day!
On 21 Jan 2020, 23:10 +0200, atdixon <atd...@gmail.com>, wrote:
Here is thurber (https://github.com/atdixon/thurber) (at early alpha release) that enables Clojure on Apache Beam platforms like Google Dataflow.

thurber's goals include:

- Full support for Beam capabilities
- AOT-less (AOT not required; full dynamic support for serializing functions, including inlined functions, and proxies)
- Macro-less (very few, always optional, macros)
- Performance focus (core optimized for large volume data streaming)
- Idiomatic Clojure focus (Clojure functions are automatically distributable functional transforms, lazy sequences over iterative output, ..)

When coming to Apache Beam and wanting to use Clojure there are a few hurdles to overcome, some discussed here in the past.  Clojure's Java interop commonly falls short in the domain of distributed big data Java platforms (proxies and functions not serializable, no support for generation of generic type signatures, minimal/insufficient support for method annotations, suboptimal dynamic binding performance, etc)

thurber bridges these issues internally, giving a full dynamic/Clojure experience on top of Apache Beam.

(For Onyx users, thurber + Beam meet the same ideals as Onyx on a well-backed platform.)

This is early alpha release and feedback on the API & facilities are welcome.

For the curious, the walkthrough covers most of thurber capability: https://github.com/atdixon/thurber/blob/master/demo/walkthrough.clj

--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clo...@googlegroups.com
Note that posts from new members are moderated - please be patient with your first post.
To unsubscribe from this group, send email to

For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
---
You received this message because you are subscribed to the Google Groups "Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email to clo...@googlegroups.com.

Dominic Parry

unread,
Jan 22, 2020, 12:44:33 PM1/22/20
to Clojure
Hi Aaron,

I’m just a minor contributor to datasplash at present, because we’re using Beam as the centre of our data streaming / processing functions. I like the idea you’re presenting, but haven’t really stretched our solutions to the point where AOT was a problem for us.

I look forward to further developments on Thurber though!



For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
---
You received this message because you are subscribed to the Google Groups "Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email to clojure+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/clojure/b7cda0e2-2f70-46f4-947e-511bfee09faf%40googlegroups.com.

Beau Fabry

unread,
Jan 24, 2020, 3:01:08 PM1/24/20
to Clojure
Hi Aaron,

I was I guess the lead developer on clj-headlights. If you have any questions feel free to hit me up. It is afaik not actively maintained by anyone anymore.

Cheers,
Beau
Reply all
Reply to author
Forward
0 new messages