Voldemort Dependencies

125 views
Skip to first unread message

ctasada

unread,
Sep 5, 2014, 7:14:13 AM9/5/14
to project-...@googlegroups.com
Hi guys,

During the last days I've started to check the changes in Voldemort 1.8.x compared with 1.6 which is the version we're using now. 

The migration to Gradle is great, but I still miss the possibility to deploy the generated jars to a Maven Repository. Anyway, trying to find how to do it I was checking the different dependencies that are still not retrieved from a repository. That put me in the track to check all the dependencies and these are my findings and questions:
  • JE 5.0.88: This version is not officially released by Oracle. The closet version is 5.0.97, and the newest one is the 5.0.104. Oracle is only publishing in it's Maven repository to official versions, so the version that Voldemort is using is not only not available, but it's not either oficial and not the latest one
  • Azkaban-Common 0.05: This version is now legacy, not maintained anymore. Is it even used somewhere? why this old version?
  • Catalina-Ant: Again an old version. Something similar can be found in Maven, but not really sure if makes sense to maintain it as a dependency. My point would be that if someone is deploying in Tomcat, this library will already be available there.
  • Tehuti: Looks great, but seems to be maintained only by FelixGV. What's this library doing that cannot be done with http://metrics.dropwizard.io?
  • libthrift 0.5: Again an old version. In Maven can be found 0.6.x and the latest is 0.9.1.
  • Tusk: I could not even find anything about it
Also the versions that are used from apache-commons, joda and other libraries are really old. Some of those libraries include some interesting performance improvement in the newer versions.

Why are we using so many old dependencies? The only answer that comes to me is that for some reason Voldemort needs to still be compatible with Java 1.5. Is that the reason? Shouldn't we move to, at least, Java 6?

I would volunteer to attack those changes: Upgrade to Java 6, update dependencies, ..... From a code perspective, but I would need help with the performance and stability tests.

Looking forward for your comments.

Carlos.

Arunachalam

unread,
Sep 5, 2014, 11:26:00 AM9/5/14
to project-...@googlegroups.com
Carlos,
    Those are great questions and I had the same when I moved from ant to Gradle. We are going to move voldemort from 1.5 to 1.6 anytime (Probably within a week or so). Please see others inline

  • JE 5.0.88: This version had all the fixes we requested from them. I also believe this is what BDB guys specifically built for us. Not sure if the latest version contains all fixes included in this.
  • Azkaban-Common 0.05: If required we can upgrade. If you can send a pull request for this change I can work with you on the upgrade.
  • Catalina-Ant: We don't use it. If you want it can be upgraded.
  • Tehuti: Felix is one our fellow Voldemort developers at LInkedIn. Kafka at LinkedIn tried to use CodaHale's metrics, I dont remember whether it started consuming too much memory or cracked under heavy load, Kafka wrote their own implementation as it became a problem. The tehuti metrics is actually kafka metrics refactored into a library. 
  • libthrift 0.5: We dont use it. Please feel free to upgrade. We only use avro.
  • Tusk: This is a linkedin intenral library. Will look into where the dependency is.

If you can make it publish to the maven repository that would be great.

Thanks,
Arun.


--
You received this message because you are subscribed to the Google Groups "project-voldemort" group.
To unsubscribe from this group and stop receiving emails from it, send an email to project-voldem...@googlegroups.com.
Visit this group at http://groups.google.com/group/project-voldemort.
For more options, visit https://groups.google.com/d/optout.

Carlos Tasada

unread,
Sep 9, 2014, 6:53:12 AM9/9/14
to project-...@googlegroups.com
Hi Aru,

I've done some progresses and now I can install in my local repo, but I still need to check how to publish to an external/central repository. I'm new to gradle, so it's taking me a bit longer to get used to the way it works.

Should the groupId be com.linkedin or com.project-voldemort ?

I'll continue with the work and send a pull request when ready.

Once this part is done I'll attack the part to upgrade the dependencies, but I would need your help to see which is the proper BDB version to use ;)

Thanks,
Carlos.

Felix GV

unread,
Oct 7, 2014, 1:36:50 PM10/7/14
to project-...@googlegroups.com
Hi,

Sorry to reply late to this.

Regarding Tehuti: it has been extracted from Kafka's metric implementation. The code was originally written by Jay Kreps, and then maintained improved by some Kafka and Voldemort devs, so it definitely is not the work of just one person. It is in my repo at the moment but I'd like to put it in a more generally available (git and maven) repo in the future. I just haven't had the time yet...

As for comparing with CodaHale/Yammer, there were a few concerns with it, but the main one was that we didn't like the exponentially decaying histogram implementation. While that implementation is very appealing in terms of (low) memory usage, it has several misleading characteristics (a lack of incoming data points makes old measurements linger longer than they should, and there's also a fairly high possiblity of losing interesting outlier data points). This makes the exp decaying implementation robust in high throughput failry constant workloads, but unreliable in sparse or spiky workloads. The Tehuti implementation provides semantics that we find easier to reason with and with a small code footprint (which we consider a plus in terms of maintainability). Of course, it is still a fairly young project, so it could be improved further.

Hope that clears things up regarding Tehuti.

--
 
Felix GV
Data Infrastructure Engineer
Distributed Data Systems
LinkedIn
 
f...@linkedin.com
linkedin.com/in/felixgv

From: project-...@googlegroups.com [project-...@googlegroups.com] on behalf of Carlos Tasada [cta...@gmail.com]
Sent: Tuesday, September 09, 2014 3:53 AM
To: project-...@googlegroups.com
Subject: Re: [project-voldemort] Voldemort Dependencies

Justin Mason

unread,
Oct 8, 2014, 5:33:43 AM10/8/14
to project-...@googlegroups.com

On Tue, Oct 7, 2014 at 6:36 PM, 'Felix GV' via project-voldemort <project-...@googlegroups.com> wrote:
As for comparing with CodaHale/Yammer, there were a few concerns with it, but the main one was that we didn't like the exponentially decaying histogram implementation. While that implementation is very appealing in terms of (low) memory usage, it has several misleading characteristics (a lack of incoming data points makes old measurements linger longer than they should, and there's also a fairly high possiblity of losing interesting outlier data points). This makes the exp decaying implementation robust in high throughput failry constant workloads, but unreliable in sparse or spiky workloads. The Tehuti implementation provides semantics that we find easier to reason with and with a small code footprint (which we consider a plus in terms of maintainability). Of course, it is still a fairly young project, so it could be improved further.

As a matter of interest, how self-contained is Tehuti?  And what license terms does it use?

I've long been worried about the same issues with Yammer/Dropwizard Metrics -- it'd be great to be able to replace the builtin Meter/Timer implementations with ones wrapping a more reliable impl, and Tehuti could be a candidate.

--j.

Felix GV

unread,
Oct 8, 2014, 1:54:54 PM10/8/14
to project-...@googlegroups.com
Hi,

Currently, the main Tehuti repo is here: https://github.com/FelixGV/tehuti

Like I said, I'd like to move it to a more general / non-personal repo in the future, but haven't had the time yet. Anyway, you can still browse the code there for now. It is not a big code base so not that hard to wrap one's mind around it.

It is Apache licensed and both Kafka and Voldemort are using it so I would say it is pretty self-contained (although Kafka has not moved to Tehuti proper, it is essentially the same code they're using, minus a few small fixes missing that we added).

Tehuti is a bit lower level than CodaHale (i.e.: you need to choose exactly which stats you want to measure and the boundaries of your histograms), but this is the type of stuff you would build a wrapper for and then re-use within your code base. For example: the Voldemort RequestCounter class.

Let us know if you have any questions or concerns regarding Tehuti.


--
 
Felix GV
Data Infrastructure Engineer
Distributed Data Systems
LinkedIn
 
f...@linkedin.com
linkedin.com/in/felixgv

From: project-...@googlegroups.com [project-...@googlegroups.com] on behalf of Justin Mason [j...@jmason.org]
Sent: Wednesday, October 08, 2014 2:33 AM

To: project-...@googlegroups.com
Subject: Re: [project-voldemort] Voldemort Dependencies
--

Arunachalam

unread,
Oct 8, 2014, 4:00:55 PM10/8/14
to project-...@googlegroups.com
About the containment part it depends on log4j only.


Thanks,
Arun.

Justin Mason

unread,
Oct 9, 2014, 6:52:05 AM10/9/14
to project-...@googlegroups.com
great, thanks!
Reply all
Reply to author
Forward
0 new messages