Hopswork with standard hadoop

86 views
Skip to first unread message

geeka...@gmail.com

unread,
Jan 2, 2019, 8:36:02 AM1/2/19
to Hops
Hi,

we have recently approached hops, congratulations for your work, it is really a great product.
We are interested in an experimentation and we have some questions:
- Is A multi-year maintenance plan foreseen for HopsFS?
- Is it possible to use Hops with standard hdfs?
- is It possible to integrate hops with message brokers such as activemq, rabbitmq or http Rest / Soap services in order to develop Realtime application?

Theofilos Kakantousis

unread,
Jan 2, 2019, 10:11:45 AM1/2/19
to Hops
Hi,

Thank you for your kind words. 
- Yes, there is. Logical Clocks are the main developers behind Hops (HopsFS, HopsYARN etc.) and the developers Hopsworks (the data science platform built on top of Hops, we are in the process of renaming documentation and support channels to reflect both Hops and Hopsworks).The company recently raised funding to accelerate development of the product. Interesting features coming in future releases is datacenter replication for Hops and Feature Store for Hopsworks.
- If you mean Hopsworks then no, it uses Hops specific features such as extendable metadata so it cannot work with Apache HDFS. HopsFS is compatible with clients working against Apache HDFS as the client interface in HopsFS is compatible with Apache HDFS.
- Hopsworks already has first-class support for Apache Kafka but other message brokers could be added in the future if there's enough demand for it. Most of Hopsworks is open-source so contributions are always welcome! Hopsworks exposes a REST API, currently being refactored to make it more compatible with web standards, documented with Swagger. There is first-class support for Spark and we are currently upgrading our stack to latest Flink so you can develop real-time applications. Is there some particular use-case or domain you have in mind?

Thanks,
Theo

geeka...@gmail.com

unread,
Jan 7, 2019, 11:35:03 AM1/7/19
to Hops
Thank you for you prompt response Theofilos,
Regarding the use cases, we are interested in the Big data/ Fast data and ML capabilities offered by hops, in order to be compliant with differents data-sources an extension able to realize ingestion from relational/non relational db an IOT sensors would be great.
In order to extend hopswork with other ingestion components such as sqoop, kafka connect, nifi, activemq etc, would you have any suggestions on how to implement such extension respecting the application multitenancy realized by hops?
is there already a fork of the project that extends hopswork with new capabilities?
Thank you

Theofilos Kakantousis

unread,
Jan 7, 2019, 2:37:02 PM1/7/19
to Hops
Hi,

For ingesting data into the platform there are a few options. One is by using the hopsworks-cli library, which is a java client for the Hopsworks REST API. Users can upload data through the DatasetService directly into their project's datasets. Another approach would be to open the Kafka port in  Hopsworks and external applications running for example on IoT devices can push directly to it by using the kafka client. Then it's trivial to run a Spark app in Hopsworks that ingests and process data from Kafka and potentially persist them to the Datasets where they can be shared as well. An example is shown on slide 18 here. Ingestion for AWS S3 buckets is also feasible by using Spark. 

On IoT, we have done work on ingesting data from Android devices into Hopsworks via Kafka. That is achieved by having the mobile devices authenticate to Hopsworks by using the project's X.509 certificate. We are interested in adding mqtt support, as we are currently investigating how we can use the same mechanism as the one used by the Android devices to authenticate to Hopsworks.  

About the specific tools. support for sqoop is coming into Hopsworks in February as part of AirFlow on Hopsworks, we don't support Kafka-connect (part of the reason) but we are running Apache Kafka so its satellite tools can be added into Hopsworks and we don't support nifi. 
In general, the thing to pay attention when adding new services in Hopsworks is to make them compliant with the project-user multi-tenancy model of Hopsworks. That is, a user in a project should not be able to access for example nifi workflows of other projects. even if the user if a member of both projects. Of course the implementation details vary depending on the service itself, if it has any notion of users and if so how access management is performed, where users are stored etc.

The most active Hopsworks repository is the one of Logical Clocks, so you could fork if its AGPLv3 license is compatible with your project. Otherwise there is also this repo with a different license. Also, this Hops repo is the most heavily maintained and developed and it's the one Logical Clocks is contributing to as well. 

Thanks,
Theo

geeka...@gmail.com

unread,
Jan 10, 2019, 8:33:45 AM1/10/19
to Hops

thank you Theo,
we did some tests to send to the hops kafka broker withe the certificate provided for the user and we found the following issues:
the key is generated without a password, this condition seems to generate a problem for both the jdk client and the kafka broker.
Is it possible to generate certificates with a password?
do you have documentation of the tests made with android devices?
thank you very much for supporting us

Theofilos Kakantousis

unread,
Jan 11, 2019, 5:38:08 AM1/11/19
to Hops
Hi,

Which key are you referring to?The user certificate (one for each project they are member of) can be downloaded by navigating to the Project-Settings-ExportCertificates. Then you can by download the keystore.jks and trustore.jks and you get an email with the password which is the same for both. If you try to unlock the keystore without the password, you will get back the public key but you cannot unlock the private one.

In our case, the Android devices were pushing messages to Hopsworks via its REST API and then Hopsworks was relaying them to kafka. One reason for following this approach was due to issues we had on getting the Kafka clients run on the Android sdk. But that was a while ago, maybe now that can work better.

geeka...@gmail.com

unread,
Feb 4, 2019, 10:04:41 AM2/4/19
to Hops
Thanks Theofilos,
We have tried to install the 0.6.0-SNAPSHOT version available at https://github.com/hopshadoop/hopsworks. at the moment we had some troubles with the installation because we can't found the corresponding chef script and the ones available for version 0.6 At logical clock don't seems to be compatibles. We had consistency problems with mysql tables when rest api try to access mysql cluster.
We tried also the AWS AMI but the version is 0.6.1.
Do you know if there is any chef available installation regarding version 0.6.0-SNASPSHOT?
thank you very much.
Federico


Theofilos Kakantousis

unread,
Feb 4, 2019, 10:29:28 AM2/4/19
to Hops
Hi Federico,

hopshadoop organization is not maintained by logicalclocks anymore. The correct one would be this https://github.com/logicalclocks/hopsworks/ 
We have a cluster definition but by default you get the latest bugfix which is 0.6.1 (name of cluster-def is misleading) https://github.com/logicalclocks/karamel-chef/blob/master/cluster-defns/1.hopsworks-0.6.0.yml
So from your root karamel-chef you can do "./run.sh ubuntu 1 hopsworks-0.6.0". If you really want 0.6.0 then you can compile it from here https://github.com/logicalclocks/hopsworks/releases/tag/v0.6.0 and deply it on your newly created vm.

Kind regards,
Theo 

geeka...@gmail.com

unread,
Feb 4, 2019, 11:37:31 AM2/4/19
to Hops

Thanks for the prompt answer Theofilos,
right now we need a MIT license in order to integrate it in our purposes.
For this we tried to install the 0.6.0-SNAPSHOT.
Do you think there are any resources that allow us to replicate that version?
Thank you very much

Jim Dowling

unread,
Feb 5, 2019, 1:18:01 AM2/5/19
to Hops
Hi Federico
Hopsworks is available as AGPL-v3 licensed. If you want a different license, you will have to contact Logical Clocks AB.

Regards
Jim
Reply all
Reply to author
Forward
0 new messages