Using Hazelcast as Distributed In-Memory File System

Miere Teixeira

unread,

Nov 13, 2014, 7:17:54 AM11/13/14

to Hazelcast Google Group

Hi team,

I'm working on a project where I should be able to process thousands of PDF files that are uploaded by our users. In order to achieve a nice response time, we want to split up the PDF processing data across separate nodes, scaling our nodes when more processing resources are needed.

We expect to use IQueue as a Queue of PDF Processing Jobs we want to execute. Also, we will use MultiMap as repository to processed data from each PDF document.

Our question is: is there an easy way to keep these PDF files persisted in memory in Hazelcast cluster?

Bringing these files to memory we want to reduce the I/O time we face when consume them from AWS S3. HDFS was considered also, but we want to avoid maintaing another tech stack just to fill this gap.

Regards,

Miere

Miere Teixeira

unread,

Nov 17, 2014, 10:36:13 AM11/17/14

to haze...@googlegroups.com, miere.t...@gmail.com

Hi...

Does anybody have any clues?
Is here the right place to ask for this kind of question?

Please, feel free to correct me if this is the wrong place to ask this kind of questions. I've following this group about a year, but that's my first participation here since then.

Regards

Enes Akar

unread,

Nov 19, 2014, 6:08:52 AM11/19/14

to haze...@googlegroups.com, miere.t...@gmail.com

Hi Miere;

This is the correct place to ask questions to hazelcast community.

It is possible but not a common use case to store files in hazelcast data grid. You should create a wrapper object (may implement DataSerializable) which properly serializes (deserializes) your PDF files.

Because serialization cost and memory overhead; I think it is best to keep meta-data in memory (hazelcast), keep files in persistent storage.

--
You received this message because you are subscribed to the Google Groups "Hazelcast" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hazelcast+...@googlegroups.com.
To post to this group, send email to haze...@googlegroups.com.
Visit this group at http://groups.google.com/group/hazelcast.
To view this discussion on the web visit https://groups.google.com/d/msgid/hazelcast/a266649f-42ac-42b9-97c9-47a15c47df08%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--

Enes Akar

Director of Engineering
Mahir İz Cad. No:35, Altunizade, İstanbul

en...@hazelcast.com
@enesakar

Miere Teixeira

unread,

Nov 21, 2014, 7:41:23 AM11/21/14

to haze...@googlegroups.com, miere.t...@gmail.com

Hi Enes,

Thank you for your clarification. I was expecting to achieve something like this.
I know this an usual approach, but I'm facing terrible latencies fetching files I should process from Amazon S3.

When I asked you the question, I doesn't know exactly what could be the implications of persisting byte arrays or BLOBs in HZ.
As you pointed me out, serialization cost and memory overhead could led me to aweful issues. There's any workaround I could deal with this if I insist on the initial approach?

Regards.

Reply all

Reply to author

Forward