What is the best way connect Redis and HDFS

69 views
Skip to first unread message

JUNHEE PARK

unread,
Dec 9, 2017, 5:54:57 PM12/9/17
to Redis DB
We are considering using Redis as the primary database and using Hdfs to store the sensor's measurement data.
I want to configure the service using HDFS, WebHDFS in Node.js, Redis, Redis client in Node.js.
Or would you prefer to configure your business logic from Node.js to a Java framework like Spring with Jedis?

Both extensions are likely to be similar, but I wonder how the speed of service processing and pipelining speed will differ.

Stefano Fratini

unread,
Dec 9, 2017, 11:27:45 PM12/9/17
to Redis DB
Hey Junhee

I am not sure about the performance of using HDFS (an append only filesystem) to support the persistence needs of Redis...

Re: to store the sensor's measurement data
Redis and IoT go hand in hand, good choice

Re: Or would you prefer to configure your business logic from Node.js to a Java framework like Spring with Jedis?
You get more performance from Java but do you need it? If all you need is to store data measurements in Redis, node.js + ioredis (client library) work flawlessly for this kind of stuff

Have a look at this video (presentation from RedisConf17 - Disclaimer I am the guy talking in the video)

Stefano

hva...@gmail.com

unread,
Dec 11, 2017, 2:21:01 PM12/11/17
to Redis DB

I think there's a misunderstanding here.

Redis does not store data on a disk filesystem.  Redis keeps its data in RAM and reads/writes its data to RAM.  The only thing Redis uses a disk filesystem for is to save a backup of data in case of a crash (called "persistence" because the data isn't lost in a crash).  Having Redis save to HDFS, which is a disk filesystem designed to make data searchable by Hadoom querys, isn't useful.  Redis doesn't search for data on disk, only in RAM.

JUNHEE PARK

unread,
Jan 7, 2018, 10:49:17 PM1/7/18
to Redis DB
Thank you for good reply. 

I know Redis store data to RAM.

But RAM is so expensive hardware.
If I store all of the sensing data to Redis, The cost will a lot.

So, I consider the user or Realtime data should be stored in Redis, and Historical data should be stored in other Disk Store such as Hadoop or Mongo DB. 

The historical data will be used sometimes for users request such as researching.

In this situation, I am not sure about what is the best  either Node.js or Java

2017년 12월 11일 월요일 오전 11시 21분 1초 UTC-8, hva...@gmail.com 님의 말:
Message has been deleted

JUNHEE PARK

unread,
Jan 7, 2018, 11:06:44 PM1/7/18
to Redis DB
Thank you for good reply. 

I will watch your conference.

Don't worry about HDFS, and I am so appreciated for good suggestion (ioredis).

And would you let me know how can you store the big size data such as time series raw data? 

I worried about the expensive RAM for a lot of raw data for researching or for the machine learning.

Best,
Jack

2017년 12월 9일 토요일 오후 8시 27분 45초 UTC-8, Stefano Fratini 님의 말:

JUNHEE PARK

unread,
Jan 8, 2018, 3:34:55 AM1/8/18
to Redis DB
Oh U used S3 service! 


2017년 12월 9일 토요일 오후 8시 27분 45초 UTC-8, Stefano Fratini 님의 말:
Hey Junhee

CharSyam

unread,
Jan 8, 2018, 4:10:47 AM1/8/18
to redi...@googlegroups.com
I don't understand what is your question. I'm sorry. I just think if you use redis on disk, there are some perfomance gap compared to use memory

2018년 1월 8일 월요일, JUNHEE PARK<j.job...@gmail.com>님이 작성한 메시지:
--
You received this message because you are subscribed to the Google Groups "Redis DB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to redis-db+unsubscribe@googlegroups.com.
To post to this group, send email to redi...@googlegroups.com.
Visit this group at https://groups.google.com/group/redis-db.
For more options, visit https://groups.google.com/d/optout.
Message has been deleted
Message has been deleted

JUNHEE PARK

unread,
Jan 8, 2018, 1:19:14 PM1/8/18
to Redis DB
Thank you for the reply.
Sorry for the unclear question. 

To summarize that is below.

I want to use Redis with Disk Storage (Hadoop, Mongo DB something like that). The reason is that storing all the data in Redis is too expensive.

1. I wonder if it would be a better idea to build a web server with Java or build a web server with Node.js in this situation. 
(I think it can be affected by the picked disk storage.)

2. I am going to use either Hadoop or MongoDB as disk storage to store historical time-series measured data. If you have experience building a system with the following framework, I would like to get a recommendation for what was well for that.

Best,
Jack


2018년 1월 8일 월요일 오전 1시 10분 47초 UTC-8, CharSyam 님의 말:
I don't understand what is your question. I'm sorry. I just think if you use redis on disk, there are some perfomance gap compared to use memory

2018년 1월 8일 월요일, JUNHEE PARK<j.job...@gmail.com>님이 작성한 메시지:
Oh U used S3 service! 

2017년 12월 9일 토요일 오후 8시 27분 45초 UTC-8, Stefano Fratini 님의 말:
Hey Junhee

I am not sure about the performance of using HDFS (an append only filesystem) to support the persistence needs of Redis...

Re: to store the sensor's measurement data
Redis and IoT go hand in hand, good choice

Re: Or would you prefer to configure your business logic from Node.js to a Java framework like Spring with Jedis?
You get more performance from Java but do you need it? If all you need is to store data measurements in Redis, node.js + ioredis (client library) work flawlessly for this kind of stuff

Have a look at this video (presentation from RedisConf17 - Disclaimer I am the guy talking in the video)

Stefano

On Sunday, December 10, 2017 at 9:54:57 AM UTC+11, JUNHEE PARK wrote:
We are considering using Redis as the primary database and using Hdfs to store the sensor's measurement data.
I want to configure the service using HDFS, WebHDFS in Node.js, Redis, Redis client in Node.js.
Or would you prefer to configure your business logic from Node.js to a Java framework like Spring with Jedis?

Both extensions are likely to be similar, but I wonder how the speed of service processing and pipelining speed will differ.

--
You received this message because you are subscribed to the Google Groups "Redis DB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to redis-db+u...@googlegroups.com.

CharSyam

unread,
Jan 8, 2018, 8:51:06 PM1/8/18
to redi...@googlegroups.com
In attacted video,
I think he didn't query to s3, maybe, the s3 file name is a kind of index.

for example(in video)

He divided tha dataset by day.

and dump one day data into S3. and maybe query also can handle only
one day range(not hour or minute, if recent data, he can handle by
timestamp, but can't handle historical data)

hva...@gmail.com

unread,
Jan 8, 2018, 9:32:16 PM1/8/18
to Redis DB
This discussion thread has already shown that Redis does not use disk storage.

So it sounds like you are talking about moving your time-series data through two different software systems:
  1. First your time-series data will be written into Redis for fastest access.  Since Redis has limited space, the data will be deleted from Redis after a period of time.
  2. Next, your time-series data will be written into other software (perhaps Hadoop or MongoDB) which is slower, but can use disk to keep the data much longer.
I don't have suggestions on which software is better for #2.  I think it depends on the querys you will use to search the older data.  For example, how fast the query must finish and how much data it must search through.

The suggestion I will make is to not write the new data samples to Redis and afterward, when they expire from Redis, write them to the other software.  I believe it's better to write them to both Redis and the other software from the start.  When the same samples are in both, your query clients can still query Redis for fast answers from the data there.  However, they can also query the other software for the full data set, including data that's no longer in Redis.  Some querys need that.  If the data is divided, the query clients must run two querys and fit the two results together like puzzle pieces in order to get the final answer.

JUNHEE PARK

unread,
Feb 2, 2018, 9:59:37 PM2/2/18
to Redis DB
Okay, I just checked your reply now. 

Now I understand that. 

Appreciate.

2018년 1월 8일 월요일 오후 5시 51분 6초 UTC-8, CharSyam 님의 말:

JUNHEE PARK

unread,
Feb 2, 2018, 10:01:28 PM2/2/18
to Redis DB
Thank you for this reply.

Yes, but We have a challenge for that.. the purpose of reducing I/O count.

2018년 1월 8일 월요일 오후 6시 32분 16초 UTC-8, hva...@gmail.com 님의 말:
Reply all
Reply to author
Forward
0 new messages