Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
Machine learning processing on Redis - Perfomance
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  9 messages - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
Vinicius Melo  
View profile  
 More options Jul 28 2012, 7:39 pm
From: Vinicius Melo <vinicius...@gmail.com>
Date: Sat, 28 Jul 2012 16:39:58 -0700 (PDT)
Local: Sat, Jul 28 2012 7:39 pm
Subject: Machine learning processing on Redis - Perfomance

Hello Developers and Team from Redis,

We are building a product that will be a free social platform intended for
knowledge exchange.

We have used the following databases together to deal with our problems:

- Users - DynamoDB
- Content and Search - ElasticSearch (lucene)
- Complicated machine learning processing, and custom algorithms - Redis

What do you think about it ? Which problems could we have with perfomance,
scalability and availability?  

We will contribute with Redis creating a lot of tutorials and wiki when we
launch in our platform for free, if you are interested, please join our
community: http://www.guchex.com

Thanks,

Vinicius Melo


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
M. Edward (Ed) Borasky  
View profile  
 More options Jul 29 2012, 4:58 pm
From: "M. Edward (Ed) Borasky" <zn...@znmeb.net>
Date: Sun, 29 Jul 2012 13:58:19 -0700
Local: Sun, Jul 29 2012 4:58 pm
Subject: Re: Machine learning processing on Redis - Perfomance

On Sat, Jul 28, 2012 at 4:39 PM, Vinicius Melo <vinicius...@gmail.com> wrote:
> Hello Developers and Team from Redis,

> We are building a product that will be a free social platform intended for
> knowledge exchange.

> We have used the following databases together to deal with our problems:

> - Users - DynamoDB
> - Content and Search - ElasticSearch (lucene)
> - Complicated machine learning processing, and custom algorithms - Redis

> What do you think about it ? Which problems could we have with perfomance,
> scalability and availability?

If your product / service is free as in zero cost to use, you will
have all sorts of problems. I'd rethink the business model before
worrying about the technical aspects. How will you earn revenue to
support the efforts?

--
Twitter: http://twitter.com/znmeb Computational Journalism Studio
http://j.mp/CompJournStudio

Data is the new coal - abundant, dirty and difficult to mine.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Josiah Carlson  
View profile  
 More options Jul 29 2012, 6:57 pm
From: Josiah Carlson <josiah.carl...@gmail.com>
Date: Sun, 29 Jul 2012 15:57:09 -0700
Local: Sun, Jul 29 2012 6:57 pm
Subject: Re: Machine learning processing on Redis - Perfomance

On Sat, Jul 28, 2012 at 4:39 PM, Vinicius Melo <vinicius...@gmail.com> wrote:
> Hello Developers and Team from Redis,

> We are building a product that will be a free social platform intended for
> knowledge exchange.

Like Quora? Stack Exchange? Facebook questions? Yahoo answers? Reddit AMA? ...?

> We have used the following databases together to deal with our problems:

So it's already implemented, and you're asking our advice after it's done?

> - Users - DynamoDB

It doesn't matter where you store your user database, as long as it's
in a database. You'd also be fine with PostgreSQL, MySQL, or any other
database that can store data on a disk somewhere for subsequent
reading, atomic writes/updates, etc.

> - Content and Search - ElasticSearch (lucene)

Unless Amazon messed this up severely, this will probably work fine.

> - Complicated machine learning processing, and custom algorithms - Redis

> What do you think about it ? Which problems could we have with perfomance,
> scalability and availability?

Redis won't offer you much for machine learning. If you're looking to
gather statistics, calculate co-visitation, ..., Redis would work
fine.

But if you're looking to perform "complicated machine learning
processing", then Redis is not the tool for you. Most machine learning
techniques rely on large matrix multiplication and/or linear
optimization, neither of which can be done efficiently with Redis.
With Redis, you are reading/writing data with a round-trip to a remote
server, which means reading/writing 100k-1M items/second (or 25k-250k
from a single client) against a single server. You won't get any
optimized algorithms for free, which means that you will not be doing
anything "complicated" with any volume of real data.

You are better off using one of the available libraries in your
language of choice, or implementing them yourself, which will let you
read/write 1B+ items/second (main memory is so much faster than a
network roundtrip), use optimized algorithms (improving the big-O
runtime), and could let you use pre-existing known-good implementation
of these algorithms.

Regards,
 - Josiah


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
M. Edward (Ed) Borasky  
View profile  
 More options Jul 29 2012, 10:28 pm
From: "M. Edward (Ed) Borasky" <zn...@znmeb.net>
Date: Sun, 29 Jul 2012 19:28:28 -0700
Local: Sun, Jul 29 2012 10:28 pm
Subject: Re: Machine learning processing on Redis - Perfomance
On Sun, Jul 29, 2012 at 3:57 PM, Josiah Carlson

"Out-of-core" linear algebra is expensive, which is why I brought up
the revenue issue. Pretty much the only game in town for packaged
large-scale number-crunching that doesn't involve writing a lot of
code and doesn't cost an arm and a leg is Mahout. Just about
everything else is either proprietary or falls on its hiney once it
goes beyond the capacity of a single machine's RAM and CPU.

--
Twitter: http://twitter.com/znmeb Computational Journalism Studio
http://j.mp/CompJournStudio

Data is the new coal - abundant, dirty and difficult to mine.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Vinicius Melo  
View profile  
 More options Jul 29 2012, 10:40 pm
From: Vinicius Melo <vinicius...@gmail.com>
Date: Sun, 29 Jul 2012 23:40:21 -0300
Local: Sun, Jul 29 2012 10:40 pm
Subject: Re: Machine learning processing on Redis - Perfomance
We decide to start with reducing the complexity of the process for
now, but it seems that our recommendation system and n-clustering
algoritms using just commands from Redis sets are enough, but it seems
we will have a lot of problems with scalability on this, but we
already have some engineers working with Mahout.

Our service is exactly a mix of all features from the social networks
(+tumblr) that you have talked, but we will just accept content that
meets our guidelines that will be anything related to knowledge , of
course, programming will be the most popular =)

Thanks,
Vinicius Melo

On Sun, Jul 29, 2012 at 11:28 PM, M. Edward (Ed) Borasky


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Krishna Gade  
View profile  
 More options Jul 30 2012, 12:07 am
From: Krishna Gade <kris...@twitter.com>
Date: Sun, 29 Jul 2012 21:07:24 -0700
Local: Mon, Jul 30 2012 12:07 am
Subject: Re: Machine learning processing on Redis - Perfomance

If you're looking for realtime data processing, take a look at
Storm<http://engineering.twitter.com/2011/08/storm-is-coming-more-details-a...>,
(open-sourced by twitter). It allows you to do online map-reduce and other
data processing algorithms. It also works well with Redis as your
computations can use Redis to store the maps on each of the storm nodes.

For batch-processing style of apps, Mahout on top of Hadoop is the way to
go at the moment.

On Sun, Jul 29, 2012 at 7:40 PM, Vinicius Melo <vinicius...@gmail.com>wrote:

--
*krishna*

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Dvir Volk  
View profile  
 More options Jul 30 2012, 3:50 am
From: Dvir Volk <dvir...@gmail.com>
Date: Mon, 30 Jul 2012 10:50:01 +0300
Local: Mon, Jul 30 2012 3:50 am
Subject: Re: Machine learning processing on Redis - Perfomance

> Redis won't offer you much for machine learning. If you're looking to
> gather statistics, calculate co-visitation, ..., Redis would work
> fine.

> But if you're looking to perform "complicated machine learning
> processing", then Redis is not the tool for you. Most machine learning

For probabilistic models redis actually works rather well. sorted sets and
hashes can model frequency tables and sparse feature vectors pretty
efficiently.
Even for more complex stuff, storing the end result in redis for quick
querying should also work fine.
Also, real time counting of events is also rather fast in redis, provided
you have the right strategy to scale redis beyond your RAM limitations.
a few examples of stuff I've done with redis in the past 2 years:

1. wikipedia based bayesian classification of texts. Redis was used both to
reduce wikipedia to  (word ->  {count(class1), count(class2), ...})
 vectors, and to query them in real time.

2. query log based "adult filtering" for queries - both training and
querying.

3.  trending topics detection from rss news feeds and other sources.

4. adaptive A/B testing.

and more...

I agree that it has its limitations, but it's pretty damn powerful for a
lot of common uses.

have you seen the new redis based map reduce framework I linked here a
couple of days ago?


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Josiah Carlson  
View profile  
 More options Jul 30 2012, 4:27 am
From: Josiah Carlson <josiah.carl...@gmail.com>
Date: Mon, 30 Jul 2012 01:27:11 -0700
Local: Mon, Jul 30 2012 4:27 am
Subject: Re: Machine learning processing on Redis - Perfomance

I've done all of those except your listed #4 in the past. Though I
don't consider any of them to be "complicated machine learning". I
suppose it's all a matter of opinion.

> I agree that it has its limitations, but it's pretty damn powerful for a lot
> of common uses.

I agree completely. Though from what the op was saying, they want it
for more than just the basics. Redis will work great for the basics,
and if you are okay with it running slow (in comparison to an in-core
library), you may even be happy about using it for more complicated
scenarios (beyond caching a result matrix from some
clustering/decomposition). But that they are already looking into
Mahout means that they're already looking towards solving bigger
problems.

> have you seen the new redis based map reduce framework I linked here a
> couple of days ago?

I did, though I've not had a reason to use it.

Regards,
 - Josiah


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Dvir Volk  
View profile  
 More options Jul 30 2012, 4:50 am
From: Dvir Volk <dvir...@gmail.com>
Date: Mon, 30 Jul 2012 11:50:33 +0300
Local: Mon, Jul 30 2012 4:50 am
Subject: Re: Machine learning processing on Redis - Perfomance

> clustering/decomposition). But that they are already looking into
> Mahout means that they're already looking towards solving bigger
> problems.

They are probably looking for unsupervised clustering algorithms. Again,
redis can perfectly store the result of map reduce jobs for realy time.

about it being slow compared to in core stuff - scaling out is always a
trade off. the advantage of being able to distribute my crunching jobs
while still using a simple single redis (even if sharded) in the middle,
wins back a lot of speed.

the way I usually work is I do small batches that reside in memory (for
example, break N wikipedia documents), and dump the aggregate to redis in a
single pipeline query. this works rather well while keeping things simple
and distributed.

BTW I'm wondering what can be done to exploit the new bitmap features for
learning algorithms. for example IDF counts can be easily modeled on them,
if the document set is either small or not very sparse.

> > have you seen the new redis based map reduce framework I linked here a
> > couple of days ago?

> I did, though I've not had a reason to use it.

me neither, but it's on my radar as I'm exploring ways to provide a more
rigid framework for log crunching and learning, and I'm really not a Java
fan so I have a natural bias against Hadoop and friends. Have you tried
Disco BTW?

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »