(I looked at various places for the information, however I could not find anything that would answer the question. It's not completely ruled out that not all places were checked though :))
I use PB erlang interface to access the database. Given a bucket name and a key, the value can easily be extracted using:
{ok, Object} = riakc_pb_socket:get(Conn, Bucket, Key), Value = riakc_obj:get_value(Object)
Alternatively, a mapred (actually, just map) request could be issued:
I would expect that the result is the same while in the second case, the amount of data transferred to the client is smaller (which might be good for certain situations).
So the [open] question is: are there any reasons for using the first approach over the second?
MapReduce ( or a simply Map ) gets really slow when database has a significant amount of data ( or distributed over several servers ). Get instead is always faster as Riak doesn't have to search for the key ( you tell Riak exactly where to GET the data in your url )
Rohman
On Thu, 28 Jul 2011 23:43:06 +0400, m...@mawhrin.net wrote:
> Hi,
> (I
looked at various places for the information, however I could not
> find
anything that would answer the question. It's not completely ruled
> out
that not all places were checked though :))
> I use PB erlang
interface to access the database. Given a bucket name
> and a key, the
value can easily be extracted using:
> {ok, Object} =
riakc_pb_socket:get(Conn, Bucket, Key),
> Value =
riakc_obj:get_value(Object)
> Alternatively, a mapred (actually, just
map) request could be issued:
> {ok, [{_, Value}]} =
riakc_pb_socket:mapred(Conn, [
> {Bucket, Key} > ], [ > {map, {modfun,
riak_kv, map_object_value}, none, true}
> ])
> I would expect that
the result is the same while in the second case, the
> amount of data
transferred to the client is smaller (which might be good
> for certain situations).
> So the [open] question is: are there any reasons for using the first > approach over the second?
> -- > Misha
--
[1]
ANTONIO ROHMAN FERNANDEZ CEO, Founder & Lead Engineer roh...@mahalostudio.com [2] PROJECTS MaruBatsu.es [3] PupCloud.com [4] Wedding Album [5]
I would have suspected that an MR job where you supply a Bucket, Key pair would be just as fast as a Get request. Shows what I know. --- Jeremiah Peschka Founder, Brent Ozar PLF, LLC
On Jul 29, 2011, at 1:37 AM, Antonio Rohman Fernandez wrote:
> MapReduce ( or a simply Map ) gets really slow when database has a significant amount of data ( or distributed over several servers ). Get instead is always faster as Riak doesn't have to search for the key ( you tell Riak exactly where to GET the data in your url )
> Rohman
> On Thu, 28 Jul 2011 23:43:06 +0400, m...@mawhrin.net wrote:
>> Hi,
>> (I looked at various places for the information, however I could not >> find anything that would answer the question. It's not completely ruled >> out that not all places were checked though :))
>> I use PB erlang interface to access the database. Given a bucket name >> and a key, the value can easily be extracted using:
>> I would expect that the result is the same while in the second case, the >> amount of data transferred to the client is smaller (which might be good >> for certain situations).
>> So the [open] question is: are there any reasons for using the first >> approach over the second?
>> -- >> Misha
> --
> Antonio Rohman Fernandez > CEO, Founder & Lead Engineer > roh...@mahalostudio.com Projects > MaruBatsu.es > PupCloud.com > Wedding Album > _______________________________________________ > riak-users mailing list > riak-us...@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
And it's a bit ironic that having data spread over more servers results in slower performance. Usually more servers = greater performance. ...black is white, up is down... *
jeremiah.pesc...@gmail.com> wrote: > I would have suspected that an MR job where you supply a Bucket, Key pair > would be just as fast as a Get request. Shows what I know. > --- > Jeremiah Peschka > Founder, Brent Ozar PLF, LLC
> On Jul 29, 2011, at 1:37 AM, Antonio Rohman Fernandez wrote:
> > MapReduce ( or a simply Map ) gets really slow when database has a > significant amount of data ( or distributed over several servers ). Get > instead is always faster as Riak doesn't have to search for the key ( you > tell Riak exactly where to GET the data in your url )
> >> (I looked at various places for the information, however I could not > >> find anything that would answer the question. It's not completely ruled > >> out that not all places were checked though :))
> >> I use PB erlang interface to access the database. Given a bucket name > >> and a key, the value can easily be extracted using:
> >> I would expect that the result is the same while in the second case, the > >> amount of data transferred to the client is smaller (which might be good > >> for certain situations).
> >> So the [open] question is: are there any reasons for using the first > >> approach over the second?
> >> -- > >> Misha
> > --
> > Antonio Rohman Fernandez > > CEO, Founder & Lead Engineer > > roh...@mahalostudio.com Projects > > MaruBatsu.es > > PupCloud.com > > Wedding Album > > _______________________________________________ > > riak-users mailing list > > riak-us...@lists.basho.com > > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
You were essentially correct. A "targeted" MR does not have to search for the data, and does not slow down with database size. It is a bucket-sweeping MR that currently has that behavior.
-Justin
On Fri, Jul 29, 2011 at 10:27 AM, Jeremiah Peschka
<jeremiah.pesc...@gmail.com> wrote: > I would have suspected that an MR job where you supply a Bucket, Key pair would be just as fast as a Get request. Shows what I know. > --- > Jeremiah Peschka > Founder, Brent Ozar PLF, LLC
> On Jul 29, 2011, at 1:37 AM, Antonio Rohman Fernandez wrote:
>> MapReduce ( or a simply Map ) gets really slow when database has a significant amount of data ( or distributed over several servers ). Get instead is always faster as Riak doesn't have to search for the key ( you tell Riak exactly where to GET the data in your url )
>> Rohman
>> On Thu, 28 Jul 2011 23:43:06 +0400, m...@mawhrin.net wrote:
>>> Hi,
>>> (I looked at various places for the information, however I could not >>> find anything that would answer the question. It's not completely ruled >>> out that not all places were checked though :))
>>> I use PB erlang interface to access the database. Given a bucket name >>> and a key, the value can easily be extracted using:
>>> I would expect that the result is the same while in the second case, the >>> amount of data transferred to the client is smaller (which might be good >>> for certain situations).
>>> So the [open] question is: are there any reasons for using the first >>> approach over the second?
>>> -- >>> Misha
>> --
>> Antonio Rohman Fernandez >> CEO, Founder & Lead Engineer >> roh...@mahalostudio.com Projects >> MaruBatsu.es >> PupCloud.com >> Wedding Album >> _______________________________________________ >> riak-users mailing list >> riak-us...@lists.basho.com >> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
1) MapReduce amounts to N=1, or reading only one replica. If you have divergent replicas (siblings, e.g.) on different notes, they might not appear in your MapReduce results. 2) MapReduce does not invoke read-repair, so divergent replicas will not converge.
On Fri, Jul 29, 2011 at 1:30 PM, Justin Sheehy <jus...@basho.com> wrote: > Jeremiah,
> You were essentially correct. A "targeted" MR does not have to search > for the data, and does not slow down with database size. It is a > bucket-sweeping MR that currently has that behavior.
> -Justin
> On Fri, Jul 29, 2011 at 10:27 AM, Jeremiah Peschka > <jeremiah.pesc...@gmail.com> wrote: > > I would have suspected that an MR job where you supply a Bucket, Key pair > would be just as fast as a Get request. Shows what I know. > > --- > > Jeremiah Peschka > > Founder, Brent Ozar PLF, LLC
> > On Jul 29, 2011, at 1:37 AM, Antonio Rohman Fernandez wrote:
> >> MapReduce ( or a simply Map ) gets really slow when database has a > significant amount of data ( or distributed over several servers ). Get > instead is always faster as Riak doesn't have to search for the key ( you > tell Riak exactly where to GET the data in your url )
> >>> (I looked at various places for the information, however I could not > >>> find anything that would answer the question. It's not completely > ruled > >>> out that not all places were checked though :))
> >>> I use PB erlang interface to access the database. Given a bucket name > >>> and a key, the value can easily be extracted using:
> >>> I would expect that the result is the same while in the second case, > the > >>> amount of data transferred to the client is smaller (which might be > good > >>> for certain situations).
> >>> So the [open] question is: are there any reasons for using the first > >>> approach over the second?
> >>> -- > >>> Misha
> >> --
> >> Antonio Rohman Fernandez > >> CEO, Founder & Lead Engineer > >> roh...@mahalostudio.com Projects > >> MaruBatsu.es > >> PupCloud.com > >> Wedding Album > >> _______________________________________________ > >> riak-users mailing list > >> riak-us...@lists.basho.com > >> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
On Fri, Jul 29, 2011 at 1:30 PM, Justin Sheehy <jus...@basho.com> wrote: > Jeremiah,
> You were essentially correct. A "targeted" MR does not have to search > for the data, and does not slow down with database size. It is a > bucket-sweeping MR that currently has that behavior.
> -Justin
> On Fri, Jul 29, 2011 at 10:27 AM, Jeremiah Peschka > <jeremiah.pesc...@gmail.com> wrote: > > I would have suspected that an MR job where you supply a Bucket, Key pair > would be just as fast as a Get request. Shows what I know. > > --- > > Jeremiah Peschka > > Founder, Brent Ozar PLF, LLC
> > On Jul 29, 2011, at 1:37 AM, Antonio Rohman Fernandez wrote:
> >> MapReduce ( or a simply Map ) gets really slow when database has a > significant amount of data ( or distributed over several servers ). Get > instead is always faster as Riak doesn't have to search for the key ( you > tell Riak exactly where to GET the data in your url )
> >>> (I looked at various places for the information, however I could not > >>> find anything that would answer the question. It's not completely > ruled > >>> out that not all places were checked though :))
> >>> I use PB erlang interface to access the database. Given a bucket name > >>> and a key, the value can easily be extracted using:
> >>> I would expect that the result is the same while in the second case, > the > >>> amount of data transferred to the client is smaller (which might be > good > >>> for certain situations).
> >>> So the [open] question is: are there any reasons for using the first > >>> approach over the second?
> >>> -- > >>> Misha
> >> --
> >> Antonio Rohman Fernandez > >> CEO, Founder & Lead Engineer > >> roh...@mahalostudio.com Projects > >> MaruBatsu.es > >> PupCloud.com > >> Wedding Album > >> _______________________________________________ > >> riak-users mailing list > >> riak-us...@lists.basho.com > >> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com