Database get returning a slice instead of a string

406 views
Skip to first unread message

Ciprian Dorin Craciun

unread,
Nov 21, 2011, 7:16:40 AM11/21/11
to Sanjay Ghemawat, Jeff Dean, lev...@googlegroups.com
Hello all!

First congratulation for this nice piece of software.

I've studied a little bit the "source" and I have a question about
the public interface:
http://code.google.com/p/leveldb/source/browse/include/leveldb/db.h
~~~~
virtual Status Get(const ReadOptions& options,
const Slice& key, std::string* value) = 0;
~~~~
virtual Status Put(const WriteOptions& options,
const Slice& key,
const Slice& value) = 0;

~~~~

Why does the `Get` method use `std::string*` instead of `Slice&`
as value argument? (Iterators for examples do use slices and no
strings).

Is this only an "backward" compatibility artefact? Does it have
some performance implications? (I guess that slices have a lower
performance impact than the standard strings?)

And the second question: what is the impact of implementing `Get`
only by opening a new `Iterator`, seeking it, and destroying? (From
your source code it seems that internally this is how `Get` is
implemented in the first time...)

Thanks,
Ciprian.

Jeff Dean

unread,
Nov 21, 2011, 12:35:47 PM11/21/11
to Ciprian Dorin Craciun, Sanjay Ghemawat, lev...@googlegroups.com

Get is meant as a convenience operation, and in some circumstances
(especially those involving very large values), it may be faster to
use an Iterator directly.

The reason the Get operation returns its data in a std::string, rather
than as a Slice, is that the guarantees that could be provided about
the lifetime of the Slice would be very weak. By copying the actual
value into its own backing store in a std::string object, we don't
require that the value inside the database representation remain live
after the call to Get returns. When you create an iterator, it
ensures that the configuration of the database at iterator
construction time remains live (through doing various kinds of
reference counting), and so the iterator interface can be in terms of
Slices when returning values, while the simpler Get interface cannot.

-Jeff

Ciprian Dorin Craciun

unread,
Nov 21, 2011, 1:26:38 PM11/21/11
to Jeff Dean, Sanjay Ghemawat, lev...@googlegroups.com

So if I would to write a LevelDB binding to another programming
language, and I want to elude the double memory copy and allocation I
could use iterators for every get operation without overhead?

(I say double memory allocation, because internally you enlarge
the passed `std::string` to the slice size (which is one memory
allocation), then you copy the data there. But when I get back the
string I can't pass it forward, I need to allocate another buffer (by
using the programming language interoperability API) and copy from the
string there. Thus the intermediary string is just overhead.)


> The reason the Get operation returns its data in a std::string, rather
> than as a Slice, is that the guarantees that could be provided about
> the lifetime of the Slice would be very weak.  By copying the actual
> value into its own backing store in a std::string object, we don't
> require that the value inside the database  representation remain live
> after the call to Get returns.

I agree with your design choice. But from an efficiency point of
view, I would have wanted one of the following solutions to also be
exported:
a) I give as input a slice for value, which has the size to the
backing buffer size; if the read data fits the buffer, you copy it
there and set the size to the amount; if not you issue an error;
b) I give as input an allocator that should be called with the
size and outputs slice for the value;


> When you create an iterator, it
> ensures that the configuration of the database at iterator
> construction time remains live (through doing various kinds of
> reference counting), and so the iterator interface can be in terms of
> Slices when returning values, while the simpler Get interface cannot.
>
>  -Jeff

Thanks for the input. For now I'll use the iterator method.

One observation: I really like the BerkeleyDB approach: it allows
to specify an allocator / deallocator per database. You could do the
similar thing but place it in the options. Thus my above solution you
allocate a new slice based on the allocator and just return it back.

Ciprian.

tao

unread,
Nov 26, 2014, 6:21:08 AM11/26/14
to lev...@googlegroups.com, san...@google.com, je...@google.com, ciprian...@gmail.com
in case below :

key = "123"
value = "abc \0 cdef"

how do i call Get() to get the value of key"123"
Reply all
Reply to author
Forward
0 new messages