Sorting B+tree by value using Tokyo Cabinet in C

80 views
Skip to first unread message

petd...@yahoo.com

unread,
Apr 7, 2014, 1:09:04 PM4/7/14
to tokyocabi...@googlegroups.com

I'm implementing a b+tree using Tokyo Cabinet, but I'd like to know if it's possible to keep the values sorted. I know I can use a tcbdbsetcmpfunc to set the custom comparison function for the keys, but not sure about the values?

I ask this because most of the time I only need the first 1000 records assuming my values are sorted. Otherwise I will have to loop over millions of records sort them and get the first 1000, which can be slow. Feedback is welcome.

For instance:

#include <tcutil.h>
#include <tcbdb.h>
#include <stdbool.h>
#include <stdint.h>

struct foo {
    int one;
    double two;
    char *three;
};

// sort ascending by three field?
static int value_cmp(const char *aptr, int asiz, const char *bptr, int bsiz, void *op) {
    return 1;
}

int main() {
    TCBDB *db;
    db = tcbdbnew();
    struct foo *f;

    tcbdbsetcmpfunc(db, value_cmp, f); // sort by struct->three?

    // open the database
    if(!tcbdbopen(db, "struct.tcb", BDBOWRITER | BDBOCREAT)){
        fprintf(stderr, "open error");
    }

    f = malloc(sizeof(struct foo));
    f->one = 100;
    f->two = 1.1111;
    f->three = "Hello World";
    printf("put: %d\n", tcbdbput(db, "foo", 3, f, sizeof(struct foo)));

    f = malloc(sizeof(struct foo));
    f->one = 100;
    f->two = 1.1111;
    f->three = "Hello Planet";
    printf("put: %d\n", tcbdbput(db, "bar", 3, f, sizeof(struct foo)));

    char *key;
    BDBCUR *cursor;
    cursor = tcbdbcurnew(db);
    tcbdbcurfirst(cursor);
    while((key = tcbdbcurkey2(cursor))!=NULL) {
        struct foo *val;
        int size;
        val = tcbdbcurval(cursor, &size);
        printf("%s: one=%d\n", key, val->one);
        printf("%s: three=%f\n", key, val->three);
        tcbdbcurnext(cursor);
    }
    tcbdbdel(db);
    return 0;
}

Sven Hartrumpf

unread,
Apr 9, 2014, 6:47:23 AM4/9/14
to tokyocabi...@googlegroups.com
Hi.

Mon, 7 Apr 2014 10:09:04 -0700 (PDT), petdarrow wrote:
> I'm implementing a b+tree using Tokyo Cabinet, but I'd like to know if it's
> possible to keep the values sorted.

I doubt it.

> I ask this because most of the time I only need the first 1000 records
> assuming my values are sorted. Otherwise I will have to loop over millions
> of records sort them and get the first 1000, which can be slow. Feedback is
> welcome.

I would recommend to have a second tokyocabinet db, mapping
from values to keys. I expect that with this one indirection
you still get nice performance.

Sven

petd...@yahoo.com

unread,
Apr 10, 2014, 8:30:41 AM4/10/14
to tokyocabi...@googlegroups.com
Hi Sven,

Thanks for your response. So for each database have another one that
uses the value as a key and the key as a value, then, would it be possible
instead to use a struct as a key where one field is the key and another field
has the value?

- P

Isaac Tewolde

unread,
Apr 10, 2014, 8:37:44 AM4/10/14
to tokyocabi...@googlegroups.com
Another option is just creating a compound key like "key::val".  Since b-trees allow partial key matching, the whole thing will work without issue. You probably would not even need the whole value if your values are different enough.

I use a similar hack for a document store system I put together a few years back.

Isaac

--
You received this message because you are subscribed to the Google Groups "Tokyo Cabinet Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tokyocabinet-us...@googlegroups.com.
To post to this group, send email to tokyocabi...@googlegroups.com.
Visit this group at http://groups.google.com/group/tokyocabinet-users.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages