The real world

29 views
Skip to first unread message

Watcher

unread,
Jul 15, 2010, 9:01:20 AM7/15/10
to kumofs
Hi,
I've just discovered about this project and I'm really interested
because it seems to fit my needs pretty closely.

Before I start testing, I'd like to ask a couple extra questions
first :
1) does every GET or CAS result in a disk seek+read or is there a
cache layer (either in kumofs or TC) ?
2) what's the biggest production site in existence : how many nodes,
how many replicas, how much data ?
3) would kumofs be a good match to store millions of small chunks of
data (<1.5 kB, for example map tiles), how well does TC work with
files that range in the tens of Gigabytes ?
4) what's the best way to have multiple TC files : have multiple
kumofs running in parallel or can you add that as a feature ? Sort of
a 'keyspace' kind of thing : in my case, I have multiple maps and I'd
rather store the tiles in different TC files just to be on the safe
side
5) is there a way to set the number of replicas ? what is the default
value ? Is there a garantee that each replica will be on a different
server (unlike Riak) ?
6) what file system do you recommend for running kumofs & TC ? Would
SSD improve performance ?
7) would it be possible for you to implement a key-range query ?

Thanks!

FURUHASHI Sadayuki

unread,
Jul 16, 2010, 10:53:19 AM7/16/10
to kum...@googlegroups.com, kumo...@googlegroups.com
Hi Watcher,

> 1) does every GET or CAS result in a disk seek+read or is there a
> cache layer (either in kumofs or TC) ?

No, TC utilizes kernel's buffer cache for reducing disk seekings.
No seeks occurs when the size of the database file is smaller than available amount of memory.
Otherwise, many seeks will occur. It's said that TC is optimized for small/active data instead of big/archive data.

> 2) what's the biggest production site in existence : how many nodes,
> how many replicas, how much data ?

The biggest site I know whose details is Ficia, a photo sharing service in Japan.
It uses 5 nodes and stores over 2 million keys. Number of replica is 3. I mean, there are over 2*3=6 million items on the servers.

While I don't know the details, Nico Nico doga also uses kumofs. It's a movie sharing service and has much larger number of uses.
Wikipedia says: As of March 31, 2010, Nico Nico Douga has over 16,700,000 registered users, 5,060,000 mobile users and 772,000 premium users.

> 3) would kumofs be a good match to store millions of small chunks of
> data (<1.5 kB, for example map tiles), how well does TC work with
> files that range in the tens of Gigabytes ?

Kumofs (especially TC, all disk I/O is handled by TC) is optimized for small keys.
TC itself works fine with tens of Gigabytes, but tuning and enough amount of memory are indispensable.
Because kumofs and TC don't have rich mechanisms (like append-only data structure) to reduce seekings, its performance will be worse than other DBs, excepting that items are hold in cache or stored on SSD.

> 4) what's the best way to have multiple TC files : have multiple
> kumofs running in parallel or can you add that as a feature ? Sort of
> a 'keyspace' kind of thing : in my case, I have multiple maps and I'd
> rather store the tiles in different TC files just to be on the safe
> side

Hmm.... multiple kumofs instances are needed.
And I don't have plans to add such feature for kumofs itself, because it will give big impacts on the design of kumofs.
Key-prefix is used on Ficia to separate namespaces, but it won't separate TC files...

> 5) is there a way to set the number of replicas ? what is the default
> value ? Is there a garantee that each replica will be on a different
> server (unlike Riak) ?

No, number of replicas is fixed as 3 (one original and two replicas).
It's garanteed that each replica is stored on different servers. At least 3 servers are requred, of course.
(I knew it for the first time Riak desn't garantee that. It seems awkward...)

> 6) what file system do you recommend for running kumofs & TC ? Would
> SSD improve performance ?

I don't know which filesystem are good (EXT4?), but SSD probably improves performance drastically.

> 7) would it be possible for you to implement a key-range query ?

I know it's useful but, it's very difficult, because design of the storage module assumes Hash database for backend storage.
I'm afraid the module is low-layered and other modules like replication and dynamic-rebalancing are bound with current API set.
# I'd like to make it possible on next-generation system.

Thanks,
Sadayuki

--
FURUHASHI Sadayuki

Reply all
Reply to author
Forward
0 new messages