Neither does the filesystem. You can similarly implement them for boltdb/sqlite.
Here's how I solve such problems:
First lay out the facts:
* ~1kb per entry for 1GB this results about 1e6 entries.
* Read has to be fast and Write can be slow (if I have exact stats, I would use them as well)
* Each key has to be unique
I need Reads to be fast, hence the Index is most important. I need 20bytes * 1e6 ~= 20MB for keys, I can easily and quite fast load it into RAM. Since writes can be slow, I think while adding removing from the index it is the best time to ensure uniqueness and resort/reorganize such that they are fast to read.
For data it wouldn't be great to have 1e6 files - so I should try to fit several into a single file. Which means I also need some kind of "file pointer" (filename + offset + length) for them.
Coming back to Keys, that's all I need to know how to structure them:
type Key [20]byte
type FileID byte
type Entry struct {
Key Key
File FileID // ???
Offset uint64
Length uint64
}
Now for the index I have several choices
1. type Index []Entry // and use binary/linear search
2. type Index map[hash of the key][]Entry // use map lookup
1. can be read/written to the disk very fast. I could memory map the index file into main memory.
2. provides faster (probably) lookup speed, but writing/reading the index file becomes more complicated and slower.
Regarding the Entry.File, I'm not sure whether a single or multiple file based approach would be better, but I'll leave that in just in case... 255 files is probably sufficient, and if I don't need them they can be later removed.
Now putting data into the database, this would mean first writing the value into the database, i.e. appending the data to a file and writing that information along with the Key to the Index.
With pseudo code:
AddEntry(key Key, value []byte)
1. find the appropriate file
2. acquire lock for that file
3. append key header to the file (key and the length of value)
4. append value to the file
5. flush data to disk
6. acquire lock for the index
7. add key to index
8. resort/organize index
9. release index lock
10. release file lock
For determining the file, I could use sha1(key) and use the first byte as the filename. Alternatively you could append to a file and when it reaches some limit, e.g. 32MB, pick a new file to write to. Or it always can be a single file. The sha1 would work better in a concurrent situation because you don't have multiple threads/processes trying to lock a single file. Not sure what would be the best, but something that can be tested and later changed.
Now how to delete data:
1. acquire index lock
2. delete entry from index
3. release index lock
Each deletion will leak around 1kb of disk space, so for deleting 1e5 entries it will leak 100MB. So you need to figure out when to garbage collect all the wasted disk space. To compact the data, you essentially create a new database:
How it works:
1. create new DB
2. for each key/value in the old DB
3. add key/value to the new DB
4. delete the old DB
This can of course optimized by reusing the old files and doing it in place... etc.
And there you have it... a design for a brain-dead key-value store. No B+, Radix trees needed. And it actually should perform quite well.
+ Egon