Hi
I often get requests on how to get a list of keys written into elliptics. Do not really understand
why is this really needed especially considering storage setups where billions of keys were uploaded,
but yet, this is one of the most frequently asked question.
Elliptics has secondary indexes for that purpose:
http://doc.reverbrain.com/elliptics:secondary-indexes
Indexes are automatically sharded and evenly distributed across the nodes in the group.
One can tag own uploaded keys with special indexes and then intersect those indexes on servers or
read the whole index key-by-key. That's essentially what RIFT - http elliptics frontend does
when you upload file through its HTTP interface:
http://doc.reverbrain.com/rift:rift
And I've added listing support into RIFT proxy via /list URI - it reads an index from the server,
iterates over the keys and creates a nice output json. It also prints a timestamp of the key update in the index,
both in seconds and current timezone.
URI accepts a namespace - bucket name to get indexes from and name - a placeholder for future indexes names
(if we will support multiple indexes).
$ curl "
http://example.com/list?namespace=testns&name="
{
"indexes": [
{
"id": "4e040aa8a798d04d56548d4917460f5759434fdf3ed948fd1cf35fd314cad3290e69b80deb0fc9b87a6bfbcbd08583919eb5b966658b3ed65e127236e1632525",
"key": "test1",
"timestamp": "1970-01-01 03:00:00.0",
"time_seconds": "0"
},
{
"id": "e5b7143155f46c9e9023cbf5e04be7276ae2e9a7583fee655c32aaff39755fa213468217291f0e08428a787bf282b416be1d26a5211f244fc66d1ce8ce545382",
"key": "test7",
"timestamp": "2014-02-18 03:29:44.835283",
"time_seconds": "1392679784"
}
]
}
Zero timestamp is for older indexes when timestamps were not yet supported. key is an object name given at upload time,
id is numeric elliptics ID (one can read those objects directly from elliptics without namespace name),
time_seconds is a coarse grained timeout in seconds since the Epoch. timestamp is a real parsed timestamp with microsecond resolution.
There is also an example python script which does basically the same - reads an index, unpacks it and print to console:
https://github.com/reverbrain/rift/blob/elliptics-2.25/example/listing.py