From a theoretical perspective (I'm a theorist myself), I like
interval sets. I've even needed them in Redis before. What I ended up
doing was actually to implement them using a pair zsets. More
specifically, you have a zset that defines left endpoints, and a zset
that defines right endpoints. To perform the query, you first copy
both zsets, then you remove any entries in the left zset with scores >
your stabbing point, and remove any entries in the right zset with
scores < your stabbing point. You can then intersect the sets to find
those items that match.
For relatively small sets (on the order of tens of thousands of
entries), this works really well. When the zsets get much bigger, it
doesn't work so well, but assuming that your intervals are relatively
sparse over your entire field, you can actually pre-shard your
start/end values (I can explain how if people are curious).
> We are personally going to be using Interval Sets to store advertised CIDR
> network blocks with the ability to find all advertised CIDRs that contain a
> specific IP address. However, we feel that Interval Sets have a number of
> additional uses. For example, one could store sessions as intervals with the
> start/end times as the start/end points of the interval, and be able to
> quickly find all sessions at any given point in history. Stabbing queries on
> intervals are also heavily used in geometric inclusion algorithms, and as
> such, this data structure could have use in geospatial and image analysis
> applications.
I've done similar things for the geometry and IP address stuff before
(even advised people how to do it).
> We have a basic command set implemented and running against the "unstable"
> branch at https://github.com/hoxworth/redis/tree/intervals. The full command
> set that we would like to support can be found
> at https://github.com/hoxworth/redis/wiki/Interval-Commands; currently we
> only support the IADD, IREM, and ISTAB commands.
>
> The backend implementation is an AVL tree augmented to support interval
> queries, alongside a redis dict to support quick removal operations. We
> still have a lot of work to do to optimize the code, knock down the memory
> overhead per record, and integrate better into the Redis ecosystem, but at
> this point we are willing to open up to the greater Redis world.
On the one hand, I've personally been hoping someone would implement a
plain binary tree in Redis (instead of the existing skiplist), just
for the sake of memory efficiency for a while, but I suspect that you
may have better luck with *some* sort of inclusion if you managed to
replace the skiplist entirely (fewer structures and better memory use
structures are always a plus). Aside from the score part of your tree,
it does look pretty flexible, and you may only be able to trim it
slightly (ignoring balance information, you still need a
left,right,parent,next,prev pointers, along with key,value
pointers/values).
With a little work, you could probably do away with the parent
pointer, leaving 6 attributes on a node at minimum. I've got a
structure in mind that self-balances and does 6 attributes per node...
> We will be maintaining this branch as well as creating a 2.6-specific
> branch, as we will be integrating this internally, and would be more than
> happy to hear thoughts on how to make this data structure more useful to a
> wider audience. We of course also want to know if there is any interest in
> including this in core, eventually.
You'll have to wait for Salvatore to weigh in on that. Personally I'm
neutral: while I can see the use for it, I have worked around it not
being there.
Regards,
- Josiah
Linux is very friendly it is just picky who its friends are
To view this discussion on the web visit https://groups.google.com/d/msg/redis-db/-/ywA-r_6umHwJ.--
You received this message because you are subscribed to the Google Groups "Redis DB" group.
To post to this group, send email to redi...@googlegroups.com.
To unsubscribe from this group, send email to redis-db+u...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/redis-db?hl=en.
Hi Dvir,>> and again I'm returning to my long time dream of an easy framework to enhance redis with new data structures and commands in a plugin fashion.I would also favor the plugin approach, but this is not so trivial to definea generic plugin interface. On top of registering new commands to handlethe new datatypes, the plugin should also supports:
....
... and I probably forget a couple of more problems.
>> how about a plain vector first?Regarding the plain vector datatype, why not using strings?Strings can store and random access fixed sized items, supports appending with geometric growth, slicing, etc ...It looks like the definition of a vector a la C++ to me ...
Regards,Didier.
--
You received this message because you are subscribed to the Google Groups "Redis DB" group.
To view this discussion on the web visit https://groups.google.com/d/msg/redis-db/-/l0g_6VbU56EJ.
What I ended up doing was actually to implement them using a pair zsets. More specifically, you have a zset that defines left endpoints, and a zset that defines right endpoints. To perform the query, you first copy both zsets, then you remove any entries in the left zset with scores > your stabbing point, and remove any entries in the right zset with scores < your stabbing point. You can then intersect the sets to find those items that match.
On the one hand, I've personally been hoping someone would implement a plain binary tree in Redis (instead of the existing skiplist), just for the sake of memory efficiency for a while, but I suspect that you may have better luck with *some* sort of inclusion if you managed to replace the skiplist entirely (fewer structures and better memory use structures are always a plus). Aside from the score part of your tree, it does look pretty flexible, and you may only be able to trim it slightly (ignoring balance information, you still need a left,right,parent,next,prev pointers, along with key,value pointers/values).
You'll have to wait for Salvatore to weigh in on that. Personally I'm
neutral: while I can see the use for it, I have worked around it not
being there.
it is possible to implement stabbing queries with the following trick using standard Redis data structures:This structure is not so easy to generate and maintain though.
I think interval sets are worthy of serious consideration.
I would like to see support for multiple dimensions. Given how
common geolocation data is, it seems that supporting two, if not
three, dimensions nativity would be highly desirable.
-j
--
Jay A. Kreibich < J A Y @ K R E I B I.C H >
"Intelligence is like underwear: it is important that you have it,
but showing it to the wrong people has the tendency to make them
feel uncomfortable." -- Angela Johnson
I think interval sets are worthy of serious consideration.
I would like to see support for multiple dimensions. Given how
common geolocation data is, it seems that supporting two, if not
three, dimensions nativity would be highly desirable.
SQLite has a pretty nice R-Tree module that provides up to 5 (!)
dimensions. http://www.sqlite.org/rtree.html
In their case, both records and queries are all range based. To
store/query points, you just set the min/max values to the same.
This keeps the API structure fairly simple, since you're always
providing pairs of numbers.
In their case, I believe they hard-code the five dimensions and just
zero-out any that are not used. I'm not sure that would be
appropriate for Redis, however, given the more memory & storage
conscious environment.
The proper k-dimensional structure for varying 'k' is a kd-tree. ;)
Regards,
- Josiah
sorry for the delay in the reply... sometimes it happens that I forgot
to star or put a thread in my TODO list and I don't see it again if it
is not reopened, so thanks for writing this message.
I just read the whole thread and it's very interesting what you are
doing with Redis. The operation you are proposing is also useful in
many contexts, but in my opinion it is not something we should
currently include into Redis, for purely design concerns, not because
your work is not valid (I think it is).
So far Redis only includes pretty straightforward data structures that
don't really reflect a specific problem. Finding all the ranges that
cross a given point is in my opinion a bit too specific for what we
have in the rest of the API. Also as exposed here it is possible, even
if with some limitation, to use the current data structures to model
this problem, and maybe using 2.6 scripting this is going to be
simpler and more viable?
So at the moment my feeling is that the additional complexity is not
justified by the gain.
But this opens another chapter, that is the one about people that want
to extend Redis with in-house features.
Because the fact that adding this feature does not make sense for
Redis (in my opinion at least) does not mean that it does not make
sense in a context where this operation is a core operation for for
kind of service...
How people can extend Redis implementing their stuff currently? Just forking.
The alternative would be a plug-in system, but I'm a bit skeptic about
it... because we care a lot about stability. A plugin system would
allow the proliferation of a lot of code that is not stable enough, or
not useful enough... but that maybe many people would start to use.
And we have scripting that can solve a lot problems without this issues...
But still, with the current Redis code base it may be non trivial to
take a new data structure synched with the new developments in a
private branch.
So IMHO what we should do in the long term, as a post Redis 3.0
project, is to perform an internal refactoring so that every type will
be implemented in a way that provides methods to perform AOF / RDB
serialization, introspection, and so forth.
Something like:
obj = lookupKey("bar");
obj->writeRDB(rio_stream);
obj->writeAOF(rio_stream);
... and so forth.
Once this is done in this way (but this will require time and a good
design) we may still not expose it as an API, but as a side effect
taking a fork of Redis with this internal structure will be much
simpler, as it is already simple to take a fork that just implements
new commands on top of current data structures.
Cheers,
Salvatore
> --
> You received this message because you are subscribed to the Google Groups
> "Redis DB" group.
> To view this discussion on the web visit
> https://groups.google.com/d/msg/redis-db/-/P8yL1K35NbwJ.
>
> To post to this group, send email to redi...@googlegroups.com.
> To unsubscribe from this group, send email to
> redis-db+u...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/redis-db?hl=en.
--
Salvatore 'antirez' Sanfilippo
open source developer - VMware
http://invece.org
"We are what we repeatedly do. Excellence, therefore, is not an act,
but a habit." -- Aristotele
p.s. that would be more like:
obj->type->writeRDB(obj, rio_stream, ...);
Hello Bill,sorry for the delay in the reply...
So far Redis only includes pretty straightforward data structures that
don't really reflect a specific problem. Finding all the ranges that
cross a given point is in my opinion a bit too specific for what we
have in the rest of the API. Also as exposed here it is possible, even
if with some limitation, to use the current data structures to model
this problem, and maybe using 2.6 scripting this is going to be
simpler and more viable?
So at the moment my feeling is that the additional complexity is not
justified by the gain.
But this opens another chapter, that is the one about people that want
to extend Redis with in-house features.
Because the fact that adding this feature does not make sense for
Redis (in my opinion at least) does not mean that it does not make
sense in a context where this operation is a core operation for for
kind of service...How people can extend Redis implementing their stuff currently? Just forking.
The alternative would be a plug-in system, but I'm a bit skeptic about
it... because we care a lot about stability. A plugin system would
allow the proliferation of a lot of code that is not stable enough, or
not useful enough... but that maybe many people would start to use.
And we have scripting that can solve a lot problems without this issues...But still, with the current Redis code base it may be non trivial to
take a new data structure synched with the new developments in a
private branch.So IMHO what we should do in the long term, as a post Redis 3.0
project, is to perform an internal refactoring so that every type will
be implemented in a way that provides methods to perform AOF / RDB
serialization, introspection, and so forth.Something like:
obj = lookupKey("bar");
obj->writeRDB(rio_stream);
obj->writeAOF(rio_stream);... and so forth.
> How people can extend Redis implementing their stuff currently? Just forking.
> The alternative would be a plug-in system, but I'm a bit skeptic about
> it... because we care a lot about stability. A plugin system would
> allow the proliferation of a lot of code that is not stable enough, or
> not useful enough... but that maybe many people would start to use.
I feel a plug-in system would provide great opportunity to expand
Redis's role as a "data structure server." It provides a powerful
way to balance keeping the core product thin and simple, while
providing the flexibility to expand Redis to address more specific,
higher-level tasks.
While I understand your concerns about stability and poor code, I
feel the rewards are much greater than the risks.
A few thoughts on this, as I've seen projects go through this before...
As Salvatore knows, I've been heavily involved** in the SQLite
community for several years. I've watched the product evolve and
grow, with many of the same concerns. "Does this belong in the
core?" is a constant question being asked. SQLite is supposed to be
"lite", after all.
** http://shop.oreilly.com/product/9780596521196.do
Most people agree that SQLite is the most popular database in the
world, in terms of database instances. Given the heavy use desktop
OSes, all major brands of smart phone, and popular applications,
there are hundreds of millions of SQLite databases out there.
As such, the SQLite team takes testing and reliability extremely
seriously. They need to. SQLite is the core data management
system for many popular applications and platforms. It needs to
be rock solid. SQLite's main test suite takes about a day and a
half to run on modern compute hardware, and provides full branch
coverage of the shipped source.
SQLite also contains a "plugin" extension system. Several, in fact.
Most extensions are somewhat "shallow", such as user-defined SQL
functions. They're very straight forward, highly localized systems.
Generally, SQLite takes a pretty pessimistic view of this type of
extension code, and-- short of an all-out crash-- will do its best
to recover from most programming errors. In many ways, this similar
to the current Lua extensions for Redis. Limited, localized
flexibility without serious exposure.
The most complex type of SQLite plugin is the "virtual table"
extension. These APIs allow a developer to define a set of some two
dozen functions, such as getColumn(), nextRow(), etc., that define
all the lookup and I/O operations on a table. The code can then
do whatever it wants to implement those functions.
Many virtual tables use a series normal SQLite tables as "shadow
tables" to implement specific types of indexes or fuzzy lookups.
Other virtual tables link SQLite to external data sources. In fact,
I've written a SQLite virtual table extension that allows one to
access any value in an Redis instance via SQL commands. Virtual
tables are somewhat analogous to a Redis plugin system that might
support new fundamental data types, such as the Interval Set.
Unlike the other APIs, the virtual table interface is extremely
low-level. There is almost no error checking on the part of the
SQLite library. In short, it defines a clear API and expects the
programmer to get it right. If the extension programmer screws up,
the database crashes. Data loss is highly unlikely, but by writing
a virtual table extension, the programmer is taking full responsibility
for the stability of the whole system. If that sounds a bit scary,
that's because it is.
All I can really say is: it works.
The SQLite community hasn't really suffered from stability issues.
First, the system has brought enormous flexibly and adaptability to
the SQLite product. SQLite's Full Text Search engine was developed
as a virtual table. The multi-dimension R-Tree system, mentioned
previously in this thread, is also a virtual table extension.
Both of these extensions use a set of "normal" database tables to
implement highly specialized indexes. Yes, this could be done by the
application, but packaging it up into a cleaner interface (at the SQL
level) makes everything much, much more accessible, and much
cleaner.
Both of these extensions also ship as part of the core source code,
but are disabled by default. The extensions are only a #def way if
you need them, but if not, you can save a few thousand lines of
source and avoid bloat on those systems that need to run lean.
Both of these extensions also started as third-party extension, but
were eventually merged into the core product and are now maintained
by the SQLite team. The FTS module, in particular, has opened many
doors for SQLite. The higher-level functionality is there if you
need it, but can be compiled out if you don't.
As for stability, the community really hasn't seen any issues. I
can't say exactly why this is, but I have some suspicions.
First, virtual tables are somewhat complex. Virtual table extensions
require a number of functions that must all work together. They're
very low-level, and require some knowledge of dynamic object files
just to compile. In short, most of the people writing virtual
tables have at least some clue. No, they're not all perfect, but
they're usually written by people that at least understand the scope
of the problem, and do their best to write appropriate code.
Second, there isn't a lot of sharing of virtual tables. Outside of
the project-provided extensions, very few people share code. In most
cases, virtual tables are just too specific to the problem at hand.
It isn't so much that people keep their virtual tables secret.
Rather, the extensions are usually so environment/problem-specific
that there just isn't much value in sharing.
Lastly, in the cases when extensions are shared, the people deploying
them typically have a clue. Integrating an extension into your own
environment takes some understanding of how they work. Very often
a shared extension will be modified to more closely match the specific
needs of the new environment. This brings the developers closer to
the code, and allows them to make a good decision about trusting the
code (or not).
If Redis were to employ a plugin/extension system for new data types,
I think we would see a similar pattern. A few core data types, such
as interval sets, would be implemented and/or maintained by the core
team. These would typically be specialized types that are hugely
useful, but only to a smaller group of users.
Most other custom data types would be implemented by the site
deploying them. Since the whole point of code like this is to
produce highly specialized data types, I would anticipate only
moderate code sharing. Even with sharing, I would expect most sites
to modify the code to suite their own specific needs.
Even if stability in some shared modules was lacking, I don't think
it would be a major issue for the community. Those that are smart
enough to build and deploy such systems, are also usually smart
enough to know where to point fingers if something goes wrong.
--
You received this message because you are subscribed to the Google Groups "Redis DB" group.
> So IMHO what we should do in the long term, as a post Redis 3.0
> project, is to perform an internal refactoring so that every type will
> be implemented in a way that provides methods to perform AOF / RDB
> serialization, introspection, and so forth.
>
> Something like:
>
> obj = lookupKey("bar");
> obj->writeRDB(rio_stream);
> obj->writeAOF(rio_stream);
>
> ... and so forth.
In my other message I mentioned why I thought the theory and concept
of Redis plugins was very similar to SQLite virtual tables, Here are
some more nitty-gritty details. Obviusly the specific functions and
such aren't the same, and don't map. The overall idea is smilar
enough that there might be some lessons in looking at the actual
extension APIs, however.
In the case of SQLite extensions, the extension code is typically
compiled into dynamic objects and linked at runtime, much like a web
server module. Extensions can also be statically built directly
into the server.
C API call to register a virtual table extension:
http://www.sqlite.org/c3ref/create_module.html
The data structure used to define a virtual table extension:
http://www.sqlite.org/c3ref/module.html
A longer doc on what all those functions need to do:
http://www.sqlite.org/vtab.html
> So far Redis only includes pretty straightforward data structures that
> don't really reflect a specific problem. Finding all the ranges that
> cross a given point is in my opinion a bit too specific for what we
> have in the rest of the API. Also as exposed here it is possible, even
> if with some limitation, to use the current data structures to model
> this problem, and maybe using 2.6 scripting this is going to be
> simpler and more viable?
I had one other thought. Last one, I promise (for tonight, anyways).
This might bring some of the flexibility of an extension system
without the need for any significant refactoring.
Many SQLite virtual table extensions are what I call "internal"
extensions. These extensions typically use a set of "normal"
database tables to provide a specialized index, or similar
higher-level functionality over the normal database operation.
These systems often use "shadow tables" that hide behind the
presented table. For example, if you create an FTS virtual table
"documents" the extension might a "documents_fts_index" table,
along with several others. When you query the "documents" table,
the extension code gathers data from the shadow tables in order
to answer the query quickly and efficiently. Some very useful
things can be done this way.
In the Redis world, this might be an extension that produces a
virtual data structure (i.e. a new key type) which is implemented
on top of the existing data structures in "shadow keys". By
combining several standard data structures, one might be able to
mimic the functionality of more complex data structures (such as
the interval set) using one or more of the existing structures.
To do this in Redis, it might be possible to allow the server
configuration to load a set of Lua files on startup. These files
could register a set of functions that would then map to Redis
command names. To keep the namespace clean, perhaps these commands
might require a prefix, such as "_<function>" or "EXT.<function>",
or even "EXT.<extension_name>.<function>".
At the most basic level, we're really just offering a set of stored
procedures, defined server-side, that are exposed as new Redis
commands. A group of commands could be combine to offer larger
functionality, however.
If we were to use such a system to implement interval sets, we might
have a Lua script that defines and exports iadd(), irem(), and
istab(). These functions would then be exposed to the Redis clients
as the commands EXT.IADD (or _IADD or EXT.IS.IADD or whatever),
EXT.IREM, and EXT.ISTAB. Assuming the Lua script was capable of
mimicking the interval set data needs (or any other structure you
might want) using core Redis data structures, you'd be good to go.
Of course, there are limits to what one can do without adding a brand
new data types. Using Lua also means this might not be an extremely
high performance way of extending Redis-- although the use of Lua
limits stability concerns. It is, however, moderately flexible,
and should be reasonably easy to implement without major refactoring
of the existing structures.