Lua does not support int64 - how to proceed?

1,184 views
Skip to first unread message

tw-bert

unread,
Aug 4, 2014, 6:09:57 PM8/4/14
to redi...@googlegroups.com
Hi group, I just found out that Lua does not support int64.

Some (actually all) keys I use are int64 values. My Lua scripts will crash as soon as the sequence values / primary keys hit the maxint32 mark.

Redis 2.8.13 on debian (LAN) and ubuntu (Cloud/Azure).
We use ZRANGEBYLEX (and other*LEX-es) extensively. We format an int64 as 18 digits (loops at 999999999999999999 to 000000000000000000) in a prefix for a sorted set member (score 0).

Sidenote: After the prefix (which contains some more data), a robust delimiter is chosen (\0\128\0 , hex \0\x80\0 , delimiter-sequence, which is invalid utf-8 data, which happens to be unambiguous in our scenario), and after that is a msgpack serialized payload (dataset with only values, no tokens, for efficiency and mem footprint).

Should I do add some pure lua int64 interpretation to all my scripts? Or ask kindly for int64 support in Redis Lua? Other ideas?

I also noticed that the built-in struct lib seems to support ints of any size and endianness:
  -> "i/In - signed/unsigned integer with size `n' (default is size of int)" , at http://redis.io/commands/eval
Could this help me in any way?

I need to compare and/or increment.

Kind regards, help appreciated, TW

Javier Guerra Giraldez

unread,
Aug 4, 2014, 6:22:25 PM8/4/14
to redi...@googlegroups.com
On Mon, Aug 4, 2014 at 5:09 PM, tw-bert <tw.be...@gmail.com> wrote:
> My Lua scripts will crash as soon as the sequence values / primary keys hit
> the maxint32 mark.


more like the 2^53 mark

--
Javier

tw-bert

unread,
Aug 5, 2014, 2:37:44 AM8/5/14
to redi...@googlegroups.com
Nope, the maxint32 mark; 2^32 -1 . Due to Lua functions like tonumber() which can't handle int64's.

Javier Guerra Giraldez

unread,
Aug 5, 2014, 7:11:56 AM8/5/14
to redi...@googlegroups.com
On Tue, Aug 5, 2014 at 1:37 AM, tw-bert <tw.be...@gmail.com> wrote:
>
> Nope, the maxint32 mark; 2^32 -1 . Due to Lua functions like tonumber() which can't handle int64's.


Lua numbers are Double by default, which are integer-accurate up to 2^53:

$: Lua

> =2^45
35184372088832
> =tonumber('35184372088832')
35184372088832
> = tonumber('35184372088832') + tonumber('35184372088832')
70368744177664


--
Javier

tw-bert

unread,
Aug 5, 2014, 9:03:06 AM8/5/14
to redi...@googlegroups.com
Hi Javier, now I get what you mean, thx.

This would get me a precision of 15 serialized digits in *lex commands (999999999999999 rolls over to 0), in my specific implementation ofcourse. Not too bad for my scenario.

Cheers, TW

Josiah Carlson

unread,
Aug 6, 2014, 12:05:45 AM8/6/14
to redi...@googlegroups.com
This is unrelated to your question, but my understanding of your description suggests that your members are of the form...

<numeric prefix><text prefix>\0\x80\0<msgpack payload>

If you are in the situation that any of your msgpack payloads are duplicated (with different numeric and/or text prefixes), you might be able to cut your memory use by storing your packed data outside of the ZSET, referenced by id, maybe as sharded hashes. I essentially do that (minus the sharded hashes) with rom [1] for prefix/suffix indexes and queries.


 - Josiah



--
You received this message because you are subscribed to the Google Groups "Redis DB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to redis-db+u...@googlegroups.com.
To post to this group, send email to redi...@googlegroups.com.
Visit this group at http://groups.google.com/group/redis-db.
For more options, visit https://groups.google.com/d/optout.

tw-bert

unread,
Aug 6, 2014, 12:30:12 AM8/6/14
to redi...@googlegroups.com
Hi Josiah, thanks for thinking along.

You're right about the basic format. It's a bit more advanced, like this:
<prefix>\0\x80\0<msgpack payload>
where prefix is:
<key1>\0<key2>\0<keyN>\0<BI-sequence>\0<param1>\0<param2>\0<param3>\0<num-keyfields>

The keys are ofcourse serialized in a lexicographically unambiguous format. String parts are always utf-8.

Next to the payload zsets, I have indexes which also use *lex.
<otherkey1>\0<otherkey2...>\0\x80\0<payload-redis-key>\0\x80\0<key1>\0<key2...>
Those indexes don't have a payload, but point to the payload members above.

A Lua script 'joins' them.

The BI-sequence is for before-imaging, to validate RDBMS transactions.
The data originates from an RDBMS, and first enters redis instance A (a BI-only instance), and after validation is transferred to redis instance B (endpoint instance). Instance B is replicated to the web. The extra indexes live primarily in instance B.

Now about your suggestion: does a ZSET member use more memory than a HSET member? I thought I had my memory usage optimized (well, optimized enough), your input is very welcome.

Cheers, TW

Josiah Carlson

unread,
Aug 6, 2014, 1:18:26 AM8/6/14
to redi...@googlegroups.com
An entry in a HASH does take less memory than a similar item in a ZSET (when both are bigger than the ziplist encoding limits), but that is unrelated.


Given your example where you have 2 types of ZSETs:

Payload ZSETs:
<key1>\0<key2>\0<keyN>\0<BI-sequence>\0<param1>\0<param2>\0<param3>\0<num-keyfields>\0\x80\0<msgpack payload>

Index ZSETs:
<otherkey1>\0<otherkey2...>\0\x80\0<payload-redis-key>\0\x80\0<key1>\0<key2...>

My possibly suggested change:

<actual payload key> -> <msgpack payload>

Payload ZSETs:
<key1>\0<key2>\0<keyN>\0<BI-sequence>\0<param1>\0<param2>\0<param3>\0<num-keyfields>\0\x80\0<actual payload key>

Index ZSETs:
<otherkey1>\0<otherkey2...>\0\x80\0<actual payload key>

Basically, take your payload and give it its own unique key (which can be sharded or otherwise). Then, instead of including the payload in your payload ZSETs, reference that unique key. And instead of referencing a series of keys in your index, also just reference the unique key. Now, if the only way to make a unique key to be used as <actual payload key> is through the use of the "<key1>\0<key2>\0<keyN>" sequence, then this doesn't gain you anything. But if you can generate concise unique keys (you can get these from your RDBMS via 64 bit serial counter), then you've just cut your memory use by average_length_of(<key1>\0<key2>\0<keyN>) * number_of_index_entries, more or less.

Deletes might be a bit funky, and depending on the size of your <msgpack payload> you may do better with...

<sharded payload key> -> {<actual payload key>: <msgpack payload>, ...}

... which is to shard your msgpack payloads into hashes based on the <actual payload key>. If you've got a copy of my book on hand, check out section 9.2.1 . If not, or if you just want to look at code: https://github.com/josiahcarlson/redis-in-action/blob/master/python/ch09_listing_source.py#L201

 - Josiah


tw-bert

unread,
Aug 6, 2014, 1:33:21 AM8/6/14
to redi...@googlegroups.com

Ah, thanks.

I've got a copy of your book, and will investigate a bit more.

However, our databases have many multi-field primary keys.

And even when single-field primary keys are involved, some have constraints that limit them to use only 4 bytes (for example). A generic redis id will cost more resources, not less, in those cases.

My gut feeling tells me that translating all that to redis-internal id's will cause unnecessary obfuscation and redis lookups, hurting performance and implementation time.

For now, I keep my structure as is. The fact that in our setup each row has a 'native' lex index (since the row payload is prefixed in a sorted set) is a big bonus. The extra indexes can always be added, even with the same keys as the native index.

Thx, TW

Josiah Carlson

unread,
Aug 6, 2014, 1:30:40 PM8/6/14
to redi...@googlegroups.com
On Tue, Aug 5, 2014 at 10:33 PM, tw-bert <tw.be...@gmail.com> wrote:

Ah, thanks.

I've got a copy of your book, and will investigate a bit more.

However, our databases have many multi-field primary keys.

And even when single-field primary keys are involved, some have constraints that limit them to use only 4 bytes (for example). A generic redis id will cost more resources, not less, in those cases.

I'm going to start with assumptions from my experience and knowledge, please feel free to hop in and correct me where my assumptions don't match reality (generally, or specifically in your case).

In my experience, composite primary keys are used to collect data/identifiers together in a way that offers what amounts to a multi-column uniqueness constraint for a row, sometimes encompassing all of the data in the row, sometimes not. Sometimes the primary key is conveniently constructed such that queries can use the implied primary-key index to help minimize secondary indexes.

But what is definitely the case is that in order to make a query against one of your indexes (stored in a ZSET and queried via ZRANGEBYLEX, according to my understanding of what you wrote earlier), you first perform a query against the index, followed by a second query against your payload. That is two O(logn) queries, but each query has relatively high constants by virtue of the long string comparisons that need to be performed during each step in the search process, never mind requiring O(logn) random memory reads per query. This is not just bad on the random memory reading side of things for latency, it fills your processor cache with unnecessary memory regions that are only important as sub-steps in the query.

On the other hand, *if* you had a single 32, 64, or even 128 bit non-composite identifier for each of your data rows, doing what I described will drop the payload queries (that are the 2nd step after an index query) from O(logn) string comparisons and random memory reads to O(1) string comparisons and random memory reads when you have the identifier. It would reduce actual Redis computation/comparison times for a query involving one of your indexes by roughly a factor of 2. Perhaps not in overall query times (which are likely mostly dominated by network latency), but in reduced CPU and memory bandwidth (which can increase network throughput, reduce latency, ...).

Note that if you never use your payload zset for index-like queries, only using it for data lookup by full primary key, you could switch to using a HASH for your payload instead of a ZSET, mapping the primary key to the payload and get the performance benefit with minor modifications to commands used, though memory use would be comparable to your existing solution.

Assuming that you perform queries against your payload ZSET that isn't just "get this data by this primary key", those queries would have reduced performance using the scheme I describe. But I don't know how often those queries are performed outside of an index + pkey combination query, so I can't make that determination for you. On a resource-use perspective, theoretically you would minimize resource utilization (CPU and memory on queries) using your existing method when (#index+pkey composite queries) * log(n) < (# of pkey-only queries). When that relationship does not hold, it would be minimized using the change I described.


Now... whether or not the change I proposed is worthwhile from an engineering perspective, especially if 64 or 128 bit sequentially generated identifiers are not available trivially (passing data through a Lua script might offer the functionality and performance necessary) is a question I cannot answer. Hell, you've already got a system that works using the methods you've already outlined, and that's a solid reason why you should ignore my ramblings about changes to your system. But the next person who streams data updates from an RDBMS to explicit index rows in Redis should pay attention to this thread ;)

 
My gut feeling tells me that translating all that to redis-internal id's will cause unnecessary obfuscation and redis lookups, hurting performance and implementation time.

It will definitely hurt implementation time. Whether it affects performance overall will depend on your query mix, but it has a pretty good chance of reducing memory use for your indexes. Probably not a factor of 2 there, but that's a function of how big your pkeys are relative to a (shorter) unique identifier.

For now, I keep my structure as is. The fact that in our setup each row has a 'native' lex index (since the row payload is prefixed in a sorted set) is a big bonus. The extra indexes can always be added, even with the same keys as the native index.

You're building it, your choice. You've always got fun problems :)

 - Josiah

tw-bert

unread,
Aug 6, 2014, 2:26:31 PM8/6/14
to redi...@googlegroups.com
Hi Josiah, thanks again.

Your suggestion is definitely a consideration. But since we don't know yet how it will compare to the original design, and really don't know if it will be an improvement, I'll postpone any decision about this alternate design to a later stage.

We do a *lot* of 'true' range queries, where for example key1+key2 = AA\x001234 to AA\x009999 . In my current design, that's just too easy: one zrangebylex call, using the prefix, and the payload is returned for all records. In Lua it's trivial to filter some data dynamically, and/or return only the payload part.
Then again, you have a point about a HSET (or simple string) lookup being quicker and neater if no ranges are involved.

We'll just have to see. Possibly, in a couple of months, I'll implement both designs - power to the user.

Loved your remark about the processor cache btw, very on-topic.

> You've always got fun problems :)

Well, thanks :)

Cheers, TW

Itamar Haber

unread,
Aug 6, 2014, 5:42:04 PM8/6/14
to redi...@googlegroups.com
>> You've always got fun problems :)

> Well, thanks :)

Go get a room somewhere... but keep this discussion going online :)

Itamar Haber | Chief Developers Advocate
Redis Labs - Enterprise-Class Redis for Developers

Mobile: +1 (415) 688 2443
Mobile (IL): +972 (54) 567 9692
Email: ita...@redislabs.com
Skype: itamar.haber

Blog  |  Twitter  |  LinkedIn


Juarez Bochi

unread,
Aug 7, 2014, 7:41:00 AM8/7/14
to redi...@googlegroups.com
Hi,

Wouldn't it be interesting to add a bigint library like bc?

It could be used like this:

    test {Bigint support with bc} {
        r eval {
            redis.call("set", "key", "12039611435714932082")
            local number = bc.number(redis.call("get", "key"))
            number = number * 2
            return number:tostring()
        } 0
    } {24079222871429864164}

I pushed the changes needed to https://github.com/jbochi/redis/compare/lbc. Unfortunately the library is not properly freeing all memory on exit, so the tests are failing.
Reply all
Reply to author
Forward
0 new messages