Multiple ways of retrieving a hash

34 views
Skip to first unread message

Steve Mustafa

unread,
Jul 18, 2015, 1:14:08 PM7/18/15
to redi...@googlegroups.com
I'm new to Redis and I'm going through (so far excellent "Redis in Action") and I'm trying to implement a small fun project to really understand Redis.

Whilst attempting to create a URL shortner, I'm implementing a hash as follows:

    shortenedURL:{SHA-256 hash of the shorturl}   |  Hash
    ================================================
    Original_URL            :   [proto:]www.google.com
    date_added              :   {TIME}



My problem here is that the look up for the generated short url is relatively easy, but reversing this to grab a list of all the original urls becomes a bit more interesting.
I searched on the forum and I found this (not sure if links to posts work ) but I cannot (yet) determine if this is the answer I am looking for from a Redis perspective.

Can someone lend a hand please?

Itamar Haber

unread,
Jul 18, 2015, 1:49:09 PM7/18/15
to redi...@googlegroups.com
Hi Steve - welcome to the mailing list and to Redis :)

Josiah covers the topic of searching quite extensively in his (excellent until the very end) book in Chapter 7: Search-based applications. I don't want to spoil your learning pleasure, but the post you linked to (yes, links still work) describes some of the ideas in it. 

What you probably want to avoid is scanning the keyspace to obtain all the URLs. This sort of full scan can be done with the `SCAN` command (or the **evil** `KEYS` command <- do not use it in production)

Grabbing a list of all the original URL can definitely be solved by maintaining a index that maps values to keys like in that post. But given the nature of the application (low index cardinality), you could just store all of the original URLs in a Set and `SSCAN` it to fetch everything.

Even better - instead of just a Set, you could use a Sorted Set to store the Original URLs as members and the score to keep the epoch value of the date_added. Then use `ZSCAN` for all, fetch URLs by ranges of dates and so forth. Note that the Sorted Set will contain the same data as your shortened Hash, so unless you have additional properties you can use a String for the short->original mapping. That said, every shortURL will require a key of its own (as you current design also does), so you may want to group them in Hashes to optimize memory consumption (see Using hashes to abstract a very memory efficient plain key-value store on top of Redis). Although this will require the use of two calls to hashing functions, this is generally an acceptable tradeoff as hashing functions are relatively inexpensive. Essentially you'll end up with something like the following "schema":

shortURLs:<crc32(original_URL)> | Hash
  <sha256(original_URL)> -> Original_URL

originalURLs | Sorted Set
  <date_added as epoch> Original_URL

--
You received this message because you are subscribed to the Google Groups "Redis DB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to redis-db+u...@googlegroups.com.
To post to this group, send email to redi...@googlegroups.com.
Visit this group at http://groups.google.com/group/redis-db.
For more options, visit https://groups.google.com/d/optout.



--

Itamar Haber | Chief Developers Advocate
Redis Watch Newsletter - Curator and Janitor
Redis Labs - Enterprise-Class Redis for Developers

Mobile: +1 (415) 688 2443
Mobile (IL): +972 (54) 567 9692
Email: ita...@redislabs.com
Twitter: @itamarhaber
Skype: itamar.haber

Blog  |  Twitter  |  LinkedIn


Steve Mustafa

unread,
Jul 18, 2015, 7:10:13 PM7/18/15
to redi...@googlegroups.com
Toda Itamar :)

The only reason I said so far was because I'm still not done with the book :-)  But it bodes quiteI well...

OK, the original reason why I chose a hash rather than a set is because I was planning on testing this by creating a few million URLs (yes, I'd like to test this properly :-) ).  I have now read that the score acts in a similar fashion to a key in a KV datastructure (like a hash).  Is it safe to assume that it follows O(1)?  For that matter, what makes it a better a datastructure to use in this case as opposed to a hash?  (I'm not being purposely difficult, I am trying to understand this)

I'll be trying the structure you suggested, thank you for that.

Steve Mustafa

unread,
Jul 18, 2015, 7:24:29 PM7/18/15
to redi...@googlegroups.com
I read through that one more time, I think I misunderstood it the first time around.


I understand this to mean we have *two* datastructures, the first for a forward lookup and the other for a backwards lookup (I call [tm] on the names).
Reply all
Reply to author
Forward
0 new messages