Redis: Relations in a NoSQL world: Using Hashes

Posted: March 24th, 2010 | Author: | Filed under: Database, NoSQL, programming, Python, Redis | 9 Comments »

So just yesterday we posted a tutorial on how to use redis to store relational despite relations not being supported. Soon after we published the documentation on the new redis hash type went online. Now hashes by themselves aren’t exactly relations but, more so an object field store. Extending the same concepts from our first article in namespace utilization and using hashes we can accomplish the same thing in a more formal fashion.

We will repeat the same exercise from the first article, creating a username password store, using hashes.

Basic hash overview

Without going into the technical details we can simplify the concept of redis’s as a way to store fields in a redis key. In Pythonic terms we can make a redis key into a basic Python dictionary.

[key] : {‘field’ -> ‘value’, ‘field’ -> ‘value’, ‘field’ -> ‘value’}

Basic hash usage

>>> r.hset("user:adam", "fullname", "Adam Smith")
1
>>> r.hset("user:adam", "password", "thisisapassword")
1
>>> r.hkeys("user:adam")
['fullname', 'password']
>>> r.hvals("user:adam")
['Adam Smith', 'thisisapassword']
>>> r.hgetall("user:adam")
{'fullname': 'Adam Smith', 'password': 'thisisapassword'}

We will break it down line by line here

  1. First we are going to make the redis key “user:adam” a hash and set the field “fullname” in that has to the value “Adam Smith” with the redis command hset (Hash Set)
  2. Then we’ll do the same thing for the field “password” and set that field in the hash to it’s appropriate value.
  3. With the hkeys command we can see all of the keys in the hash set on that redis key
  4. hvals returns all of the values in the redis key
  5. More useful, is the hgetall command. This will return a Python dictionary of all of the fields set in the key

As you can see, this is an excellent way to store information about an object without “faking” a relation like in our previous tutorial.

As I mentioned in the conclusion of the last article, if we want to change how we store information in redis all we should have to do is to change is change the inner workings of the functions add_user, authenticate_user, delete_user and the rest of our fictitious application should operate without any changes.

Creating a new user

r = redis.Redis("localhost")
from hashlib import md5

def add_user(username, fullname, password):
    if r.sadd("users", username):
        #r.set("user:%s:fullname" % username, fullname)
        #r.set("user:%s:password" % username, md5(password).hexdigest() )
        r.hset("user:%s" % username, "fullname", fullname)
        r.hset("user:%s" % username, "password", md5(password).hexdigest())
        return True
    else:
        return False

I left the original code in the function commented out so we can see the differences in the two methods here. In our original method we used the name space of redis itself to store the reference. In the updated fashion we are using redis’s hash data type to store the related fields.

>>> add_user("adam", "Adam Smith", "wealthofnations")
True
>>> add_user("adam", "Adam Smith", "wealthofnations")
False

This function operates in the same fashion as our old version did but, is using a different data structure in the backend.

Logging a user in

Now we’ll refactor our old authentication code to work with the new backend.

def authenticate_user(username, password):
    #if username in r.smembers("users"):
    if r.sismember("users", username"):
        passhash = md5(password).hexdigest()
        #if passhash == r.get("user:%s:password" % username):
        if passhash == r.hget("user:%s" % username, "password"):
            return True
        else:
            return False
    else:
        return False

Just as in the add_user function, we just need to make a small change. Now instead of fetching the related redis key we simply fetch the field from the hash of “user:*username*”.

>>> authenticate_user("adam", "wealthofnations")
True
>>> authenticate_user("adam", "bad_password")
False

We can see the function again performs just like the old function but, it pulls the password from the hash object instead of the related key

Deleting a user

Deleting a user using the hash store is much easier than using the related key store, since everything is just stored in one key instead of being spread out across multiple keys.

def delete_user(username):
    #if username in r.smembers("users"):
    if r.sismembers("users", username):
        r.srem("users", username)
        r.delete("user:%s" % username)
        #r.delete("user:%s:fullname" % username)
        #r.delete("user:%s:password" % username)
        return True
    else:
        return False
<pre>

Now instead of having to delete the 2 related keys, we just need to delete the one key storing the hash of all of the user data


9 Comments on “Redis: Relations in a NoSQL world: Using Hashes”

  1. 1 Brian Nesbitt said at 7:13 pm on March 24th, 2010:

    First off, great post – I enjoy reading seeing how others are experimenting/using Redis and modeling data in nosql stores.

    1) The users set isn’t “really” necessary here, but I assume you are just using it to prevent duplicate usernames from being created in the rare race condition. Maybe you want to add a quick explanation for your readers why it is being used and your not just checking for the existance of the hash key to determine if the username is in use.

    2) I know you are just providing this as a quick proof of concept with a small dataset, but as your dataset grows doing a full “r.smembers” I believe will actually download all the values in the “users” Set. I think a better approach would be to use the SISMEMBER command.
    http://code.google.com/p/redis/wiki/SismemberCommand

    Something like: if r.sismember(“users”, username):

    3) I don’t know python syntax but in your add_user code snippet line 8 I think your missing your string replacement for the hash key “% username”.

  2. 2 Adam said at 7:31 pm on March 24th, 2010:

    Hey Brian,

    Thanks for the comments.

    In using the ‘set’ as opposed to the HEXISTS command, that’s mostly just a carry over from the previous example.

    I was using the sets mostly to maintain a consistency from the previous example to compare the difference in implementation in the storage of passwords between related keys and the newer hash functionality

    As for 2 and 3… Yep, good catches, those are bugs. I’ve corrected both this post and the previous post to reflect them. Thanks for the catch.

  3. 3 Brian Nesbitt said at 8:18 pm on March 24th, 2010:

    Just using HEXISTS creates a race condition when creating users. Using the Set will actually prevent that since the SADD is atomic and returns a success result!

    There is another “smembers” call when deleting a user that should be “sismember” :-)

  4. 4 Adam said at 6:12 am on March 25th, 2010:

    Got it… I’ve been trying the getting up at 5am thing recently and my night time brain isn’t as good as my morning brain now :-)

    This was written at night O:-)

  5. 5 Sam said at 7:29 pm on March 31st, 2010:

    Without the user set, you would have to do a keys() to find out who all the users are and perform an operation on each. Once you have a lot of users, your keys() query is going to take ages (relatively speaking).

  6. 6 nick said at 3:24 pm on August 27th, 2011:

    i agree of the keys(). you should avoid it at all costs. i noticed cases when keys() return prompt after more than 10 sec…

  7. 7 Adrian Nye said at 6:02 pm on January 2nd, 2012:

    Is there anywhere than explains the tradeoffs between using a redis hash vs a plain key/val? There must be advantages and tradeoffs in speed and memory. This example shows that you can do a task either way, but not why to choose one or the other.

  8. 8 Julissa said at 11:22 am on January 26th, 2012:

    At last some rtainaoltiy in our little debate.

  9. 9 Denisel said at 10:27 am on March 26th, 2012:

    Enter comment here.


Leave a Reply