Redis: Relations in a NoSQL world: Using Hashes
Posted: March 24th, 2010 | Author: Adam | Filed under: Database, NoSQL, programming, Python, Redis | 9 Comments »So just yesterday we posted a tutorial on how to use redis to store relational despite relations not being supported. Soon after we published the documentation on the new redis hash type went online. Now hashes by themselves aren’t exactly relations but, more so an object field store. Extending the same concepts from our first article in namespace utilization and using hashes we can accomplish the same thing in a more formal fashion.
We will repeat the same exercise from the first article, creating a username password store, using hashes.
Basic hash overview
Without going into the technical details we can simplify the concept of redis’s as a way to store fields in a redis key. In Pythonic terms we can make a redis key into a basic Python dictionary.
[key] : {‘field’ -> ‘value’, ‘field’ -> ‘value’, ‘field’ -> ‘value’}
Basic hash usage
>>> r.hset("user:adam", "fullname", "Adam Smith")
1
>>> r.hset("user:adam", "password", "thisisapassword")
1
>>> r.hkeys("user:adam")
['fullname', 'password']
>>> r.hvals("user:adam")
['Adam Smith', 'thisisapassword']
>>> r.hgetall("user:adam")
{'fullname': 'Adam Smith', 'password': 'thisisapassword'}
We will break it down line by line here
- First we are going to make the redis key “user:adam” a hash and set the field “fullname” in that has to the value “Adam Smith” with the redis command hset (Hash Set)
- Then we’ll do the same thing for the field “password” and set that field in the hash to it’s appropriate value.
- With the hkeys command we can see all of the keys in the hash set on that redis key
- hvals returns all of the values in the redis key
- More useful, is the hgetall command. This will return a Python dictionary of all of the fields set in the key
As you can see, this is an excellent way to store information about an object without “faking” a relation like in our previous tutorial.
As I mentioned in the conclusion of the last article, if we want to change how we store information in redis all we should have to do is to change is change the inner workings of the functions add_user, authenticate_user, delete_user and the rest of our fictitious application should operate without any changes.
Creating a new user
r = redis.Redis("localhost")
from hashlib import md5
def add_user(username, fullname, password):
if r.sadd("users", username):
#r.set("user:%s:fullname" % username, fullname)
#r.set("user:%s:password" % username, md5(password).hexdigest() )
r.hset("user:%s" % username, "fullname", fullname)
r.hset("user:%s" % username, "password", md5(password).hexdigest())
return True
else:
return False
I left the original code in the function commented out so we can see the differences in the two methods here. In our original method we used the name space of redis itself to store the reference. In the updated fashion we are using redis’s hash data type to store the related fields.
>>> add_user("adam", "Adam Smith", "wealthofnations")
True
>>> add_user("adam", "Adam Smith", "wealthofnations")
False
This function operates in the same fashion as our old version did but, is using a different data structure in the backend.
Logging a user in
Now we’ll refactor our old authentication code to work with the new backend.
def authenticate_user(username, password):
#if username in r.smembers("users"):
if r.sismember("users", username"):
passhash = md5(password).hexdigest()
#if passhash == r.get("user:%s:password" % username):
if passhash == r.hget("user:%s" % username, "password"):
return True
else:
return False
else:
return False
Just as in the add_user function, we just need to make a small change. Now instead of fetching the related redis key we simply fetch the field from the hash of “user:*username*”.
>>> authenticate_user("adam", "wealthofnations")
True
>>> authenticate_user("adam", "bad_password")
False
We can see the function again performs just like the old function but, it pulls the password from the hash object instead of the related key
Deleting a user
Deleting a user using the hash store is much easier than using the related key store, since everything is just stored in one key instead of being spread out across multiple keys.
def delete_user(username):
#if username in r.smembers("users"):
if r.sismembers("users", username):
r.srem("users", username)
r.delete("user:%s" % username)
#r.delete("user:%s:fullname" % username)
#r.delete("user:%s:password" % username)
return True
else:
return False
<pre>
Now instead of having to delete the 2 related keys, we just need to delete the one key storing the hash of all of the user data
First off, great post – I enjoy reading seeing how others are experimenting/using Redis and modeling data in nosql stores.
1) The users set isn’t “really” necessary here, but I assume you are just using it to prevent duplicate usernames from being created in the rare race condition. Maybe you want to add a quick explanation for your readers why it is being used and your not just checking for the existance of the hash key to determine if the username is in use.
2) I know you are just providing this as a quick proof of concept with a small dataset, but as your dataset grows doing a full “r.smembers” I believe will actually download all the values in the “users” Set. I think a better approach would be to use the SISMEMBER command.
http://code.google.com/p/redis/wiki/SismemberCommand
Something like: if r.sismember(“users”, username):
3) I don’t know python syntax but in your add_user code snippet line 8 I think your missing your string replacement for the hash key “% username”.
Hey Brian,
Thanks for the comments.
In using the ‘set’ as opposed to the HEXISTS command, that’s mostly just a carry over from the previous example.
I was using the sets mostly to maintain a consistency from the previous example to compare the difference in implementation in the storage of passwords between related keys and the newer hash functionality
As for 2 and 3… Yep, good catches, those are bugs. I’ve corrected both this post and the previous post to reflect them. Thanks for the catch.
Just using HEXISTS creates a race condition when creating users. Using the Set will actually prevent that since the SADD is atomic and returns a success result!
There is another “smembers” call when deleting a user that should be “sismember”
Got it… I’ve been trying the getting up at 5am thing recently and my night time brain isn’t as good as my morning brain now
This was written at night O:-)
Without the user set, you would have to do a keys() to find out who all the users are and perform an operation on each. Once you have a lot of users, your keys() query is going to take ages (relatively speaking).
i agree of the keys(). you should avoid it at all costs. i noticed cases when keys() return prompt after more than 10 sec…
Is there anywhere than explains the tradeoffs between using a redis hash vs a plain key/val? There must be advantages and tradeoffs in speed and memory. This example shows that you can do a task either way, but not why to choose one or the other.
At last some rtainaoltiy in our little debate.
Enter comment here.