domenica 15 maggio 2011

Redis: how to support "startswith" queries

While working at django_db_indexer i wanted to implement an indexing system that could take full advantage of Redis db types.
In this post i'll explain how django_db_indexer offers support for startswith queries using redis zsets, feel free to write me if you think of a better way.

Suppose you have some objects you want to store in a redis database, for example :

class Musician(object):
def __init__(self,name,group):
self.name = name
self.group = group




We want to be able to retrieve all musician whose names starts with a certain string.

To do so, i took inspiration from this post in Salvatore Sanfilippo's blog:

http://antirez.com/post/autocomplete-with-redis.html

We'll be using redis' ordered sets. Zsets in redis are sets of values (strings) ordered by a certain score.
When two or more values share the same score, they are ordered alphabetically.
We'll store some musicians:



> hmset musicians:1 name "Julian Casablancas" group Strokes
> hmset musicians:2 name "Thom Yorke" group Radiohead
> hmset musicians:2 name "James Murphy" group "LCD Soundsystem"
...

For every record, we'll save the musician's name in a redis zset like this:


> zadd musician_name_startswith 0 Julian Casablancas_1
> zadd musician_name_startswith 0 Thom Yorke_2
> zadd musician_name_startswith 0 James Murphy_3
> zrange zset 0 -1
1. "James Murphy_3"
2. "Julian Casablancas_1"
3. "Thom Yorke_2"



Now, if we want to find all musicians whose name starts with "J" we simply follow these steps:

1) add the string "J" to our zset with score 0
2) find it's rank with "zrank" command
3) retrieve all elements in zset which rank is bigger than "J"'s rank using zrange
4) remove "J" from the set

We want to use a transaction with "WATCH" used over our zset in order to make this method work, because if another record is inserted in the zset the lookup string rank may vary.
We can now simply iterate over the list, breaking when we find a string that doesn't start with "J", and find musician's id by splitting the string by "_".



While zadd and zrank have logarithmic complexity, zrange has O(log(N)+M), where M is the number of elements returned (N in worst case scenario). To reduce this M you could add another string with only the last char changed in the following one (e.g. for our "J" i whould be "K") and retrieve the records with zrank between the two strings.

Here's the code:


SEPARATOR = '_'
def startswith(lookup,connection):
key = "musician_name_startswith"
conn.zadd(key,lookup,0)
conn.watch(key)
while True:
try:
pipeline = conn.pipeline()
up = conn.zrank(key,lookup)
pipeline.zrange(key,up+1,-1)
pipeline.zrem(key,lookup)
res = pipeline.execute()
return result_generator(lookup,res[0])
except WatchError:
pass

def result_generator(lookup,l):
for i in l:
if not i.startswit(lookup):
break
yield i



lunedì 2 maggio 2011

Playing with Redis scripting branch

The new Redis' scripting feature is what i was looking for to reduce network latency bottleneck.

- Store something in a key based on the result of an incr call:
./redis-cli eval "t = redis('incr','x')
>redis('set','key_' .. t,'value')
> return 'key_' .. t" 0


This is expecially useful when you're storing a complex object with a unique id using redis hash type:

./redis-cli eval "t = redis('incr','user_id')
> redis('hmset','user:' .. t,'username','user','password','pass')
> return t" 0
(integer) 2

./redis-cli hgetall user:2
1) "username"
2) "user"
3) "password"
4) "pass"



- Startswith index (i'll talk about it in a next post):


 
./redis-cli zadd startswith_index 0 anthony
./redis-cli zadd startswith_index 0 bert
./redis-cli zadd startswith_index 0 carl
./redis-cli zadd startswith_index 0 carlton
./redis-cli zadd startswith_index 0 peter

./redis-cli eval "redis('zadd','startswith_index','0','car')
> r = redis('zrank','startswith_index','anth')
> ret = redis('zrange','startswith_index',tonumber(r)+1,'-1')
> redis('zrem','startswith_index','anth')
> return ret" 0
1) "carl"
2) "carlton"
3) "peter"

Mysql and commas in decimal numbers

Some time ago i was working on a mess of a site: i had to fix some issues in order to make it work as long as needed to rewrite it from scratch. One of these issues involved a query that had to be ordered by a text field containing decimal numbers with commas instead of points.
In my first attempt i simply used CAST to convert text to decimal, something like:

ORDER BY CAST( price as decimal(10,2))

Unfortunately, mysql doesn’t like commas at all: it just “cuts” the number whenever it finds one. To get my queryset ordered properly i had to get rid of the commas, so i used replace:

ORDER BY CAST( replace(price,’,’,’.’) as decimal(10,2))

This tip is also useful when you have decimals with commas you want to display with points and vice versa.