Per-user caching in Django

Django comes with an easy-to-use caching framework. With a few simple decorators, an application’s views are cached. Decorators can even be used to control upstream caches, such as those maintained by ISPs. Nevertheless, if a rendered view is customized with information individual to a user, these caching options cease to be useful. Django has several solutions for this scenario:

1. The CACHE_MIDDLEWARE_ANONYMOUS_ONLY setting
2. The vary_on_cookie decorator
3. Template fragment caching
4. The low-level caching API

The CACHE_MIDDLEWARE_ANONYMOUS_ONLY setting

The CACHE_MIDDLEWARE_ANONYMOUS_ONLY setting causes Django to ignore the cache if the user is not anonymous. This is less helpful than it seems. At the Dayton Daily News, we require a trivial registration to access many areas of the site. Using this setting means that the entire page cannot be cached because of a simple “Welcome, username” line in the rendered view.

Another obstacle to using the site-level cache is that many demographic-tracking packages require setting client-specific Javascript variables in the rendered view and then accessing a script on another server. The per-site cache will cache these as well, distorting your analytics.

The vary_on_cookie decorator

The vary_on_cookie decorator (found in django.views.decorators.vary) is a simple way to tell upstream caches to cache a view based on the content of the user’s cookie. This means that each user will get their own page cached.

This is a useful decorator and a part of any caching setup for user-based sites. On its own, however, it still means that if a user visits a page only once, your server must perform all the work of rendering a page. The server gains no benefit when another user visits, since the page must be generated anew and then cached for this user as well.

Template fragment caching

This is a new feature in the development version of Django. It consists of a simple template tag that signals the framework to cache a portion of the rendered template. For example:

{% load cache %} {# thanks to AdamG for noticing the typo here #}
...stuff you don't want to cache
{ cache 300 "some" "section" user.id }
...stuff you do want to cache
{% endcache %}
...more stuff you don't want to cache

The cache tag accepts the number of seconds for which the cache should remain valid and a series of keys used to uniquely identify the cache. You may use any number of keys. Addng user.id to the mix will make this portion of the template cached on a per-user basis. Using something static will make it a standard, all-user, cache.

I experimented with template fragment caching while developing an application for which we expected extremely high traffic. In the end, we decided that the overhead did not justify the savings.

This is a very new feature and naturally not ready for production use. In the next stable release of Django I imagine that it will be considerably more efficient.

The low-level caching API

The low-level caching API is the solution for serious fine-tuning of your cache. It is located in django.core.cache. It is laughably simple to use:

from django.core.cache
 
CACHE_EXPIRES = 5 * 60 # 5 minutes
 
def some_view(request, object_id):
    cache_key = "someobjectcache%s" % object_id
    object_list = cache.get(cache_key)
    #if not object_list:
    #AdamG noted that this check avoids empty lists
    #evaluating to False, as "if not object_list" did
    if object_list is None:
        object_list = expensive_lookup()
        cache.set(cache_key, object_list, CACHE_EXPIRES)
    ...

The cache is accessed via a unique key. You can cache anything that can be safely picked in Python, including query sets from Django’s ORM. If the cache has expired, cache.get(key) returns None. Setting a key in the cache requires the unique key, the object to cache, and the time in seconds for which the cache is valid.

Since Django’s template engine is quite fast, we use the low-level API to cache the most expensive portions of each page: large database lookups, search results, the result of filtering large sets of data, ad infinitum.

This has given us the biggest savings in terms of memory, database hits, and CPU usage.

One final trick

A couple of our applications are real database hogs. They have a wide range of queries that get pulled over and over. Clearly, dropping into raw SQL and pulling lists rather than objects is the best way to streamline this type of demand, but then we lose the benefit of our custom model manager and model methods. Another neat trick is to add caching to your model manager itself:

from django.contrib.sites.models import Site
from django.core.cache
 
CACHE_EXPIRES = 5 * 60 # 10 minutes
 
class ObjectManager(models.Manager):
    def get_query_set(self, *args, **kwargs):
        cache_key = 'objectlist%d%s%s' % (
            Site.objects.get_current().id, # unique for site
            ''.join([str(a) for a in args]), # unique for arguments
            ''.join('%s%s' % (str(k), str(v)) for k, v in kwargs.iteritems())
        )
 
        object_list = cache.get(cache_key)
        #if not object_list:
        #AdamG noted that this check avoids empty lists
        #evaluating to False, as "if not object_list" did
        if object_list is None:
            object_list = super(ObjectManager, self).get_query_set(*args, **kwargs)
            cache.set(cache_key, object_list, CACHE_EXPIRES)
        return object_list

This custom model manager caches query sets using the arguments passed to get_query_set(). If they are fewer than 10 minutes old, they returned from the cache; otherwise, they are returned as a fresh query set and added to the cache. This technique can be used for busy databases to cache all possible queries performed by your application.

Leave a comment | Trackback
Feb 12th, 2008 | Posted in Programming, Tips
Tags: ,
  1. Feb 5th, 2009 at 22:55 | #1

    Challenge with last solutions is saves/deletes/edits.

    Saves and Deletes can be detected using signals, updates/edits ?

    Also even though you detect things, its difficult to know which keys to invalidate since 1 row of the db got updated?

    Any workarounds to that ?