Django Caching with Backups
Django’s cache is useful to speed up access to data from slow sources, such as a remote RSS feed. But if the remote source becomes inaccessible, the data disappears after the cache expires it. One solution is to store the data longer, but that can lead to stale data. Another solution is to keep two cached copies for differing lengths of time, but that can quickly eat up a lot of memory.
One answer is to keep a copy in the cache, and a second copy on disk (or in the database) as a backup. The datastore
module works a lot like the cache
module, except that there is no explicit set()
operation. Instead, get
is called with the key, a function that will generate a new object for the cache, and the object’s time to live.
import datastore data = datastore.get('foo', get_more_foo, 5*60)
What happens here? First, the cache is checked for an object keyed ‘foo’. If one is found, it is returned. If not, get_more_foo
is called to create a new object, which is stored in the cache for 5*60 seconds, after which the updated object is pickled and backed up to the file system.
If get_more_foo
raises an exception, the result of the last successful call to get_more_foo
(the backup copy) is returned instead. The backup value is also put back into Django’s cache to keep access quick until the next attempt at updating ‘foo’, but for only half the regular time ((5*60)/2 seconds).
The only issue is when there is nothing in the cache and there is no backup. This can happen on the first call to datastore.get
. In this case, the exception raised by get_more_foo
is propagated to the caller.
Therefore, it is useful to wrap the call to datastore.get
in a try/except block:
import datastore try: data = datastore.get('foo', get_more_foo, 5*60) except FooNotAvailable: ...
The datastore is controlled by three required settings (in settings.py). DATASTORE_DIR
is the directory in which to place the backup file. /tmp is a good place for it. DATASTORE_CULL_AFTER
is the number of calls to datastore.get
before the DATASTORE_DIR
is cleaned of old backups that have not been accessed recently. DATASTORE_CULL_TIME
is the number of seconds since the last access, after which the backup becomes free for deletion (checked against file atime). This should be proportionately large to the reliability of the data source. Each successful call to get_more_foo
will regenerate the backup, updating the atime. If the source is likely to be unavailable for hours at a time, a DATASTORE_CULL_TIME
in the neighborhood of twelve hours may be in order.
Note – this file uses the with statement, so a recent version of Python is required.