Threading in Django

Profitable use of threading in web development is rare, particularly when contending with Python’s global interpreter lock. There are a few notable exceptions to this.

The global interpreter lock (GIL) is the method that Python uses to maintain internal thread-safety when not all Python objects are thread-safe. The GIL ensures that no Python object may be modified by multiple threads at the same time. This is transparent to the Python programmer; the GIL is locking internal structures. A list that is modified by multiple threads without explicit locking will still behave erratically.

This makes threading less useful in web development. Because Python can not directly maintain state between reqeusts (indirectly, via the client, state may be maintained using session cookies or other similar strategies), each request is a new thread:

 1. Determine what data is being requested
 2. Get the data
 3. Format the data
 4. Send the data

Each of these steps depends on the state information from the previous step. There is not a lot of room for asynchronicity. The usefulness of threading is pushed back to the web server. That is where integrated server/framework solutions, like Ocsigen, have an advantage.

Many web-based applications, however, are not simple database front ends. There are a few common situations where threading becomes useful; specifically, when there are side effects of a request.

On a side note, experts will tell you that only POST request should result in side effects. This is misleading. POST should certainly be used for user-intended side effects. However, what happends when there is an error loading a page?

Error messages

In the event of a coding or environment error, Django sends me an email. This is a wonderful (and an occassionally annoying) feature. But what if the problem is caused by the data we input, rather than the code? I can detect this in my code and present the user with a nice error page apologizing for the inconvenience, but now I want to notify the content team that an entry in the database is invalid in some way.

Sending a message, especially if Django is connecting to a remote mail server, can take a while. This is a good use for threading. The thread should deal with as little data processing as possible, concentrating instead on the side effect.

import threading
from django.core.mail import send_mail
 
def foo(request, identifier):
    data = get_some_data(identifier)
    try:
        error_check(data) # raise exception if data invalid
    except InvalidData, e:
        # Format our information here, in the main thread
        subject = "Invalid data in entry %d" % identifier
        message = """
            There is an error in entry %d.  Please check this
            data at http://path/to/django/admin/app/table/%d.
        """ % (identifier, identifier)
        recipients = ['', '']
        from = ''
 
        # Create a new thread in Daemon mode to send message
        t = threading.Thread(target=send_mail,
                             args=[subject, message, from, recipients],
                             kwargs={'fail_silently': True})
        t.setDaemon(True)
        t.start()
    return HttpResponseServerError(some_error_page)

Note particularly that we use t.setDaemon(True) on the thread before starting it. This tells Python not to wait for the thread to exit before returning data to the client.

Writing files

Many of our web-based programs write data to static files. Some applications maintain logs. Others publish static HTML files to the enterprise server. These are great uses for Python threads, since the GIL is released during file IO operations.

Caching remote data

We are a large business and applications are developed by various (often isolated) units. I often find myself in a situation where my application depends on mutable data extracted from another system but does not have direct access to the data’s database.

Rather than depend on low internal network latencies and download the data extract on each page load, I cache the data and keep a local copy, using a script to synchronize the data. For simple data, this can be stored using Django’s low level cache. But if some state needs to be maintained for that data, a database table is a better method.

When there are a large number of such objects, a scheduled update process becomes burdensome, especially if the number of objects increases regularly. Therefore, the solution is to trigger the event when the data is accessed (via a page load).

Each cached database entry gets a timestamp field to note its last update. The model defines an update method and the model manager’s get() method checks the entry’s timestamp to see if the data needs to be freshened. The update routine is called in a separate thread.

Warning: in Django, when a model instance is found via a relationship, the manager’s get() method is not called! This is because these objects are accessed via a django.db.models.fields.related.ManyRelatedManager, rather than the model’s own manager. If you wish to solve this without duplicating code, look into Django signals.

The problem is that this is not a simple side effect and the GIL will get in the way. The solution is to do all of the necessary logic to determine if the object needs to be updated in the main thread. Just before returning the response to the user, the update thread is started in Daemon mode. That minimizes competition for the GIL. The user will not get the updated version of the object on this page load, but the next user will.

The other alternative is to use os.fork or the subprocess module to launch the update routine in a separate process (with its own interpreter instance and its own GIL). This sort of solution gets used a lot in PHP because of its lack of threads. In Python, threads tend to be more useful and much less resource hungry.

Leave a comment | Trackback
Mar 7th, 2008 | Posted in Software
  1. Robert
    May 25th, 2009 at 10:07 | #1

    Hi, I found your article useful, but also I found some errors ;) (from -> from_, kwargs={‘fail_silently’=True} -> kwargs={‘fail_silently’:True}). I made function similar to django.core.mail.send_mail:


    import threading

    def async_send_mail(subject, message, from_, recipients):
    """
    based on:
    https://artfulcode.net/articles/threading-django/
    returns true allways
    """
    from django.core.mail import send_mail

    # Create a new thread in Daemon mode to send message
    t = threading.Thread(target=send_mail,
    args=[subject, message, from_, recipients],
    kwargs={'fail_silently':True})
    t.setDaemon(True)
    t.start()
    return True

  2. t2y
    Sep 27th, 2009 at 18:28 | #2

    Hi, your article is just what I have been looking for.
    I translated into Japanese in my site since it’s great.
    Thank you for good article!

  3. RawNawpairm
    Jan 1st, 2010 at 15:01 | #3

    In my program I need to select multiple bytes, I have the first and last byte being 0×18 and 0×28. I’m editing a txt file. I have it so whatever I put in is replaced. So I have myfile.seekg ( hexvalues = (0×18, 0×19, 0×20, etc.) You get the point. I want to just put in 0×18 and 0×28 and it gets everything in the middle. Im using fstream

    _____________
    Darmowe tapety

  4. Jun 30th, 2010 at 14:56 | #4

    You shouldn’t set thread.daemon = True. If your main thread exits, the daemon thread will die. If it works for you, it’s because Apache doesn’t kill its processes often. :)

    You should either use a normal thread or a queue.


    import threading
    import Queue
    import atexit

    def _worker():
    while True:
    func, args, kwargs = _queue.get()
    try:
    func(*args, **kwargs)
    except:
    pass # bork or ignore here; ignore for now
    finally:
    _queue.task_done() # so we can join at exit

    def postpone(func):
    def decorator(*args, **kwargs):
    _queue.put((func, args, kwargs))
    return decorator

    _queue = Queue.Queue()
    _thread = threading.Thread(target = _worker) # one is enough; it's postponed after all
    _thread.daemon = True # so we can exit
    _thread.start()

    def _cleanup():
    _queue.join() # so we don't exit too soon

    atexit.register(_cleanup)

    Use it like this



    def foo():
    pass # do your stuff here

    • walty
      Sep 28th, 2010 at 19:48 | #5

      Andrej Primc :
      You shouldn’t set thread.daemon = True. If your main thread exits, the daemon thread will die. If it works for you, it’s because Apache doesn’t kill its processes often. :)
      You should either use a normal thread or a queue.

      import threading
      import Queue
      import atexit

      def _worker():
      while True:
      func, args, kwargs = _queue.get()
      try:
      func(*args, **kwargs)
      except:
      pass# bork or ignore here; ignore for now
      finally:
      _queue.task_done()# so we can join at exit
      def postpone(func):
      def decorator(*args, **kwargs):
      _queue.put((func, args, kwargs))
      return decorator
      _queue = Queue.Queue()
      _thread = threading.Thread(target = _worker)# one is enough; it's postponed after all
      _thread.daemon = True# so we can exit
      _thread.start()
      def _cleanup():
      _queue.join()# so we don't exit too soon
      atexit.register(_cleanup)

      Use it like this


      def foo():
      pass # do your stuff here

      andre primc,

      your code works great!

      it’s too bad that the post here does not support code indentation.

      may be u try to put it somewhere in google code?