Threading in Django
Profitable use of threading in web development is rare, particularly when contending with Python’s global interpreter lock. There are a few notable exceptions to this.
The global interpreter lock (GIL) is the method that Python uses to maintain internal thread-safety when not all Python objects are thread-safe. The GIL ensures that no Python object may be modified by multiple threads at the same time. This is transparent to the Python programmer; the GIL is locking internal structures. A list that is modified by multiple threads without explicit locking will still behave erratically.
This makes threading less useful in web development. Because Python can not directly maintain state between reqeusts (indirectly, via the client, state may be maintained using session cookies or other similar strategies), each request is a new thread:
1. Determine what data is being requested
2. Get the data
3. Format the data
4. Send the data
Each of these steps depends on the state information from the previous step. There is not a lot of room for asynchronicity. The usefulness of threading is pushed back to the web server. That is where integrated server/framework solutions, like Ocsigen, have an advantage.
Many web-based applications, however, are not simple database front ends. There are a few common situations where threading becomes useful; specifically, when there are side effects of a request.
On a side note, experts will tell you that only POST request should result in side effects. This is misleading. POST should certainly be used for user-intended side effects. However, what happends when there is an error loading a page?
Error messages
In the event of a coding or environment error, Django sends me an email. This is a wonderful (and an occassionally annoying) feature. But what if the problem is caused by the data we input, rather than the code? I can detect this in my code and present the user with a nice error page apologizing for the inconvenience, but now I want to notify the content team that an entry in the database is invalid in some way.
Sending a message, especially if Django is connecting to a remote mail server, can take a while. This is a good use for threading. The thread should deal with as little data processing as possible, concentrating instead on the side effect.
import threading from django.core.mail import send_mail def foo(request, identifier): data = get_some_data(identifier) try: error_check(data) # raise exception if data invalid except InvalidData, e: # Format our information here, in the main thread subject = "Invalid data in entry %d" % identifier message = """ There is an error in entry %d. Please check this data at http://path/to/django/admin/app/table/%d. """ % (identifier, identifier) recipients = ['', ''] from = '' # Create a new thread in Daemon mode to send message t = threading.Thread(target=send_mail, args=[subject, message, from, recipients], kwargs={'fail_silently': True}) t.setDaemon(True) t.start() return HttpResponseServerError(some_error_page)
Note particularly that we use t.setDaemon(True)
on the thread before starting it. This tells Python not to wait for the thread to exit before returning data to the client.
Writing files
Many of our web-based programs write data to static files. Some applications maintain logs. Others publish static HTML files to the enterprise server. These are great uses for Python threads, since the GIL is released during file IO operations.
Caching remote data
We are a large business and applications are developed by various (often isolated) units. I often find myself in a situation where my application depends on mutable data extracted from another system but does not have direct access to the data’s database.
Rather than depend on low internal network latencies and download the data extract on each page load, I cache the data and keep a local copy, using a script to synchronize the data. For simple data, this can be stored using Django’s low level cache. But if some state needs to be maintained for that data, a database table is a better method.
When there are a large number of such objects, a scheduled update process becomes burdensome, especially if the number of objects increases regularly. Therefore, the solution is to trigger the event when the data is accessed (via a page load).
Each cached database entry gets a timestamp field to note its last update. The model defines an update method and the model manager’s get()
method checks the entry’s timestamp to see if the data needs to be freshened. The update routine is called in a separate thread.
Warning: in Django, when a model instance is found via a relationship, the manager’s get()
method is not called! This is because these objects are accessed via a django.db.models.fields.related.ManyRelatedManager
, rather than the model’s own manager. If you wish to solve this without duplicating code, look into Django signals
.
The problem is that this is not a simple side effect and the GIL will get in the way. The solution is to do all of the necessary logic to determine if the object needs to be updated in the main thread. Just before returning the response to the user, the update thread is started in Daemon mode. That minimizes competition for the GIL. The user will not get the updated version of the object on this page load, but the next user will.
The other alternative is to use os.fork
or the subprocess
module to launch the update routine in a separate process (with its own interpreter instance and its own GIL). This sort of solution gets used a lot in PHP because of its lack of threads. In Python, threads tend to be more useful and much less resource hungry.
Hi, I found your article useful, but also I found some errors ;) (from -> from_, kwargs={‘fail_silently’=True} -> kwargs={‘fail_silently’:True}). I made function similar to django.core.mail.send_mail:
import threading
def async_send_mail(subject, message, from_, recipients):
"""
based on:
https://artfulcode.net/articles/threading-django/
returns true allways
"""
from django.core.mail import send_mail
# Create a new thread in Daemon mode to send message
t = threading.Thread(target=send_mail,
args=[subject, message, from_, recipients],
kwargs={'fail_silently':True})
t.setDaemon(True)
t.start()
return True
Hi, your article is just what I have been looking for.
I translated into Japanese in my site since it’s great.
Thank you for good article!
In my program I need to select multiple bytes, I have the first and last byte being 0×18 and 0×28. I’m editing a txt file. I have it so whatever I put in is replaced. So I have myfile.seekg ( hexvalues = (0×18, 0×19, 0×20, etc.) You get the point. I want to just put in 0×18 and 0×28 and it gets everything in the middle. Im using fstream
_____________
Darmowe tapety
You shouldn’t set thread.daemon = True. If your main thread exits, the daemon thread will die. If it works for you, it’s because Apache doesn’t kill its processes often. :)
You should either use a normal thread or a queue.
import threading
import Queue
import atexit
def _worker():
while True:
func, args, kwargs = _queue.get()
try:
func(*args, **kwargs)
except:
pass # bork or ignore here; ignore for now
finally:
_queue.task_done() # so we can join at exit
def postpone(func):
def decorator(*args, **kwargs):
_queue.put((func, args, kwargs))
return decorator
_queue = Queue.Queue()
_thread = threading.Thread(target = _worker) # one is enough; it's postponed after all
_thread.daemon = True # so we can exit
_thread.start()
def _cleanup():
_queue.join() # so we don't exit too soon
atexit.register(_cleanup)
Use it like this
def foo():
pass # do your stuff here
andre primc,
your code works great!
it’s too bad that the post here does not support code indentation.
may be u try to put it somewhere in google code?