Running Python With Apache on Windows
Category: Code
Posted:
My day job has a Windows environment and avoids installing other OSes unless absolutely necessary. Windows is great on the desktop, but has many limitations when tasked with running a web site that relies heavily upon open source languages and products. Its limitations are front and center when trying to host a Python web application. Below are some of the gotchas that I’ve encountered and how I addressed the problem.
The Python GIL
Every Python developer should be familiar with the GIL. If you’re not, follow that link and start reading. TL;DR? Threads of the same process block each other on all IO. By itself, the GIL is an easy beast to tame by spawning more processes, instead of threads. On *nix OSes, processes are relatively light weight and you can fork a process and continue on your merry way. Processes on Windows are heavier and it does not support fork without installing more software. Cygwin and Windows Services for Unix are the two ways of making Windows behave more like *nix for specially compiled programs.
Apache MPM (Multi-Processing Modules)
Apache is a great HTTP server application that provides more functionality than most sites will ever need or use. It is also one of the few options that is designed to run as a service on Windows. The lighter weight options, nginx and lighttpd are usually better for my needs, but both have issues with running as a Windows service.
Apache supports a few different Multi-Processing Modules, prefork is the default for *nix. On Windows, there is only one MPM, mpm_winnt. This MPM works great and follows the Windows idea of spawning threads when you want work done. The GIL makes almost any Python website unusably slow when run on Windows with Apache. Most web applications are a bunch of IO (Network, database, disk, etc.) with a small amount of CPU. This basically turns Apache in to a single threaded web server. The prefork MPM does not experience this problem due to its use of processes instead of threads.
I observed the situation where a page that by itself would take about 1 second to generate, could take tens of seconds to finish if there were more than 2-3 overlapping requests. Each new request (in a thread) would get roughly equal time and slow down all previous threads. It was possible to block the site for minutes with as few as 10 requests.
Faking a Python MPM
There is a way of configuring a Windows server so that it can serve a Python web application and avoid the GIL. A multi-process MPM is conceptual the same as a load balancer sitting in front of several web servers. An incoming request is routed to a individual web server to handle.
Apache as a Balancer
Apache can function as a load balancer if another option is not available. Here’s a configuration snippet that will equally balance requests among three Apache instances running on the same machine as the instance acting as the load balancer (reverse proxy). See mod_proxy documentation for the rest of the configuration directives that you will need to fully configure.
<Proxy balancer://cluster>
BalancerMember http://192.168.0.10:9001 smax=3 max=10 ttl=120 route=www_1
BalancerMember http://192.168.0.10:9002 smax=3 max=10 ttl=120 route=www_2
BalancerMember http://192.168.0.10:9003 smax=3 max=10 ttl=120 route=www_3
</Proxy>
ProxyPass / balancer://cluster/ ProxyPassReverse / balancer://cluster/
Gotchas
This configuration of faking a server farm on a single machine requires some new problems to be resolved. Thankfully, these are not that difficult once you are aware of them.
Lots-O-Logs
Every request will be logged by the load balancing Apache instance and the proxied instance that does the work. This is not necessarily a bad thing by itself. It is useful to know which Apache instance handled a specific request and also get the aggregate view (load balancer logs). Unless steps are taken, this will double the disk IO and space requirements.
The simple resolution is to disable all logging on the worker instances. This can be accomplished by using CustomLog and a conditional environment variable that is never set.
LogFormat " " empty
# Below will never output anything, but it will create an empty file
CustomLog "D:/logs/carme/apache/access-1.log" empty env=NOTHING_IS_LOGGED
Logging has now been reduced to a normal volume, but you will not know which instance handled the request. To regain that bit of information, you can add %{BALANCER_WORKER_ROUTE}e to the LogFormat of the load balancer. This will include whatever value is set for route= in the above BalancerMember configuration. E.g. www_1, www_2, or www_3.
Fixing IPs
The instances behind the load balancer, and application code will see every request as if it is coming from the load balancing instance. This can be resolved with the Apache module mod_rpaf.
