Google App Engine Task Queues, Push vs. Pull Paradigm, and Web Hooks
Despite my post last week on the Shortcomings of Google App Engine and my decision to move away from it as a viable platform for upcoming projects, I have been impressed with the overall architecture and design of their experimental Task Queue API.
Google throughout its years has been a leader in interface design and that has been reflected not only in the UI of the products they have built, but the countless API interfaces they have published. Google has made available some of the most easy to use yet powerful API interfaces. A clear focus on leveraging open standards where possible has helped them along the way. Google App Engine is probably the strongest testament to this, allowing developers to quickly build web applications that scale to millions of users on an easy to use Python or Java runtime environment. Their latest experimental design for the Task Queue API in Google App Engine is no exception.
Definition
Before I discuss its advantages, I should provide a definition of a task queue:
Task queues have all sorts of uses for offline processing, including periodically pulling data from third party sources, computing aggregate statistics, delivery of emails to users, etc.
Simple Interface
One of the most straightforward advantages of Google's Task Queue API is its very simple interface. While you can define a set of configuration options, they are all optional. Enqueuing a task for execution is as simple as the following:
A default queue is provided, though you can easily define additional queues with their own execution options. After being enqueued, the task is run as soon as possible (according to the queue's scheduling options). Optional configuration options are specified in a queue.yaml file, including queue names, rates of processing, and bucket sizes.
Push vs. Pull
While a simple interface is nice, the push vs. pull model of the GAE Task Queue is what makes it really shine. To understand this advantage, let's compare it to another popular cloud based queue solution, Amazon Simple Queue Service (SQS). With SQS, you define a queue and it becomes a central repository for unprocessed tasks. Then you create a set of worker processes (on, say, Amazon EC2 servers) that regularly poll the qeueue to see if there are available tasks for processing. If a worker process finds an available task, the task becomes locked, allowing that worker to process it without other workers having access to it. Once the work is complete, it is removed from the SQS queue.
While this approach provides a lot of flexibility, it requires constantly running worker processes that are polling for available work. In addition, if there is a spike in tasks in the queue, you must also manage the scale up and eventual scale down of worker processes.
In contrast to this mechanism, GAE Task Queue provides a push model. Instead of having an arbitrary number of worker processes constantly polling for available tasks, GAE Task Queue instead pushes work to workers when tasks are available. This work is then processed by the existing auto-scaling GAE infrastructure, allowing you to not have to worry about scaling up and down workers. You simply define the maximum rates of processing and GAE takes care of farming out the tasks to workers appropriately.
Web Hooks
What is also compelling about GAE Task Queue is its use of web hooks as the description of a task unit. When you break it down, an individual task consists of the code to execute for that task as well as the individual data input for that specific task.
The web already provides a great mechanism for this through HTTP requests, their GET and POST input, and the resulting status response. Since in GAE you already define code to execute on an HTTP request, you can leverage the same mechanism for defining the execution code for tasks. As far as the data input, the GET querystring params or HTTP POST body provide suitable mechanisms for providing any kind of input. In this way, a task description is simply a URL that handles the request and a set of input parameters to that request.
This allows you to leverage everything you have already learned in building web request handlers in GAE for user-initiated requests. And more importantly, leverages the fact that GAE has already invested heavily in auto-scaling web request handling. It can simply re-use this infrastructure for tasks queues without having to invent a separate scaling architecture.
Shortcomings
While the overall design of GAE Task Queues is compelling, it suffers from the same shortcomings I mentioned in my previous post. Namely, a given task has a 30 second deadline. That means any individual task cannot perform more than 30s of computation, including getting data from the data store, calling third party APIs, computing aggregations, etc. In many cases, this is fine, since you can simply enqueue many small tasks and make tasks granular enough to always complete in 30s. However, this often does introduce needless complexity in task division and some tasks simply cannot be divided into less than 30 seconds of processing.
Overall, I find the design of the GAE Task Queues compelling and think its a great pattern for modeling queue infrastructure, whether its on or off Google App Engine.
Google throughout its years has been a leader in interface design and that has been reflected not only in the UI of the products they have built, but the countless API interfaces they have published. Google has made available some of the most easy to use yet powerful API interfaces. A clear focus on leveraging open standards where possible has helped them along the way. Google App Engine is probably the strongest testament to this, allowing developers to quickly build web applications that scale to millions of users on an easy to use Python or Java runtime environment. Their latest experimental design for the Task Queue API in Google App Engine is no exception.
Definition
Before I discuss its advantages, I should provide a definition of a task queue:
Task Queue is defined as a mechanism to synchronously distribute a sequence of tasks among parallel threads of execution. The global problem is broken down into tasks and the tasks are enqueued onto the queue. Parallel threads of execution pull tasks from the queue and perform computations on the tasks. The runtime system is responsible for managing thread accesses to the tasks in the queue as well as ensuring proper queue usage (i.e. dequeuing from an empty queue should not be allowed).
Source: Task Queue Implementation Pattern: Ekaterina Gonina (Author), Jike Chong (Shepherd), UC Berkeley ParLab
Source: Task Queue Implementation Pattern: Ekaterina Gonina (Author), Jike Chong (Shepherd), UC Berkeley ParLab
Task queues have all sorts of uses for offline processing, including periodically pulling data from third party sources, computing aggregate statistics, delivery of emails to users, etc.
Simple Interface
One of the most straightforward advantages of Google's Task Queue API is its very simple interface. While you can define a set of configuration options, they are all optional. Enqueuing a task for execution is as simple as the following:
#python
from google.appengine.api.labs import taskqueue
#Add the task to the default queue.
taskqueue.add(url='/worker', params={'key': key})
from google.appengine.api.labs import taskqueue
#Add the task to the default queue.
taskqueue.add(url='/worker', params={'key': key})
A default queue is provided, though you can easily define additional queues with their own execution options. After being enqueued, the task is run as soon as possible (according to the queue's scheduling options). Optional configuration options are specified in a queue.yaml file, including queue names, rates of processing, and bucket sizes.
Push vs. Pull
While a simple interface is nice, the push vs. pull model of the GAE Task Queue is what makes it really shine. To understand this advantage, let's compare it to another popular cloud based queue solution, Amazon Simple Queue Service (SQS). With SQS, you define a queue and it becomes a central repository for unprocessed tasks. Then you create a set of worker processes (on, say, Amazon EC2 servers) that regularly poll the qeueue to see if there are available tasks for processing. If a worker process finds an available task, the task becomes locked, allowing that worker to process it without other workers having access to it. Once the work is complete, it is removed from the SQS queue.
While this approach provides a lot of flexibility, it requires constantly running worker processes that are polling for available work. In addition, if there is a spike in tasks in the queue, you must also manage the scale up and eventual scale down of worker processes.
In contrast to this mechanism, GAE Task Queue provides a push model. Instead of having an arbitrary number of worker processes constantly polling for available tasks, GAE Task Queue instead pushes work to workers when tasks are available. This work is then processed by the existing auto-scaling GAE infrastructure, allowing you to not have to worry about scaling up and down workers. You simply define the maximum rates of processing and GAE takes care of farming out the tasks to workers appropriately.
Web Hooks
What is also compelling about GAE Task Queue is its use of web hooks as the description of a task unit. When you break it down, an individual task consists of the code to execute for that task as well as the individual data input for that specific task.
The web already provides a great mechanism for this through HTTP requests, their GET and POST input, and the resulting status response. Since in GAE you already define code to execute on an HTTP request, you can leverage the same mechanism for defining the execution code for tasks. As far as the data input, the GET querystring params or HTTP POST body provide suitable mechanisms for providing any kind of input. In this way, a task description is simply a URL that handles the request and a set of input parameters to that request.
This allows you to leverage everything you have already learned in building web request handlers in GAE for user-initiated requests. And more importantly, leverages the fact that GAE has already invested heavily in auto-scaling web request handling. It can simply re-use this infrastructure for tasks queues without having to invent a separate scaling architecture.
Shortcomings
While the overall design of GAE Task Queues is compelling, it suffers from the same shortcomings I mentioned in my previous post. Namely, a given task has a 30 second deadline. That means any individual task cannot perform more than 30s of computation, including getting data from the data store, calling third party APIs, computing aggregations, etc. In many cases, this is fine, since you can simply enqueue many small tasks and make tasks granular enough to always complete in 30s. However, this often does introduce needless complexity in task division and some tasks simply cannot be divided into less than 30 seconds of processing.
Overall, I find the design of the GAE Task Queues compelling and think its a great pattern for modeling queue infrastructure, whether its on or off Google App Engine.
Want to accelerate your product career?
I've finally distilled my 15+ years of product experience into a course designed to help PMs master their craft. Join me for the next cohort of Mastering Product Management.
Are you building a new product?
Learn how to leverage the Deliberate Startup methodology, a modern approach to finding product/market fit. Join me for the next cohort of Finding Product/Market Fit.
Enjoyed this essay?
Get my monthly essays on product management & entrepreneurship delivered to your inbox.
Aug 24, 2009