💡 ASGI (or “Asynchronous Server Gateway Interface”) is an interface/protocol designed to support the development of modern, asynchronous client and server applications for a wide variety of use cases including web APIs, micro-services, and client libraries.


In the 90s, when the World Wide Web was just starting out, we quickly faced a pretty big problem: as websites became more complicated, we started to require more complicated business logic on the server side. There weren’t really “APIs” (not in any modern sense of the term at least) and in fact most web servers were either written entirely in C/C++ or were simple static file servers. However, around the same time, we started to embrace a new wave of programming languages and models (e.g. Perl, Python, etc.), that would increase our agility in developing more complicated business software. Very quickly this developed into a pretty big problem to solve: how can we use a web server written in C/C++ to interface with higher-level business and application logic written in another language like Perl or Python?


Thus was born CGI (or "Common Gateway Interface"). The way CGI worked was pretty straightforward:

  1. A web browser (like Netscape Navigator 😆) connects to a web server and makes a request for a web page when a user clicks on a link or types in an address.
  2. A CGI web server accepts the incoming TCP connection, reads and parses an HTTP request, then executes a shell script with the HTTP request data (oftentimes passed in via environment variables 💥).
  3. The CGI script then uses the HTTP request data to perform some application logic, and writes some data (at the time this was mostly just plain HTML) to standard output.
  4. The CGI web server then takes the standard output written by the shell script, translates it into an well-formed HTTP response, and sends it along the TCP connection back to the client.

For almost a decade, CGI continued to be the prevailing choice for complicated web server software, but alternatives began to emerge as more people came online and server-side applications required more.


Fast-forwarding to the late 90s/early 2000s, CGI was starting to show its age, and it became clear that there was a pretty big problem with how it was originally designed: for every request that comes in, a CGI server would need to start a program in a new process. This design wouldn’t scale very well as web servers started to deal with more and more users, since RAM and CPU wouldn’t scale at the same pace.

FastCGI was one of the solutions developed largely in response to this scaling problem. FastCGI works broadly in a similar fashion to CGI, with the main difference being that FastCGI applications are run once and handle many requests over the course of the process lifetime. This also necessitated a new kind of component, the application server, to host application-level functionality in a long-running process.

  1. A web browser (like Internet Explorer 💀) connects to a web server and makes a request for a web page when a user clicks on a link or types in an address.
  2. A FastCGI-enabled web server accepts the incoming TCP connection, reads and parses an HTTP request, then translates the request into a binary message protocol that gets sent to a long-running application server.
  3. A FastCGI application server then translates the incoming binary message into an application-specific function call, passing in the request data.
  4. After the application returns some data from the function call, the FastCGI application server then translates the return value into a binary response message.
  5. The web server then takes binary message and translates it into an well-formed HTTP response, to send it along the TCP connection back to the client.

By using this design, FastCGI web servers could scale by taking advantage of multi-threading in the application server itself, as threads are often faster to spin up and require fewer resources. Also, but leveraging the application server model, application code itself could be even more focused on the business logic needed by the application.

2000s to 2010s

Eventually, many different FastCGI-style servers began popping up, each with its own idea for how the interface should look between the application server and the application. But unlike CGI or FastCGI, these ideas were often language-specific due to the nature of that interaction itself (e.g. Java’s [in]famous “servlet” API).


In the Python world, this interface was called WSGI (or “Python Web Server Gateway Interface”) and was first recommended via PEP-333 in late 2003. WSGI eventually gave rise to a lot of the mid-2000s most popular web frameworks, among them Django and Flask, which were each implementations of the WSGI specification. This meant that by writing code using a library like Flask, you could easily deploy your application code onto any WSGI/FastCGI/CGI-compatible web server.

Let’s take the follow example of a “minimal application” from Flask’s own docs:

from flask import Flaskapp = Flask(__name__)

def hello_world():
    return "<p>Hello, World!</p>"

Notice anything missing about this code? There is no server! Where did it go? How come running python doesn’t do anything?

Scrolling down further through their docs, you can see that there is a second step needed:

export FLASK_APP=hello
flask run
#  * Running on

This works now because the Flask code we wrote only contains the application itself. In order to run, we need to host that code in a WSGI application server, which flask run happens to do for us.

2010s to Today

While the FastCGI model continued to prevail through most of the early 2000s, some more popular websites were already starting to face further scaling challenges, even with FastCGI’s multi-threaded application servers. This was primarily because FastCGI suffered from a design problem similar in nature to its predecessors: each request required a new thread.

C10k and asynchronous I/O

As early as 1999, this issue in scaling web servers was dubbed “the C10k problem”, referring to the challenge of using a single server to handle 10,000 client connections concurrently. Inspired by this challenge, many solutions were born, including gevent, Twisted, and Tornado in the Python world (also notably Ryan Dahl’s early work on Node.js). These solutions all made generally centered in on modelling concurrency around a couple of key distinguishing ideas:

  • So-called “green threads” or “tasklets” replace processes and threads: each request instead spawns a tasklet and you can run tens of thousands of them on a single OS thread because they are so lightweight. To get a feel for how lightweight they are, Go’s “goroutines” can be spawned with an initial stack size of only 2KB compared to a normal OS thread which allocates closer to 8MB!
  • Non-blocking or asynchronous I/O is used in order to schedule multiple tasklets on a single thread. For example, it has been observed that a large majority of application code just sits around waiting for a database to respond back with some data before resuming actual application logic. With blocking I/O, this would require the entire thread to wait for the database to come back with data. However by leveraging non-blocking/async I/O, a scheduler can put tasklets to “sleep” when they are awaiting some I/O operation and then later “wake them up” again when data is ready (thus we have async and await in so many languages now).

However, with these newer models for concurrency, some of the original WSGI folks were struggling to try to rewrite their spec to better adapt to this modern reality of asynchronous server technologies becoming more wide-spread. The main design flaw in WSGI is evident when we look at the code sample provided in PEP-333 itself:

def simple_app(environ, start_response):
    """Simplest possible application object"""
    status = '200 OK'
    response_headers = [('Content-type', 'text/plain')]
    start_response(status, response_headers)
    return ['Hello world!\n']

In the above code, we can see that the “simplest possible [WSGI](about:blank#wsgi) application object” is represented by a Python function that takes in two parameters:

  • environ which contains HTTP request data (the name referring to the old days of CGI passing this data via environment variables)
  • start_response which is a synchronous callback function to be called when a response is ready to be written back to the client

How could this interface be adapted for use in an asynchronous server? The problem is even further complicated when we consider common use cases, such as streaming data from/to the client (e.g. a large file upload or download) or more complicated use cases like WebSockets.


After several iterations, a new interface called ASGI (or “Asynchronous Server Gateway Interface”) was developed to solve these problems in making WSGI work well with asynchronous I/O (thus the “A” 😃). Below is the WSGI code example from above translated into ASGI:

async def simple_app(scope, receive, send):
    await send(
            "type": "http.response.start",
            "status": 200,
            "headers": [(b"Content-type", b"text/plain")],
    await send({"type": "http.response.body", "body": b"Hello world!\n"})

We can see here that there are a couple of notable differences from plain WSGI:

  • scope largely replaced the original environ parameter as a dictionary containing HTTP request/connection information, but also includes additional metadata to allow further extension and future-proofing, e.g. scope["type"] contains information useful in protocol negotiation, and there’s even a scope["asgi"] value which can be used to support older/newer versions of the ASGI spec itself!
  • receive and send are used as asynchronous callbacks that allow request and response data (respectively) to be streamed without blocking the current thread. Unliked WSGI’s start_response callback function, these two functions act like I/O channels that communicate messages via plain Python dictionaries.
  • There’s no return! This is largely because the interface is driven by sending specific messages via the receive and send callbacks. Furthermore, some protocols such as WebSockets are commonly constructed in ASGI by using an infinite loop; in this case, the function never actual returns!