Reading the Express Source Code
The GitHub repository of one of the most widely used back-end frameworks with over 30 million weekly downloads defines itself as -
Fast, unopinionated, minimalist web framework for node.
Currently sitting at 64,000 stars and 5,000 commits, there's no denying the impact that Express has had in server side development for the web, having used by companies such as PayPal, Uber and IBM.
If you are reading this article, chances are you already know about what Express is. A powerful back-end framework used to build RESTAPIs, you can read more about it on their website. The main purpose of this article is to help people who genuinely want to start contributing to open source projects but falter when it comes to reading and understanding large codebases by providing a much-needed reference apart from docs, and today, we're looking at Express.
Given the old coding style of the codebase, discussed here, and even rewritten in typescript by many due to it's pre-ES2015 design patterns, we'll try our best to analyse and understand the source code.
Getting Started
Before diving into any open source project, we need to familiarise ourselves with folder structure of the project we are going to be dealing with. You can do this on GitHub itself or clone the repository locally to view it in your preferred code editor (recommended)
git clone https://github.com/expressjs/express.git
cd express
ls
Charter.md
Code-Of-Conduct.md
Collaborator-Guide.md
Contributing.md
History.md
LICENSE
Readme-Guide.md
Readme.md
Release-Process.md
Security.md
Triager-Guide.md
appveyor.yml
/benchmarks
/examples
index.js
/lib
package.json
/test
Let's focus on the core components and see how they are related:
The key element of the source code is /lib
-This is where the core of Express lives and where we will be spending most of our time
Other than that we have directories such as /benchmarks
, /examples
and /test
which can also be contributed to, but lie beyond the scope of this article.
Headfirst into /lib
Before we do this, we need to locate index.js
, the entry point of Express. You should see the following code:
'use strict';
module.exports = require('./lib/express');
use strict
directive throughout the codebase, which is a way to opt into a restricted variant of JavaScript, that helps catch common coding errors and improves the quality and security of code.When you install express into a Node application and import it, this file allows you to load and execute the ./lib/express
file, this is a common pattern in Node for structuring code into different files and directories, and then composing them together.
/lib/express.js
The express.js
file contains a factory function called createApplication
that creates and returns a new instance of the app each time it's called.
function createApplication() {
var app = function(req, res, next) {
app.handle(req, res, next);
};
// here we basically copy properties of event emitter and proto to app
mixin(app, EventEmitter.prototype, false);
mixin(app, proto, false);
// expose the prototype that will get set on requests
app.request = Object.create(req, {
app: { configurable: true, enumerable: true, writable: true, value: app }
})
// expose the prototype that will get set on responses
app.response = Object.create(res, {
app: { configurable: true, enumerable: true, writable: true, value: app }
})
app.init();
return app;
}
Observations we can make here are:
The
app.handle
method runs middleware functions that are registered on the app instance, whereapp
is defined as a function with the request, response and next parameters.The next two lines of code use the
merge-descriptors
library to copy the properties of event emitter and proto (defined in router/index.js) to the app instance.Finally, we initialise and return the app instance.
After this, the comments are pretty self explanatory, where we simply expose the prototypes, routing constructors and built-in middleware to the public interface, followed by a function that displays an error message if we were to use any middleware discontinued by Express.
Some of these middlewares (express.raw()
, express.json()
, express.text()
) rely on a library called body-parser
that parses request body in the format mentioned. We will see more about them in later sections.
/lib/request.js
Here we extend the prototype of existing IncomingMessages
from the http
Nodejs module and add multiple internal and external functions/methods on the request object. Frequently used methods such as req.body
, req.query
, etcetera are defined on this extended object
var req = Object.create(http.IncomingMessages.prototype)
Here too, we use a factory function to define a "getter" on the request object. Those who aren't aware, a getter is a function that returns some values, think of it as a "read" operation on an object. This is used extensively throughout this file.
function defineGetter(obj, name, getter) {
Object.defineProperty(obj, name, {
configurable: true,
enumerable: true,
get: getter
});
}
The entire thing provides utilities for header access, content negotiation, parameter retrieval, MIME type checking, protocol/security handling, IP address handling, hostname and subdomain parsing, URL/path parsing, request freshness checks, and AJAX request detection. This results in a convenient API for developers using Express.js to build web applications.
/lib/response.js
Very much like request.js
, this file also extends the prototype of a class from the http
module, in this case: ServerResponse
.
Not only does this file contain double the lines of code of request.js
, it also defines some of the most used functions that we use in APIs when sending a response, such as res.send()
, res.json()
, res.cookie()
, res.status()
, res.render()
, res.redirect()
and more. Since this is an object, we can chain these methods. For example, this is valid block of express code:
app.get("/", (req, res) => {
//send a message along with an OK status code (200)
return res.status(200).send("Express is cool!");
})
I highly recommend reading the code from this file, as it is very interesting to see how these functions we use so often are actually implemented behind the scenes.
/lib/application.js
The core application object is defined here along with several methods for configuration, each with high level JSDoc comments explaining what they do.
The first method we encounter is app.init
, which was used in the factory function from /lib/express.js
and starts the instance of the app.
Another important private method is app.lazyrouter()
, that is called whenever the application needs to access the router for the first time, not when the app is initialised. This is because the router config depends on app settings that are set after it has run. This is also mentioned in the code
/**
* lazily adds the base router if it has not yet been added.
* We cannot add the base router in the defaultConfiguration because
* it reads app settings which might be set after that has run.
*/
app.lazyrouter = function lazyrouter() {
//validation checks...
this._router.use(query(this.get('query parser fn')));
this._router.use(middleware.init(this));
}
};
We will talk about _router
in the next section.
The app.handle
method, which is also found in createApplication
is used to process incoming requests and sends an appropriate response taking into account any middleware it finds.
Another frequently used function is app.use
which is often used to add middleware to an express app. You may have used it like this:
app.use(express.json());
This particular middleware gets executed between the request-response cycle to parse the messages into JSON format. The implementation of this function has a line:
var fns = flatten(slice.call(arguments, offset));
This essentially "flattens" the incoming middleware which may or may not be an array. We can pass middleware in express in the following manner:
app.use([m1, [m2, m3]]);
In this case the flattening would make the array from [m1, [m2, m3]]
=> [m1, m2, m3]
.
Another very fascinating piece of code is this:
methods.forEach(function(method){
app[method] = function(path){
if (method === 'get' && arguments.length === 1) {
// app.get(setting)
return this.set(path);
}
this.lazyrouter();
var route = this._router.route(path);
route[method].apply(route, slice.call(arguments, 1));
return this;
};
});
This means that the authors did not have to manually define each http action verb, such as app.get
, app.post
, app.delete
. Instead, this loop delegates the HTTP action to the application object dynamically.
Lastly, we have the app.listen()
method:
app.listen = function listen() {
var server = http.createServer(this);
return server.listen.apply(server, arguments);
};
As you can see, it is merely a wrapper around the http.createServer()
function. It takes in a port and a callback, and is used like this
app.listen(8080, () => console.log("server alive"));
Router
The entire routing logic for express is contained within this folder. Each express app has a _router
object, which is an instance of the Router object. This _router
has three main components: Router, Route, Layer.
Layer
Layer is a structure that consists of the path or method of the incoming request and a handler for middleware functions and is defined in router/layer.js
.
function Layer(path, options, fn) {
var opts = options || {};
this.handle = fn;
this.name = fn.name || '<anonymous>';
this.params = undefined;
this.path = undefined;
this.regexp = pathRegexp(path, this.keys = [], opts);
// set fast path flags
this.regexp.fast_star = path === '*'
this.regexp.fast_slash = path === '/' && opts.end === false
}
Layer also has a match
method attached to its prototype that matches the path or method of an incoming request.
Router
The router is a mini express application that is used for modularity of routes that is capable of handling its own set of middleware and routes that can then be mounted on the main app instance. It is used to define routes for just a part of the app.
const express = require("express");
const router = express.Router();
router.get("/", (middeware));
module.exports = router;
The main difference between app.get()
and router.get()
is that the former is attached directly to the main application instance and have a global scope across the entire app. On the other hand, router.get()
is defined on the router instance and is limited to its scope.
The Router maintains a stack of Layers that is implemented using arrays. A new Route and Layer object is created when app.get
is used is pushed onto the stack. An incoming request causes _router to go through all layers in its stack until the layers path matches the request path using the match function we discussed earlier.
Route
Unlike the router, which matches the path, the router matches the HTTP method and executes the associated layer's handler function. Each route has its own stack containing the name of the HTTP method name.
function Route(path) {
this.path = path;
this.stack = [];
debug('new %o', path)
// route handlers for various http methods
this.methods = {};
}
Since we can have multiple middleware, they are executed in a serial manner after flattening the middleware, with an idx
variable involved keeping track of the position of the middleware in the stack.
Route.prototype.dispatch = function dispatch(req, res, done) {
var idx = 0;
var stack = this.stack;
if (stack.length === 0) {
return done();
}
//...
}
Here is a diagram to understand how the Router, Route and Layer are related:
Putting It All Together ๐ฒ
Now that we know how the code is structured, let's understand the workflow of an express app through an example
//step 1
const express = require("express")
const app = express();
//step 2
app.use(express.json());
//other middleware;
//eg. app.use(cors());
app.get("/", (req, res) => {
res.status(200).send("Hello");
});
app.listen(8181, () => {
console.log("server running on http://localhost:8181"))
}
Importing and creating the application instance: During this step, the
createApplication
factory function is called, that creates and returns a fresh express app instance.Middleware: When
app.use()
is called, _router creates a new Layer with its path and handler function and pushes it to the Router stack. Since this example does not contain any path, it will be '/' by default, meaning this middleware (in this case, parsing the body as JSON) will be executed on every incoming request.In a real world scenario, you would have several middleware that need to be executed in order, for which express uses
next()
, which is essentially a signal that the next Layer in the stack is ready to be executed. As discussed before, it holds a closure overapp.handle
as it uses anidx
variable to keep track of its position in the stack.proto.handle = function handle(req, res, out) { var self = this; debug('dispatching %s %s', req.method, req.url); var idx = 0; //.... function next(err) { //... } // find next matching layer var layer; var match; var route; //.... }
Creating the route(s): On reaching
app.get
, _router creates a new Route and Layer object all while setting the path and handler accordingly.proto.route = function route(path) { var route = new Route(path); var layer = new Layer(path, { sensitive: this.caseSensitive, strict: this.strict, end: true }, route.dispatch.bind(route)); layer.route = route; this.stack.push(layer); return route; };
Listening for incoming requests: As I said earlier, the express app listens for incoming requests and is simply a wrapper around
http.createServer
from Node and usesserver.listen
to expose the application. But the magic of express starts here.The moment a request arrives, it is handled by
app.handle
increateApplication
var app = function(req, res, next) { app.handle(req, res, next); };
which is defined at
/lib/application.js
app.handle = function handle(req, res, callback) { var router = this._router; // final handler var done = callback || finalhandler(req, res, { env: this.get('env'), onerror: logerror.bind(this) }); router.handle(req, res, done); };
Which then calls
router.handle
that essentially loops over all the Layers until the request path is matched.while (match !== true && idx < stack.length) { layer = stack[idx++] match = matchLayer(layer, path) route = layer.route if (match !== true) { continue } //... }
We then send a response using the methods defined on the response object in
/lib/response.js
, ie,res.status
andres.send
, which wraps around the NodeJSServerResponse
object.
The application continues to listen for incoming requests on the port, in this case 8181, ready to handle and respond to each request according to the defined routes and middleware.
A Primer On Everything So Far
If you've made it till here, great! Let's go over everything we've talked about until now, so you don't have to scroll too much :)
First, we had a look at the folder structure of the source code. In particular the
/lib
directory, which is at the core of express. Within this folder, we've gotexpress.js
which defines a factory function that runs when an instance of an app is made.We took a look at other files such as
application.js
,request.js
,response.js
that define the application object, request object and response object along with some very important methods that are often used in express applications.Then, we understood how we can create an entire mini-application in express using the
Router
object, and how Layers, a custom structure, are used internally to store paths and handlers of routes and middleware.Finally, I used an example of a minimal express application to demonstrate the entire flow of a request-response and how it is carried out internally.
What's Next?
Now that you have understood how the codebase is structured and the internal working, head over to the open issues tab on GitHub, and look if you can take up any of them!
Or if you're still having second thoughts, have another read, or go through the code and meddle with it locally. Here is how you can locally use and play around with the cloned repository.
Another useful tip would be to read the tests, which I skimmed over since they would make this article too long. But reading tests is a great way to understand how the code is supposed to run, making it easier to find loopholes and logical errors in the code.
const express = require("../express");
Mess around and find out, Google is your best friend too.
Trivia
Express was originally maintained by TJ Holowaychuk, before being placed under the stewardship of the Node.js Foundation incubator.
Express.js was inspired by Sinatra, a minimalist web framework for Ruby. This influence can be seen in Express's design philosophy of being minimalist.
The first commit was made on June 27, 2009
Thanks for staying till the end.
Drop a like if you enjoyed reading or if this provided some value to you! I would love it if you followed me on my socials.
This is something new I'm trying, so that people don't get intimidated by large codebases and also as a learning experience for myself. Let me know what codebase I should explain next and also any feedback on this one.
See y'all in the next one. ๐พ
Tabish.
Subscribe to my newsletter
Read articles from Tabish Naqvi directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
Tabish Naqvi
Tabish Naqvi
Hey, I'm a sophomore computer science student and a self-taught full stack developer who's into tech, startups and building projects that really matter. When I'm not coding, I'm either brainstorming my next project, reading or hanging out.