Scaling Node.js Applications

Scaling Node.js applications can be a challenge. JavaScript’s single threaded nature prevents Node from taking advantage of modern multi-core machines. For example, the following code implements a bare bones HTTP server, which listens on the port number passed in from the command line. This code will execute in a single thread, whether it’s run on a single core machine or a 1,000 core machine.

var http = require("http");
var port = parseInt(process.argv[2]);

http.createServer(function(request, response) {
  console.log("Request for:  " + request.url);
  response.writeHead(200);
  response.end("hello world\n");
}).listen(port);

Taking Advantage of Multiple Cores

With a little work, the previous code can be modified to utilize all of the available cores on a machine. In the following example, the HTTP server is refactored using the cluster module. cluster allows you to easily create a network of processes which can share ports. In this example, a separate process is spawned for each system core, as defined by the numCPUs variable. Each of the child processes then implements the HTTP server, by listening on the shared port.

var cluster = require("cluster");
var http = require("http");
var numCPUs = require("os").cpus().length;
var port = parseInt(process.argv[2]);

if (cluster.isMaster) {
  for (var i = 0; i < numCPUs; i++) {
    cluster.fork();
  }

  cluster.on("exit", function(worker, code, signal) {
    cluster.fork();
  });
} else {
  http.createServer(function(request, response) {
    console.log("Request for:  " + request.url);
    response.writeHead(200);
    response.end("hello world\n");
  }).listen(port);
}

Scaling Across Machines

Using the cluster module, you can more effectively take advantage of modern hardware. However, you are still limited by the resources of a single machine. If your application receives significant traffic, eventually you will need to scale out to multiple machines. This can be done using a reverse proxy server to load balance the incoming requests among multiple servers.

Nodejitsu has developed node-http-proxy, an open source proxy server for Node applications. The module can be installed using the following command.

npm install http-proxy

The actual reverse proxy server is shown below. In this example, the load is balanced between two servers running on the local machine. Before testing the reverse proxy, ensure that the original HTTP server application is running on ports 8080 and 8081. Next, launch the reverse proxy and connect to it using a browser. If everything is working properly, you should notice that requests are alternated between the two HTTP servers.

var proxyServer = require('http-proxy');
var port = parseInt(process.argv[2]);
var servers = [
  {
    host: "localhost",
    port: 8081
  },
  {
    host: "localhost",
    port: 8080
  }
];

proxyServer.createServer(function (req, res, proxy) {
  var target = servers.shift();

  proxy.proxyRequest(req, res, target);
  servers.push(target);
}).listen(port);

Of course, this example only uses one machine. However, if you have access to multiple machines, you can run the reverse proxy server on one machine, while one or more machines run the HTTP server.

Scaling Using nginx

Using a Node reverse proxy is nice because it keeps your entire software stack in the same technology. However, in production systems, it is more common to use nginx to handle load balancing. nginx is an open source HTTP server and reverse proxy that is extremely good at serving static files such as CSS and HTML. Therefore, nginx can be used to handle tasks such as caching and serving the static content on your site, while forwarding requests for dynamic content to the Node server(s).

To implement nginx load balancing, you simply need to install nginx, then add the Node servers as upstream resources in the server configuration file. The configuration file is located at {nginx-root}/conf/nginx.conf, where {nginx-root} is the nginx root installation directory. The entire configuration file is shown below, however we are only interested in a few pieces. There is also a good chance that your file will look different if you have performed any customization.


#user  nobody;
worker_processes  1;

#error_log  logs/error.log;
#error_log  logs/error.log  notice;
#error_log  logs/error.log  info;

#pid        logs/nginx.pid;


events {
    worker_connections  1024;
}


http {
    include       mime.types;
    default_type  application/octet-stream;

    #log_format  main  '$remote_addr - $remote_user [$time_local] "$request" '
    #                  '$status $body_bytes_sent "$http_referer" '
    #                  '"$http_user_agent" "$http_x_forwarded_for"';

    #access_log  logs/access.log  main;

    sendfile        on;
    #tcp_nopush     on;

    #keepalive_timeout  0;
    keepalive_timeout  65;

    #gzip  on;

    upstream node_app {
      server 127.0.0.1:8080;
      server 127.0.0.1:8081;
    }

    server {
        listen       80;
        server_name  localhost;

        #charset koi8-r;

        #access_log  logs/host.access.log  main;

        location / {
            root   html;
            index  index.html index.htm;
        }

        location /foo {
          proxy_redirect off;
          proxy_set_header   X-Real-IP            $remote_addr;
          proxy_set_header   X-Forwarded-For  $proxy_add_x_forwarded_for;
          proxy_set_header   X-Forwarded-Proto $scheme;
          proxy_set_header   Host                   $http_host;
          proxy_set_header   X-NginX-Proxy    true;
          proxy_set_header   Connection "";
          proxy_http_version 1.1;
          proxy_pass         http://node_app;
        }

        #error_page  404              /404.html;

        # redirect server error pages to the static page /50x.html
        #
        error_page   500 502 503 504  /50x.html;
        location = /50x.html {
            root   html;
        }

        # proxy the PHP scripts to Apache listening on 127.0.0.1:80
        #
        #location ~ \.php$ {
        #    proxy_pass   http://127.0.0.1;
        #}

        # pass the PHP scripts to FastCGI server listening on 127.0.0.1:9000
        #
        #location ~ \.php$ {
        #    root           html;
        #    fastcgi_pass   127.0.0.1:9000;
        #    fastcgi_index  index.php;
        #    fastcgi_param  SCRIPT_FILENAME  /scripts$fastcgi_script_name;
        #    include        fastcgi_params;
        #}

        # deny access to .htaccess files, if Apache's document root
        # concurs with nginx's one
        #
        #location ~ /\.ht {
        #    deny  all;
        #}
    }


    # another virtual host using mix of IP-, name-, and port-based configuration
    #
    #server {
    #    listen       8000;
    #    listen       somename:8080;
    #    server_name  somename  alias  another.alias;

    #    location / {
    #        root   html;
    #        index  index.html index.htm;
    #    }
    #}


    # HTTPS server
    #
    #server {
    #    listen       443;
    #    server_name  localhost;

    #    ssl                  on;
    #    ssl_certificate      cert.pem;
    #    ssl_certificate_key  cert.key;

    #    ssl_session_timeout  5m;

    #    ssl_protocols  SSLv2 SSLv3 TLSv1;
    #    ssl_ciphers  HIGH:!aNULL:!MD5;
    #    ssl_prefer_server_ciphers   on;

    #    location / {
    #        root   html;
    #        index  index.html index.htm;
    #    }
    #}

}

As I mentioned, for the purposes of this article, we are only interested in a few pieces of the file. The first interesting piece is shown below. This part of the configuration defines an upstream server named node_app, which is balanced between two IP addresses.

upstream node_app {
  server 127.0.0.1:8080;
  server 127.0.0.1:8081;
}

Simply defining the upstream server does not tell nginx how to use it. Therefore, we must define a route using the following directives. Using this route, any requests to /foo are proxied upstream to one of the Node servers.

location /foo {
  proxy_redirect off;
  proxy_set_header   X-Real-IP            $remote_addr;
  proxy_set_header   X-Forwarded-For  $proxy_add_x_forwarded_for;
  proxy_set_header   X-Forwarded-Proto $scheme;
  proxy_set_header   Host                   $http_host;
  proxy_set_header   X-NginX-Proxy    true;
  proxy_set_header   Connection "";
  proxy_http_version 1.1;
  proxy_pass         http://node_app;
}

Conclusion

This article has shown you how to scale Node.js applications from a single thread to multiple processes executing on multiple machines. You have also learned how to set up a load balancer using both Node and nginx. Please note that this article is not intended to be a comprehensive guide to running Node applications in production. If you are using nginx, there are additional tweaks which can increase performance, such as caching. You would also want to use a tool such as forever to restart your Node processes after a crash.

8 thoughts on “Scaling Node.js Applications

  1. In the context of this article, how about scaling a database? I have no experience with this and I’m interested how it is done. Say your node clusters are using a database. If all servers connect to the same database it may choke, if they connect to different databases how to keep them synchronized?

    • The basics of scaling a database come down to these three primitive operations – indexing (i.e. denormalisation), replication & sharding.

      Once you grok the above, it’s actually much less of a concern what datastore you’re using. However, you should always think about replication from a high-availability & failover perspective.

      That being said, if you’re big enough (like Amazon/Facebook/etc. insanely big), you’ll have to come to grips with making some of your data eventually-consistent. That means master-master replication – cassandra, riak, etc.

    • I’m not here to defend nginx, but it sounds like they had a lot of other problems. In any case, nginx is nice because it is a proven commodity. I love node, but there’s no need to reinvent the wheel when nginx is very good at serving static content and proxying.

    • If you’re doing serious traffic, serve static content from a decent CDN (origin pull). Then it’s much less of a concern what static server you’re using.

      For load balancing dynamic resources (plus support for websockets, etc.), use HAProxy.