15 December 2014

Wordpress on Hiphop / Nginx / Varnish

I recently was asked to investigate speeding up one of the Wordpress sites of a fairly large government organization in Britain.  A large part of my investigation focused on the server stack because I felt that we could get more out of the hardware that was provisioned for us.

I decided to set up a stack on my development machine to see how it would work and if it was feasible.  I settled on nginX with hiphop and a Varnish frontend cache.  I realize that nginX would be just fine as the cache and server but in this particular case it would not be possible to replace Apache with nginx on the live server.  I also wanted to experiment with ESI and it looked better documented in Varnish than nginx.

Installing HHVM is very easy:

 wget -O - http://dl.hhvm.com/conf/hhvm.gpg.key | apt-key add -  
 echo deb http://dl.hhvm.com/ubuntu saucy main | tee /etc/apt/sources.list.d/hhvm.list  
 apt-get update  
 apt-get install hhvm  

Installing nginx is also very easy:

 sudo apt-get update  
 sudo apt-get install nginx  

Instead of manually configuring nginx to use hhvm I used a tool which ships with it (found at /usr/share/hhvm/install_fastcgi.sh).  The Github page has documentation (here) in case you don't want to use the packaged install script.  Note that the install script will install for Apache and nginx.

There is a tool (here) that will migrate your Apache config to nginx.  I used it to get a demonstration config file which I then edited after RTFM on nginx config.

My test config nginx file ( /etc/nginx/sites-enabled/default ) looks like the snippet below.

 # Read http://codex.wordpress.org/Nginx  
 #    http://wiki.nginx.org/Pitfalls  
 #    http://wiki.nginx.org/QuickStart  
 #    http://www.queryadmin.com/854/secure-wordpress-nginx/  
 #    http://tautt.com/best-nginx-configuration-for-security/  
 #  
 #    Generate your key with: openssl dhparam -out /etc/nginx/ssl/dhparam.pem 2048  
 #    Generate certificate: sudo openssl req -x509 -nodes -days 365 -newkey rsa:2048 -keyout /etc/nginx/ssl/nginx.key -out /etc/nginx/ssl/nginx.crt  
 server_tokens off;  
 add_header X-Frame-Options SAMEORIGIN;  
 add_header X-Content-Type-Options nosniff;  
 add_header X-XSS-Protection "1; mode=block";  
 add_header Content-Security-Policy "default-src 'self'; script-src 'self' 'unsafe-inline' 'unsafe-eval' https://ssl.google-analytics.com https://assets.zendesk.com https://connect.facebook.net; img-src 'self' https://ssl.google-analytics.com https://s-static.ak.facebook.com https://assets.zendesk.com; style-src 'self' 'unsafe-inline' https://fonts.googleapis.com https://assets.zendesk.com; font-src 'self' https://themes.googleusercontent.com; frame-src https://assets.zendesk.com https://www.facebook.com https://s-static.ak.facebook.com https://tautt.zendesk.com; object-src 'none'";  
 server {  
   server_name www.example.com;  
   listen 8080;  
   root /home/web/sites/default/html/;  
   index index.php;  
   access_log /home/web/sites/default/logs/access.log combined;  
   error_log /home/web/sites/default/logs/error.log warn;  
   include /home/web/sites/default/html/nginx.conf;  
   location / {  
     # include the "?$args" part so non-default permalinks doesn't break when using query string  
     try_files /wp-content/w3tc/pgcache/$cache_uri/_index.html $uri $uri/ /index.php?$args ;  
   }  
   location /wp-admin/ {  
     return 301 https://$server_name$request_uri;  
   }  
   location /mystery-login {  
     return 301 https://$server_name$request_uri;  
   }  
   # Prevent any potentially-executable files in the uploads directory from being executed  
   location ~* /uploads/ {  
     location ~ \.php {return 403;}  
   }  
   # Do not log favicon.ico requests  
   location = /favicon.ico {  
     log_not_found off;  
     access_log off;  
   }  
   # Do not log robots.txt requests  
   location = /robots.txt {  
     allow all;  
     log_not_found off;  
     access_log off;  
   }  
   location ~* \.(js|css|png|jpg|jpeg|gif|ico)$ {  
     expires max;  
     log_not_found off;  
   }  
   include global/w3tc.conf;  
   # Common deny or internal locations, to help prevent access to not-public areas  
   location ~* wp-admin/includes { deny all; }  
   location ~* wp-includes/theme-compat/ { deny all; }  
   location ~* wp-includes/js/tinymce/langs/.*\.php { deny all; }  
   location /wp-content/ { internal; }  
   location /wp-includes/ { internal; }  
   location ~* wp-config.php { deny all; }  
   # Rewrite rules for Wordpress SEO by Yoast  
   rewrite ^/sitemap_index\.xml$ /index.php?sitemap=1 last;  
   rewrite ^/([^/]+?)-sitemap([0-9]+)?\.xml$ /index.php?sitemap=$1&sitemap_n=$2;  
   # Add trailing slash to */wp-admin requests  
   rewrite /wp-admin$ $scheme://$host$uri/ permanent;  
   # Redirect 403 errors to 404 error to fool attackers  
   error_page 403 = 404;  
   # Deny all attempts to access hidden files such as .htaccess, .htpasswd, .DS_Store (Mac).  
   # Keep logging the requests to parse later (or to pass to firewall utilities such as fail2ban)  
   location ~ /\. {  
     deny all;  
   }  
   location ~ \.php$ {  
     fastcgi_split_path_info ^(.+?\.php)(/.*)$;  
     if (!-f $document_root$fastcgi_script_name) {  
       return 404;  
     }  
     fastcgi_keep_conn on;  
     fastcgi_pass  127.0.0.1:9000;  
     fastcgi_index index.php;  
     fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;  
     include fastcgi_params;  
   }  
 }  
 server {  
   server_name example.com;  
   listen 8080;  
   return 301 $scheme://www.example.com$request_uri;  
 }  
 server {  
   server_name www.example.com;  
   listen 443 ssl;  
   root /home/web/sites/default/html/;  
   index index.php;  
   access_log /home/web/sites/default/logs/access_ssl.log combined;  
   error_log /home/web/sites/default/logs/error_ssl.log warn;  
   # enable session resumption to improve https performance  
   # http://vincent.bernat.im/en/blog/2011-ssl-session-reuse-rfc5077.html  
   ssl_session_cache shared:SSL:50m;  
   ssl_session_timeout 5m;  
   # Diffie-Hellman parameter for DHE ciphersuites, recommended 2048 bits  
   ssl_dhparam /etc/nginx/ssl/dhparam.pem;  
   # enables server-side protection from BEAST attacks  
   # http://blog.ivanristic.com/2013/09/is-beast-still-a-threat.html  
   ssl_prefer_server_ciphers on;  
   # disable SSLv3(enabled by default since nginx 0.8.19) since it's less secure then TLS http://en.wikipedia.org/wiki/Secure_Sockets_Layer#SSL_3.0  
   ssl_protocols TLSv1 TLSv1.1 TLSv1.2;  
   # ciphers chosen for forward secrecy and compatibility  
   # http://blog.ivanristic.com/2013/08/configuring-apache-nginx-and-openssl-for-forward-secrecy.html  
   ssl_ciphers 'ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-AES256-GCM-SHA384:kEDH+AESGCM:ECDHE-RSA-AES128-SHA256:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-SHA:ECDHE-ECDSA-AES128-SHA:ECDHE-RSA-AES256-SHA384:ECDHE-ECDSA-AES256-SHA384:ECDHE-RSA-AES256-SHA:ECDHE-ECDSA-AES256-SHA:DHE-RSA-AES128-SHA256:DHE-RSA-AES128-SHA:DHE-RSA-AES256-SHA256:DHE-DSS-AES256-SHA:AES128-GCM-SHA256:AES256-GCM-SHA384:ECDHE-RSA-RC4-SHA:ECDHE-ECDSA-RC4-SHA:RC4-SHA:HIGH:!aNULL:!eNULL:!EXPORT:!DES:!3DES:!MD5:!PSK';  
   # enable ocsp stapling (mechanism by which a site can convey certificate revocation information to visitors in a privacy-preserving, scalable manner)  
   # http://blog.mozilla.org/security/2013/07/29/ocsp-stapling-in-firefox/  
   resolver 8.8.8.8;  
   ssl_stapling off;    # can turn on if cert allows  
   ssl_trusted_certificate /etc/nginx/ssl/nginx.crt;  
   # config to enable HSTS(HTTP Strict Transport Security) https://developer.mozilla.org/en-US/docs/Security/HTTP_Strict_Transport_Security  
   # to avoid ssl stripping https://en.wikipedia.org/wiki/SSL_stripping#SSL_stripping  
   add_header Strict-Transport-Security "max-age=31536000; includeSubdomains;";  
   ssl_certificate /etc/nginx/ssl/nginx.crt;  
   ssl_certificate_key /etc/nginx/ssl/nginx.key;  
   location / {  
     # include the "?$args" part so non-default permalinks doesn't break when using query string  
     try_files /wp-content/w3tc/pgcache/$cache_uri/_index.html $uri $uri/ /index.php?$args ;  
   }  
   # Add trailing slash to */wp-admin requests  
   rewrite /wp-admin$ $scheme://$host$uri/ permanent;  
   include /home/web/sites/default/html/nginx.conf;  
   rewrite ^(/)?mystery-login/?$ /wp-login.php?$query_string break;  
   location ~ \.php$ {  
     fastcgi_split_path_info ^(.+?\.php)(/.*)$;  
     if (!-f $document_root$fastcgi_script_name) {  
       return 404;  
     }  
     fastcgi_keep_conn on;  
     fastcgi_pass  127.0.0.1:9000;  
     fastcgi_index index.php;  
     fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;  
     include fastcgi_params;  
   }  
 }  
Take note of the port_in_redirect because it can help with issues around Varnish or nginx including 8080 in the url when doing a redirect (this can happen if you access a url without a trailing slash).  If you're getting port 8080 and you've tried this then also double check your Wordpress site config to make sure the site root does not include 8080.

Varnish is included in the Ubuntu packages, but on their site they recommend rather using the packages supplied by varnish-cache.org.  They list the steps required to set it up and I'm not going to reproduce them here because they might change - rather go to their site.

To configure Varnish on Ubuntu you need to nano /etc/default/varnish.  On RHEL this file is /etc/sysconfig/varnish

There are a number of options provided.  The easiest way to get running is to pick alternative 2 by commenting out the other options.

Just make sure to change the port to 80 as below:

 DAEMON_OPTS="-a :80 \  
        -T localhost:6082 \  
        -f /etc/varnish/default.vcl \  
        -S /etc/varnish/secret \  
        -s malloc,256m"  

At this point Varnish will listen for incoming web requests on port 80 and all we need to do is wire it up to nginx. To do so nano /etc/varnish/default.vcl

It stitched my varnish configuration from a number of sources and will run through it piece by piece here.

Firstly we tell Varnish where to find nginx and set up an authentication that identifies the local machine (more later).

 backend default {  
  .host = "localhost";  
  .port = "8080";  
 }  
 acl purge {  
  "127.0.0.1";  
  "localhost";  
 }  

After that we add to the various hooks that Varnish provides.  The code below will likely need to be modified for your site.  I got it from a variety of sources and there might even be some unnecessary duplication.

 sub vcl_recv {  
   # only using one backend  
   set req.backend = default;  
   # only cache example.com and optionally the www subdomain  
   if (req.http.host !~ "(www)?example.com") {  
      return(pass);  
   }  
   # remove cookie from static content and always return cached version  
   if (req.url ~ "\.(png|gif|jpg|swf|css|js)$") {  
     unset req.http.cookie;  
     return(lookup);  
   }  
   # allow for purge option but only from the site we allow  
   if (req.request == "PURGE") {  
    if (!client.ip ~ purge) {  
     error 405 "Not allowed.";  
    }  
    ban("req.url ~ "+req.url+" && req.http.host == "+req.http.host);  
    error 200 "OK";  
   }  
   # set standard proxied ip header for getting original remote address  
   set req.http.X-Forwarded-For = client.ip;  
   # logged in users must always pass  
   if( req.url ~ "^/wp-(login|admin)" || req.http.Cookie ~ "wordpress_logged_in_" ){  
     return (pass);  
   }  
   # don't cache search results  
   if( req.url ~ "\?s=" ){  
   #  return (pass);  
   }  
   # always pass through posted requests and those with basic auth  
   if ( req.request == "POST" || req.http.Authorization ) {  
      return (pass);  
   }  
   # remove cookies from everything other than admin areas so we can cache content  
   if (!(req.url ~ "wp-(login|admin)")) {  
     unset req.http.cookie;  
   }  
   # else ok to fetch a cached page  
   return (lookup);  
 }  
 sub vcl_fetch {  
   # remove some headers we never want to see  
   unset beresp.http.Server;  
   unset beresp.http.X-Powered-By;  
   unset beresp.http.X-Pingback;  
   set beresp.do_esi = true; /* Do ESI processing */  
   set beresp.ttl = 24h;  
   # don't cache response to posted requests or those with basic auth  
   if ( req.request == "POST" || req.http.Authorization ) {  
      return (hit_for_pass);  
   }  
   # only cache status ok  
   if ( beresp.status != 200 ) {  
     return (hit_for_pass);  
   }  
   # remove cookies from static content  
   if (req.url ~ "\.(png|gif|jpg|swf|css|js)$") {  
    unset beresp.http.set-cookie;  
   }  
   # Drop any cookies Wordpress tries to send back to the client.  
   if (!(req.url ~ "wp-(login|admin)")) {  
     unset beresp.http.set-cookie;  
   }  
   # else ok to cache the response  
   return (deliver);  
 }  
 sub vcl_deliver {  
   if (obj.hits > 0) {  
     set resp.http.X-Cache = "HIT";  
   }  
   else {  
     set resp.http.X-Cache = "MISS";  
   }  
   unset resp.http.Via;  
   unset resp.http.X-Varnish;  
 }  
 sub vcl_hit {  
  if (req.request == "PURGE") {  
   purge;  
   error 200 "OK";  
  }  
 }  
 sub vcl_miss {  
  if (req.request == "PURGE") {  
   purge;  
   error 404 "Not cached";  
  }  
 }  
 sub vcl_hash {  
   hash_data( req.url );  
   if ( req.http.host ) {  
     hash_data( regsub( req.http.host, "^([^\.]+\.)+([a-z]+)$", "\1\2" ) );  
   } else {  
     hash_data( server.ip );  
   }  
   # ensure separate cache for mobile clients (WPTouch workaround)  
   if( req.http.User-Agent ~ "(iPod|iPhone|incognito|webmate|dream|CUPCAKE|WebOS|blackberry9\d\d\d)" ){  
     hash_data("touch");  
   }  
   return (hash);  
 }  

For further reading I recommend:

Tip