13 July 2015

Setting up Nginx as a reverse proxy for Apache with SSL termination

Reverse Proxy diagram from Wiki Commons
We're currently hosting client sites on a Rackspace server and using their Load Balancer feature as a way to terminate SSL and not have to have multisite certificates.

We only attach one node to the Load Balancer so we're paying for more than we're using.  My proof of concept is to use Nginx to terminate SSL certificates and proxy to the Apache server.  This will save us £ 225 per load balancer, and since we're using ten of them that's quite a significant saving.

My first step was to spin up a free tier EC2 instance running Ubuntu 14.04 LTS.  I guess you can replace this with your favourite cloud or on-the-metal server.

Then I installed my packages. These are the ones I remember so YMMV.

 sudo apt-get install nginx apache2 fail2ban php5-fpm mcrypt php5-mcrypt openssl php5-cli php5 libapache2-mod-php  

My network diagram is slightly different from the picture for this post in that the web server is hosted on the same machine as the proxy.

I decided to run Apache on port 8000 and listen only to connections from localhost. Nginx would listen on port 80 and forward requests to Apache. I decided to let Nginx serve static content because it's pretty quick at doing so and this saves Apache from being overwhelmed by requests.

Configuring Apache

My first port of call was to edit /etc/apache2/ports.conf and make sure that my Listen line looks like this: Listen 127.0.0.1:8000

Then I created two virtual hosts to test with.  Here's a sample:

 <VirtualHost *:8000>  
      ServerName dummy1.mydomain.co.uk  
      ServerAdmin webmaster@localhost  
      DocumentRoot /var/www/dummy3/  
      ErrorLog ${APACHE_LOG_DIR}/dummy3_error.log  
      CustomLog ${APACHE_LOG_DIR}/dummy3_access.log combined  
 </VirtualHost>  

I made a simple index.php file in two new directories /var/www/dummy1 and /var/www/dummy3 which just output two server variables for me to test with. I also copied an image file into those directories so that I could check how static assets would be served.

 <?php  
 echo $_SERVER['SCRIPT_FILENAME'] . '<br>';  
 echo $_SERVER['SERVER_SOFTWARE'] . '<br>';  

Configuring Nginx

I decided to use self-signed certificates for testing and reserve dummy2 for a trial run of a free ssl certificate.  There are quite a few certificate signers who will give you a 30 day certificate to trial.

I created an /etc/nginx/ssl directory because for some reason I don't like to contaminate my conf.d directory and made subdirectories for my sites under that.

I created self-signed certificates (commands are at the top of my host file) and set up the vhosts like this:

Now when I hit a static file on HTTP or HTTPS then Nginx serves it up directly. Inspecting the response headers with your favourite browsers debug tool will confirm that images are served by Nginx. Visiting the index file shows that it's loading the correct one and is being handled by Apache. Lastly, checking the certificate will show you that each site is using its own certificate.

That has the potential of saving my company £ 2200, which is a happy thing to be able to do in your first week while your boss is watching :)

10 July 2015

Securing Jenkins with oAuth

Jenkins is pretty easy to secure with the help of some useful plugins.

The first that I suggest using is an oAuth provider.  Our repositories are hosted on Bitbucket so I'm using their oAuth, but there is also a Github oAuth plugin.  The instructions to set up the plugin are very clear (see the plugin page).

When you're configuring your Jenkins to use the oAuth security remember to leave the Authorization setting to "logged in users can do anything" for now.  We'll change this later, but we don't want to get locked out of Jenkins when we apply the security settings.

Now install the plugin Role Based Authentication Strategy (see the plugin page).

Add a new group called "Anonymous" and uncheck everything.

When a user logs into the oAuth they'll be given a message by Jenkins saying that they don't have any permissions.  This means that not everybody with a Bitbucket account can access your site so thats a good thing.

You just need to add them to the roles plugin settings.  Click Manage Jenkins then Manage and Assign Roles.  Click on assign roles and add the user.  Then tick the boxes of the roles you want to assign them.




Checking the SSL certificates for a list of domains

We have a number of domains that are secured by SSL and need to be able to automate checks for certificate validity and expiry.

Luckily there is a script to do exactly this.  On Ubuntu you can apt-get-install ssl-cert-check but there are copies of the script online in case your distro doesn't have it as a package.

Create a file with a list of the domains you want to check and the port to check on.  It should look something like this:
 yourdomain.com 443  
 www.anotherdomain.com 443  
 www.yetanotherclientdomain.com 443  

Lets assume you called your file domainlist.txt

You can then run ssl-cert-check -f domainlist.txt to get the tool to run through it and display to console the status and expiry date of the domains you provided.

Using the options shown in the help page for the script lets you use the script to send an email to you if a certificate is going to expire soon.

ssl-cert-check -a -f domainlist.txt -q -x 30 -e yourmail@foo.com

If you get a message about a missing mail binary you'll spot that in the script (line 243) it looks in a variety of locations for a file called mail or mailx.  An appropriate binary in Ubuntu is contained in the heirloom-mailx package so installing that will solve your problem.

27 May 2015

Allowing the whole wide world to read your S3 bucket

This is a bucket policy that you can use to allow the whole world the ability to read files in your s3 bucket.

You might want to do this if you're serving static web content from S3 and don't need the fine grained control that the Amazon documentation details.


You will still need to set up permissions on the bucket but this policy will let people read the files you're storing on S3.

26 May 2015

Storing large values in Memcached with PHP

Memcached saved my users a minute per query
I'm working on a business intelligence tool that requires as an intermediary calculation a list of UK postcodes that are within a radius of a user supplied postcode.

It currently takes about 7 seconds to query my Postgres database to get this list out.  Unfortunately I need to do this several times as part of a goal seeking function so I need to greatly improve this lookup speed.

I'm already using the Postgres earthdistance module and have properly indexed my table so I realized that I needed to look for a caching solution.

Memcached places limits on the size of the value you can store.  The default setup is 1meg and I'm reluctant to change this because it adds to the deployment burden.  My result sets were sometimes up to 4 megs large - searching on a 20 mile radius in London yields a lot of postcodes!

My idea was to split the large piece of data into several smaller pieces and to place an index referencing the pieces as the value for the key we're trying to store.

I decided to make use of PHP's gzcompress() function to reduce the size of the element because I felt that the time I spend compressing the data is still going to be drastically less than running the query and I want to try my best to avoid cache evictions.

I'm currently using Laravel so the code snippets below use the facades made available by Laravel.  I think the code is readable enough to extend to other PHP environments and I think the approach could also be ported to other languages.




20 May 2015

Using Fail2Ban to protect a Varnished site from scrapers

I'm using Varnish to cache a fairly busy property site.  Varnish works like a bomb for normal users and has greatly improved our page load speed.

For bots that are scraping the site, presumably to add the property listings to their own site, though the cache is next to useless since the bots are sequentially trawling through the whole site.

I decided to use fail2ban to block IP's who hit the site too often.

The first step in doing so was to enable a disk based access log for Varnish so that fail2ban will have something to work with.

This means setting up varnishncsa.  Add this to your /etc/rc.local file:

 varnishncsa -a -w /var/log/varnish/access.log -D -P /var/run/varnishncsa.pid  

This starts up varnishncsa in daemon mode and appends Varnish access attempts to /var/log/varnish/access.log

Now edit or create /etc/logrotate.d/varnish and make an entry to rotate this access log:

  /var/log/varnish/*log {   
     create 640 http log   
     compress   
     postrotate   
       /bin/kill -USR1 `cat /var/run/varnishncsa.pid 2>/dev/null` 2> /dev/null || true   
     endscript   
   }   

Install fail2ban. On Ubuntu you can apt-get install fail2ban

Edit /etc/fail2ban/jail.conf and add a block like this:

 [http-get-dos]  
 enabled = true  
 port = http,https  
 filter = http-get-dos  
 logpath = /var/log/varnish/access.log  
 maxretry = 300  
 findtime = 300  
 #ban for 5 minutes  
 bantime = 600  
 action = iptables[name=HTTP, port=http, protocol=tcp]  

This means that if a person has 300 (maxretry) requests in 300 (findtime) seconds then a ban of 600 (bantime) seconds is applied.

 We need to create the filter in /etc/fail2ban/filter.d/http-get-dos.conf to create the pattern to match the jail:

 # Fail2Ban configuration file  
 #  
 # Author: http://www.go2linux.org  
 #  
 [Definition]  
 # Option: failregex  
 # Note: This regex will match any GET entry in your logs, so basically all valid and not valid entries are a match.  
 # You should set up in the jail.conf file, the maxretry and findtime carefully in order to avoid false positives.  
 failregex = ^<HOST>.*"GET  
 # Option: ignoreregex  
 # Notes.: regex to ignore. If this regex matches, the line is ignored.  
 # Values: TEXT  
 #  
 ignoreregex =  

Now lets test the regex against the log file so that we can see if it is correctly picking up the IP addresses of the visitors:

fail2ban-regex /var/log/varnish/access.log /etc/fail2ban/filter.d/http-get-dos.conf 

You should see a list of IP addresses and times followed by summary statistics.

When you restart fail2ban your scraper protection should be up and running.

14 May 2015

Solving Doctrine - A new entity was found through the relationship

There are so many different problems that people have with the Doctrine error message:

 exception 'Doctrine\ORM\ORMInvalidArgumentException' with message 'A new entity was found through the relationship 'App\Lib\Domain\Datalayer\UnicodeLookups#lookupStatus' that was not configured to cascade persist operations for entity:  

Searching through the various online sources was a bit of a nightmare.  The best documentation I found was at http://www.krueckeberg.org/ where there were a number of clearly explained examples of various associations.

More useful information about association ownership was in the Doctrine manual, but I found a more succinct explanation in the answer to this question on StackOverflow.

Now I understood better about associations and ownership and was able to identify exactly what sort I was using and the syntax that was required. I was implementing a uni-directional many to one relationship, which is supposedly one of the most simple to map.

I had used the Doctrine reverse engineering tool to generate the stubs of my model.  I was expecting it to be able to handle this particular relationship - the tool warns that it does not properly map all relationships but this particular one actually works out of the box.

 {project root}/vendor/doctrine/orm/bin/doctrine orm:convert-mapping --from-database yml --namespace App\\Lib\\Domain\\Datalayer\\ .  
 {project root}/vendor/doctrine/orm/bin/doctrine orm:generate-entities --generate-methods=true --generate-annotations=true --regenerate-entities=true ../../../  
 {project root}/vendor/doctrine/orm/bin/doctrine orm:generate-proxies ../Proxies  

Just as an aside to explain the paths and namespaces : I'm using Laravel 5 and put my domain model into app/Lib/Domain.  I'm implementing a form of the repository design pattern so I have the following directory structure:

 app/Lib/Domain  
 +--Entities  
 +--Proxies  
 +--Mappings  
 +--Repositories  
 +--Services  

The mappings are not used at runtime but are used to generate the entities.

So my generated class looked like this:

 namespace App\Lib\Domain\Datalayer;  
 /**  
  * Lookups  
  *  
  * @ORM\Table(name="lookups", indexes={@ORM\Index(name="IDX_4CEC819D037A087", columns={"lookup_status_id"}), @ORM\Index(name="IDX_4CEC8194D39DE23", columns={"camera_event_id"})})  
  * @ORM\Entity  
  */  
 class Lookups  
 {  
   /**  
    * @var \App\Lib\Domain\Datalayer\LookupStatuses  
    *  
    * @ORM\ManyToOne(targetEntity="App\Lib\Domain\Datalayer\LookupStatuses", inversedBy="lookups", cascade={"persist"})  
    * @ORM\JoinColumns({  
    *  @ORM\JoinColumn(name="lookup_status_id", referencedColumnName="id")  
    * })  
    */  
   private $lookupStatus;  
 }  

My use case was:

  1. Find a lookup status from the table
  2. Call setLookupStatus on my lookup class
  3. Persist and flush the lookup class
  4. Error
After carefully reviewing all the documentation I linked above, and a great deal more, I realized that step 1 was the issue.  Because I was getting the object out of cache Doctrine thought it was new.  

Of course I couldn't persist the object as the help message suggested (it already existed so I got a key violation error).  The mappings were actually correct.

So my advice for resolving this error message is to read through the documentation I linked above carefully and then make sure that Doctrine is actually aware of the entity you're using.  You may need to persist it (if its new) or otherwise make sure its not cached.