01 December 2015

Lowering your AWS bill by moving to reserved instances

The cost saving from reserving an EC2 instance is quite dramatic.  This morning I moved two web servers to reserved instances and am dropping my hosting cost by 64% for those servers.

There isn't actually any effort required in moving your on demand EC2 instance to a reserved instance.  The only actual change is a billing change, you don't need to do anything in your instance configuration.

The only important thing to remember is that you're reserving an instance in a particular availability zone.  The billing effect will only apply to instances launched in the same availability zone.

Amazon will apply the discount of having a reserved instance to your invoice automatically.  They provide quite extensive documentation on reserved instances on their site (here).
Tip

23 November 2015

Fixing where php5 cronjob maxlife mails root user about module mcrypt already loaded

I'm running an nginx server with php5-fpm and was always getting mail in /var/mail/root telling me that the cronjob running usr/lib/php5/maxlifetime was throwing warnings.

The warnings were that:
PHP Warning:  Module 'mcrypt' already loaded in Unknown on line 0

To fix this I had a look at the file and noticed that it was looping through the various sapi methods and running a command.  The line in the shell script looks like this:
for sapi in apache2 apache2filter cgi fpm; do
if [ -e /etc/php5/${sapi}/php.ini ]; then
So I removed the mcrypt extension from my apache2 php.ini (/etc/php5/apache2/php.ini) and now the maxlifetime shell script runs without throwing warnings.
Tip

12 November 2015

Updating database when migrating a Wordpress site between domains

If you're using a staging server to test your Wordpress changes then you'll be deploying Wordpress to a new domain once your test team gives the go ahead.

Unfortunately this can break Wordpress quite badly.  All the links in your content are essentially hard coded into the database content table.  There are settings in the options table that help Wordpress with deciding on redirects.

Here are three useful sql statements that will make your life a little easier when migrating.  You can include them as part of your scripted deploy or just run them manually if you don't deploy Wordpress often.

Edit them to suit your domain configuration, but they'll help you to change the links and settings in your database to point to the new domain.
Tip

26 August 2015

Setting up a new user in Ubuntu from scratch

Adding new users to Ubuntu is easy because of the convenience tools that exist.

Start with the command

sudo useradd -d /home/testuser -m testuser

This creates the user and sets up a default home directory.  The user doesn't have a password, but you could add one with passwd if you wanted to.

Then create a directory .ssh in their home directory.  Create a file called authorized_keys in the directory and copy in contents of the users public key into it.

Chown the .ssh directory (and file) to the user and chmod the file to 600.  The directory should be mode 700.

Make sure that /etc/sshd_config is set up to deny logging in by password.

The user should be able to login using their public key by setting up their .ssh/config on their home machine.
Host foo
HostName server.ip.address
User testuser
IdentityFile ~/.ssh/id_rsa
If you want to set them up with your custom bash stuff then remember to copy the files into their home directory and chown them.  You will need to change their shell to bash by editing /etc/passwd

If you want them to have administrative privilege on the machine you need to add them to the "sudo" group with this command :   usermod -aG sudo username.  Because you've disabled password access to the machine you'll need to add a new config file in /etc/sudoers.d that allows them to sudo without a password.  It will need to have a line like this: testuser ALL=(ALL) NOPASSWD:ALL

If you're feeling lazy you might want to use a script.  I've created one and shared it below, but I strongly suggest that you rather write your own and don't use mine for anything other than an idea.  I'm not guaranteeing that it works or even that it's particularly useful.

Tip

18 August 2015

Fixing missing msvcp110.dll in xampp on Windows

Microsoft Happiness
I need to use a Windows server to deploy a program I'm busy writing.

I kept on getting a problem with Apache starting up that prevented it from including the MS sql server pdo drivers.

The message was:
the program can't start because msvcp110.dll is missing from your computer. Try reinstalling the program to fix the problem
and this usually relates to the Visual C++ Redistributable for Visual Studio 2012 package not being installed.

I had previously installed this (it's available on the Microsoft site) but was still getting the error.

Eventually I stumbled on this topic on the Apache friends forum which advised to copy the msvcp110.dll file to both
the \xampp\apache\bin and the \xampp\php directories.

Apparently Apache wasn't able to follow the OS path to find the file.  In my case it was already in the php directory but not with the other binaries for Apache.

After copying it there Apache restarted without errors and the PDO driver for MS-SQL was enabled.
Tip

11 August 2015

Fixing broken thumbnails in Drupal


If your autoscaled images are not loading on Drupal then here are some steps to take to troubleshoot the problem.

Firstly, if you have logging enabled then visit http://yoursite.com/admin/reports/dblog to see a list of the events.

If you see events with the message "Unable to generate the derived image located at...." then check the permissions on your files directory.  This is usually /sites/default/files.

Then check that the GD library is installed for PHP.  On Ubuntu you can install it with apt-get install php5-gd

If you don't see any events then try opening the image url in a new tab on your browser to confirm that your web server is properly rewriting the request to index.php instead of trying to serve a file out of the directory.

On Nginx you should consider using location blocks like this:

   # These next locations help with thumbnails - https://www.drupal.org/node/2374961  
   location @rewrite {  
     rewrite ^/(.*)$ /index.php?q=$1 last;  
   }  
   location ~ ^/sites/.*/files/styles/ {  
     try_files $uri @rewrite;  
   }  

This makes sure that requests to locations in the styles directory are routed to index.php if the requested file does not exist in the filesystem.
Tip

13 July 2015

Setting up Nginx as a reverse proxy for Apache with SSL termination

Reverse Proxy diagram from Wiki Commons
We're currently hosting client sites on a Rackspace server and using their Load Balancer feature as a way to terminate SSL and not have to have multisite certificates.

We only attach one node to the Load Balancer so we're paying for more than we're using.  My proof of concept is to use Nginx to terminate SSL certificates and proxy to the Apache server.  This will save us £ 225 per load balancer, and since we're using ten of them that's quite a significant saving.

My first step was to spin up a free tier EC2 instance running Ubuntu 14.04 LTS.  I guess you can replace this with your favourite cloud or on-the-metal server.

Then I installed my packages. These are the ones I remember so YMMV.

 sudo apt-get install nginx apache2 fail2ban php5-fpm mcrypt php5-mcrypt openssl php5-cli php5 libapache2-mod-php  

My network diagram is slightly different from the picture for this post in that the web server is hosted on the same machine as the proxy.

I decided to run Apache on port 8000 and listen only to connections from localhost. Nginx would listen on port 80 and forward requests to Apache. I decided to let Nginx serve static content because it's pretty quick at doing so and this saves Apache from being overwhelmed by requests.

Configuring Apache

My first port of call was to edit /etc/apache2/ports.conf and make sure that my Listen line looks like this: Listen 127.0.0.1:8000

Then I created two virtual hosts to test with.  Here's a sample:

 <VirtualHost *:8000>  
      ServerName dummy1.mydomain.co.uk  
      ServerAdmin webmaster@localhost  
      DocumentRoot /var/www/dummy3/  
      ErrorLog ${APACHE_LOG_DIR}/dummy3_error.log  
      CustomLog ${APACHE_LOG_DIR}/dummy3_access.log combined  
 </VirtualHost>  

I made a simple index.php file in two new directories /var/www/dummy1 and /var/www/dummy3 which just output two server variables for me to test with. I also copied an image file into those directories so that I could check how static assets would be served.

 <?php  
 echo $_SERVER['SCRIPT_FILENAME'] . '<br>';  
 echo $_SERVER['SERVER_SOFTWARE'] . '<br>';  

Configuring Nginx

I decided to use self-signed certificates for testing and reserve dummy2 for a trial run of a free ssl certificate.  There are quite a few certificate signers who will give you a 30 day certificate to trial.

I created an /etc/nginx/ssl directory because for some reason I don't like to contaminate my conf.d directory and made subdirectories for my sites under that.

I created self-signed certificates (commands are at the top of my host file) and set up the vhosts like this:

Now when I hit a static file on HTTP or HTTPS then Nginx serves it up directly. Inspecting the response headers with your favourite browsers debug tool will confirm that images are served by Nginx. Visiting the index file shows that it's loading the correct one and is being handled by Apache. Lastly, checking the certificate will show you that each site is using its own certificate.

That has the potential of saving my company £ 2200, which is a happy thing to be able to do in your first week while your boss is watching :)
Tip

10 July 2015

Securing Jenkins with oAuth

Jenkins is pretty easy to secure with the help of some useful plugins.

The first that I suggest using is an oAuth provider.  Our repositories are hosted on Bitbucket so I'm using their oAuth, but there is also a Github oAuth plugin.  The instructions to set up the plugin are very clear (see the plugin page).

When you're configuring your Jenkins to use the oAuth security remember to leave the Authorization setting to "logged in users can do anything" for now.  We'll change this later, but we don't want to get locked out of Jenkins when we apply the security settings.

Now install the plugin Role Based Authentication Strategy (see the plugin page).

Add a new group called "Anonymous" and uncheck everything.

When a user logs into the oAuth they'll be given a message by Jenkins saying that they don't have any permissions.  This means that not everybody with a Bitbucket account can access your site so thats a good thing.

You just need to add them to the roles plugin settings.  Click Manage Jenkins then Manage and Assign Roles.  Click on assign roles and add the user.  Then tick the boxes of the roles you want to assign them.




Tip

Checking the SSL certificates for a list of domains

We have a number of domains that are secured by SSL and need to be able to automate checks for certificate validity and expiry.

Luckily there is a script to do exactly this.  On Ubuntu you can apt-get-install ssl-cert-check but there are copies of the script online in case your distro doesn't have it as a package.

Create a file with a list of the domains you want to check and the port to check on.  It should look something like this:
 yourdomain.com 443  
 www.anotherdomain.com 443  
 www.yetanotherclientdomain.com 443  

Lets assume you called your file domainlist.txt

You can then run ssl-cert-check -f domainlist.txt to get the tool to run through it and display to console the status and expiry date of the domains you provided.

Using the options shown in the help page for the script lets you use the script to send an email to you if a certificate is going to expire soon.

ssl-cert-check -a -f domainlist.txt -q -x 30 -e yourmail@foo.com

If you get a message about a missing mail binary you'll spot that in the script (line 243) it looks in a variety of locations for a file called mail or mailx.  An appropriate binary in Ubuntu is contained in the heirloom-mailx package so installing that will solve your problem.
Tip

27 May 2015

Allowing the whole wide world to read your S3 bucket

This is a bucket policy that you can use to allow the whole world the ability to read files in your s3 bucket.

You might want to do this if you're serving static web content from S3 and don't need the fine grained control that the Amazon documentation details.


You will still need to set up permissions on the bucket but this policy will let people read the files you're storing on S3.
Tip

26 May 2015

Storing large values in Memcached with PHP

Memcached saved my users a minute per query
I'm working on a business intelligence tool that requires as an intermediary calculation a list of UK postcodes that are within a radius of a user supplied postcode.

It currently takes about 7 seconds to query my Postgres database to get this list out.  Unfortunately I need to do this several times as part of a goal seeking function so I need to greatly improve this lookup speed.

I'm already using the Postgres earthdistance module and have properly indexed my table so I realized that I needed to look for a caching solution.

Memcached places limits on the size of the value you can store.  The default setup is 1meg and I'm reluctant to change this because it adds to the deployment burden.  My result sets were sometimes up to 4 megs large - searching on a 20 mile radius in London yields a lot of postcodes!

My idea was to split the large piece of data into several smaller pieces and to place an index referencing the pieces as the value for the key we're trying to store.

I decided to make use of PHP's gzcompress() function to reduce the size of the element because I felt that the time I spend compressing the data is still going to be drastically less than running the query and I want to try my best to avoid cache evictions.

I'm currently using Laravel so the code snippets below use the facades made available by Laravel.  I think the code is readable enough to extend to other PHP environments and I think the approach could also be ported to other languages.




Tip

20 May 2015

Using Fail2Ban to protect a Varnished site from scrapers

I'm using Varnish to cache a fairly busy property site.  Varnish works like a bomb for normal users and has greatly improved our page load speed.

For bots that are scraping the site, presumably to add the property listings to their own site, though the cache is next to useless since the bots are sequentially trawling through the whole site.

I decided to use fail2ban to block IP's who hit the site too often.

The first step in doing so was to enable a disk based access log for Varnish so that fail2ban will have something to work with.

This means setting up varnishncsa.  Add this to your /etc/rc.local file:

 varnishncsa -a -w /var/log/varnish/access.log -D -P /var/run/varnishncsa.pid  

This starts up varnishncsa in daemon mode and appends Varnish access attempts to /var/log/varnish/access.log

Now edit or create /etc/logrotate.d/varnish and make an entry to rotate this access log:

  /var/log/varnish/*log {   
     create 640 http log   
     compress   
     postrotate   
       /bin/kill -USR1 `cat /var/run/varnishncsa.pid 2>/dev/null` 2> /dev/null || true   
     endscript   
   }   

Install fail2ban. On Ubuntu you can apt-get install fail2ban

Edit /etc/fail2ban/jail.conf and add a block like this:

 [http-get-dos]  
 enabled = true  
 port = http,https  
 filter = http-get-dos  
 logpath = /var/log/varnish/access.log  
 maxretry = 300  
 findtime = 300  
 #ban for 5 minutes  
 bantime = 600  
 action = iptables[name=HTTP, port=http, protocol=tcp]  

This means that if a person has 300 (maxretry) requests in 300 (findtime) seconds then a ban of 600 (bantime) seconds is applied.

 We need to create the filter in /etc/fail2ban/filter.d/http-get-dos.conf to create the pattern to match the jail:

 # Fail2Ban configuration file  
 #  
 # Author: http://www.go2linux.org  
 #  
 [Definition]  
 # Option: failregex  
 # Note: This regex will match any GET entry in your logs, so basically all valid and not valid entries are a match.  
 # You should set up in the jail.conf file, the maxretry and findtime carefully in order to avoid false positives.  
 failregex = ^<HOST>.*"GET  
 # Option: ignoreregex  
 # Notes.: regex to ignore. If this regex matches, the line is ignored.  
 # Values: TEXT  
 #  
 ignoreregex =  

Now lets test the regex against the log file so that we can see if it is correctly picking up the IP addresses of the visitors:

fail2ban-regex /var/log/varnish/access.log /etc/fail2ban/filter.d/http-get-dos.conf 

You should see a list of IP addresses and times followed by summary statistics.

When you restart fail2ban your scraper protection should be up and running.
Tip

14 May 2015

Solving Doctrine - A new entity was found through the relationship

There are so many different problems that people have with the Doctrine error message:

 exception 'Doctrine\ORM\ORMInvalidArgumentException' with message 'A new entity was found through the relationship 'App\Lib\Domain\Datalayer\UnicodeLookups#lookupStatus' that was not configured to cascade persist operations for entity:  

Searching through the various online sources was a bit of a nightmare.  The best documentation I found was at http://www.krueckeberg.org/ where there were a number of clearly explained examples of various associations.

More useful information about association ownership was in the Doctrine manual, but I found a more succinct explanation in the answer to this question on StackOverflow.

Now I understood better about associations and ownership and was able to identify exactly what sort I was using and the syntax that was required. I was implementing a uni-directional many to one relationship, which is supposedly one of the most simple to map.

I had used the Doctrine reverse engineering tool to generate the stubs of my model.  I was expecting it to be able to handle this particular relationship - the tool warns that it does not properly map all relationships but this particular one actually works out of the box.

 {project root}/vendor/doctrine/orm/bin/doctrine orm:convert-mapping --from-database yml --namespace App\\Lib\\Domain\\Datalayer\\ .  
 {project root}/vendor/doctrine/orm/bin/doctrine orm:generate-entities --generate-methods=true --generate-annotations=true --regenerate-entities=true ../../../  
 {project root}/vendor/doctrine/orm/bin/doctrine orm:generate-proxies ../Proxies  

Just as an aside to explain the paths and namespaces : I'm using Laravel 5 and put my domain model into app/Lib/Domain.  I'm implementing a form of the repository design pattern so I have the following directory structure:

 app/Lib/Domain  
 +--Entities  
 +--Proxies  
 +--Mappings  
 +--Repositories  
 +--Services  

The mappings are not used at runtime but are used to generate the entities.

So my generated class looked like this:

 namespace App\Lib\Domain\Datalayer;  
 /**  
  * Lookups  
  *  
  * @ORM\Table(name="lookups", indexes={@ORM\Index(name="IDX_4CEC819D037A087", columns={"lookup_status_id"}), @ORM\Index(name="IDX_4CEC8194D39DE23", columns={"camera_event_id"})})  
  * @ORM\Entity  
  */  
 class Lookups  
 {  
   /**  
    * @var \App\Lib\Domain\Datalayer\LookupStatuses  
    *  
    * @ORM\ManyToOne(targetEntity="App\Lib\Domain\Datalayer\LookupStatuses", inversedBy="lookups", cascade={"persist"})  
    * @ORM\JoinColumns({  
    *  @ORM\JoinColumn(name="lookup_status_id", referencedColumnName="id")  
    * })  
    */  
   private $lookupStatus;  
 }  

My use case was:

  1. Find a lookup status from the table
  2. Call setLookupStatus on my lookup class
  3. Persist and flush the lookup class
  4. Error
After carefully reviewing all the documentation I linked above, and a great deal more, I realized that step 1 was the issue.  Because I was getting the object out of cache Doctrine thought it was new.  

Of course I couldn't persist the object as the help message suggested (it already existed so I got a key violation error).  The mappings were actually correct.

So my advice for resolving this error message is to read through the documentation I linked above carefully and then make sure that Doctrine is actually aware of the entity you're using.  You may need to persist it (if its new) or otherwise make sure its not cached.
Tip

29 April 2015

Monitoring Varnish for random crashes

I'm using Varnish to cache the frontend of a site that a client is busy promoting.  It does a great job of reducing requests to my backend but is prone to random crashes.  I normally get about two weeks of uptime on this particular server, which is significantly lower than other places that I've deployed Varnish.

I just don't have enough information to work with to try and solve why the random crash is occurring.  The system log shows that a child process doesn't respond to CLI and so is killed.  The child never seems to be able to be brought up again.

My /var/log/messages file looks like this:

 08:31:45 varnishd[7888]: Child (16669) not responding to CLI, killing it.  
 08:31:45 varnishd[7888]: Child (16669) died signal=3  
 08:31:45 varnishd[7888]: child (25675) Started  
 08:31:45 varnishd[7888]: Child (25675) said Child starts  
 08:31:45 varnishd[7888]: Child (25675) said SMF.s0 mmap'ed 1073741824 bytes of 1073741824  
 08:32:19 varnishd[7888]: Child (25675) not responding to CLI, killing it.  
 08:32:21 varnishd[7888]: Child (25675) not responding to CLI, killing it.  
 08:32:21 varnishd[7888]: Child (25675) died signal=3  

Which doesn't give me a lot to work with.  I couldn't find anything in the documentation about this sort of problem.  I don't want to uninstall Varnish so I decided to rather look for a way to monitor the process.

I first tried Monit but after about two weeks my site was down.  After sshing onto the box and restarting Varnish I checked the monit logs.  Although it was able to recognize that Varnish had crashed, it was not able to successfully bring it back up.

My Monit log looked like this:

 [BST Apr 23 09:07:24] error  : 'varnish' process is not running  
 [BST Apr 23 09:07:24] info   : 'varnish' trying to restart  
 [BST Apr 23 09:07:24] info   : 'varnish' start: /etc/init.d/varnish  
 [BST Apr 23 09:07:54] error  : 'varnish' failed to start  
 [BST Apr 23 09:08:54] error  : 'varnish' process is not running  
 [BST Apr 23 09:08:54] info   : 'varnish' trying to restart  
 [BST Apr 23 09:08:54] info   : 'varnish' start: /etc/init.d/varnish  
 [BST Apr 23 09:09:24] error  : 'varnish' failed to start  
 [BST Apr 23 09:10:24] error  : 'varnish' process is not running  
 [BST Apr 23 09:10:24] info   : 'varnish' trying to restart  
 [BST Apr 23 09:10:24] info   : 'varnish' start: /etc/init.d/varnish  
 [BST Apr 23 09:10:54] error  : 'varnish' failed to start  
 [BST Apr 23 09:11:54] error  : 'varnish' service restarted 3 times within 3 cycles(s) - unmonitor  

My problem sounded a lot like this one on ServerFault so I looked for another way to monitor the process other than using Monit.

Instead of using daemonize, supervisord, or another similar program I'm trying out a simple shell script that I found at http://blog.unixy.net/2010/05/dirty-varnish-monitoring-script/.  The author says it's dirty, and I suppose it is, but it has the advantage of being dead simple and easy to control.   I've set it up as a cron job to run every five minutes.  Hopefully this will be a more effective way to make sure that Varnish doesn't stay dead for very long.

In case the source file goes down I saved a copy as a Gist:

Tip

21 April 2015

Fixing when queue workers keep popping the same job off a queue

From Amazon Queue documentation
My (Laravel) project uses a queue system for importing because these jobs can take a fair amount of time (up to an hour) and I want to have them run asynchronously to prevent my users from having to sit and watch a spinning ball.

I created a cron job which would run my Laravel queue work command every 5 minutes.  PHP is not really the best language for long-running processes which is why I elected to rather run a task periodically instead of listening all the time.  This introduced some latency (which I could cut down) but this is acceptable in my use case (imports happen once a month and are only run manually if an automated import fails).

The problem that I faced was that my queue listener kept popping the same job off the queue.  I didn't try running multiple listeners but I'm pretty confident weirdness would have resulted in that case as well.

Fixing the problem turned out to be a matter of configuring the visibility time of my queue.  I was using the SQS provided default of 30 seconds.  Amazon defines the visibility timeout as:
The period of time that a message is invisible to the rest of your application after an application component gets it from the queue. During the visibility timeout, the component that received the message usually processes it, and then deletes it from the queue. This prevents multiple components from processing the same message.  Source: Amazon documentation
This concept is common to queues and exists in various names in Beanstalk and others.  In Beanstalk the setting is called time-to-run and IronMQ refers to it as timeout.  So if your run time exceeds your queue availability timeout then workers will pop off a job that is currently being run in another process.
Tip

17 April 2015

Setting up the admin server in HHVM 3.6.0 and Nginx

Hiphop has a built-in admin server that has a lot of useful functions.  I found out about it on an old post on the Hiphop blog.

Since those times Hiphop has moved towards using an ini file instead of a config.hdf file.

On a standard prebuilt HHVM on Ubuntu you should  find the ini file in /etc/hhvm/php.ini

Facebook maintains a list of all the ini settings on Github.

It is into this file that we add two lines to enable the admin server:

 hhvm.admin_server.password = SecretPassword  
 hhvm.admin_server.port = 8888  

I then added a server to Nginx by creating this file: /etc/nginx/conf.d/admin.conf (by default Nginx includes all conf files in that directory):

 server {  
   # hhvm admin  
   listen 8889;  
   location ~ {  
     fastcgi_pass  127.0.0.1:8888;  
     include    fastcgi_params;  
   }  
 }  

Now I can run curl 'http://localhost:8889' from my shell on the box to get a list of commands. Because I host this project with Amazon and have not set up a security rule the port/server are not available to the outside world.  You may want to check your firewall rules on your server.

To run a command add the password as a get variable:

 curl 'http://localhost:8889/check-health?auth=SecretPassword'  
Tip

Searching in a radius with Postgres

Postgres has two very useful extensions - earthdistance and postgis.  PostGIS is much more accurate but I found earthdistance to be very easy to use and accurate enough for my purpose (finding UK postcodes within a radius of a point).

To install it first find your Postgres version and then install the appropriate package.  On my Debian Mint dev box it looks like the below snippet. My production machine is an Amazon RDS and you can skip this step in that environment.

 psql -V  
 sudo apt-get install postgresql-contrib postgresql-contrib-9.3  
 sudo service postgresql restart  

Having done that you should launch psql and run these two commands.  Make sure that you install cube first because it is a requirement of earthdistance.

 CREATE EXTENSION cube; CREATE EXTENSION earthdistance;  

Now that the extensions are installed you have access to all of the functions they provide.

If you want to check that they're working you can run SELECT earth(); as a quick way to test a function. It should return the earth's radius.

Earthdistance treats earth as a perfect sphere but Postgis is more accurate. Since I'm working in a fairly local region (less than 20 statute miles) I felt that approximating the surface as a sphere was sufficiently accurate.

I had a table of postcodes and an example query looked like this:

 SELECT * FROM postcodes  
 WHERE earth_box(ll_to_earth(-0.20728725565285000, 51.48782225571700000), 100) @> ll_to_earth(lat, lng)  

This took quite a long time (*cough* a minute *cough*) to run through the 1.7 million postcodes in my table. After a little searching I realized that I should index the calculation with the result that queries now take about 100ms, which is plenty fast enough on my dev box.

 For reference - using the Haversine formula in a query was taking around 30 seconds to process - even after constraining my query to a box before calculating radii.

 CREATE INDEX postcode_gis_index on postcodes USING gist(ll_to_earth(lat, lng));  

So now I can search for UK postcodes that are close to a point and I can do so sufficiently quickly that my user won't get bored of watching a loading spinner gif.
Tip

30 March 2015

Ignoring duplicate inserts with Postgres when processing a batch

I'm busy on a project which involves importing fairly large datasets of about ~3.3GB at a time.  I have to read a CSV file, process each line, and generate a number of database records from the results of that process.

Users are expected to be able to rerun batches and there is overlap between different datasets.  For example: the dataset of "last year" overlaps with the dataset of "all time".  This means that we need an elegant way to handle duplicate updates.

Searching if a record exists (by PK) is fine until the row count in the table gets significant.  At just over 2 million records it was taking my development machine 30 seconds to process 10,000 records.  This number steadily increased as the row count increased.

I had to find a better way to do this and happened across the option of using a database rule to ignore duplicates.  While using the rule there is a marked improvement in the performance as I no longer need to search the database for a record.
Tip

17 March 2015

Adding info to Laravel logs

I am coding a queue worker that is handling some pretty large (2gig+) datasets and so wanted some details in my logs that Vanilla laravel didn't offer.

Reading the documentation at http://laravel.com/docs/4.2/errors wasn't much help until I twigged that I could manipulate the log object returned by Log::getMonolog();.

Here is an example of adding memory usage to Laravel logs.

In app/start/global.php make the following changes
 Log::useFiles(storage_path().'/logs/laravel.log');  
 $log = Log::getMonolog();  
 $log->pushProcessor(new Monolog\Processor\MemoryUsageProcessor);  

You'll find the Monolog documentation on the repo
Tip

12 March 2015

Support for Postgres broken in HHVM 3.6.0

On my desktop machine I run my package upgrades every day.  The other day my Hiphop version got updated to 3.6.0 and suddenly my Postgres support died.

Running Hiphop gave a symbol not found error in the postgres.so file ( undefined symbol: _ZTIN4HPHP11PDOResourceE\n ) exactly like the issue reported on the driver repository (here).

I tried to recompile the postgres driver against Hiphop 3.6.0 but hit a number of problems, mostly to do with hhvm-pgsql-master/pdo_pgsql_statement.cpp it seems.

The fix for the incompatibility was unfortunately rolling back to my previous version of Hiphop.  To do this on Mint/Ubuntu just do this:

  1. Run cat /etc/*-release to get your release information
  2. Download the appropriate package for your distro from http://dl.hhvm.com/ubuntu/pool/main/h/hhvm/
  3. Remove your 3.6.0 installation of hhvm: sudo apt-get remove hhvm
  4. Install the package you downloaded : sudo dpkg -i <deb package>
After that everything should be installed properly and you can start up hhvm without a problem.




Tip

22 February 2015

Fixing puppet "Exiting; no certificate found and waitforcert is disabled" error

While debugging and setting up Puppet I am still running the agent and master from CLI in --no-daemonize mode.  I kept getting an error on my agent - ""Exiting; no certificate found and waitforcert is disabled".

The fix was quite simple and a little embarrassing.  Firstly I forgot to run my puppet master with root privileges which meant that it was unable to write incoming certificate requests to disk.  That's the embarrassing part and after I looked at my shell prompt and noticed this issue fixing it was quite simple.

Firstly I got the puppet ssl path by running the command puppet agent --configprint ssldir

Then I removed that directory so that my agent no longer had any certificates or requests.

On my master side I cleaned the old certificate by running puppet cert clean --all (this would remove all my agent certificates but for now I have just the one so its quicker than tagging it).

I started my agent up with the command puppet agent --test which regenerated the certificate and sent the request to my puppet master.  Because my puppet master was now running with root privileges (*cough*) it was able to write to its ssl directory and store the request.

I could then sign the request on my puppet master by running puppet cert sign --all

When running normally the puppet master will run as the puppet user so I'm not overly worried about running it as root in CLI while I debug it.
Tip

19 February 2015

Installing the Postgres extension for Hiphop

PostGres is rapidly gaining traction as a solid relational database manager.  It provides transaction reliability (ACID), standards compliance, and has a better reputation for handling large datasets than MariaDB / mySQL.

Luckily installing it for Hiphop is painless and there is an externally provided extension for it.

There was not a prebuilt package available for the PocketRent/hhvm-pgsql extension for my version of Hiphop so I built it following the advice in the project readme.  Their instructions worked first time.

I did not have success with using the "ini" method of including the extension so had to create a Hiphop configuration file.

 DynamicExtensionPath = /path/to/hhvm/extensions  
 DynamicExtensions {  
   * = pgsql.so  
 }  

I placed the snippet they provide (above) into a file called config.hdf in my Hiphop location.  I'm using Mint on my dev box so this was /etc/hhvm/config.hdf for me.

Then I edited /etc/default/hhvm and set Hiphop up to use the config file.  This snippet shows the change:

 ## Add additional arguments to the hhvm service start up that you can't put in CONFIG_FILE for some reason.  
 ## Default: ""  
 ## Examples:  
 ##  "-vLog.Level=Debug"        Enable debug log level  
 ##  "-vServer.DefaultDocument=app.php" Change the default document  
 ADDITIONAL_ARGS="-c /etc/hhvm/config.hdf"  

After that I restarted my hhvm service and was able to use Postgres in my Laravel project.  Hurray :)
Tip

New site in nginx is downloading PHP instead of executing it

I've just set up a new nginx host and was having problems with visiting it.

Whenever I loaded the page it would download a copy of my index file.

In my case the problem was two-fold.  Firstly I had mistyped my server name so it was falling back to an alternative catchall.

Secondly Chromium has a "won't fix" bug whereby redirects are cached.  Interestingly the cache also persisted into Incognito mode.

Clearing my cache with the "Clear Browsing History" menu function didn't work to clear out the redirect cache.  This gave the symptom of stopping nginx, stopping varnish, and even stopping hhvm but still having Chromium download the file when I tried to visit it.

What did work was to affix a get variable to the link.  So instead of visiting http://mysite.local I tried http://mysite.local?foo=1 which invalidated the cache.

This wasn't an entirely satisfactory solution so the next thing I tried was to use the developer tools (ctrl shift I) and then right clicking on the file in the network log.  Using the "Clear Browser Cache" option from that popup also worked.

So the TL;DR is:
1) Make sure that your config file is correct
2) Clear out your redirect cache or invalidate it
Tip

07 January 2015

Compressing Apache output with mod_deflate on Centos

Apache on Centos ships with mod_deflate installed and enabled by default.  To check this you can grep your config file and make sure the line which loads it is not commented out.

 cat /etc/httpd/conf/httpd.conf | grep LoadModule deflate_module  

When Apache loads it reads all the config files (ending in .conf) in /etc/httpd/conf.d so we'll add configuration options for mod_deflate into this directory. Lets use a file called deflate.conf to specify the config:

 <IfModule mod_deflate.c>  
  AddOutputFilterByType DEFLATE text/plain  
  AddOutputFilterByType DEFLATE text/html  
  AddOutputFilterByType DEFLATE text/xml  
  AddOutputFilterByType DEFLATE text/css  
  AddOutputFilterByType DEFLATE text/javascript  
  AddOutputFilterByType DEFLATE image/svg+xml  
  AddOutputFilterByType DEFLATE image/x-icon  
  AddOutputFilterByType DEFLATE application/xml  
  AddOutputFilterByType DEFLATE application/xhtml+xml  
  AddOutputFilterByType DEFLATE application/rss+xml  
  AddOutputFilterByType DEFLATE application/javascript  
  AddOutputFilterByType DEFLATE application/x-javascript  
  DeflateCompressionLevel 9  
 # Browser specific settings  
  BrowserMatch ^Mozilla/4 gzip-only-text/html  
  BrowserMatch ^Mozilla/4\.0[678] no-gzip  
  BrowserMatch \bMSIE !no-gzip !gzip-only-text/html  
  BrowserMatch \bOpera !no-gzip   
 </IfModule>  

You can check it is working by noticing your YSlow report now shows, by using an online tool, or by just checking the headers with Chrome or Firefox's developer tools.

If you're using Varnish and Apache does not have mod_deflate then you can enable gzip in your vcl as per the Varnish manual.  The page linked at the bottom of the Varnish manual ( How GZIP, and GZIP+ESI works in Varnish ) explains how the response from the backend is stored in a compressed state.

  sub vcl_fetch {  
    if (beresp.http.content-type ~ "text") {  
        set beresp.do_gzip = true;  
    }  
 }  
Tip

06 January 2015

Separating business logic from persistence layer in Laravel

There are several reasons to separate business logic from your persistence layer.  Perhaps the biggest advantage is that the parts of your application which are unique are not coupled to how data are persisted.  This makes the code easier to port and maintain.

I'm going to use Doctrine to replace the Eloquent ORM in Laravel.  A thorough comparison of the patterns is available here.

By using Doctrine I am also hoping to mitigate the risk of a major version upgrade on the underlying framework.  It can be expected for the ORM to change between major versions of a framework and upgrading to a new release can be quite costly.

Another advantage to this approach is to limit the access that objects have to the database.  Unless a developer is aware of the business rules in place on an Eloquent model there is a chance they will mistakenly ignore them by calling the ActiveRecord save method directly.

I'm not implementing the repository pattern in all its glory in this demo.  For a more purist approach to the pattern you can read this.  The reason that I'm choosing this approach is to cut down on the number of classes, cut down on the composer autoload, and to ensure my code is portable.  I also do not have to make any changes to my app aliases in config or create new Laravel services.

I'm going to use three objects to refer to my user table:
  • The UserEntity is a Doctrine entity to be accessed via the Doctrine entity manager by the Repository.  
  • The UserRepository is an intermediary layer akin to the data access object of other languages.  It uses the Entity to gather information that the Service layer needs.  
  • The UserService implements business logic and exposes methods to the controller.  It is the "fat model" in the "fat model / skinny controller" paradigm.
In order to be able to use Doctrine within Laravel I'm using the mitchellvanw/laravel-doctrine package.  I also use "raveren/kint" for access to the debugging "dd" shortcut.

The decision to use constructor injection when instantiating the user service is to make it easier to use a mock object when testing.  

I've deviated from the common practice of using private properties on a Doctrine entity and rather exposing getters and setters.  This is primarily so that in my user service layer I can conveniently reference properties of the entity in a manner that is not likely to change if I swap to another ORM.

I decided against marking the entity private and using reflection to retrieve the private properties in my repository.  I felt it was an unnecessary complication and not worth the processing cycles to ensure compatibility between methods of accessing a model property between ORMs.

The Doctrine entity class is incapable of persisting itself so if a developer instantiates it and modifies properties in the controller they won't be able to persist it unless they call the EntityManager class.  Hopefully this is more PT than calling save() on an ActiveRecord object.  Our design philosophy of avoiding doing this in controllers should also help to discourage mistakes here.

The files I created are listed below.  The drawback of avoiding any Laravel specific code is the rather ugly way of instantiating the service in the controller.  I believe, however, that my code will be easier to port to another framework than if I were to declare a Laravel service provider to make a static call to user.

You must include "app/models/user" into your composer autoload section and run the composer dump-autoload command from your shell once you've set up the directory.

app/models/user/UserEntity.php

 namespace User;  
 use Doctrine\ORM\Mapping AS ORM;  
 /**  
  * @ORM\Entity  
  * @ORM\Table(name="users")  
  */  
 class UserEntity  
 {  
      // see http://docs.doctrine-project.org/projects/doctrine-orm/en/latest/reference/basic-mapping.html for info on mapping  
      /**  
       * @ORM\Id  
       * @ORM\GeneratedValue  
       * @ORM\Column(type="integer")  
       */  
      private $id;  
      /**  
       * @ORM\Column(type="string")  
       */  
      private $name;  
      /**  
       * @ORM\Column(type="string")  
       */  
      private $password;  
      /**  
       * @ORM\Column(type="datetime")  
       */  
      private $created;  
      /**  
       * @ORM\Column(type="datetime")  
       */  
      private $modified;  
      public function getId()  
      {  
           return $this->id;  
      }  
      public function getName()  
      {  
           return $this->name;  
      }  
      public function setName($name)  
      {  
           $this->name = $name;  
      }  
      public function setPassword($password)  
      {  
           $this->password = $password;  
      }  
 }  

app/models/user/UserRepository.php

 <?php namespace User;  
 class UserRepository  
 {  
   protected $entity;  
   public function __construct( UserEntity $userEntity )  
   {  
     $this->entity = $userEntity;  
   }  
   public function getUserById($userId)  
   {  
     $user = \EntityManager::find('User\UserEntity', $userId);  
     return $user;  
   }  
   public function getUserByName($userName)  
   {  
     $user = \EntityManager::getRepository( 'User\UserEntity' )->findBy( [ 'name' => $userName ] );  
     if( !is_array( $user ) || empty( $user ) )  
     {  
       return false;  
     }  
     return $user[0];  
   }  
   public function setPassword( $userDetails, $password )  
   {  
     // If user variable is numeric, assume ID  
     if ( is_numeric( $userDetails ) )  
     {  
       // Get user based on ID  
       $user = $this->getuserById( $userDetails );  
     }  
     else  
     {  
       // Since not numeric, lets try get the user based on Name  
       $user = $this->getuserByName( $userDetails );  
     }  
     $user->setPassword( $password );  
     $this->persist( $user );  
     return true;  
   }  
   public function persist( UserEntity $user )  
   {  
     // do any last moment validations here  
     \EntityManager::persist( $user );  
     \EntityManager::flush();  
   }  
 }  

app/models/user/UserService.php

 <?php namespace User;  
 /**  
  * Our UserService, containing all useful methods for business logic around Users  
  * Do not reference the entity in here.  
  */  
 class UserService  
 {  
   // Containing our user repository to make all our database calls to  
   protected $userRepo;  
   /**  
    * Loads our $userRepo with the supplied userRepository  
    *  
    * We use constructor injection to make it easier to unit test.  
    *  
    * @param userInterface $userRepo  
    * @return userService  
    */  
   public function __construct( $userRepository )  
   {  
     $this->userRepo = $userRepository;  
   }  
   /**  
    * Method to get user based either on name or ID  
    *  
    * @param mixed $user  
    * @return string  
    */  
   public function getUserName($user)  
   {  
     // If user variable is numeric, assume ID  
     if (is_numeric($user))  
     {  
       // Get user based on ID  
       $user = $this->userRepo->getuserById($user);  
     }  
     else  
     {  
       // Since not numeric, lets try get the user based on Name  
       $user = $this->userRepo->getuserByName($user);  
     }  
     // If user entity returned (rather than null) return the name of the user  
     if ($user != null)  
     {  
       return $user->getName();  
     }  
     // If nothing found, return this simple string  
     return 'user Not Found';  
   }  
   public function setPassword( $user, $password )  
   {  
     // perform any validations  
     if( strlen( $password ) < 6 )  
     {  
       throw new \ValidationException( 'Password may not be shorter than 6 characters' );  
     }  
     // perform any hashing on the password  
     $password = \Hash::make($password);  
     return $this->userRepo->setPassword( $user, $password );  
   }  
 }  

app/controllers/HomeController.php

 <?php  
 class HomeController extends BaseController {  
      public function showWelcome()  
      {  
           return View::make('hello');  
      }  
      public function setPassword( $userDetails, $password )  
      {  
           $userService = new User\UserService( new User\UserRepository( new User\UserEntity ) );  
           try  
           {  
                $response = $userService->setPassword( $userDetails, $password );  
           }  
           catch ( ValidationException $e )  
           {  
                // set error message for frontend  
                echo 'An exception was thrown ('.$e->getMessage().') - this will result in a frontend message';  
           }  
           dd( $response );  
      }  
 }  

app/routes.php

 Route::put('/password/{user}/{password}', 'HomeController@setPassword');  
Tip