27 May 2015

Allowing the whole wide world to read your S3 bucket

This is a bucket policy that you can use to allow the whole world the ability to read files in your s3 bucket.

You might want to do this if you're serving static web content from S3 and don't need the fine grained control that the Amazon documentation details.


You will still need to set up permissions on the bucket but this policy will let people read the files you're storing on S3.

26 May 2015

Storing large values in Memcached with PHP

Memcached saved my users a minute per query
I'm working on a business intelligence tool that requires as an intermediary calculation a list of UK postcodes that are within a radius of a user supplied postcode.

It currently takes about 7 seconds to query my Postgres database to get this list out.  Unfortunately I need to do this several times as part of a goal seeking function so I need to greatly improve this lookup speed.

I'm already using the Postgres earthdistance module and have properly indexed my table so I realized that I needed to look for a caching solution.

Memcached places limits on the size of the value you can store.  The default setup is 1meg and I'm reluctant to change this because it adds to the deployment burden.  My result sets were sometimes up to 4 megs large - searching on a 20 mile radius in London yields a lot of postcodes!

My idea was to split the large piece of data into several smaller pieces and to place an index referencing the pieces as the value for the key we're trying to store.

I decided to make use of PHP's gzcompress() function to reduce the size of the element because I felt that the time I spend compressing the data is still going to be drastically less than running the query and I want to try my best to avoid cache evictions.

I'm currently using Laravel so the code snippets below use the facades made available by Laravel.  I think the code is readable enough to extend to other PHP environments and I think the approach could also be ported to other languages.




20 May 2015

Using Fail2Ban to protect a Varnished site from scrapers

I'm using Varnish to cache a fairly busy property site.  Varnish works like a bomb for normal users and has greatly improved our page load speed.

For bots that are scraping the site, presumably to add the property listings to their own site, though the cache is next to useless since the bots are sequentially trawling through the whole site.

I decided to use fail2ban to block IP's who hit the site too often.

The first step in doing so was to enable a disk based access log for Varnish so that fail2ban will have something to work with.

This means setting up varnishncsa.  Add this to your /etc/rc.local file:

 varnishncsa -a -w /var/log/varnish/access.log -D -P /var/run/varnishncsa.pid  

This starts up varnishncsa in daemon mode and appends Varnish access attempts to /var/log/varnish/access.log

Now edit or create /etc/logrotate.d/varnish and make an entry to rotate this access log:

  /var/log/varnish/*log {   
     create 640 http log   
     compress   
     postrotate   
       /bin/kill -USR1 `cat /var/run/varnishncsa.pid 2>/dev/null` 2> /dev/null || true   
     endscript   
   }   

Install fail2ban. On Ubuntu you can apt-get install fail2ban

Edit /etc/fail2ban/jail.conf and add a block like this:

 [http-get-dos]  
 enabled = true  
 port = http,https  
 filter = http-get-dos  
 logpath = /var/log/varnish/access.log  
 maxretry = 300  
 findtime = 300  
 #ban for 5 minutes  
 bantime = 600  
 action = iptables[name=HTTP, port=http, protocol=tcp]  

This means that if a person has 300 (maxretry) requests in 300 (findtime) seconds then a ban of 600 (bantime) seconds is applied.

 We need to create the filter in /etc/fail2ban/filter.d/http-get-dos.conf to create the pattern to match the jail:

 # Fail2Ban configuration file  
 #  
 # Author: http://www.go2linux.org  
 #  
 [Definition]  
 # Option: failregex  
 # Note: This regex will match any GET entry in your logs, so basically all valid and not valid entries are a match.  
 # You should set up in the jail.conf file, the maxretry and findtime carefully in order to avoid false positives.  
 failregex = ^<HOST>.*"GET  
 # Option: ignoreregex  
 # Notes.: regex to ignore. If this regex matches, the line is ignored.  
 # Values: TEXT  
 #  
 ignoreregex =  

Now lets test the regex against the log file so that we can see if it is correctly picking up the IP addresses of the visitors:

fail2ban-regex /var/log/varnish/access.log /etc/fail2ban/filter.d/http-get-dos.conf 

You should see a list of IP addresses and times followed by summary statistics.

When you restart fail2ban your scraper protection should be up and running.

14 May 2015

Solving Doctrine - A new entity was found through the relationship

There are so many different problems that people have with the Doctrine error message:

 exception 'Doctrine\ORM\ORMInvalidArgumentException' with message 'A new entity was found through the relationship 'App\Lib\Domain\Datalayer\UnicodeLookups#lookupStatus' that was not configured to cascade persist operations for entity:  

Searching through the various online sources was a bit of a nightmare.  The best documentation I found was at http://www.krueckeberg.org/ where there were a number of clearly explained examples of various associations.

More useful information about association ownership was in the Doctrine manual, but I found a more succinct explanation in the answer to this question on StackOverflow.

Now I understood better about associations and ownership and was able to identify exactly what sort I was using and the syntax that was required. I was implementing a uni-directional many to one relationship, which is supposedly one of the most simple to map.

I had used the Doctrine reverse engineering tool to generate the stubs of my model.  I was expecting it to be able to handle this particular relationship - the tool warns that it does not properly map all relationships but this particular one actually works out of the box.

 {project root}/vendor/doctrine/orm/bin/doctrine orm:convert-mapping --from-database yml --namespace App\\Lib\\Domain\\Datalayer\\ .  
 {project root}/vendor/doctrine/orm/bin/doctrine orm:generate-entities --generate-methods=true --generate-annotations=true --regenerate-entities=true ../../../  
 {project root}/vendor/doctrine/orm/bin/doctrine orm:generate-proxies ../Proxies  

Just as an aside to explain the paths and namespaces : I'm using Laravel 5 and put my domain model into app/Lib/Domain.  I'm implementing a form of the repository design pattern so I have the following directory structure:

 app/Lib/Domain  
 +--Entities  
 +--Proxies  
 +--Mappings  
 +--Repositories  
 +--Services  

The mappings are not used at runtime but are used to generate the entities.

So my generated class looked like this:

 namespace App\Lib\Domain\Datalayer;  
 /**  
  * Lookups  
  *  
  * @ORM\Table(name="lookups", indexes={@ORM\Index(name="IDX_4CEC819D037A087", columns={"lookup_status_id"}), @ORM\Index(name="IDX_4CEC8194D39DE23", columns={"camera_event_id"})})  
  * @ORM\Entity  
  */  
 class Lookups  
 {  
   /**  
    * @var \App\Lib\Domain\Datalayer\LookupStatuses  
    *  
    * @ORM\ManyToOne(targetEntity="App\Lib\Domain\Datalayer\LookupStatuses", inversedBy="lookups", cascade={"persist"})  
    * @ORM\JoinColumns({  
    *  @ORM\JoinColumn(name="lookup_status_id", referencedColumnName="id")  
    * })  
    */  
   private $lookupStatus;  
 }  

My use case was:

  1. Find a lookup status from the table
  2. Call setLookupStatus on my lookup class
  3. Persist and flush the lookup class
  4. Error
After carefully reviewing all the documentation I linked above, and a great deal more, I realized that step 1 was the issue.  Because I was getting the object out of cache Doctrine thought it was new.  

Of course I couldn't persist the object as the help message suggested (it already existed so I got a key violation error).  The mappings were actually correct.

So my advice for resolving this error message is to read through the documentation I linked above carefully and then make sure that Doctrine is actually aware of the entity you're using.  You may need to persist it (if its new) or otherwise make sure its not cached.

29 April 2015

Monitoring Varnish for random crashes

I'm using Varnish to cache the frontend of a site that a client is busy promoting.  It does a great job of reducing requests to my backend but is prone to random crashes.  I normally get about two weeks of uptime on this particular server, which is significantly lower than other places that I've deployed Varnish.

I just don't have enough information to work with to try and solve why the random crash is occurring.  The system log shows that a child process doesn't respond to CLI and so is killed.  The child never seems to be able to be brought up again.

My /var/log/messages file looks like this:

 08:31:45 varnishd[7888]: Child (16669) not responding to CLI, killing it.  
 08:31:45 varnishd[7888]: Child (16669) died signal=3  
 08:31:45 varnishd[7888]: child (25675) Started  
 08:31:45 varnishd[7888]: Child (25675) said Child starts  
 08:31:45 varnishd[7888]: Child (25675) said SMF.s0 mmap'ed 1073741824 bytes of 1073741824  
 08:32:19 varnishd[7888]: Child (25675) not responding to CLI, killing it.  
 08:32:21 varnishd[7888]: Child (25675) not responding to CLI, killing it.  
 08:32:21 varnishd[7888]: Child (25675) died signal=3  

Which doesn't give me a lot to work with.  I couldn't find anything in the documentation about this sort of problem.  I don't want to uninstall Varnish so I decided to rather look for a way to monitor the process.

I first tried Monit but after about two weeks my site was down.  After sshing onto the box and restarting Varnish I checked the monit logs.  Although it was able to recognize that Varnish had crashed, it was not able to successfully bring it back up.

My Monit log looked like this:

 [BST Apr 23 09:07:24] error  : 'varnish' process is not running  
 [BST Apr 23 09:07:24] info   : 'varnish' trying to restart  
 [BST Apr 23 09:07:24] info   : 'varnish' start: /etc/init.d/varnish  
 [BST Apr 23 09:07:54] error  : 'varnish' failed to start  
 [BST Apr 23 09:08:54] error  : 'varnish' process is not running  
 [BST Apr 23 09:08:54] info   : 'varnish' trying to restart  
 [BST Apr 23 09:08:54] info   : 'varnish' start: /etc/init.d/varnish  
 [BST Apr 23 09:09:24] error  : 'varnish' failed to start  
 [BST Apr 23 09:10:24] error  : 'varnish' process is not running  
 [BST Apr 23 09:10:24] info   : 'varnish' trying to restart  
 [BST Apr 23 09:10:24] info   : 'varnish' start: /etc/init.d/varnish  
 [BST Apr 23 09:10:54] error  : 'varnish' failed to start  
 [BST Apr 23 09:11:54] error  : 'varnish' service restarted 3 times within 3 cycles(s) - unmonitor  

My problem sounded a lot like this one on ServerFault so I looked for another way to monitor the process other than using Monit.

Instead of using daemonize, supervisord, or another similar program I'm trying out a simple shell script that I found at http://blog.unixy.net/2010/05/dirty-varnish-monitoring-script/.  The author says it's dirty, and I suppose it is, but it has the advantage of being dead simple and easy to control.   I've set it up as a cron job to run every five minutes.  Hopefully this will be a more effective way to make sure that Varnish doesn't stay dead for very long.

In case the source file goes down I saved a copy as a Gist:

21 April 2015

Fixing when queue workers keep popping the same job off a queue

From Amazon Queue documentation
My (Laravel) project uses a queue system for importing because these jobs can take a fair amount of time (up to an hour) and I want to have them run asynchronously to prevent my users from having to sit and watch a spinning ball.

I created a cron job which would run my Laravel queue work command every 5 minutes.  PHP is not really the best language for long-running processes which is why I elected to rather run a task periodically instead of listening all the time.  This introduced some latency (which I could cut down) but this is acceptable in my use case (imports happen once a month and are only run manually if an automated import fails).

The problem that I faced was that my queue listener kept popping the same job off the queue.  I didn't try running multiple listeners but I'm pretty confident weirdness would have resulted in that case as well.

Fixing the problem turned out to be a matter of configuring the visibility time of my queue.  I was using the SQS provided default of 30 seconds.  Amazon defines the visibility timeout as:
The period of time that a message is invisible to the rest of your application after an application component gets it from the queue. During the visibility timeout, the component that received the message usually processes it, and then deletes it from the queue. This prevents multiple components from processing the same message.  Source: Amazon documentation
This concept is common to queues and exists in various names in Beanstalk and others.  In Beanstalk the setting is called time-to-run and IronMQ refers to it as timeout.  So if your run time exceeds your queue availability timeout then workers will pop off a job that is currently being run in another process.

17 April 2015

Setting up the admin server in HHVM 3.6.0 and Nginx

Hiphop has a built-in admin server that has a lot of useful functions.  I found out about it on an old post on the Hiphop blog.

Since those times Hiphop has moved towards using an ini file instead of a config.hdf file.

On a standard prebuilt HHVM on Ubuntu you should  find the ini file in /etc/hhvm/php.ini

Facebook maintains a list of all the ini settings on Github.

It is into this file that we add two lines to enable the admin server:

 hhvm.admin_server.password = SecretPassword  
 hhvm.admin_server.port = 8888  

I then added a server to Nginx by creating this file: /etc/nginx/conf.d/admin.conf (by default Nginx includes all conf files in that directory):

 server {  
   # hhvm admin  
   listen 8889;  
   location ~ {  
     fastcgi_pass  127.0.0.1:8888;  
     include    fastcgi_params;  
   }  
 }  

Now I can run curl 'http://localhost:8889' from my shell on the box to get a list of commands. Because I host this project with Amazon and have not set up a security rule the port/server are not available to the outside world.  You may want to check your firewall rules on your server.

To run a command add the password as a get variable:

 curl 'http://localhost:8889/check-health?auth=SecretPassword'