27 May 2016

Redirecting non-www urls to www and http to https in Nginx web server

Image: Pixabay
Although I'm currently playing with Elixir and its HTTP servers like Cowboy at the moment Nginx is still my go-to server for production PHP.

If you haven't already swapped your web-server from Apache then you really should consider installing Nginx on a test server and running some stress tests on it.  I wrote about stress testing in my book on scaling PHP.

Redirecting non-www traffic to www in nginx is best accomplished by using the "return" verb.  You could use a rewrite but the Nginx manual suggests that a return is better in the section on "Taxing Rewrites".

Server blocks are cheap in Nginx and I find it's simplest to have two redirects for the person who arrives on the non-secure non-canonical form of my link.  I wouldn't expect many people to reach this link because obviously every link that I create will be properly formatted so being redirected twice will only affect a small minority of people.

Anyway, here's the config:

11 May 2016

Logging as a debugging tool

Image: https://www.pexels.com
Logging is such an important part of my approach to debugging that I sometimes struggle to understand how programmers avoid including logging in their applications.

Having sufficiently detailed logs enables me to avoid having to make assumptions about variable values and program logic flow.

For example, when a customer wants to know why their credit card was charged twice I want to be able to answer with certainty that we processed the transaction only once and be able to produce the data that I sent to the payment provider.

I have three very simple rules for logging that I follow whenever I'm feeling like being nice to future me.  If I hate future me and want him to spend more time answering queries than is needed then I forget these rules:

  1. The first command in any function I write is a debug statement confirming entry into the function
  2. Any time that the script terminates with an error then the error condition is logged, along with the exception message and variable values if applicable.
  3. When I catch an exception then I log that as a debug message in the place that I catch it rather than letting it bubble up the stack.  If I'm the one throwing the exception then I log in the place where I throw it. 
These rules have grown on me from the experience of debugging code and having to deal with an assortment of customer queries that have been escalated to the development team.

By logging the entry into functions I can go back on my logs and see the path that a request took through the code.  Instead of wondering how a particular state of execution came about I have a good trail of functions that led me to that point.  

To me errors that fail silently are the worst possible errors.  I don't expect that the user needs to be alerted to every error or its details, but if I am unable to send a transactional email then I expect that there should be a log of that fact. That might sound self-evident but I've recently worked on a project where we send a mail and don't check if it was successful or not.  This would only occasionally happen and we only noticed something was amiss when a customer complained.  I was not able to determine when the problem arose, how often it happened, or to whom I should resend transactional mails along with an apology.

Logging an exception in the place I catch it has consistently proven to be helpful.  Having a log as close as possible to the source of the error condition helps to narrow down the stack that it occurred in.  This is especially valuable when I rethrow the exception with a user friendly message because I don't lose the technical details of the program state.

Because my logs can get very spammy in production I use the "Fingers Crossed" feature of Monolog.  I prefer this to the alternative of increasing the bar for logging to "info" and above because when an error occurs then I have a verbose track of my program state.  I've created a gist showing the setup in Laravel 5.1 and 5.2 but the approach will work anywhere that you use Monolog.

Another useful trick I've learned is to integrate my application errors into my log aggregating platform.  I use Loggly to aggregate my logs and push application error messages to it.  This lets me easily view my application errors in the context of my various server logs so spotting an nginx error or something in syslog that could contribute to the application problem is a lot easier.  The gist that I linked above shows my Loggly setup, but you can also read their documentation.

Useful and appropriate logging is an indispensable tool for debugging and if you're not working on developing your own logging style to support your approach to debugging then hop on it!

08 April 2016

Are tokens enough to prevent CSRF?

Image: Pixabay
CSRF attacks exploit the trust that a website has in a client like a web browser.  These attacks rely on the website trusting that a request from a client is actually the intention of the person using that client.

An attacker will try to trick the web browser into issuing a request to the server.  The server will assume that the request is valid because it trusts the client.

At its most simple a CSRF attack could involve making a malicious form on a webpage that causes the client to send a POST request to a url.

As an example, imagine that a user called Alice is logged into Facebook in one tab and is browsing the internet on another tab.  A filthy pirate Bob creates a malicious form in a webpage that submits a POST request to Facebook that sends a person to a link of Rick Astley dancing.  Alice arrives on the page we made and Javascript submits the form to Facebook.  Facebook trusts Alice's web browser and there is a valid session for her so it processes the request.  Before she knows it her Facebook status is a link to Rick Astley (who, by the way, will never give you up).

Of course Facebook is not vulnerable to this, and neither should your code be.

The best way to mitigate CSRF attacks is to generate a very random token which you store in Alice's session.  You then make sure that whenever your output a form on your site that you include this token in the form.  Alice will send the token whenever she submits the form and you can compare it to the one stored in her session to make sure that the request is originating from your site.

Bob has no way of knowing what the token in Alice's session is and so he can't trick her browser into submitting it to our site.  Our site will get a request from Alice's client but because it doesn't have the token we can reject it.

In other words the effect of the token is to stop relying on implicit trust for the client and rather set up a challenge response system whereby the client proves it is trustworthy.  If Bob wants to send a request that will be accepted he must find a way to read a token off a form that your site has rendered for Alice.  This is not a trivial task but can possibly be done - there are very creative ways (like this attack) to abuse requests.

Another way to prevent CSRF is to rely on multi-factor authentication.  We can group ways to authenticate into knowledge (where you know something like a password), possession (where you have something like a USB dongle), or inherent (where you are something).

Instead of just relying on one of these mechanisms we can use two (or more) in order to authenticate.  For example we can ask a person for a password and also require that they enter a code sent to the mobile phone which proves they have the mobile phone linked to their account.

CSRF will become much harder for Bob to accomplish if our form is protected with multi-factor authentication (MFA).  Of course this comes with a user experience cost so only critical forms need to be protected with MFA.  For less critical forms the single authentication method of a CSRF token will suffice.

There is debate around whether it is useful to check whether the referrer header matches your site is helpful in deterring CSRF.  It is true that it is trivial to spoof this header in a connection that you control.  However it is more difficult to get this level of control in a typical CSRF attack where browsers will rewrite the referrer header in an ajax call (see the specification).  By itself it is not sufficient to deter CSRF, but it can raise the difficulty level for attackers.

Cookies should obviously not be used to mitigate CSRF.  They are sent along with any request to the domain whether the user intended to make the request or not.

Setting a session timeout window can help a little bit as it will narrow the window that requests will be trusted by your application.  This will also improve your session security by making it harder for fixation attacks to be effective.

Tokens are the most convenient way to make CSRF harder to accomplish on your site.  When used in conjunction with referrer checks and a narrow session window you can make it significantly harder for an opponent to accomplish a successful attack.

For critically important forms multi-factor authentication are the way to go.  They interrupt the user experience and enforce explicit authentication.  This has a negative affect on your UX but makes it impossible (I think!) for an automated CSRF attack to be effective.

11 March 2016

Exploring Russian Doll Caching

This technique was developed in the Ruby community and is a great way to approach caching partial views. In Ruby rendering views is more expensive than PHP, but this technique is worth understanding as it could be applied to data models and not just views.

In the Ruby world Russian Doll caching is synonymous with key-based expiration caching.  I think it's useful to rather view the approach as being the blend of two ideas.  That's why I introduce key-based expiration separately.

Personally I think Russian Dolls are a bit of a counter-intuitive analogy.  Real life Russian Dolls each contain one additional doll, but the power of this technique rests on the fact that "dolls" can contain many other "dolls".  I find the easiest way to think about it is to say that if a child node is invalidated then its siblings and their children are not affected.  When the parent is regenerated those sibling nodes do not need to be rendered again.

Cache Invalidation

I use Laravel which luckily allows the use of tagging cache entries as a way of grouping them.  I started the habit of tagging my cache keys with the name of the model, then whenever I update the model I invalidate the tag, which clears out all the related keys for that model.

In the absence of the ability to tag keys the next best approach to managing cache invalidation is to use key-based expiration.

The idea behind key-based expiration is to change your key name by adding a timestamp.  You store the timestamp separately and fetch it whenever you want to fetch the key.

If you change the value in the key then the timestamp changes.  This means the key name changes and so does the stored timestamp.  You'll always be able to get the most recent value, and Memcached or Redis will handle expiring the old key names.

The practical effect of this strategy is that you must change your model to update the stored timestamp whenever you change the cache.  You also have to retrieve the current timestamp whenever you want to get something out of the cache.

Nested view fragments, nested cache structure

Typically a page is rendered as a template which is filled out with a view.  Blocks are inserted into the view as partial views, and these can be nested.

The idea behind Russian Doll caching is to cache each nested part of the page.  We use cache keys that mimic the frontend nesting.

If a view fragment cache key is invalidated then all of the wrapping items keys are also invalidated.  We'll look at how to implement this in a moment.

The wrapping items are invalidated, so will need to be rendered, but the *other* nested fragments that have not changed still remain in the cache and can be reused.  This means that only part of the page will need to be rendered from scratch.

I find the easiest way to think about it is to say that if a child node is invalidated then its siblings and their children are not affected.  When the parent is regenerated those sibling nodes do not need to be rendered again.

Implementing automatically busting containing layers

We can see that the magic of Russian Doll caching lies in the ability to bust the caches of the wrapping layers.  We'll use key-based expiration together with another refinement to implement this.

The actual implementation is non-trivial and you'll be needing to write your own helper class.  There are Github projects like Corollarium which implement Russian Doll caching for you.

In any case lets outline the requirements.

Lets have a two level cache, for simplicity, that looks like this:

Parent (version 1)
- Child (version 1)
- Child (version 1)
- Child (version 1)

I've created a basic two tier cache where every item is at version 1, freshly generated.  Expanding this to multiple tiers requires being able to let children nodes act as parents, but while I'm busy talking through this example lets constrain ourselves to just having one parent and multiple children.

Additional cache storage needs

First lets define our storage requirements.

We want keys to be automatically invalidated when they are updated and key-based expiration is the most convenient way to accomplish this.

This means that we'll have a value stored for each of them that holds the most recent value.  Currently all of these values are "version 1".

In addition to storing the current version of each key we will also need to store and maintain a list of dependencies for the key.  These are cache items which the key is built from.

We need to be certain that the dependencies have not changed since our current item was cached.  This means that our dependency list must store the version that each dependency was at when the current key was generated.

The parent node will need to store its list of dependencies and the version that they were when it was cached.  When we retrieve the parent key we need to check its list of dependencies and make sure that none of them have changed.

Putting it together

Now that we've stored all the information we need to manage our structure, lets see how it works.

Lets say that one of the children changes and is now version 2.  We update the key storing its most current value as part of the update to the value, using our key based expiration implementation.

On the next page render our class will try to pull the parent node from cache.  It first inspects the dependency list and it realises that one of the children is currently on version 2 and not the same version it was when the parent was cached.

We invalidate the parent cache object when we discover a dependency has changed.  This means we need to regenerate the parent.  We may want to implement a dogpile lock for this, if you're expecting concurrency on the page.

Only the child that has changed needs to be regenerated, and not the other two.  So the parent node can be rebuilt by generating one child node and reading the other two from cache.  This obviously results in a much less expensive operation.

03 February 2016

Working with classic ASP years after it died

I searched for "dead clown" but all the pictures were too
disturbing.  I suppose that's kind of like the experience
of trying to get classic ASP up and running with todays libraries
I'm having to work on a legacy site that runs on classic ASP.  The real challenge is trying to get the old code to run on my Ubuntu virtual machine.

There is a lot of old advice on the web and most of it was based on much older software versions, but I persevered and have finally managed to get classic ASP running on Apache 2.4 in Ubuntu.

The process will allow you to have a shot at getting your code running, but my best advice is to use a small Windows VM.  There's no guarantee that your code will actually compile and run using this solution, and the effort required is hardly worthwhile.

The Apache module you're looking for is Apache::ASP.  You will need to build it manually and be prepared to copy pieces of it to your perl include directories.  You will also need to manually edit one of the module files.

The best instructions I found for getting Apache::ASP installed were on the cspan site.  You'll find the source tarball on the cpan modules download page.

I'm assuming that you're able to install the pre-requisites and build the package by following those instructions.  I was able to use standard Ubuntu packages and didn't have to build everything from source:

 sudo apt-get install libapreq2-3 libapache2-request-perl  

Once you've built and installed Apache::ASP you need to edit your apache.conf file to make sure it's loaded:

 PerlModule Apache2::ASP  
  # All *.asp files are handled by Apache2::ASP  
  <Files ~ (\.asp$)>  
   SetHandler perl-script  
   PerlHandler Apache::ASP  

If you try to start Apache at this point you will get an error something like Can't locate Apache2/ASP.pm in @INC (you may need to install the Apache2::ASP module)

Unfortunately the automated installs don't place the modules correctly.  I'm not a perl developer and didn't find an easy standard way to add an external path to the include path, so I just copied the modules into my existing perl include path.  You'll find the requested files in the directory where you build Apache2::ASP

The next problem that I encountered was that Apache 2.4 has a different function name to retrieve the ip of the connecting request.  You'll spot an error in your log like this : Can't locate object method "remote_ip" via package "Apache2::Connection".

The bug fix is pretty simple and is documented at cpan.  You'll need to change line 85 of StateManager.pm.  You'll find the file in the directory where you copied the modules into the perl include directory, and its location is in your error log.

 # See https://rt.cpan.org/Public/Bug/Display.html?id=107118  
 Change line 85:  
     $self->{remote_ip}     = $r->connection()->remote_ip();  
     if (defined $r->useragent_ip()) {  
         $self->{remote_ip} = $r->useragent_ip();  
     } else {  
         $self->{remote_ip} = $r->connection->remote_ip();  

Finally after all that my code doesn't run because of compile issues - but known good test code does work.

This is in no way satisfactory for production purposes, but does help in getting a development environment up and running.

25 January 2016

Laravel - Using route parameters in middleware

I'm busy writing an application which is heavily dependent on personalized URLs.  Each visitor to the site will have a PURL which I need to communicate to the frontend so that my analytics tags can be associated with the user.

Before I go any further I should note that I'm using Piwik as my analytics package, and it respects "Do Not Track" requests.  We're not using this to track people, but we are tying it to our clients existing database of their user interests.

I want the process of identifying the user to be as magical as possible so that my controllers can stay nice and skinny.  Nobody likes a fat controller right?

I decided to use middleware to trap all my web requests to assign a "responder" to the request.  Then I'll use a view composer to make sure that all of the output views have this information readily available.

The only snag in this plan was that the Laravel documentation was a little sketchy on how to get the value of the request parameter in middleware.  It turns out that the syntax I was looking for was $request->route()->parameters()which neatly returns the route parameters in my middleware.

The result is that every web request to my application is associated with a visitor in my database and this unique id is sent magically to my frontend analytics.

So, here are enough of the working pieces to explain what my approach was:

19 January 2016

Using OpenSSH to setup an SFTP server on Ubuntu 14.04

I'm busy migrating an existing server to the cloud and need to replicate the SFTP setup.  They're using a password to authenticate a user and then uploading data files for a web service to consume.

YMMV - My use case is pretty specific to this legacy application so you'll need to give consideration to the directories you use.

It took a surprising amount of reading to find a consistent set of instructions so I thought I should document the setup from start to finish.

Firstly, I set up the group and user that I will be needing:

 groupadd sftponly  
 useradd -G sftponly username  
 passwd username  

Then I made a backup copy of and then edited /etc/ssh/sshd_config

Right at the end of the file add the following:

  Match group sftponly   
    ChrootDirectory /usr/share/nginx/html/website_directory/chroot   
    X11Forwarding no   
    AllowTcpForwarding no   
    ForceCommand internal-sftp -d /uploads   

For some reason if this block appears before the UsePAM setting then your sshd_config is borked and you won't be able to connect to port 22.

We force the user into the /uploads directory by default when they login using the ForceCommand setting.

Now change the Subsystem setting.  I've left the original as a comment in here.  The parameter "-u 0002" sets the default umask for the user.

 #Subsystem sftp /usr/lib/openssh/sftp-server  
 Subsystem sftp internal-sftp -u 0002  

I elected to place the base chroot folder inside the website directory for a few reasons.  Firstly, this is the only website or service running on this VM so it doesn't need to play nicely with other use cases.  Secondly I want the next sysadmin who is trying to work out how this all works to be able to immediately spot what is happening when she looks in the directory.

Then because my use case demanded it I enabled password logins for the sftp user by finding and changing the line in /etc/ssh/sshd_config like this:

 # Change to no to disable tunnelled clear text passwords  
 PasswordAuthentication yes  

The base chroot directory must be owned by root and not be writeable by any other groups.

cd /usr/share/nginx/html/website_directory
mkdir chroot
chown root:root chroot/  
chmod 755 chroot/  

If you skip this step then your connection will be dropped with a "broken pipe" message as soon as you connect.  Looking in your /var/log/auth.log file will reveal errors like this: fatal: bad ownership or modes for chroot directory

The next step is to make a directory that the user has write privileges to.  The base chroot folder is not writeable by your sftp user, so make an uploads directory and give them "writes" (ha!) to it:

 mkdir uploads  
 chown username:username uploads  
 chmod 755 uploads  

If you skip that step then when you connect you won't have any write privileges.  This is why we had to create a chroot base directory and then place the uploads folder off it.  I chose to stick the base in the web directory to make it obvious to spot, but obviously in more general cases you would place this in more sensible locations.

Finally I link the uploads directory in the chroot jail to the uploads directory where the web service expects to find files.

 cd /usr/share/nginx/html/website_directory  
 ln -s chroot/uploads uploads  

I feel a bit uneasy about a password login being used to write files to a directory being used by a webservice, but in my particular use case my firewall whitelists our office IP address on port 22.  So nobody outside of our office can connect.  I'm also using fail2ban just in case somebody manages to get access to our VPN.