28 June 2016

Why am I so late on the Bitcoin train?

Image: Pixabay
I've been somewhat of a Bitcoin sceptic for quite some time.  When it first became a thing I was worried that governments would legislate it out of existence.

It has had a pretty bad rap of being associated with the dark web and it is definitely the choice of currency for malware authors.

In its normal usage Bitcoin is more transparent than cash.  If I give you a cash note there is no permanent record of the transaction and the tax man can't get a sniff into our business.

Governments hate transactions they can't tax or police and so in the beginning there was a concern that Bitcoin would be outlawed.

In contrast to cash, if I transfer you Bitcoin then there is a record of the transaction that anybody in the world can inspect.  It's possible to trace the coins in your Bitcoin wallet back through the various people who owned them.  Anybody in the world can watch the contents of your wallet and see where you spend your money.

This is exactly the sort of thing that governments love.

Of course not everybody wants to share their transactions with the world and so there are Bitcoin laundering services that attempt to anonymise the coins in your wallet.  This puts us back to square one with Bitcoin being very convenient for criminals to use in order to evade financial intelligence controls.

I suspect that the process of banning Bitcoin transactions would impinge too much on the freedom of citizens.  Some governments are talking about banning cryptography in order to maintain surveillance on their citizens so it's not a stretch to imagine them being displeased with Bitcoin laundering services.  *sigh*.

Anyway, back to my killer app for Bitcoin... I've recently emigrated and am still paying debt in South Africa.  Sending money back to South Africa costs about £30 and takes 2-4 days if I use the banking system.  If I use Bitcoin the process costs around £ 3 and I can have the money in my South African bank account on the very same day.

Bitcoin costs one tenth the price of using banks and is at least twice as fast.

My transaction is not at all anonymous and a government can trace the funds to me on either end of the transaction where copies of my passport are stored with the exchanges.  If I wanted to hide this from thieves, the government, spear-phishers, and other people who want to take my money without giving anything in return then I would use a Bitcoin laundry service.

10 June 2016

Restarting BOINC automatically

Image: https://boinc.berkeley.edu, fair use
BOINC is a program curated by the University of Berkeley that allows people around the world to contribute to science projects.  

It works by using spare cycles from your computer to perform calculations that help do things like folding proteins to find candidates for cancer treatment, mapping the milky way galaxy, searching for pulsar stars, and improving our understanding of climate change and its effects.

It runs as a background process and is easily configured to only run in certain conditions - like when you haven't used your computer for 10 minutes for example.

It comes with a nifty GUI manager and for most people using it on their desktop this post is not going to be at all relevant.  This post deals with the case where a person is running it on a server without the GUI manager.

Anyway, the easiest solution I found to restarting BOINC on a headless server was to use supervisord.  It's pretty much the "go to" tool for simple process management and adding the BOINC program was as easy as would be expected:

Here's the program definition from my /etc/supervisord.conf file:

 command=sh /root/boinc/startup.sh  

I use a script to restart BOINC because I want to make sure that I get reconnected to my account manager in case something goes wrong.

Here's what /root/boinc/startup.sh script looks like:

 /etc/init.d/boinc-client start  
 sleep 10  
 boinccmd --join_acct_mgr http://bam.boincstats.com <user> <pass>  

If BOINC crashes it will automatically get restarted and reconnected to my account manager.  This means I don't need to monitor that process on all the servers I install it on.

01 June 2016

Associating Vagrant 1.7.2 with an existing VM

My Vagrant 1.7.2 machine bugged out and when I tried to `vagrant up` it spawned a new box instead of bringing up my existing machine.

Naturally this was a problem because I had made some manual changes to the config that I hadn't had a chance to persist to my puppet config files yet.

To fix the problem I found used the command `VBoxManage list vms` in the directory where my Vagrantfile is.  This provided me a list of the machine images it could find.

I then went and edited the file at .vagrant/machines/default/virtualbox/id and replaced the UUID that was in there with the one that the VBoxManage command had output.

Now when I run 'vagrant up' it spins up the correct VM.  Happy days.

27 May 2016

Redirecting non-www urls to www and http to https in Nginx web server

Image: Pixabay
Although I'm currently playing with Elixir and its HTTP servers like Cowboy at the moment Nginx is still my go-to server for production PHP.

If you haven't already swapped your web-server from Apache then you really should consider installing Nginx on a test server and running some stress tests on it.  I wrote about stress testing in my book on scaling PHP.

Redirecting non-www traffic to www in nginx is best accomplished by using the "return" verb.  You could use a rewrite but the Nginx manual suggests that a return is better in the section on "Taxing Rewrites".

Server blocks are cheap in Nginx and I find it's simplest to have two redirects for the person who arrives on the non-secure non-canonical form of my link.  I wouldn't expect many people to reach this link because obviously every link that I create will be properly formatted so being redirected twice will only affect a small minority of people.

Anyway, here's the config:


11 May 2016

Logging as a debugging tool

Image: https://www.pexels.com
Logging is such an important part of my approach to debugging that I sometimes struggle to understand how programmers avoid including logging in their applications.

Having sufficiently detailed logs enables me to avoid having to make assumptions about variable values and program logic flow.

For example, when a customer wants to know why their credit card was charged twice I want to be able to answer with certainty that we processed the transaction only once and be able to produce the data that I sent to the payment provider.

I have three very simple rules for logging that I follow whenever I'm feeling like being nice to future me.  If I hate future me and want him to spend more time answering queries than is needed then I forget these rules:

  1. The first command in any function I write is a debug statement confirming entry into the function
  2. Any time that the script terminates with an error then the error condition is logged, along with the exception message and variable values if applicable.
  3. When I catch an exception then I log that as a debug message in the place that I catch it rather than letting it bubble up the stack.  If I'm the one throwing the exception then I log in the place where I throw it. 
These rules have grown on me from the experience of debugging code and having to deal with an assortment of customer queries that have been escalated to the development team.

By logging the entry into functions I can go back on my logs and see the path that a request took through the code.  Instead of wondering how a particular state of execution came about I have a good trail of functions that led me to that point.  

To me errors that fail silently are the worst possible errors.  I don't expect that the user needs to be alerted to every error or its details, but if I am unable to send a transactional email then I expect that there should be a log of that fact. That might sound self-evident but I've recently worked on a project where we send a mail and don't check if it was successful or not.  This would only occasionally happen and we only noticed something was amiss when a customer complained.  I was not able to determine when the problem arose, how often it happened, or to whom I should resend transactional mails along with an apology.

Logging an exception in the place I catch it has consistently proven to be helpful.  Having a log as close as possible to the source of the error condition helps to narrow down the stack that it occurred in.  This is especially valuable when I rethrow the exception with a user friendly message because I don't lose the technical details of the program state.

Because my logs can get very spammy in production I use the "Fingers Crossed" feature of Monolog.  I prefer this to the alternative of increasing the bar for logging to "info" and above because when an error occurs then I have a verbose track of my program state.  I've created a gist showing the setup in Laravel 5.1 and 5.2 but the approach will work anywhere that you use Monolog.

Another useful trick I've learned is to integrate my application errors into my log aggregating platform.  I use Loggly to aggregate my logs and push application error messages to it.  This lets me easily view my application errors in the context of my various server logs so spotting an nginx error or something in syslog that could contribute to the application problem is a lot easier.  The gist that I linked above shows my Loggly setup, but you can also read their documentation.

Useful and appropriate logging is an indispensable tool for debugging and if you're not working on developing your own logging style to support your approach to debugging then hop on it!


08 April 2016

Are tokens enough to prevent CSRF?

Image: Pixabay
CSRF attacks exploit the trust that a website has in a client like a web browser.  These attacks rely on the website trusting that a request from a client is actually the intention of the person using that client.

An attacker will try to trick the web browser into issuing a request to the server.  The server will assume that the request is valid because it trusts the client.

At its most simple a CSRF attack could involve making a malicious form on a webpage that causes the client to send a POST request to a url.

As an example, imagine that a user called Alice is logged into Facebook in one tab and is browsing the internet on another tab.  A filthy pirate Bob creates a malicious form in a webpage that submits a POST request to Facebook that sends a person to a link of Rick Astley dancing.  Alice arrives on the page we made and Javascript submits the form to Facebook.  Facebook trusts Alice's web browser and there is a valid session for her so it processes the request.  Before she knows it her Facebook status is a link to Rick Astley (who, by the way, will never give you up).

Of course Facebook is not vulnerable to this, and neither should your code be.

The best way to mitigate CSRF attacks is to generate a very random token which you store in Alice's session.  You then make sure that whenever your output a form on your site that you include this token in the form.  Alice will send the token whenever she submits the form and you can compare it to the one stored in her session to make sure that the request is originating from your site.

Bob has no way of knowing what the token in Alice's session is and so he can't trick her browser into submitting it to our site.  Our site will get a request from Alice's client but because it doesn't have the token we can reject it.

In other words the effect of the token is to stop relying on implicit trust for the client and rather set up a challenge response system whereby the client proves it is trustworthy.  If Bob wants to send a request that will be accepted he must find a way to read a token off a form that your site has rendered for Alice.  This is not a trivial task but can possibly be done - there are very creative ways (like this attack) to abuse requests.

Another way to prevent CSRF is to rely on multi-factor authentication.  We can group ways to authenticate into knowledge (where you know something like a password), possession (where you have something like a USB dongle), or inherent (where you are something).

Instead of just relying on one of these mechanisms we can use two (or more) in order to authenticate.  For example we can ask a person for a password and also require that they enter a code sent to the mobile phone which proves they have the mobile phone linked to their account.

CSRF will become much harder for Bob to accomplish if our form is protected with multi-factor authentication (MFA).  Of course this comes with a user experience cost so only critical forms need to be protected with MFA.  For less critical forms the single authentication method of a CSRF token will suffice.

There is debate around whether it is useful to check whether the referrer header matches your site is helpful in deterring CSRF.  It is true that it is trivial to spoof this header in a connection that you control.  However it is more difficult to get this level of control in a typical CSRF attack where browsers will rewrite the referrer header in an ajax call (see the specification).  By itself it is not sufficient to deter CSRF, but it can raise the difficulty level for attackers.

Cookies should obviously not be used to mitigate CSRF.  They are sent along with any request to the domain whether the user intended to make the request or not.

Setting a session timeout window can help a little bit as it will narrow the window that requests will be trusted by your application.  This will also improve your session security by making it harder for fixation attacks to be effective.

Tokens are the most convenient way to make CSRF harder to accomplish on your site.  When used in conjunction with referrer checks and a narrow session window you can make it significantly harder for an opponent to accomplish a successful attack.

For critically important forms multi-factor authentication are the way to go.  They interrupt the user experience and enforce explicit authentication.  This has a negative affect on your UX but makes it impossible (I think!) for an automated CSRF attack to be effective.


11 March 2016

Exploring Russian Doll Caching

This technique was developed in the Ruby community and is a great way to approach caching partial views. In Ruby rendering views is more expensive than PHP, but this technique is worth understanding as it could be applied to data models and not just views.

In the Ruby world Russian Doll caching is synonymous with key-based expiration caching.  I think it's useful to rather view the approach as being the blend of two ideas.  That's why I introduce key-based expiration separately.

Personally I think Russian Dolls are a bit of a counter-intuitive analogy.  Real life Russian Dolls each contain one additional doll, but the power of this technique rests on the fact that "dolls" can contain many other "dolls".  I find the easiest way to think about it is to say that if a child node is invalidated then its siblings and their children are not affected.  When the parent is regenerated those sibling nodes do not need to be rendered again.

Cache Invalidation

I use Laravel which luckily allows the use of tagging cache entries as a way of grouping them.  I started the habit of tagging my cache keys with the name of the model, then whenever I update the model I invalidate the tag, which clears out all the related keys for that model.

In the absence of the ability to tag keys the next best approach to managing cache invalidation is to use key-based expiration.

The idea behind key-based expiration is to change your key name by adding a timestamp.  You store the timestamp separately and fetch it whenever you want to fetch the key.

If you change the value in the key then the timestamp changes.  This means the key name changes and so does the stored timestamp.  You'll always be able to get the most recent value, and Memcached or Redis will handle expiring the old key names.

The practical effect of this strategy is that you must change your model to update the stored timestamp whenever you change the cache.  You also have to retrieve the current timestamp whenever you want to get something out of the cache.

Nested view fragments, nested cache structure

Typically a page is rendered as a template which is filled out with a view.  Blocks are inserted into the view as partial views, and these can be nested.

The idea behind Russian Doll caching is to cache each nested part of the page.  We use cache keys that mimic the frontend nesting.

If a view fragment cache key is invalidated then all of the wrapping items keys are also invalidated.  We'll look at how to implement this in a moment.

The wrapping items are invalidated, so will need to be rendered, but the *other* nested fragments that have not changed still remain in the cache and can be reused.  This means that only part of the page will need to be rendered from scratch.

I find the easiest way to think about it is to say that if a child node is invalidated then its siblings and their children are not affected.  When the parent is regenerated those sibling nodes do not need to be rendered again.

Implementing automatically busting containing layers

We can see that the magic of Russian Doll caching lies in the ability to bust the caches of the wrapping layers.  We'll use key-based expiration together with another refinement to implement this.

The actual implementation is non-trivial and you'll be needing to write your own helper class.  There are Github projects like Corollarium which implement Russian Doll caching for you.

In any case lets outline the requirements.

Lets have a two level cache, for simplicity, that looks like this:

Parent (version 1)
- Child (version 1)
- Child (version 1)
- Child (version 1)

I've created a basic two tier cache where every item is at version 1, freshly generated.  Expanding this to multiple tiers requires being able to let children nodes act as parents, but while I'm busy talking through this example lets constrain ourselves to just having one parent and multiple children.

Additional cache storage needs

First lets define our storage requirements.

We want keys to be automatically invalidated when they are updated and key-based expiration is the most convenient way to accomplish this.

This means that we'll have a value stored for each of them that holds the most recent value.  Currently all of these values are "version 1".

In addition to storing the current version of each key we will also need to store and maintain a list of dependencies for the key.  These are cache items which the key is built from.

We need to be certain that the dependencies have not changed since our current item was cached.  This means that our dependency list must store the version that each dependency was at when the current key was generated.

The parent node will need to store its list of dependencies and the version that they were when it was cached.  When we retrieve the parent key we need to check its list of dependencies and make sure that none of them have changed.

Putting it together

Now that we've stored all the information we need to manage our structure, lets see how it works.

Lets say that one of the children changes and is now version 2.  We update the key storing its most current value as part of the update to the value, using our key based expiration implementation.

On the next page render our class will try to pull the parent node from cache.  It first inspects the dependency list and it realises that one of the children is currently on version 2 and not the same version it was when the parent was cached.

We invalidate the parent cache object when we discover a dependency has changed.  This means we need to regenerate the parent.  We may want to implement a dogpile lock for this, if you're expecting concurrency on the page.

Only the child that has changed needs to be regenerated, and not the other two.  So the parent node can be rebuilt by generating one child node and reading the other two from cache.  This obviously results in a much less expensive operation.