28 December 2017

Component cohesion

Image: Pixabay
Breaking your application down into components can be a useful approach to a "divide and conquer" methodology.  Assigning specific behaviour to a component and then defining interfaces for other components to access it allows you to develop a service driven architecture. 

I'm in the process of decomposing a monolithic application into services that will eventually become standalone micro-services.  Part of the task ahead lies in determining the service boundaries, which are analogous to software components for my micro-service application. 

I want components to be modular to allow them to be developed and deployed as independently as possible.  I'm using the approach suggested by Eric Evans in his book on domain driven design where he describes the concept of "bounded contexts".  I like to think of a bounded context as being for domain models as a namespace is for classes.  These contexts are spaces where a domain model defined in the Ubiquitous Domain Language will have a precise and consistent meaning.  Keeping components modular helps to define and maintain these boundaries.

I want my components to be cohesive because I want my architecture to be so simple that people wonder why we need an architect at all.  It should be intuitively obvious why a group of classes belong together in a component and what part of my domain logic they're implementing.  Cohesion is a good thing and we're all familiar with writing cohesive classes, but what principals are important to consider when looking at grouping up classes into cohesive components

Robert C Martin discusses three important concepts that govern component cohesion on his website (here)

  • Release-Reuse equivalency principle (REP) - the granule of release is the granule of reuse
  • Common Closure principle (CCP) - classes that change together are packaged together
  • Common Reuse principle (CRP) - Classes that are used together are packaged together

The Release-Reuse equivalence principle (REP) is very simple.  It states that classes that are packaged together into a component need to be able to be released together.  In practice this boils down to properly versioning your releases and having all of the classes in your component versioned and released together.

The Common Closure principle (CCP) states that you should gather together classes that change for the same reasons and the same times.  Conversely you should separate out classes that change for different reasons and at different times. 

Remember that the S of SOLID stands for "single responsibility principle" (SRP) where a class should have only one reason to change?  The CCP is for components what the SRP is for classes.

We can say that generally stuff that changes together should be kept together.

The Common Reuse principle (CRP) states that you should not force users of a component to depend on things they don't need. 

The CRP more strongly suggests that we do not include classes into a component that are not tightly bound to the function of the component.  Every time we touch one of those classes we will be forced to retest all of the client applications that are using the component.  Our deployment process will be heavier than it needs to be, and crucially we'll be deploying more than we have to.

The CRP is a more general form of the interface segregation principle but suggests that a component should be built from classes that are commonly used together. 

Generally speaking, we should avoid depending on things that we don't need.

We've seen three principles that govern how we group up classes into components.  The REP and CCP are inclusive about grouping up classes and suggest what classes do belong together.  The CRP is more strong about excluding classes from a component.  There is therefore a balance to be walked between these principals. 

Tim Ottinger suggested a diagram that helps to see the cost of abandoning a principle.  The label on an edge is the cost of weakening adherence to the principle on the opposite vertex.  So, for example the cost of abandoning CCP is that we have too many components changing at one time.

Diagram suggested by Tim Ottinger illustrating tension between component cohesion principles
Your application will fall somewhere within this triangle as you balance your focus between the principles. 

This balance is dynamic and changes over time.  Robert C Martin notes that “A good architect finds a position in that tension triangle that meets the _current_ concerns of the development team, but is also aware that those concerns will change over time.

These principles will govern how I examine my monolith and identify classes that I can group together to form components.


27 December 2017

Writing SOLID Laravel code

Image: Pixabay
SOLID is a mnemonic acronym for five object-oriented design principals that are intended to make software designs more understandable (see Wikipedia). They were promoted by a chap called Robert C Martin who has been programming since before I was born and is an authority on writing clean code.

 Laravel is a PHP framework that implements the model-view-controller (MVC) pattern. A lot of people think that their responsibility for OOP design ends with adopting a framework, but actually Laravel is relatively un-opinionated on your OOP design and you still need to think about writing code that is testable and maintainable.

 The reason that SOLID principals matter becomes apparent when you work on a single project for a long time. If you're writing throwaway applications for clients that you never expect to work on again (presumably because the client won't hire you again) then the quality of your code doesn't matter. But if you're the guy stuck with managing and implementing change in an application that is actively growing and improving then you're going to want code that is easy to change, easy to test, and easy to deploy.

The most common problem I've seen in Laravel is "fat controllers" that use the ORM to get some data and then push it through to a view.  Let's take a look at an example I've made.  Imagine that we're writing a payroll program.  We might write something like the following controller methods:

This is an unfortunately common Laravel pattern that is taught in countless tutorials. We call the model from the controller, format the data, and then pass it on to the view. This is the easiest way to teach the MVC pattern to beginners but unfortunately it violates the SOLID principals. Let's see why, and how we can improve this code.

The "S" in SOLID stands for single responsibility principal which requires that each module or class should have a single responsibility for the functionality of the application.  A more subtle understanding is put forward by Robert C Martin who says that "A class should have only one reason to change".

The thinking behind limiting the reasons for changing a class comes from the observation that software is often developed by teams and that often a team is implementing a feature for a particular actor.  In our example the CEO of the company will have different requirements from the CFO, and when either of them requests a change then we want to limit the impact of that change.  The actor is the reason for software to change - they request a feature and a team goes ahead and implements it.

In our controller above if the CEO requested a change then that change would definitely affect the CFO.  The teams working on the code would need to merge in code from each other.  If our code was properly designed then the controller class would be responsible to just one actor.

In this example I've moved the responsibility for calculating the employee pay to its own object.  This object will only change if the CFO requests a change to the way that wages are calculated and so adheres to the single responsibility principal.  We would similarly have an object that is responsible for counting the hours.  I've chosen this way of solving the problem because the Facade pattern is very loaded in Laravel and I think it would just muddy the waters to use it here.

Let's move on to "O" which is the open-closed principal which requires that "A software artifact should be open for extension but closed for modification".  It was developed by Betrand Meyer and holds that you should be able to extend on a modules functionality without having to change that module.

The aim of the OCP is to protect important code that implements high level policies from changes made to code that implements low-level details.  We want the part of our code that is central to our application to be insulated from changes in other parts of the application.

There is some level of separation in our Laravel application.  We can make a change to the View without there being any impact on the Controller, but within the controller above we have no such insulation.  If we make a change to the way we read the database then we will be affecting exactly the same function that is responsible for calculating wages!

The open-closed principal seeks to prevent you from changing core functionality as a side-effect of adding new functionality to your application.  It works by separating out the application into a hierarchy of dependencies.  You can extend functionality from the lower levels of the hierarchy without changing the code in the higher levels.

The "L" in SOLID is named for Barbara Liskov who put forward what is now known as the Liskov substitution principal.  The principal holds that "if S is a subtype of T, then objects of type T in a program may be replaced with objects of type S without altering any of the desirable properties of that program".

In the example above I've amended the object to make it inherit from an interface.  Both the PermanentEmployeePayCalculator class and the TemporaryEmployeePayCalculator implement this interface and can be substituted for each other.  This makes a lot more sense if you consider an LSP violation, such as this one:

This violates the Liskov Substitution Principal because the methods have got different signatures. You cannot substitute the subtypes of PayCalculator each other because they're incompatible.  The object that depended on them would need to implement some logic to be able to know how many parameters to pass to the method.  Adhering to the Liskov substitution principal removes this need and removes special cases from your code.

Adhering to the Liskov substitution principal
The "I" in SOLID was proposed by Robert C Martin and stands for interface segregation.  The idea is that code should not be made to depend on methods that it does not use.  By reducing the dependencies between classes you help to decouple your code making it easier to make changes in one section without impacting on others.

Let's imagine that we separated out our controller into classes like the below diagram.  We have an Employee data object that is responsible for interacting with the persistence layer and returning results.  It has a method that the PayCalculator object uses to determine whether the employee needs to earn their overtime rate and a method that both objects use to fetch the list of hours that an employee has worked (which may or may not violate the single responsibility principal).

Violation of the ISP

The problem here is that the HoursReporter is forced to depend on the isHourOvertime() function.  This introduces an additional coupling between the classes that we need to avoid if we want to adhere to the interface segregation principal.
Adhering to the ISP

We can easily solve this problem by declaring an interface for the classes to depend on.  The interface for the HoursReporter class excludes the function that we do not want to depend on.

The last letter in SOLID is "D" which stands for the dependency inversion principal which holds that the most flexible modules are those in which source code dependencies refer only to abstractions rather than concretions.

To understand dependency inversion consider two things: flow of control and source code dependency.  We want to be able to have our source code dependencies to be independent of how control flows through our application.

Some classes and files in our application are more prone to change than others.  They are "volatile" classes.  We want to minimise the effect of the changes in these classes to the more stable classes.  Ideally we want our business logic to be very stable and highly insulated from changes elsewhere in our system.

In the diagram below I'm illustrating a source code dependency hierarchy.  High level classes call functions in lower level classes, but in order to do so they need to depend on that class.  This means that your source code dependencies are unavoidably tied to how your flow of control works.

Source code dependency hierarchy
The problem that arises from this dependency is that it becomes difficult to swap functionality.  A change to the source code of a low level object means that we need to rebuild all of the files that depend on it;  Admittedly rebuilding files in PHP is less of an issue than for statically typed languages that are built in advance, but we are still directly impacting on files other than the one we are touching.

Let's say, for example, we had a class that outputs the Employee wages to screen.  In the diagram above we would see the Employee object as the High Level object and perhaps a "ScreenOutput" object as a low-level object.  Our Employee object calls the ScreenOutput class directly, and so we have to mention the source code in Employee, like this:

Now our CFO asks us to be able to print out the wages using the black and white printer in her office.  Uh-oh, now we need to rewrite our source-code dependency because the "use" statement refers specifically to a concretion.

What happens if we want to make a change to the way that wages are displayed on the screen?  We can easily tweak the ScreenOutput object, but can we deploy it separately?  What impact is it going to have on all the places that depend on it?

How could we fix this problem and allow ourselves to swap functionality in and out without affecting our source code dependencies?  How do we actually decouple these objects?

The answer is to always depend on abstractions rather than concretions.  This insulates you from changes in the underlying files and lets you change and deploy parts of your application separately.

Using an interface to implement dependency inversion
In the diagram above the Employee object is calling the ScreenOutput class method through an interface.  The class has a source code dependency on the interface file (which shouldn't change often) and any code change in the ScreenOutput class will not affect the Employee object.

The rules to follow for the dependency inversion principal are:

  1. Do not reference volatile concrete classes 
  2. Do not derive from volatile concrete classes
  3. Do not override concrete functions
  4. Never mention the name of anything concrete and volatile

One way that you can accomplish this is through using a Factory to instantiate volatile concrete classes.  This removes the requirement to have a source code dependency on the class that you're instantiating in the object where you need it.

Laravel approaches dependency inversion by using a "service container".  Your code no longer depends on a concrete implementation of a class, but rather requests an instance of an object from the IoC container.

In our controller code above the IoC container returns an instance of the HoursWorked model through the Facade pattern.  The controller is not directly dependent on the source code file of the HoursWorked model.  So in this particular case we're just lucky to be adhering to a SOLID principal!

23 June 2017

How to get Virtualbox VMs to talk to each other

I'm busy writing an Ansible script and want to test it locally before trying to deploy it anywhere.  The easiest way to try and make my local environment as close to my deployment environment was to set up a network of Virtualbox VMs.

The problem was that I've always configured my VM's to use NAT networking.  I ssh onto them by setting port forwarding and have never really needed them to have their own address.

The solution to this problem is pretty simple.  Just stop the machines and add a new network adapter of type "Host Only".  This adapter will handle communication between the guest and host machines.

The trick is that you need to configure the guest OS network interface too.

To do this SSH onto your VM and run "ip add" to list your network adapters.  If you're like me and started with NAT before adding "Host Only" as your second adapter the output should look something like this:

You need to identify the adapter that is your "Host Only" network.  You can do this by running "ip add" on your host machine and looking for the vboxnet0 network address (assuming you're using the defaults given to you by Virtualbox).

Now you need to edit /etc/network/interfaces and tell Linux (I'm using Ubuntu 16.04) to set up that interface.  Add lines like this snippet to your file:

Now your virtual machines will have an ip address (you can grab it with ifconfig) that you can set up in your Ansible inventory.

13 April 2017

Is PHP a good fit for an API server?

Image: Pixabay
Calling PHP a double-claw hammer is a bit of an in-joke in the PHP community.  A lot of people bemoan PHP as a language – it's fashionable to do so and it seems to be a way to look clever.  The joke came about from a blog post where somebody pointed out all of the problems with PHP (here's a rebuttal - https://blog.codinghorror.com/the-php-singularity/ )

Anyway, PHP is a warty language that sucks in academic circles but it doesn't matter because it's really good at web stuff, there are lots of people who know it (so it's cheap to hire), there are lots of libraries and frameworks (so it's cheap and fast to develop in).  The commercial world is willing to overlook the academic warts.

I'm busy helping to improve the performance of an API server.  As part of my effort I'm profiling calls to the endpoints.  I'm using Blackfire to generate call graphs and also logging the sql queries that the ORM is producing so that I can check the indexes and joins.

Here's a callgraph for a call to the endpoint where we are looking to run a paginated SQL query.  We're not applying any business logic or having any side-effects - all we're trying to do is query the database and return a JSON string to the frontend.

Blackfire call graph
That's a pretty substantial call graph for what sounds like a simple task right?  All I want to do is route the request to a controller, query the database, and send the results back.

Blackfire tells me that 172 different MySQL queries are being run.  The PHP code responsible is using the ORM to build up the joins and so on.  I suspect that the problem is that there is pagination being applied and the ORM is not able to optimize the queries it needs to do in order to paginate efficiently.

Okay, so what questions do I have?

Why are we not querying the database more directly?  I appreciate that developer productivity is a good reason to use ORM but is it a good reason in this case?  172 queries is an awful lot, especially when a lot of them are related to querying the schema so our ORM can run.

Why on earth does PHP have to spend so much time in disk I/O reading all of those source files when really what we need is request routing, a database query, and a response handler?  

Blackfire reports that 304kb of data was transmitted across the wire for this.  That seems like a lot of data for the five or six records that I'm returning to the frontend.

The call graph is frustrating – I'm lumbered with a whole lot of black box code and I have no control over the SQL that is being run.  How can I improve the performance of this transaction?

So is PHP the best tool for this job?

I have previously had intractable problems with PHP when it comes to memory management.  It's pretty complicated and it differs depending on the way that PHP is run but I do not have 100% confidence in PHP's garbage collection.  

Circular object references (which I encountered while using an ORM where a model referenced itself as a parent to form a hierarchy) cannot be completely collected by PHP.  PHP actually relies on the container the machine runs in to collect this memory.  

PHP is not built for being a long-running program.  It was never designed for this and it should never be used for this.  It was built to handle a request for a page and then terminate. 

The application is bootstrapped for every request.  How much overhead does this add?  Well there's a question that Blackfire is raising for me.  Take a look at the timeline for the transaction from before:
Blackfire call timeline
The timeline shows when a PHP function was called in relation to the time taken to generate the response. 

My controller function starts at around 750ms into the transaction.  The actual time is irrelevant as a benchmark, but the fact that the first time *my* code runs is half-way into the transaction is what is relevant.   

Until halfway into my application I've been waiting for PHP to bootstrap my application.  You could argue that this is because of the PHP framework I'm using, but actually it is the limitation of PHP not being able to maintain state that requires us to continuously bootstrap the application.

Bootstrapping our application might involve disk I/O (depending on OpCache).  It definitely involves network I/O because we have to connect to MySQL and wait for it to authenticate us.  I know that there are ways to improve this, like by not using a framework and by tuning OpCache to improve compile time.

I'm concerned about what will happen when the application has 50,000 concurrent users.  How much of a strain will it place on my database server to be constantly connecting (and authenticating)?

I think PHP is brilliant at web pages and not so good at being a long-running application that is capable of reusing resources.  I'm a huge PHP fan but as an architect I do not want it to be my only tool.  

I'm busy learning Elixir and the Phoenix framework (again with the frameworks!) response in microseconds (not milliseconds).  I don't think we should using PHP like the hammer we use for everything.