Skip to main content

Is PHP a good fit for an API server?

Image: Pixabay
Calling PHP a double-claw hammer is a bit of an in-joke in the PHP community.  A lot of people bemoan PHP as a language – it's fashionable to do so and it seems to be a way to look clever.  The joke came about from a blog post where somebody pointed out all of the problems with PHP (here's a rebuttal - https://blog.codinghorror.com/the-php-singularity/ )

Anyway, PHP is a warty language that sucks in academic circles but it doesn't matter because it's really good at web stuff, there are lots of people who know it (so it's cheap to hire), there are lots of libraries and frameworks (so it's cheap and fast to develop in).  The commercial world is willing to overlook the academic warts.

I'm busy helping to improve the performance of an API server.  As part of my effort I'm profiling calls to the endpoints.  I'm using Blackfire to generate call graphs and also logging the sql queries that the ORM is producing so that I can check the indexes and joins.

Here's a callgraph for a call to the endpoint where we are looking to run a paginated SQL query.  We're not applying any business logic or having any side-effects - all we're trying to do is query the database and return a JSON string to the frontend.


Blackfire call graph
That's a pretty substantial call graph for what sounds like a simple task right?  All I want to do is route the request to a controller, query the database, and send the results back.

Blackfire tells me that 172 different MySQL queries are being run.  The PHP code responsible is using the ORM to build up the joins and so on.  I suspect that the problem is that there is pagination being applied and the ORM is not able to optimize the queries it needs to do in order to paginate efficiently.

Okay, so what questions do I have?

Why are we not querying the database more directly?  I appreciate that developer productivity is a good reason to use ORM but is it a good reason in this case?  172 queries is an awful lot, especially when a lot of them are related to querying the schema so our ORM can run.

Why on earth does PHP have to spend so much time in disk I/O reading all of those source files when really what we need is request routing, a database query, and a response handler?  

Blackfire reports that 304kb of data was transmitted across the wire for this.  That seems like a lot of data for the five or six records that I'm returning to the frontend.

The call graph is frustrating – I'm lumbered with a whole lot of black box code and I have no control over the SQL that is being run.  How can I improve the performance of this transaction?

So is PHP the best tool for this job?

I have previously had intractable problems with PHP when it comes to memory management.  It's pretty complicated and it differs depending on the way that PHP is run but I do not have 100% confidence in PHP's garbage collection.  

Circular object references (which I encountered while using an ORM where a model referenced itself as a parent to form a hierarchy) cannot be completely collected by PHP.  PHP actually relies on the container the machine runs in to collect this memory.  


PHP is not built for being a long-running program.  It was never designed for this and it should never be used for this.  It was built to handle a request for a page and then terminate. 

The application is bootstrapped for every request.  How much overhead does this add?  Well there's a question that Blackfire is raising for me.  Take a look at the timeline for the transaction from before:
Blackfire call timeline
The timeline shows when a PHP function was called in relation to the time taken to generate the response. 

My controller function starts at around 750ms into the transaction.  The actual time is irrelevant as a benchmark, but the fact that the first time *my* code runs is half-way into the transaction is what is relevant.   

Until halfway into my application I've been waiting for PHP to bootstrap my application.  You could argue that this is because of the PHP framework I'm using, but actually it is the limitation of PHP not being able to maintain state that requires us to continuously bootstrap the application.

Bootstrapping our application might involve disk I/O (depending on OpCache).  It definitely involves network I/O because we have to connect to MySQL and wait for it to authenticate us.  I know that there are ways to improve this, like by not using a framework and by tuning OpCache to improve compile time.

I'm concerned about what will happen when the application has 50,000 concurrent users.  How much of a strain will it place on my database server to be constantly connecting (and authenticating)?

I think PHP is brilliant at web pages and not so good at being a long-running application that is capable of reusing resources.  I'm a huge PHP fan but as an architect I do not want it to be my only tool.  

I'm busy learning Elixir and the Phoenix framework (again with the frameworks!) response in microseconds (not milliseconds).  I don't think we should using PHP like the hammer we use for everything.

Comments

Popular posts from this blog

Separating business logic from persistence layer in Laravel

There are several reasons to separate business logic from your persistence layer.  Perhaps the biggest advantage is that the parts of your application which are unique are not coupled to how data are persisted.  This makes the code easier to port and maintain. I'm going to use Doctrine to replace the Eloquent ORM in Laravel.  A thorough comparison of the patterns is available  here . By using Doctrine I am also hoping to mitigate the risk of a major version upgrade on the underlying framework.  It can be expected for the ORM to change between major versions of a framework and upgrading to a new release can be quite costly. Another advantage to this approach is to limit the access that objects have to the database.  Unless a developer is aware of the business rules in place on an Eloquent model there is a chance they will mistakenly ignore them by calling the ActiveRecord save method directly. I'm not implementing the repository pattern in all its glory in this demo.  

Fixing puppet "Exiting; no certificate found and waitforcert is disabled" error

While debugging and setting up Puppet I am still running the agent and master from CLI in --no-daemonize mode.  I kept getting an error on my agent - ""Exiting; no certificate found and waitforcert is disabled". The fix was quite simple and a little embarrassing.  Firstly I forgot to run my puppet master with root privileges which meant that it was unable to write incoming certificate requests to disk.  That's the embarrassing part and after I looked at my shell prompt and noticed this issue fixing it was quite simple. Firstly I got the puppet ssl path by running the command   puppet agent --configprint ssldir Then I removed that directory so that my agent no longer had any certificates or requests. On my master side I cleaned the old certificate by running  puppet cert clean --all  (this would remove all my agent certificates but for now I have just the one so its quicker than tagging it). I started my agent up with the command  puppet agent --test   whi

Redirecting non-www urls to www and http to https in Nginx web server

Image: Pixabay Although I'm currently playing with Elixir and its HTTP servers like Cowboy at the moment Nginx is still my go-to server for production PHP. If you haven't already swapped your web-server from Apache then you really should consider installing Nginx on a test server and running some stress tests on it.  I wrote about stress testing in my book on scaling PHP . Redirecting non-www traffic to www in nginx is best accomplished by using the "return" verb.  You could use a rewrite but the Nginx manual suggests that a return is better in the section on " Taxing Rewrites ". Server blocks are cheap in Nginx and I find it's simplest to have two redirects for the person who arrives on the non-secure non-canonical form of my link.  I wouldn't expect many people to reach this link because obviously every link that I create will be properly formatted so being redirected twice will only affect a small minority of people. Anyway, here's