20 October 2018

Streaming MPEG-TS to your Raspberry Pi


Raspberry Penguin
CC by openclipart.org
Usually people focus on getting camera streams off their Pi, but I wanted to set up a test environment that would let me play with some encoding and distribution software.

For the client in this scenario I'm running a Raspberry Pi 3 Model B which I have plugged into an ethernet cable on my home network. I'm using my Ubuntu 18.04 laptop to serve up the multicast stream.

If you're new to multicast the first piece of reading you should do is to read about the reserved IP ranges on Wikipedia. You can't broadcast on just any old address, but you do have a very wide selection to choose from. I decided to broadcast on udp://239.255.0.1 and port 5000, because why not?

I downloaded a video from https://www.stockio.com/free-videos/ and saved it into a folder. From there all you need to do is run ffmpeg on the server, thusly:

ffmpeg -re -i costa.mp4 -bsf:v h264_mp4toannexb -acodec copy -vcodec copy -f mpegts udp://239.255.0.1:5000
 
There are actually some parameters in there that ffmpeg gave me in its error output when I tried to play the file. If you use the same file as me then you need to include them, otherwise you might not need to.

Once ffmpeg had started streaming the file I quickly opened up VLC on the Pi and tuned to the network stream that my laptop was streaming on. Hurray! Everything is working, my UDP multicast is up and running, now I can play with some more advanced features.

Here is a screenshot from my Pi:

Captured from Raspberry Pi, video from https://www.stockio.com/free-videos/
In case you don't want to have to dash over to your other window to see the stream before the clip ends you should know that ffmpeg is able generate test patterns for you. If you're like me and don't want to spend too much time reading the ffmpeg manual, I found another site that has a good little snippet that you can just copy/paste.

29 September 2018

Using Azure Active directory as an OAuth2 provider for Django

Azure Active Directory is a great product and is invaluable in the enterprise space. In this article we'll be setting it up to provide tokens for the OAuth2 client credentials grant. This authorization flow is useful when you want to authorize server-to-server communication that might not be on behalf of a user.

This diagram, by Microsoft, shows the client credentials grant flow.
From Microsoft documentation
 The flow goes like this:
  1. The client sends a request to Azure AD for a token
  2. Azure AD verifies the attached authentication information and issues an access token
  3. The client calls the API with the access token. The API server is able to verify the validity of the token and therefore the identity of the client.
  4. The API responds to the client

Setting up Azure AD as an OAuth2 identity provider

The first step is to create applications in your AD for both your API server and the client. You can find step-by-step instructions on how to register the applications on the Microsoft page at Integrating applications with Azure Active Directory.

We need to generate a key for the client. From your AD blade in the Azure portal click on "app registrations" and then show all of them. Select the client application and then open it's settings. Create a new key and make sure you copy its value after saving.

Now let's add the permissions that our client will ask for. For this article we're just going to use a basic "access" scope. We're not going to add custom scopes to the server application, but this is easily possible by editing the manifest of the server application.

From the settings page of the client application click on "Required permissions". Click "Add" and in the search box type in the name of the server application that you registered in AD. Select it from the list that appears and choose "access" under the delegated permissions.

Now we have a fully fledged OAuth2 service set up and running, which is awesome. Let's call the Azure AD endpoint to get a token:

You can get the application id by opening up the registered application from your Azure AD portal blade.

Azure should respond with a JSON body that looks something like this:


The access_token value will be a JWT token. If you haven't used them before go check out jwt.io. This is the access token that you will present to your Django backend.

If you copy the access token into the decoder on jwt.io then you'll see that the payload looks like this:

Interlude

Let's take a breather and watch a music video. Because, really, 8 bits should be enough for anybody.

Okay, we're back and we're ready to get Django to authenticate the token.

Validating the token in Django

I'm not going to go into the details of setting up your Django project. There is already an excellent tutorial on their site. I'm going to assume that you have an app in your project to handle the API.

What we need to do is write some custom middleware for Django that will get the access token out of the "Authorization" HTTP header, validate the JWT, and check the signature against Azure's published signature. You can find out more about this on Microsoft's website, but this is actually an OAuth2 standard approach so just about any standards compliant documentation will do.

We'll be using standard libraries to decode the JWT, but in order to properly verify it we will need to have the key. We don't want to hardcode the key into our API because Azure rotates their keys on a regular basis. Luckily the OAuth2 standard provides a defined way for an OAuth2 provider to share this sort of information.

Azure publishes it's OAuth2 configuration file at https://login.microsoftonline.com/common/.well-known/openid-configuration. There is also a configuration file that is specific to your tenant at https://login.microsoftonline.com/{tenant}/.well-known/openid-configuration, but the general one is fine for us.

In that document there is a key called "jwks_url" whose value is a url. On that url you will find the keys that Azure is currently using. We need to retrieve that document and find the key that was used in our access token.

Right, so that's the plan. Decode the JWT to find what key was used to sign it and go fetch the key from Microsoft to verify the token. If anybody is able to trick our DNS and serve us content that we think is from Microsoft then they'll be able to pwn us, but otherwise as long as we trust the response from Microsoft we should be able to trust the JWT token the client gave us.

I placed this into settings.py because swapping Azure AD for any other standard implementation of OAuth2 should be possible by just swapping the configuration endpoint. I also set up Memcached while I was busy in the settings, so I ended up with this extra settings information:


Then I made a new file called checkaccesstoken.py and put it into the "middleware" directory of my API app, making sure that I registered it in settings.py

In the __call__ method of the middleware I call out to the functions that are going to check the token:

To get the token we use the Django request object:

We make sure that the token is a Bearer token by defining a class constant called AUTH_TYPE_PREFIX as having the value "Bearer".

Now that we have the access token we need to check it. You could use any JWT library, but I chose to use "pip install pyjwt cryptography". This JWT library will throw a DecodeError if the token is invalid, which we catch and respond with 401 to the user.

The public key is available over the network so we want to cache it to reduce the I/O time. Microsoft suggests that it is safe to cache for about a day.

Now we're done! We can send a bearer access token to our API and be authenticated. Note that we're not doing any authorization, all we're doing here is authenticating. We haven't tied this client application to any form of role in our application. That's for another day :)

References:
  1. I placed the whole middleware file in a Gist
  2. Microsoft article on client credentials flow
  3. Microsoft advice on validating tokens

06 September 2018

Standardizing infra - lxd is not Docker

LXD logo from Ubuntu blog
Selling a software package that is deployed to a customer data-centre can be challenging due to the diversity found in the physical infrastructure.

It is expensive to make (and test) adjustments to the software that allow it to run in a non-standard manner.

Maintaining a swarm of snowflakes is not a scalable business practice. More deployment variations means more documentation and more pairs of hands needed to manage them. Uniformity and automation are what keep our prices competitive.

Opposing the engineering team's desire for uniformity is the practical need to fit our solution into the customers data-centre and budget. We can't define a uniform physical infrastructure that all of our customers must adhere to. Well, I suppose we could, but we would only have a very small number of customers who are willing and able to fit their data-centre around us.

We are therefore on the horns of a dilemma. Specifying standard physical hardware is impractical and limits the sales team. Allowing the sales team to adjust the infrastructure means that we have to tailor our platform to each deployment.

Virtualization is the rational response to this dilemma. We can have a standard virtualized infrastructure and work with the sales team and the customer to make sure that the hardware is able to support this. This way we'll know that our software is always deployed on the same platform.

LXD is a hypervisor for containers and is an expansion of LXC, the Linux container technology behind Docker. It's described by Stéphane Graber, an Ubuntu project engineer,as a "daemon exporting an authenticated representational state transfer application programming interface (REST API) both locally over a unix socket and over the network using https. There are then two clients for this daemon, one is an OpenStack plugin, the other a standalone command line tool."

It's not a replacement for Docker, even though Linux kernel provided containers are at the root of both technologies. So what is LXD for? An LXD container is an alternative to a virtual machine running in a traditional hypervisor.

With virtualization (or LXD containerization), if your customer has limited rack-space or a limited budget you can still sell your software based on a standard platform. You can take what metal is available and install LXD containers to partitition the resources up into separated "machines".

If you like, you can use Docker to manage the deployment of your software into these LXD containers. Docker and LXD are complementary!

In practical terms you can use tools like Ansible to automate the provisioning of your LXD containers on the customers metal. You are able to define in code the infrastructure and platform that your software runs on. And that means your engineering team wins at automation and your sales team wins at fitting the software into the customer data-centre.

15 May 2018

Is blockchain "web scale"?

For something to be truly awesome, it must be "web scale" right? The rather excellent video just below shows how hype can blind us to the real values of a technology. It's quite a famous trope in some circles and I think is a useful parallel to the excitement about blockchain.

In it an enthusiastic user of a new technology repeats marketing hype despite having no understanding of the technical concerns involved. Watch the video and you'll hear every piece of marketing spin that surrounds NoSQL databases.



I usually get very excited about new technologies but I'm quite underwhelmed by blockchain. In the commercial context I see it as a great solution that really just needs to find a problem to solve.

Lots of companies are looking for the problem that the blockchain solution will solve. Facebook has a blockchain group that is dedicated to studying blockchain and how it can be used in Facebook.

They don't seem to have found a particularly novel use case for it and appear to be set on using it to launch a Facebook crypto-currency. This would let Facebook track your financial transactions (https://cheddar.com/videos/facebook-plans-to-create-its-own-cryptocurrency) which to me sounds like juicy information to sell to advertisers.

I'm convinced that when we finally find that problem blockchain will be a godsend, but right now outside of cryptocurrency I'm struggling to see how the cost and risk of blockchain is worthwhile in the commercial context.

A publicly shared database

Blockchain offers a distributed ledger system, which sounds like something we want right? Everybody can look at the list of transactions at any time, and verify that a particular transaction is genuine.

Using boring traditional data services you'd be forced to have a database and expose an API that lets authenticated users access it in ways that you directly control. How dull is that!

Like communism, decentralizing control of data is a great idea because it promotes equality and encourages distributed effort focused on the greater good. Capitalists and other realists will point out that it's not machine learning algorithms that will be the oil of the future, it's data. Facebook isn't worth billions because it has clever algorithms, but rather because it controls data about people. Data has value, are you sure you want to give it away?

Immutable

Broadly speaking, instead of your company having a private database that it controls access to, you're able to have a shared database that you can only control the contents of if you spend more on hardware than the rest of the world.

Wait, what? Your blockchain consultant didn't explain to you that anybody with a botnet is going to be able to rewrite your public ledger?

Well it's true. The way that blockchain works means that the record is only immutable if there isn't an actor who controls more than 51% of the processing power in the network.

If you have a private blockchain you're going to need to be certain that you can throw enough resources at it to prevent malicious actors from rewriting history. Or you could only allow trusted actors to use your blockchain, which sounds very much like a traditional access method.

Who will mine your blocks?

The most popular use for blockchain (by far) is to provide a ledger for currency, like Bitcoin and the other coins out there. When users mine the blockchain they are rewarded with coins which ultimately they hope to be able to convert into fiat currency or material goods/services at some point. There is a direct incentive to spend money on electricity to crunch the numbers to verify Bitcoin transactions.

If you build your own private blockchain who is going to mine the blocks? What is the value in them doing so and are you going to end up mining the blocks yourself just to get the transactions processed? How is this cheaper than a traditional database?

Given the problems with immutability it should be pretty clear that a private blockchain is a pretty risky way to avoid boring traditional data-sharing approaches like an API. And of course that's assuming that there will be people wanting to mine your blocks.

Digital smart contracts will replace lawyers

Blockchain lets you dispense with traditional signed contracts with suppliers and rather enter into digitally signed contracts. They're touted to replace lawyers (https://blockgeeks.com/guides/smart-contracts/).

Smart contracts cut out the middle-man when it comes to transferring goods and services. They're an alternative to conveyancing costs, credit-card processing fees, escrow costs, and so on. Essentially you place the asset into the trust of the computer program which will then decide where to transfer it depending on whatever conditions you program into it.

Instead of placing your house into escrow with a lawyer who is bound by professional conduct rules and has an insured lawyer's trust fund you can use a smart contract written by anybody. That's the democratizing power of blockchain!



Who can forget about the Ethereum fork which happened because of faulty code in a smart contract? I'm not nearly arrogant enough to assume that I can code tighter code than the guys and girls who created the DAO - are you willing to bet your house that you are?

I am horribly unfair

Maybe it's not fair to use a high value asset as an example for considering a smart chain. What about other use cases touted by blockchain evangelists?

Blockchain supposedly streamlines business decisions by eliminating back-and-forth decision making. A smart contract can simply have the business rules coded into it so instead of seeking approval from humans you can just rely on the contract to grant your request.

For example, if you need to make a business trip it can happen automatically just so long as the coded requirements are met. As long as the contract is coded to be aware of every factor that goes into the decision it can automatically approve your travel request.

That contract won't work for requesting a new monitor for your workstation though. You'll need a different contract for that. Or maybe you can extend the old contract and just add more rules to it?

Given the rate cards for blockchain developers very simple administrative decisions can end up taking a lot more coding effort (and money) than you really need to be dedicating to simple decisions. And what happens if a management decision needs judgment that hasn't been coded?

Banks are already using smart contracts

Indeed they are (https://www.barclayscorporate.com/insight-and-research/technology-and-digital-innovation/what-does-blockchain-do.html).

Barclays sees a future where identity theft is impossible because digital identity is immutable and publicly accessible for them to read and share with law enforcement. My financial records would be available to Barclays (and whoever else can read the blockchain) and they'd be able to make a decision about opening an account quicker than the time it currently takes (about an hour the last time I did it at the branch).

I wouldn't need to take my photo identity and proof of address documents to the bank, I would just need to show that I own the private key associated with the digital identity profile. This will prevent identity theft, according to Barclays, presumably because consumer computers are secure and digital keys can't be stolen.

Trade documents are given as a good example of how digital signing and identity can be accomplished with blockchain.

In blockchain how do you establish identity? The digital identity is established by ownership of a private key, but how do you link that to a physical entity? Surely you need some way to link the digital identity with the physical entity before you ship the goods that your smart contract says have been paid for.

How do I know that wallet address 0xaa8e0d3ded810704c4d8bc879a650aad50f36bc5 is actually Acme Inc trading out of London and not Pirates Ltd trading out of Antigua? Who is responsible for authenticating the digital identities in blockchain?

You can trust the blockchain (as long as the hash power is evenly distributed) but can you trust the digital identities on it?

Digital signing and identity can also be managed through public key cryptography where a recognized and trusted central authority signs keys after verifying the owners identity. This isn't a new arena and blockchain doesn't solve the problems that public key cryptography has.

There are already established digital signing solutions that don't rely on snake-oil. I signed my rent agreement digitally in the UK with my landlords who live in continental Europe. I hardly think that this space is a raison d'ĂȘtre for block chain.

Public, not private blockchains

It seems that my beef is with the impractical nature of using private blockchain where existing solutions are more secure and cheaper. So, what about public blockchains, like Ethereum?

In a traditional blockchain each node on the network needs a full copy of the entire chain in order to be able to verify transactions. Without this public scrutiny the blockchain is no longer secure.

The problem with this is that Ethereum can only process very limited amounts of transactions per seconds. Currently it runs at about 45 transactions per second, which isn't an awful lot when you share it out amongst your company, and all the people speculatively trading Ethereum.

Ethereum is considering a sharding approach where they will decentralize the chain slightly in order to improve transaction speed. A few nodes on the network will have more authority than others. These nodes will need to have explicit trust in each other, and obviously the network will need to follow suit.

As a company do you want to commit to this level of trust? Who are the actors controlling these nodes? Who will they be in three years time? What countries laws are they bound by?

Data you put into the blockchain will be shared with everybody, forever. Governments don't like competition when it comes to spying on people and are passing increasingly strict privacy laws - how will you plan for compliance when you don't control your data?

Blockchain is web-scale!