Realist, not conformist analysis of the latest financial, business and political news

That Fastly Outage

From our Swindon Correspondent:

From CapX

Outages like that caused this week by fastly are becoming an increasingly common online phenomenon – one notorious incident was caused when a Pakistani internet provider tried to block one YouTube video within the country, but instead blocked the site for hours to almost half the world.
No, they aren’t becoming increasingly common. What’s happening is that people notice them because the internet is so much bigger than it was 20 years ago.

Last year almost every site operated by Google was taken offline after a fastly-type error. Each serves as a warning that the internet has become critical infrastructure – but has none of the protections given to most other infrastructure.

None of this is critical infrastructure. Not Google, YouTube, BBC, New York Times or the FT. The emergency services, and various comms about very serious government stuff, probably running on dedicated networks and who knows what, are critical infrastructure.

A large reason why you get all these services so cheaply and with so much functionality constantly added is that they aren’t critical infrastructure. Being 99.9% reliable is fine. No-one dies from lack of comical cat videos or tweets from Beyonce. So, risks can be taken. If Twitter had to be 99.9999% reliable, the cost would rocket up. You wouldn’t get rapid changes.

The reality is the internet is held together with little more than spit, glue and hope. The protocols on which it runs were designed decades ago for a small network largely used by academic institutions and the occasional hobbyist. Efforts to rebuild or refashion those protocols for the modern era tend to move painfully slowly, a task we could roughly liken to trying to rebuild a spaceship while it’s actually travelling through the galaxy, with the added complication of needing every person onboard to approve every individual change.
This is where the author is really talking out of his arse, because none of these outages are because of the protocols of the internet. They’re about code. The main protocols for the web, which are TCP/IP v4, HTTP, HTTPS and DNS are all pretty solid.
And no, upgrading protocols doesn’t require universal approval. Internet is just about a pair of computers talking a shared language. Our modern world is billions of those 1 to 1 conversations. And a computer can talk one language with one machine, and a different language with another machine. So some users can move onto the new hotness, and others remain on the old and busted. And that’s fine. At some point, the others might catch up at which point, you switch off the old one.
It is no surprise then, that progress is slow – and so people looking to offer good, reliable web services look to services like fastly instead, re-centralising the internet and creating a few major points of failure.
Yes, it creates a few points of failure, but it’s also the case that you gain from someone being a specialist who can dedicate themselves to doing this one thing really well. Not only is it going to be cheaper, because software scales really well, but it’s going to generally be more robust and reliable. Running a load of mirror servers around the world, keeping them all patched and protected and monitoring them takes a lot of effort.
And the stakes are far higher than the occasional hour or so with minimal internet access. There are so many vulnerabilities in the architecture of the internet that a malicious actor looking to exploit them would have no shortage of targets. 

A country wanting to launch a military operation against a neighbour could, for example, launch massive cyberattacks to take out much of the internet and cause a large-scale distraction. Others might take down the internet for fun, or for profit.

No, they can’t, because “the internet” doesn’t work like that. You can’t “take down the internet” because it’s a bunch of connected computers. It’s decentralised, by design.

The web going offline for an hour here or there is not, on its own, the biggest problem we have in the world this year. But we should be taking it as a warning that there are major problems that need fixing before they lead to disaster. The risk is that we’ll keep ignoring these, too.
There aren’t major problems with the internet itself. There might be risks at Amazon or eBay, but who are you going to get to address those? The sort of clowns in government ministries that fucked up the track and trace app and have inexcusable security holes, or the people at Amazon or eBay that have near 100% uptime already?
5 3 votes
Article Rating
Total
0
Shares
Subscribe
Notify of
guest
11 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Boganboy
Boganboy
13 days ago

When I look at my white hair in the mirror, I can see an excuse (other than laziness) for my lack of knowledge about the internet. But I haven’t noticed any problems.

So since I mainly use it for silly comments like this one, I’d agree that we might as well leave well-enough alone. Of course the people trying to contact me through my mobile phone might disagree. But the problem there is that I usually leave it turned off.

Snarkus
Snarkus
13 days ago

I dont think change is even that slow. Moving to UDP based protocols for some services is underway, quietly and has been tested for some years. Microsoft and FB for instance. IIRCC QIC protocol. Much of the change is invisible to end user. What has changed is failure of the manglement classes to understand implications of technology is now much more visible. The foolishness of putting SCADA based systems onto a public network and allowing email to inject code into internal systems is now obvious and mostly, ignored. This is something the technical types pointed out after the Morris Worm.… Read more »

Michael van der Riet
Michael van der Riet
13 days ago
Reply to  Snarkus

At last someone who sounds like he knows what he’s talking about. Do you have ten minutes to explain to the average lay moron like me why Apple is largely immune to malware, while MS and Android only have to look at it and they fall over?

Bloke on M4
Bloke on M4
12 days ago

Because Apple have great big levels of gatekeeping about putting things on the store, while Google don’t. Google are more like a liberal society, where we assume innocence until guilty. Both of these have their advantages and disadvantages. The safety of the app store comes at a price. Not only in $$$s for all that security, but also that Apple is more restrictive on what you can legitimately do with your iPhone. You want to run a browser engine on your phone that isn’t Apple’s? You can’t. Chrome, Edge, Firefox all run Safari’s engine underneath. You want to sell people… Read more »

Quentin Vole
Quentin Vole
12 days ago
Reply to  Snarkus

Quite so, Snarkus. I haven’t noticed any significant reduction in the rate that new RFCs are published during the 30-odd years I’ve been wrangling the Internets.

Bruce Schneier (pbuh) used to say that cyber-attacks were the equivalent of an invading army landing on the beach, fighting their way ashore, and then pushing into the front of the queue at the post office. He doesn’t say that now, but that’s because of the stupidity of putting critical stuff where it doesn’t belong.

Michael van der Riet
Michael van der Riet
13 days ago

<blockquote>A country wanting to launch a military operation against a neighbour could, for example, launch massive cyberattacks to take out much of the internet and cause a large-scale distraction. Others might take down the internet for fun, or for profit.</blockquote>

I can see what he’s getting at. An hour without cat memes might very well destroy the civlised world.

The Mole
The Mole
13 days ago

The structure of the internet is fine (not perfect but fine) even if it was run as critical infrastructure I doubt that it would be in any way significantly different. Fastly is a well run business, they have strong incentives to produce reliable software. From all accounts it was a simple human mistake generating a bug that slipped through the test process. Software is complex, you can’t test or reason over all combination of events and mistakes happen. The only problem is when bureaucrats and managers don’t understand that fact and try to deploy vital services purely on the web… Read more »

Arthur the Cat
Arthur the Cat
13 days ago

“Yes, it creates a few points of failure, but it’s also the case that you gain from someone being a specialist who can dedicate themselves to doing this one thing really well.”

You’ve just made me realise that the Internet is Adam Smith’s pin factory on a global scale.

Spike
Spike
12 days ago
Reply to  Arthur the Cat

Not to mention that we learn from failures!

It’s not specific to the Internet or to technology that someone regards a collection of diverse individual transactions as a “system” and reacts to a failure with a systemic solution, the final word, to be adopted by all. Such an approach gets us an Internet much less robust than the current one; and one where Boganboy’s surfing will result in sales pitches on his cellphone.

Bloke on M4
Bloke on M4
12 days ago
Reply to  Arthur the Cat

“You’ve just made me realise that the Internet is Adam Smith’s pin factory on a global scale.” More generally, software is a lot about people making specific things and then someone turns that into a generic version. Like developers hand-crafted the 3d graphics stuff in games like Doom and Unreal. Then the people who made Unreal realised this would be generically useful and created the Unreal Engine that they sell to other game developers to use. There’s things I once coded a decade ago where I wouldn’t today because someone has a service and I just send off a request… Read more »

John B
John B
11 days ago

A country wanting to launch a military operation against a neighbour could, for example, launch massive cyberattacks to take out much of the internet and cause a large-scale distraction.’

Much simpler and more effective to drop tinfoil onto HT lines and short out the electric grid.

11
0
Would love your thoughts, please comment.x
()
x