Compare commits
No commits in common. "d480ecb5cd0506e086da6409912eb37dad3d7964" and "4aab9f1242cb39296e9d157b91b142bee76a262a" have entirely different histories.
d480ecb5cd
...
4aab9f1242
7 changed files with 5 additions and 151 deletions
1
.gitignore
vendored
1
.gitignore
vendored
|
@ -1,3 +1,2 @@
|
||||||
/public
|
/public
|
||||||
/resources
|
/resources
|
||||||
/.compress_state
|
|
||||||
|
|
2
TODO.md
2
TODO.md
|
@ -1,2 +1,4 @@
|
||||||
- Add permalinks to sections
|
- Add permalinks to sections
|
||||||
|
- Add summmary to articles
|
||||||
|
- Add open graph metadata
|
||||||
- Add json+ld metadata
|
- Add json+ld metadata
|
||||||
|
|
27
build.sh
27
build.sh
|
@ -1,27 +1,4 @@
|
||||||
#!/usr/bin/env bash
|
#!/usr/bin/env bash
|
||||||
|
|
||||||
set -euo pipefail
|
rm -rf public
|
||||||
|
hugo build --minify
|
||||||
hugo build --minify
|
|
||||||
incremental-compress \
|
|
||||||
-dir public \
|
|
||||||
-statedir .compress_state \
|
|
||||||
-types html,css,js,json,xml,ico,svg,md,otf,woff,ttf,woff2,webmanifest \
|
|
||||||
-zstd=false \
|
|
||||||
-verbose
|
|
||||||
rsync \
|
|
||||||
--perms \
|
|
||||||
--times \
|
|
||||||
--update \
|
|
||||||
--partial \
|
|
||||||
--progress \
|
|
||||||
--recursive \
|
|
||||||
--checksum \
|
|
||||||
--compress \
|
|
||||||
--links \
|
|
||||||
--delete-after \
|
|
||||||
--owner \
|
|
||||||
--usermap "*:bcarlin_net" \
|
|
||||||
--group \
|
|
||||||
--groupmap "*:bcarlin_net" \
|
|
||||||
public/ root@192.168.1.25:/home/bcarlin_net/www
|
|
||||||
|
|
|
@ -1,118 +0,0 @@
|
||||||
---
|
|
||||||
title: 'Prepare for the Next Internet Outage'
|
|
||||||
slug: 'prepare-for-the-next-internet-outage'
|
|
||||||
date: '2025-06-14T04:05:48+02:00'
|
|
||||||
tags: [architecture, cloud]
|
|
||||||
summary: >
|
|
||||||
A reflection on recent internet outages and my takeways to build more
|
|
||||||
resilient web services.
|
|
||||||
---
|
|
||||||
|
|
||||||
Last Thursday, [the Internet broke](https://mashable.com/article/google-down-cloudflare-twitch-character-ai-internet-outage).
|
|
||||||
Again. Yes, the media turned a two-hour outage into a baitclick-friendly global
|
|
||||||
crisis.
|
|
||||||
|
|
||||||
What made this incident significant was not just the disruption of Google Cloud
|
|
||||||
but the hundreds of websites and applications that went down at the same time.
|
|
||||||
This included including some major ones like Cloudflare, who uses GCP for some
|
|
||||||
of its services. Cloudflare being a widespread CDN, cache and proxy, it created
|
|
||||||
a domino effect and broke, in turn, countless websites.
|
|
||||||
|
|
||||||
It reminds us of the fragile interconnectedness of our digital world. I don’t
|
|
||||||
want to point fingers, but rather learn lessons from this incident. This wasn't
|
|
||||||
just a random hiccup; it highlighted fundamental principles that, in the age of
|
|
||||||
"everything as a service," we might have inadvertently overlooked.
|
|
||||||
|
|
||||||
Here are my key takeaways.
|
|
||||||
|
|
||||||
## Do Not Put All Your Eggs in the Same Vendor Basket
|
|
||||||
|
|
||||||
The cloud means infinite scaling, infinite storage, infinite compute power,
|
|
||||||
infinite flexibility. It is built on the promise of reducing costs (which can be
|
|
||||||
true when used correctly). However, this hides an overlooked truth and its
|
|
||||||
biggest risk: single vendor dependency. The recent outage showed how a single
|
|
||||||
vendor outage, or even a component within their infrastructure, can have a
|
|
||||||
cascading effect on most services.
|
|
||||||
|
|
||||||
Now, let’s add to the mix that
|
|
||||||
[AWS, Azure and Google Cloud Platform have a combined market share of 63% in value](https://www.crn.com/news/cloud/2025/cloud-market-share-q1-2025-aws-dips-microsoft-and-google-show-growth?page=1&itc=refresh).
|
|
||||||
Even if your business do not use these infrastructure providers directly,
|
|
||||||
chances are that you use vendors who relies on them, or on vendors who might
|
|
||||||
rely on them. Yes, chances are that your SaaS application is dependent on at
|
|
||||||
least one of these vendors.
|
|
||||||
|
|
||||||
**What you can do**:
|
|
||||||
|
|
||||||
* *Map Your Dependencies*: Do you truly know all the services your core product
|
|
||||||
relies on, directly and indirectly? Which IaaS, PaaS, APIs, CDN, and so on are
|
|
||||||
you using? What are they, in turn, using? Do you rely on NpmJS to build your
|
|
||||||
product? Is your app deployed with a Github Action ? The more you know, the
|
|
||||||
more you’re prepared.
|
|
||||||
* *Vendor Due Diligence*: Uptime guarantees (3? 4? 5 nines?) are just marketing.
|
|
||||||
Take it as such. What is your vendor’s architecture? its Continuity plan? Its
|
|
||||||
transparency on incidents? those are far more important criteria.
|
|
||||||
* *Consider Multi-Cloud Strategies*: You would not put all your servers in the
|
|
||||||
same datacenter? Then do not put all your infrastructure in the same IaaS
|
|
||||||
provider! (If you would, you should do something about it!)
|
|
||||||
|
|
||||||
## Own Your Data, Own Your Business
|
|
||||||
|
|
||||||
The cloud and API world we live in is great. It allows us to build fast, iterate
|
|
||||||
quickly, test things and improve our solutions. You need authentication, use
|
|
||||||
Subabase or Auth0. Online payment? There is Stripe or Paypal. Transactional
|
|
||||||
emails? Sendgrid and MailChimp. Search? Algolia. The list can be long, but now,
|
|
||||||
you can work on creating value.
|
|
||||||
|
|
||||||
Yet, as the outage showed, if these services become unavailable, your users
|
|
||||||
might be locked out, or your application might cease to function, regardless of
|
|
||||||
your own infrastructure's health. This can lead to a significant loss of control
|
|
||||||
over core business operations and data access. Third-Party services ARE single
|
|
||||||
point of failures!
|
|
||||||
|
|
||||||
**What you can do**:
|
|
||||||
|
|
||||||
* *Fallback Mechanisms for Core Services*: If a service becomes unavailable, how
|
|
||||||
do you replace it? Can you develop an alternative to fall back on?
|
|
||||||
* *Robust Data Mirroring*: Ensure you have regular, accessible backups of your
|
|
||||||
critical data, even if it primarily resides with a third-party. Can you
|
|
||||||
restore it quickly to a different environment if needed?
|
|
||||||
|
|
||||||
## Build for Resilience
|
|
||||||
|
|
||||||
Resilience has always been a consequence of redundancy. You should always have a
|
|
||||||
backup system that can assure the service while your main system is down.
|
|
||||||
|
|
||||||
But this is not enough to just have redundancy. Your application should also be
|
|
||||||
designed to be fault tolerant and use whole or parts of the backup system when
|
|
||||||
needed. At least, it should ensure that the impact for your users is the least
|
|
||||||
possible: the impossibility to send an email should never block your whole
|
|
||||||
application.
|
|
||||||
|
|
||||||
**What you can do**:
|
|
||||||
|
|
||||||
* *Distributed Architectures*: Design your systems with principles like
|
|
||||||
microservices. Deploy your services on several IaaS providers. Replicate
|
|
||||||
critical data across several providers. The goal is to limit the impact of any
|
|
||||||
single component failure.
|
|
||||||
* *Self-Healing Systems*: Implement mechanisms that can automatically detect
|
|
||||||
failures, reroute traffic, or restart services without human intervention. The
|
|
||||||
quicker your system can react, the less impact an outage will have.
|
|
||||||
* *Design for failure*: Don't wait for an external event to expose your
|
|
||||||
weaknesses. It is too late. Add some automated failure tests to your CI
|
|
||||||
pipeline: what if the client has a 5 second latency with your server? What if
|
|
||||||
the database is unavailable? What if a payment cannot be processed right away?
|
|
||||||
What is the user *experience* like when something goes wrong? Those issues
|
|
||||||
WILL happen.
|
|
||||||
|
|
||||||
## Conclusion
|
|
||||||
|
|
||||||
The next outage will come. That’s for sure. Maybe not as big, but there will be
|
|
||||||
some that will affect your business.
|
|
||||||
|
|
||||||
Be prepared:
|
|
||||||
|
|
||||||
* Know your infrastructure, your vendors, their vendors, etc.
|
|
||||||
* Asses risks on a regular basis. Your app evolves, your vendors too. What is
|
|
||||||
true at one moment is not at the next.
|
|
||||||
* Plan for the worst case. Incidents will happen. Your job is to make it so that
|
|
||||||
the user experience is not impacted.
|
|
|
@ -1,5 +1,5 @@
|
||||||
baseURL: 'https://bcarlin.net/'
|
baseURL: 'https://bcarlin.net/'
|
||||||
languageCode: 'en_US'
|
languageCode: 'fr-FR'
|
||||||
title: 'Bruno Carlin'
|
title: 'Bruno Carlin'
|
||||||
theme: ['bcarlin']
|
theme: ['bcarlin']
|
||||||
uglyUrls: true
|
uglyUrls: true
|
||||||
|
|
|
@ -20,10 +20,5 @@
|
||||||
<link rel="manifest" href="/site.webmanifest" />
|
<link rel="manifest" href="/site.webmanifest" />
|
||||||
|
|
||||||
<title>{{ if .IsHome }}{{ site.Title }}{{ else }}{{ printf "%s | %s" .Title site.Title }}{{ end }}</title>
|
<title>{{ if .IsHome }}{{ site.Title }}{{ else }}{{ printf "%s | %s" .Title site.Title }}{{ end }}</title>
|
||||||
<meta name="description" content="{{ .Description }}">
|
|
||||||
|
|
||||||
{{ template "_internal/opengraph.html" . }}
|
|
||||||
{{ template "_internal/schema.html" . }}
|
|
||||||
|
|
||||||
{{ partialCached "head/css.html" . }}
|
{{ partialCached "head/css.html" . }}
|
||||||
{{ partialCached "head/js.html" . }}
|
{{ partialCached "head/js.html" . }}
|
||||||
|
|
|
@ -14,4 +14,3 @@
|
||||||
{{- end }}
|
{{- end }}
|
||||||
{{- end }}
|
{{- end }}
|
||||||
{{- end }}
|
{{- end }}
|
||||||
<script defer data-domain="bcarlin.net" src="//stats.bcarlin.net/js/script.js"></script>
|
|
||||||
|
|
Loading…
Add table
Add a link
Reference in a new issue