add new post

2025-06-12 01:45:15 +02:00 · 2025-06-12 01:45:15 +02:00 · ae6163c100
commit ae6163c100
parent 4aab9f1242
3 changed files with 144 additions and 2 deletions
--- a/.gitignore
+++ b/.gitignore
@ -1,2 +1,3 @@
 /public
 /resources
 /.compress_state
--- a/build.sh
+++ b/build.sh
@ -1,4 +1,27 @@
 #!/usr/bin/env bash
-rm -rf public
+set -euo pipefail
-hugo  build --minify
+
 hugo build --minify
 incremental-compress \
  -dir public \
  -statedir .compress_state \
  -types html,css,js,json,xml,ico,svg,md,otf,woff,ttf,woff2,webmanifest \
  -zstd=false \
  -verbose
 rsync \
  --perms \
  --times \
  --update \
  --partial \
  --progress \
  --recursive \
  --checksum \
  --compress \
  --links \
  --delete-after \
  --owner \
  --usermap "*:bcarlin_net" \
  --group \
  --groupmap "*:bcarlin_net" \
  public/ root@192.168.1.25:/home/bcarlin_net/www
--- a/content/blog/007-prepare-for-the-next-internet-outage.md
+++ b/content/blog/007-prepare-for-the-next-internet-outage.md
@ -0,0 +1,118 @@
 ---
 title: 'Prepare for the Next Internet Outage'
 slug: 'prepare-for-the-next-internet-outage'
 date: '2025-06-14T04:05:48+02:00'
 tags: [architecture, cloud]
 summary: >
  A reflection on recent internet outages and my takeways to build more
  resilient web services.
 ---
 Last Thursday, [the Internet broke](https://mashable.com/article/google-down-cloudflare-twitch-character-ai-internet-outage).
 Again. Yes, the media turned a two-hour outage into a baitclick-friendly global
 crisis.
 What made this incident significant was not just the disruption of Google Cloud
 but the hundreds of websites and applications that went down at the same time.
 This included including some major ones like Cloudflare, who uses GCP for some
 of its services. Cloudflare being a widespread CDN, cache and proxy, it created
 a domino effect and broke, in turn, countless websites.
 It reminds us of the fragile interconnectedness of our digital world. I don’t
 want to point fingers, but rather learn lessons from this incident. This wasn't
 just a random hiccup; it highlighted fundamental principles that, in the age of
 "everything as a service," we might have inadvertently overlooked.
 Here are my key takeaways.
 ## Do Not Put All Your Eggs in the Same Vendor Basket
 The cloud means infinite scaling, infinite storage, infinite compute power,
 infinite flexibility. It is built on the promise of reducing costs (which can be
 true when used correctly). However, this hides an overlooked truth and its
 biggest risk: single vendor dependency. The recent outage showed how a single
 vendor outage, or even a component within their infrastructure, can have a
 cascading effect on most services. 
 Now, let’s add to the mix that
 [AWS, Azure and Google Cloud Platform have a combined market share of 63% in value](https://www.crn.com/news/cloud/2025/cloud-market-share-q1-2025-aws-dips-microsoft-and-google-show-growth?page=1&itc=refresh).
 Even if your business do not use these infrastructure providers directly,
 chances are that you use vendors who relies on them, or on vendors who might
 rely on them. Yes, chances are that your SaaS application is dependent on at
 least one of these vendors.  
 **What you can do**:
 * *Map Your Dependencies*: Do you truly know all the services your core product
  relies on, directly and indirectly? Which IaaS, PaaS, APIs, CDN, and so on are
  you using? What are they, in turn, using? Do you rely on NpmJS to build your
  product? Is your app deployed with a Github Action ? The more you know, the
  more you’re prepared.
 * *Vendor Due Diligence*: Uptime guarantees (3? 4? 5 nines?) are just marketing.
  Take it as such. What is your vendor’s architecture? its Continuity plan? Its
  transparency on incidents? those are far more important criteria.
 * *Consider Multi-Cloud Strategies*: You would not put all your servers in the
  same datacenter? Then do not put all your infrastructure in the same IaaS
  provider! (If you would, you should do something about it!)
 ## Own Your Data, Own Your Business
 The cloud and API world we live in is great. It allows us to build fast, iterate
 quickly, test things and improve our solutions. You need authentication, use
 Subabase or Auth0. Online payment? There is Stripe or Paypal. Transactional
 emails? Sendgrid and MailChimp. Search? Algolia. The list can be long, but now,
 you can work on creating value.
 Yet, as the outage showed, if these services become unavailable, your users
 might be locked out, or your application might cease to function, regardless of
 your own infrastructure's health. This can lead to a significant loss of control
 over core business operations and data access. Third-Party services ARE single
 point of failures!
 **What you can do**:
 * *Fallback Mechanisms for Core Services*: If a service becomes unavailable, how
  do you replace it? Can you develop an alternative to fall back on?
 * *Robust Data Mirroring*: Ensure you have regular, accessible backups of your
  critical data, even if it primarily resides with a third-party. Can you
  restore it quickly to a different environment if needed?
 ## Build for Resilience
 Resilience has always been a consequence of redundancy. You should always have a
 backup system that can assure the service while your main system is down.
 But this is not enough to just have redundancy. Your application should also be
 designed to be fault tolerant and use whole or parts of the backup system when
 needed. At least, it should ensure that the impact for your users is the least
 possible: the impossibility to send an email should never block your whole
 application.
 **What you can do**:
 * *Distributed Architectures*: Design your systems with principles like
  microservices. Deploy your services on several IaaS providers. Replicate
  critical data across several providers. The goal is to limit the impact of any
  single component failure.
 * *Self-Healing Systems*: Implement mechanisms that can automatically detect
  failures, reroute traffic, or restart services without human intervention. The
  quicker your system can react, the less impact an outage will have.
 * *Design for failure*: Don't wait for an external event to expose your
  weaknesses. It is too late. Add some automated failure tests to your CI
  pipeline: what if the client has a 5 second latency with your server? What if
  the database is unavailable? What if a payment cannot be processed right away?
  What is the user *experience* like when something goes wrong? Those issues
  WILL happen.
 ## Conclusion
 The next outage will come. That’s for sure. Maybe not as big, but there will be
 some that will affect your business.
 Be prepared:
 * Know your infrastructure, your vendors, their vendors, etc.
 * Asses risks on a regular basis. Your app evolves, your vendors too. What is
  true at one moment is not at the next.
 * Plan for the worst case. Incidents will happen. Your job is to make it so that
  the user experience is not impacted.