Skip to main content

Command Palette

Search for a command to run...

Resilience

Updated
4 min read

What is Resilience?

image.png

Introduction

Resilience is one of the most important aspects of any system - more so with the cloud wherein we have a true Distributed System! And when we say a distributed system, the resilience parameters are affected by the evergreen paper - ***Fallacies of Distributed Computing! ***

This blog is a two part series -

  1. Fallacies of Distributed Computing

  2. How are these addressed in a micro-service architecture?

Before we get into how to go about building resilient systems, we need to understand the word - Resilience!

What is Resilience?

The following are the definitions that we come across when we lookup the word Resilience

The English Language

The capacity to recover quickly from difficulties; toughness

Computer Architecture

It is the ability to provide and maintain an acceptable level of service in the face of faults and challenges to normal operation

Micro-services

The ability of a microservice to function in spite of errors in the dependent services

In other words, failure of one service should not result in the failure of the system as a whole.

Why is Resilience Important?

Well, this can be summarised by this beautiful line -

The oak fought the wind and was broken, the willow bent when it must and survived.
Jordan, R. (1994). The Fires of Heaven. 2nd ed. U.S: Tor Books

What this effectively means is that any system that we build should not be hardened against failure rather we should be able to cope with the problems that arise and ensure that we are up and running. If we are hardened against failure, we are like the oak tree! There could always be a wind strong enough to break us!

Most importantly, we can always try to reduce failures but we would never be able to erode it - In the simplest case, a failure could be a burnt drive and in the worst case it could be a burnt data center. We don't have control over either of these! Consequently, this implies that we should be able to handle these "never" eliminated failures! This is where we have backup disks and Disaster Recovery (DR) centers (costing us the big bucks!)

When it comes to micro-service, the architecture by itself brings in a level of resilience! We have a highly decoupled system with cohesive services!

Enough has been spoken about micro-services that I don't need to elaborate on its usefulness (and at times pain!)

The most important aspect when it comes to Resilience - especially in Cloud and Distributed Architectures are to ensure that the assertions in the paper The Fallacies of Distributed Computing by Peter Deutsch and others (as part of erstwhile Sub Microsystems) are addressed!

Fallacies of Distributed Computing

These fallacies mostly arise due to the injection of network within the architecture or application. In order to achieve resilient applications, these fallacies must be addressed

The fallacies are -

  1. Network is reliable - A network is never reliable!

  2. Latency is zero - A latency can be reduced by optimising network but can never be zero....a network packet takes time to move from one point to another and this adds latency!

  3. Infinite bandwidth - There are physical and logical limitations on the bandwidth provided by the network.

  4. Network is secure - A chain is only as strong as its weakest link or so they say! If network were secure, we wouldn't have many data leaks!

  5. Network topology is constant - We always need to assume that the network topology is going to change. A router on the way can be go kaput.. a new firewall rule is introduced! Only change is constant!

  6. Transport cost is zero - Every network call has an associated cost!

  7. Network is homogeneous - In the real world and the digital world, we have our differences and we need to ensure that we handle it!

A micro-service architecture brings in complexities in this area as we now have a highly distributed system!

We'll look at how these fallacies are/can be addressed in micro-service architecture in the next blog!