> For the complete documentation index, see [llms.txt](https://huy312100.gitbook.io/software-development/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://huy312100.gitbook.io/software-development/cloud-service/aws/certificate/clf-c02/aws-architecting-and-ecosystem/well-architected-framework/reliability.md).

# Reliability

• Ability of a system to recover from infrastructure or service disruptions, dynamically acquire computing resources to meet demand, and mitigate disruptions such as misconfigurations or transient network issues

• Design Principles

* Test recovery procedures - Use automation to simulate different failures or to recreate
* Automatically recover from failure - Anticipate and remediate failures before they occur
* Scale horizontally to increase aggregate system availability - Distribute requests across multiple, smaller resources to ensure that they don't share a common point of failure
* Stop guessing capacity - Maintain the optimal level to satisfy demand without over or under provisioning - Use Auto Scaling
* Manage change in automation - Use automation to make changes to infrastructure

<figure><img src="/files/FL0VmpsvpvlGalNeCnAx" alt=""><figcaption></figcaption></figure>
