Availability refers to the property of readiness of a software system to carry out its operations when the need arises.
Availability of a system is closely related to its reliability. The more reliable a system is, the more available it is.
Another factor which modifies availability is the ability of a system to recover from faults. A system may be very reliable, but if the system is unable to recover either from complete or partial failures of its subsystems, then it may not be able to guarantee availability. This aspect is called recovery.
The availability of a system can be defined as follows:
"Availability of a system is the degree to which the system is in a fully operable state to carry out its functionality when it is called or invoked at random."
Mathematically, this can be expressed as follows:
Availability = MTBF/(MTBF + MTTR)
Take a look at the following terms used in the preceding formula:
- MTBF: Mean time between failures
- MTTR: Mean time to repair
This is often called the mission capable rate of a system.
Techniques for Availability are closely tied to recovery techniques. This is due to the fact that a system can never be 100% available. Instead, one needs to plan for faults and strategies to recover from faults, which directly determines the availability. These techniques can be classified as follows:
- Fault detection: The ability to detect faults and take action helps to avert situations where a system or parts of a system become unavailable completely. Fault detection typically involves steps such as monitoring, heartbeat, and ping/echo messages, which are sent to the nodes in a system, and the response measured to calculate if the nodes are alive, dead, or are in the process of failing.
- Fault recovery: Once a fault is detected, the next step is to prepare the system to recover from the fault and bring it to a state where the system can be considered available. Typical tactics used here include Hot/Warm Spares (Active/Passive redundancy), Rollback, Graceful Degradation, and Retry.
- Fault prevention: This approach uses active methods to anticipate and prevent faults from occurring so that the system does not have a chance to go to recovery.
Availability of a system is closely tied to the consistency of its data via the CAP theorem which places a theoretical limit on the trade-offs a system can make with respect to consistency versus availability in the event of a network partition. The CAP theorem states that a system can choose between being consistent or being available—typically leading to two broad types of systems, namely, CP (consistent and tolerant to network failures) and AP (available and tolerant to network failures).
Availability is also tied to the system's scalability tactics, performance metrics, and its security. For example, a system that is highly horizontally scalable would have a very high availability, since it allows the load balancer to determine inactive nodes and take them out of the configuration pretty quickly.
A system which, instead, tries to scale up may have to monitor its performance metrics carefully. The system may have availability issues even when the node on which the system is fully available if the software processes are squeezed for system resources, such as CPU time or memory. This is where performance measurements become critical, and the system's load factor needs to be monitored and optimized.
With the increasing popularity of web applications and distributed computing, security is also an aspect that affects availability. It is possible for a malicious hacker to launch remote denial of service attacks on your servers, and if the system is not made foolproof against such attacks, it can lead to a condition where the system becomes unavailable or only partially available.
Security, in the software domain, can be defined as the degree of ability of a system to avoid damage to its data and logic from unauthenticated access, while continuing to provide services to other systems and roles that are properly authenticated.
A security crisis or attack occurs when a system is intentionally compromised with a view to gaining illegal access to it in order to compromise its services, copy, or modify its data, or deny access to its legitimate users.
In modern software systems, the users are tied to specific roles which have exclusive rights to different parts of the system. For example, a typical web application with a database may define the following roles:
- user: End user of the system with login and access to his/her private data
- dbadmin: Database administrator, who can view, modify, or delete all database data
- reports: Report admin, who has admin rights only to those parts of database and code that deal with report generation
- admin: Superuser, who has edit rights to the complete system
This way of allocating system control via user roles is called access control. Access control works by associating a user role with certain system privileges, thereby decoupling the actual user login from the rights granted by these privileges.
This principle is the
Authorization technique of security.
Another aspect of security is with respect to transactions where each person must validate the actual identity of the other. Public key cryptography, message signing, and so on are common techniques used here. For example, when you sign an e-mail with your GPG or PGP key, you are validating yourself—The sender of this message is really me, Mr. A—to your friend Mr. B on the other side of the e-mail. This principle is the Authentication technique of security.
The other aspects of security are as follows:
- Integrity: These techniques are used to ensure that data or information is not tampered with in anyway on its way to the end user. Examples are message hashing, CRC Checksum, and others.
- Origin: These techniques are used to assure the end receiver that the origin of the data is exactly the same as where it is purporting to be from. Examples of this are SPF, Sender-ID (for e-mail), Public Key Certificates and Chains (for websites using SSL), and others.
- Authenticity: These are the techniques which combine both the Integrity and Origin of a message into one. This ensures that the author of a message cannot deny the contents of the message as well as its origin (himself/herself). This typically uses Digital Certificate Mechanisms.
Deployability is one of those quality attributes which is not fundamental to the software. However, in this book, we are interested in this aspect, because it plays a critical role in many aspects of the ecosystem in the Python programming language and its usefulness to the programmer.
Deployability is the degree of ease with which software can be taken from the development to the production environment. It is more of a function of the technical environment, module structures, and programming runtime/languages used in building a system, and has nothing to do with the actual logic or code of the system.
The following are some factors that determine deployability:
- Module structures: If your system has its code organized into well-defined modules/projects which compartmentalize the system into easily deployable subunits, the deployment is much easier. On the other hand, if the code is organized into a monolithic project with a single setup step, it would be hard to deploy the code into a multiple node cluster.
- Production versus development environment: Having a production environment which is very similar to the structure of the development environment makes deployment an easy task. When the environments are similar, the same set of scripts and toolchains that are used by the developers/DevOps team can be used to deploy the system to a development server as well as a production server with minor changes—mostly in the configuration.
- Development ecosystem support: Having a mature tool-chain support for your system runtime, which allows configurations such as dependencies to be automatically established and satisfied, increases deployability. Programming languages such as Python are rich in this kind of support in its development ecosystem, with a rich array of tools available for the DevOps professional to take advantage of.
- Standardized configuration: It is a good idea to keep your configuration structures (files, database tables, and others) the same for both developer and production environments. The actual objects or filenames can be different, but if the configuration structures vary widely across both the environments, deployability decreases, as extra work is required to map the configuration of the environment to its structures.
- Standardized infrastructure: It is a well-known fact that keeping your deployments to a homogeneous or standardized set of infrastructure greatly aids deployability. For example, if you standardize your frontend application to run on 4 GB RAM, Debian-based 64-bit Linux VPS, then it is easy to automate deployment of such nodes—either using a script, or by using elastic compute approaches of providers such as Amazon—and to keep a standard set of scripts across both development and production environments. On the other hand, if your production deployment consists of heterogeneous infrastructure, say, a mix of Windows and Linux servers with varying capacities and resource specifications, the work typically doubles for each type of infrastructure decreasing deployability.
- Use of containers: The user of container software, popularized by the advent of technology such as Docker and Vagrant built on top of Linux containers, has become a recent trend in deploying software on servers. The use of containers allows you to standardize your software, and makes deployability easier by reducing the amount of overhead required to start/stop the nodes, as containers don't come with the overhead of a full virtual machine. This is an interesting trend to watch for.