What Is High Availability | Part 3

Shadow Operations

When the malfunction of a component redundant and after repair, one may wish to reintroduce the active service, check its proper functioning, but the results are used. In this case, inputs are processed by one (or several) components to be reliable. These produce the result operated by the rest of the system. The same inputs are processed by the component is said reintroduced mode shadow. We can verify the proper functioning of the component by comparing the results with those produced reliable components. This method is often used in systems based on voting for it just to exclude the component mode “shadow” of the final vote.

High Availability Cluster

A high availability system (or High Availability Cluster) is a computer system resilient to software failures and power, whose purpose is to keep services available for as long as possible. A high availability cluster is a set of two or more machines that are characterized by having a series of shared services and to be constantly monitored each other. A cloud hosting can be good example of High Availability solutions. We can divide into two classes:

High availability of infrastructure: If a hardware failure in one of the machines in the cluster, high availability software can automatically start the services in any of the other machines in the cluster (failover). And when the machine failed recovers, the services are migrated back to the original equipment (failback). This resilience automated service guarantees the high availability of services offered by the cluster, thus minimizing the perception of failure on the part of users.

High availability application: If a hardware failure or application of any of the machines in the cluster, high availability software can automatically start the services have failed in any of the other machines in the cluster. And when the machine failed recovers, the services are migrated back to the original machine. This resilience automated service guarantees the integrity of the information, since there is no data loss, and avoids inconvenience to users that do not have to note that there has been a problem.

Do not confuse a high availability cluster with a high performance cluster. The second is a configuration of equipment designed to provide computing capabilities far greater than just the individual teams (see e.g., Beowulf cluster type systems), while the first type of cluster is designed to ensure the continued operation of certain applications.

Calculating Availability

In a real system, if one component fails, is repaired or replaced by a new component. If this new component fails, is replaced by another, and so on. The fixed component is considered in the same state as a new component. Over its lifetime, one of the components can be considered one of the following states: Running or Repair. The running state indicates that the component is operational and under repair means it has failed and has not yet been replaced by a new component.

Increasingly, it is becoming necessary to ensure the availability of a service, but being that many components of current information systems contain mechanical parts the reliability of these is relatively poor if the service is critical. To ensure no interruption of service is needed, often disposing of redundant hardware that is put into operation automatically upon failure of the components in use.

The more redundancy exists, the smaller the SPOF (Single Point Of Failure), and lower the probability of disruptions in service. Until recently these systems were very expensive, and there has been an increase in demand for alternative solutions. Soon the systems were built with affordable hardware (clusters), highly scalable and low cost.

Fault tolerance is basically about having redundant hardware that goes into operation automatically after the detection of major hardware failure. Whichever solution is adopted, there is always two parameters that allow measuring the degree of fault tolerance that are the MTBF – Mean Time Between Failures – (mean time between failures) and MTTR – Mean Time To Repair – average recovery time, which is the time (average) that elapses between the occurrence of failure and the total recovery of the system to its operational state. The availability of a system can be calculated by the formula:

Availability = MTBF / (MTBF + MTTR)

In case of defects, the system goes from working to the repair mode, and when it will return back to the operational status. Therefore, it can be said that the system has during its lifetime, an average of time to file failure (MTTF) and mean time to repair (MTTR). This time is a succession of MTTFs and MTTRs, as this is failing and being repaired. The lifetime of the system is the sum of MTTFs: MTTF + MTTR cycles already lived.

Load Balancing

All hardware has its limits, and often the same service has to be spread over several machines, failing to become congested. These solutions can specialize in small groups on which it makes a load balancing: CPU usage, storage, or network. Either one introduces the concept of clustering, or server farm, since the balance will probably be done to multiple servers. In computer networking, load balancing is a technique to distribute the workload evenly between two or more computers, network links, CPUs, hard disks or other resources to optimize resource utilization, maximize performance, minimize response time and prevent overloading. Using multiple components with load balancing, instead of a single component, may increase reliability through redundancy.

Balancing network

Balancing network usage is mainly for forwarding traffic through alternate routes to decongest the access to the servers. This balancing can occur at any level of the OSI layer.

Balancing storage

The balancing of the storage media enables access to distributed file systems across multiple disks (software / hardware RAID), by deriving obvious gains in access times. These solutions can be dedicated or exist in each of the servers in the cluster.

Balancing CPU

This type of balancing is performed by the distributed processing systems and basically consists in dividing the total load processing by multiple processors in the system (whether local or remote).

Study: From Wikipedia, the free encyclopedia. The text is available under the Creative Commons.

VN:F [1.9.17_1161]
Rating: 0.0/10 (0 votes cast)
VN:F [1.9.17_1161]
Rating: 0 (from 0 votes)

Related Posts:



Online 24X7 Chat Support
 
 
Telephone
Toll Free
Online chat
 
Online 24X7 Email Support
 
Emails
 
 
 
Support
Support email
sales
Sales email
 
Billing
Billing email
 
   
Latest Tutorials & Articles (Updated Daily)
http://blog.eukhost.com
  Forums :
http://www.eukhost.com/forums/