Which disk arrangement is fault tolerant and uses half of your disk space for fault tolerance?

Analyzing your Security Information with FortiAnalyzer

Kenneth Tam, ... Josh More, in UTM Security with Fortinet, 2013

File System Configuration

All FortiAnalyzers larger than the 100 series support multiple disk drives. These may be configured to various levels of redundancy via RAID. In accordance to the limitations of RAID, devices with a limited number of disks are also limited to the type of RAID that they support. For example, in a dual-disk device like a 400B, the types of redundancy would be limited to simple striping (RAID-0) or mirroring (RAID-1). These two raid levels are implemented in software. For reasons of speed, more complex RAID levels are implemented in hardware. So long as a sufficient number of drives are installed, all RAID options are supported.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9781597497473000089

State of the Art on Technology and Practices for Improving the Energy Efficiency of Data Storage

Marcos Dias de Assunção, Laurent Lefèvre, in Advances in Computers, 2012

4.1 Disk Arrays and MAIDs

A disk array is a storage system that contains multiple disk drives. It can be Just a Bunch of Disks (JBODs), in which case the controller is an external module that interfaces with the array. Several of current storage arrays use Switched Bunch of Diskss (SBODs) or Extended Bunch of Diskss (EBODs), which give better response times. Hence, an array solution generally comprises controllers, which make arrays differ from disk enclosures by having cache memory and advanced features such as RAID. Common components of a disk array include:

Array controllers: devices that manage the physical disk drives and present them to the servers as logical units. Usually a controller contains additional disk cache and implements hardware level RAID.

Cache memories: as described above, an array can contain additional cache memories for improving the performance of read and write operations.

Disk enclosures: an array contains a number of disk drives, such as HDDs and SSDs. It can contain a mix of different drive types. The size of the disk enclosures depends on the used form factor (e.g., 2.5-in. or 3.5-in. hard disk drives).

Power supplies: a disk array can contain multiple power supplies in order to increase its reliability in case one of the supplies fails.

Although disk arrays can be directly attached to servers through a series of interfaces, they are often part of a more sophisticated storage system such as network attached storage or storage area network; described later.

As mentioned earlier, in order to improve their reliability and fault tolerance, disk arrays are commonly equipped with multiple power supplies. It is important that these supplies be power efficient and have a minimum power factor. Furthermore, the disk drives are the most power-consuming elements in the array. Thus, it is crucial to choose drives that are efficient and provide features that can minimize power consumption under the expected workload. For example, data archives can be more energy efficient by using disks with large storage capacity, while this is often not the case of high I/O applications. The RAID level also affects the energy efficiency of a storage system, since drives used for protection are not used to retrieve data, but consume energy like the other drives. As an example, Table VII shows different RAID levels and their storage efficiency [25].

Table VII. RAID types and efficiency [25]

RAID levelStorage efficiency*
RAID 1 50%
RAID 5 (3 + 1) 75%
RAID 6 (6 + 2) 75%
RAID 5 (7 + 1) 87.5%
RAID 6 (14 + 2) 87.5%

*Storage efficiency here means the percentage of the disks capacity that is made available for actual data storage.

As discussed earlier, it is important that the power supplies of storage arrays be power efficient. Properly sized power supplies benefit systems in both idle and active modes. Furthermore, it is relevant to work closely with the provider of storage equipments to choose solutions suitable to the expected workload and that have been designed with energy efficiency in mind. Disk arrays that utilize (i) disks with variable speeds, (ii) disks with spin-down features and (iii) mixed storage, can help minimize the energy consumed by the storage subsystem and reduce costs.

The efficiency of several power-saving features often depend on the workload; hence, the importance of working closely with providers of data storage solutions. For example, as described in the next section, current technology on MAIDs can lead to savings of up to 70% [26]. The energy savings can generally be substantial when MAID technology is applied to near-line storage where the storage resources can remain idle for large periods of time.

4.1.1 Options for Improvement of Energy and Cost Efficiency

MAID is a technology that uses a combination of cache memory and idle disks to service requests, only spinning up disks as required [12]. Stopping spindle rotation on less frequently accessed disk drives can reduce power consumption (see Fig. 3). Manufacturers such as Fujitsu allow customers to specify schedules with periods during which the drives should be spun down (or powered off) according to the workload or backup policies. Fujitsu also employs a technique in which drives are not spun up at the same time to minimize peak usage scenarios. These techniques come at hand for solutions targeted at back-up and archival as the drives can be spun down when the backup operations are not taking place.

Which disk arrangement is fault tolerant and uses half of your disk space for fault tolerance?

Fig. 3. Pictorial view of MAID [27].

How much power MAID features can save depends on the application that uses the disks and how often the application accesses the disks. As discussed earlier, EMC reports savings of up to 30% in power usage in a fully loaded CLARiiON CX4–960 environment if more than 50% of the data is infrequently accessed [28]. The criteria used to decide when drives are spun down (or put into standby mode) or spun up, also have an impact on energy savings as well as in performance. As an example of standby criteria, in EMC’s FLARE system [28], hard disk drives of a RAID group enter standby mode when both storage and processors report that the drives have not been used for 30 min. Similar threshold is used by Fujitsu’s ECO mode, where by default the ECO mode starts after 30 min of no disk access. ECO mode also allows the administrator to specify operation periods during which the motors of hard disk drives should not stop.

When initially conceived, MAID techniques enabled HDDs to be either on or off, which could incur considerable application performance penalties if data on a spun-down drive was required and the disk had to be spun back up. MAID techniques are said to have reached their second generation, where they implement Intelligent Power Management (IPM) with different power-saving modes and performance [29]. An example of MAID 2.0 is Nexsans Assureon, SATABoy, and SATABeast solutions that implement intelligent power management with its AutoMAID5 technology. AutoMAID has multiple power-saving modes that align power consumption to different quality of service needs. The user can configure the trade-off between response times and power savings. Nexsan claims that by enforcing the appropriate policies to determine the required level of access speed and MAID levels, a reduction of up to 70% in power requirements can be achieved [30]. The typical MAID-level configuration settings of AutoMAID are as follows:

Level 0:

Normal operation, drives at 7.2 K-rpm, heads loaded.

Level 1:

Hard disk drive heads are unloaded.

Sub-second recovery time.

Level 2:

Hard disk drive heads are unloaded.

Platters slow to 4 K-rpms.

15-s recovery time.

Level 3:

Hard disk drives stop spinning (sleep mode; powered on).

30–45 s recovery time.

Other power conservation techniques for disk arrays have been proposed, such as the Popular Data Concentration (PDC) [31] and file allocation mechanisms [9]. The rationale is to perform consolidation by migrating frequently accessed data to a subset of the disks. By skewing the load toward fewer disks, others can be transitioned to low-power consumption modes. It was found that it is possible to conserve a substantial amount of energy during periods of light load on the servers as long as two-speed (or variable speed) disks are used.

Another important issue refers to scalability. When choosing storage solutions, a recommended practice is to employ systems that allow for further storage bays to be added as the storage demand grows [25]. Hence, it is important to design the system to the intended workload and then scale using small storage bays to reduce eventual inefficiencies.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780123965288000043

Storage Networks

Gary Lee, in Cloud Networking, 2014

Storage area networks

SANs are specialized networks used in the data center that are dedicated to storage traffic. The most popular SAN standard today is Fibre Channel (FC) which we will describe in more detail later in this chapter. FC requires the use of host bus adapters (HBAs) that are similar to NICs and are resident on the server, along with storage arrays that have FC interfaces. One or more FC switches are used to interconnect multiple servers to multiple storage arrays as shown in Figure 8.3.

Which disk arrangement is fault tolerant and uses half of your disk space for fault tolerance?

Figure 8.3. Simple Fibre Channel storage area network.

Storage arrays are common in data centers and consist of multiple disk drives within a single chassis. These storage arrays could be built using either SATA or SAS drives connected to a FC interface through a switch on a storage controller board. We will provide more information on this later in the chapter. SAN storage is sometimes called block storage because the server applications deal with storage at the block level, which is part of a larger file or database.

Storage network administrators are extremely concerned about reliability and security. By having a dedicated SAN, they can manage it independently from the rest of the data center and data is physically isolated from other networks for higher levels of security. They also require reliable equipment that has gone through extensive certification testing. Because certification testing is a complex and expensive process, FC networking gear is only supplied by a few lead vendors and less competition leads to more costly solutions. But storage network administrators are willing to accept these trade-offs in order to have a reliable, dedicated network with very high and consistent performance, both for bandwidth and latency.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780128007280000084

Hardware Testbeds, Instrumentation, Measurement, Data Extraction, and Analysis

Paul J. Fortier, Howard E. Michel, in Computer Systems Performance Evaluation and Prediction, 2003

10.4 Testbed and model workloads

The term workload defines the load placed on a real system (typically measured or observed on a computer system while it runs normal operations), while the term test or model workload denotes a computer system's load constructed and applied to a system for performance studies (typically synthesized using characteristics from a real workload). For most modeling projects the use of a synthetic workload makes more sense, since we can control the load applied to the experiments. By controlling the load applied to a computer system under analysis, we can possibly predict the outcome of the experiment or force the experiment to test specific components of the system. In addition to this reason, synthetic workloads do not possibly contain real information, which may be sensitive or valuable to the system under study, and its compromise or loss would be significant. Once a valid synthetic workload has been developed, it can be reused to study additional systems. An example is the Transaction Processing Consortium (TPC) workloads developed to study database systems. These TPC workloads have been used by vendors and customers to study various database systems and to determine which is better for different applications. Some of these workloads have been specialized for data mining or for distributed databases and other specialized applications.

To study computer architectures, a variety of instruction workloads have been developed. These are focused on low-level operations and consist of mixes of loads, stores, comparisons, branches, additions, subtractions, floating-point operations, multiplications, divisions, shift operations, logical operations, and register operations. These instruction mix workloads have become standardized for specific architectures such as PCs.

Other workloads do not focus on low-level operations but wish to examine more coarse-grained architectural concepts. These would be developed using high-order languages and would be designed to test things such as file transfer, task switching, memory management policies, and other operating systems components.

Some popular benchmarks include the TPC benchmarks described previously for examining database systems, the Sieve benchmark used to examine PCs and microprocessors, Ackerman's function for testing procedure call mechanisms in computer systems, Whetstone kernel developed to test low-level assembly-level operations, the Linpack package to test floating-point operations, the Drystone benchmark for testing low-level integer operations, and the Spec benchmark suite for measuring engineering-type applications (e.g., compilation, electronic design, VLSI circuit simulation, and complex mathematics manipulations such as matrix multiplications) on a computer system.

Given that all of these and other workloads exist, modelers must still determine which to use or which method to use in constructing their own workload for a given modeling project. There are four main considerations applicable when selecting a workload for a project. They are the computer systems services exercised by the workload, the level of detail to be applied, closeness to realistic load, and timeliness.

The most important component of the workload selection is to determine the services one wishes to examine. Making this list of services can be very daunting and time consuming but is time well spent. First, one must determine the system under test. This represents the complete set of components making up a system being studied. Often we may be focusing on some single component or some small set of components for comparison, called the components under study. For example, an operating system design team may be interested in different process scheduling algorithms on the total operating systems performance. The determination of the system and its components is a very important step in workload development and should not be trivialized.

An example will illustrate the service's concept. We are interested in this example: comparing an off-line backup paging storage system using disk drive arrays (e.g., such as one would find in a large database log subsystem). The system consists of several disk data systems, each containing multiple disk drives. The disk drives have separate read and write subsystems. Each subsystem uses fixed magnetic heads for these operations. If we specify the architecture from the highest level and work down to lower levels, the services, factors, metrics, and workloads are defined as follows:

1.

Backup system

Services: backup pages, backup changed pages, restore pages, list backed-up pages

Factors: page size, batch or background process, incremental or full backup

Metrics: backup time, restoration time

Workload: a database system with log pages to be backed up—vary frequency of logging

2.

Disk data system

Services: read/write to a disk

Factors: type of disk drive

Metrics: speed, reliability, time between failures

Workload: synthetic program generating transaction-like disk I/O requests

3.

Disk drives

Services: read record, write record, find record

Factors: disk drive capacity, number of tracks, number of cylinders, number of read/write heads

Metrics: time to find record, time to read record, time to write record, data lost rate, requests per unit time

Workload: synthetic program generating realistic operations requests to the disk drives

4.

Read/write subsystem

Services: read data, write data

Factors: data encoding technique, implementation technology

Metrics: I/O bandwidth, density of media

Workload: read/write data streams with varying patterns

5.

Read/write heads

Services: read signal and write signal

Factors: composition, head spacing, record gap size

Metrics: magnetic field strength, hysteresis

Workload: reads and writes of varying power strengths, disks moving at various rotational speeds

After we have completed the specification of the system and the components of interest, we need to determine the level of detail required in producing and recording requests for the defined services. A workload description can be as detailed as providing definitions for all events in the system or can simply be an aggregate or generalization of this load. Some possibilities for the detail may be average resource demand, most frequent request, frequency of request types (e.g., 25 percent reads and 75 percent writes), a timestamped sequence of specific requests, or some distribution of resource demands.

Typical modeling projects begin by using a variant of the concept of most frequently requested service. For example, in a transaction processing system we may use a simple debit-credit benchmark from the TPC benchmarks. Such a selection would be valid if a particular service is requested much more than others. A second alternative is to be more specific and construct a workload by selecting specific services, their characteristics, and frequency. The Linpack package is such a workload. It selects very specific computer operations in very prescribed patterns to test specific components of the system. The next alternative is to construct a time stamped record, where each record represents a specific request for a specific service along with details of the actual access (such a description could be constructed by taking a trace of all activities of an existing system). In most cases this type of workload may be too difficult to construct and to validate for use in all but the most complex modeling projects. The aggregate resource demand approach is similar to what we would expect to see in an analytical model. We look to characterize each request for services as averages or distributions. For example, each request may be characterized as requiring 50 units of one particular resource and 25 units of some other and making these requests every 1,000 units of time.

No matter which of these approaches we use, the modeler must determine if the selected load is representative of the real system load. Typically we will be interested in determining if the service request's load has similar arrival characteristics, resource demands, and resource utilization demands as the real load.

Finally, a developed workload should faithfully model the changes in use patterns in a timely manner. For example, the TPC benchmarks have continued to evolve to meet the needs of changing database systems design and use. The original TPC workloads were designed for the “bankers” database problem. That is, they simply were looking to provide transaction loads to well-structured, simple, flat relational database specifications. They were record oriented and had no dimensions beyond the simple relational model of the day. These have evolved now to include benchmarks for the new object relational databases and for data warehouses and data mining systems. Other important considerations in developing a workload include repeatability, external components impact, and load leveling. Repeatability looks at a workload's ability to be reproduced faithfully with little added overhead. External components impact looks to capture and characterize impacts on the system under study by nonessential components. Finally, load leveling may be of interest if our study wishes to examine a system under best-case or worst-case scenarios.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9781555582609500102

Installing and configuring Windows Server 2008 R2

Dustin Hannifin, ... Joey Alpern, in Microsoft Windows Server 2008 R2, 2010

Dynamic disk volumes

Once disks are converted to dynamic, they can be configured to support the following types of volumes:

Simple Volume: A simple volume is the same as a single partition when using basic disks. A simple volume does not provide redundancy.

Spanned Volume: A spanned volume is one that can span multiple physical disk drives that logically appear to the OS as a single drive.

Striped Volume: A striped volume provides software RAID level 0 functionality. RAID level 0 does not provide redundancy in the event of disk failure but does enhance the performance of multiple disks via striping data across two or more disk drives.

Notes from the field

Disk striping

Disk striping is a technology that has been around for years now. It allows data to be “striped” across multiple disks to enhance disk performance. Instead of one disk read/write head being used to write data, multiple heads from multiple disk drives can be used to write data, thus increasing the performance. Typically, the more disks added to the stripe set, the faster the performance.

Mirrored Volume—A mirrored volume provides software RAID level 1 functionality. Two disk drives are set up as a mirror set, and data that is written to the primary drive is also written to the secondary drive. In the event that the primary disk drive fails, the second disk drive contains the “second copy” of the data and can become the new primary disk drive in the RAID configuration. This technology ensures data fault tolerance and redundancy, but you lose the performance enhancements gained by disk striping.

RAID-5 Volume—A RAID-5 volume provides software-based disk striping with fault tolerance. A RAID-5 volume contains three or more physical disks to create one logical disk drive as seen in Figure 2.14. A RAID-5 volume gives you the performance benefits of a stripped set as seen in striped volumes, while providing disk fault tolerance as seen in mirrored volumes. In RAID-5 volumes, any single disk can fail in the array without any loss of data.

Which disk arrangement is fault tolerant and uses half of your disk space for fault tolerance?

Figure 2.14. RAID-5 Volume.

Notes from the field

Disk hot spares

Some servers provide the ability to add a “hot spare” disk drive. Hot spares provide additional redundancy by providing a standby drive dedicated to replacing a failed drive in a disk array. Traditionally, if a disk drive failed in a disk mirror or RAID-5 array, the administrator would need to immediately replace the failed drive, as failure of a second drive would result in loss of data on the disk array. By using a hot spare, the server will automatically add the standby “spare” drive to the mirror or RAID array and start rebuilding that array.

Best practices

Backups and disk fault tolerance

Disk drive fault tolerance technologies, such as mirroring and RAID-5, should never be used to replace traditional backups. These technologies are great to ensure that you do not always have to restore data in the event of a single disk failure; however, they do not protect you from multiple disk failures or total server failure. Good backups are always a must whether disk fault tolerance technologies are used or not.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9781597495783000025

Columnar Databases

Joe Celko, in Joe Celko’s Complete Guide to NoSQL, 2014

2.4 Multiple Users and Hardware

One of the advantages of a columnar model is that if two or more users want to use a different subset of columns, they do not have to lock out each other. This design is made easier because of a disk storage method known as RAID (redundant array of independent disks, originally redundant array of inexpensive disks), which combines multiple disk drives into a logical unit. Data is stored in several patterns called levels that have different amounts of redundancy. The idea of the redundancy is that when one drive fails, the other drives can take over. When a replacement disk drive in put in the array, the data is replicated from the other disks in the array and the system is restored. The following are the various levels of RAID:

RAID 0 (block-level striping without parity or mirroring) has no (or zero) redundancy. It provides improved performance and additional storage but no fault tolerance. It is a starting point for discussion.

In RAID 1 (mirroring without parity or striping) data is written identically to two drives, thereby producing a mirrored set; the read request is serviced by either of the two drives containing the requested data, whichever one involves the least seek time plus rotational latency. This is also the pattern for Tandem’s nonstop computing model. Stopping the machine required a special command—“Ambush”—that has to catch both data flows at the same critical point, so they would not automatically restart.

In RAID 10 (mirroring and striping) data is written in stripes across primary disks that have been mirrored to the secondary disks. A typical RAID 10 configuration consists of four drives: two for striping and two for mirroring. A RAID 10 configuration takes the best concepts of RAID 0 and RAID 1 and combines them.

In RAID 2 (bit-level striping with dedicated Hamming-code parity) all disk spindle rotation is synchronized, and data is striped such that each sequential bit is on a different drive. Hamming-code parity is calculated across corresponding bits and stored on at least one parity drive. This theoretical RAID level is not used in practice.

In RAID 3 (byte-level striping with dedicated parity) all disk spindle rotation is synchronized, and data is striped so each sequential byte is on a different drive. Parity is calculated across corresponding bytes and stored on a dedicated parity drive.

RAID 4 (block-level striping with dedicated parity) is equivalent to RAID 5 except that all parity data is stored on a single drive. In this arrangement, files may be distributed between multiple drives. Each drive operates independently, allowing input/output (I/O) requests to be performed in parallel. Parallelism is a huge advantage for a database. Each session can access one copy of a heavily referenced table without locking or read head contention.

RAID 5, RAID 6, and other patterns exist; many of them are marketing terms more than technology. The goal is to provide fault tolerance of drive failures, up to n disk drive failures or removals from the array. This makes larger RAID arrays practical, especially for high-availability systems. While this is nice for database people, we get more benefit from parallelism for queries.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780124071926000029

Domain 7: Security Operations (e.g., Foundational Concepts, Investigations, Incident Management, Disaster Recovery)

Eric Conrad, ... Joshua Feldman, in CISSP Study Guide (Third Edition), 2016

Redundant Array of Inexpensive Disks (RAID)

Even if only one full backup tape is needed for recovery of a system due to a hard disk failure, the time to recover a large amount of data can easily exceed the recovery time dictated by the organization. The goal of a Redundant Array of Inexpensive Disks (RAID) is to help mitigate the risk associated with hard disk failures. There are various RAID levels that consist of different approaches to disk array configurations. These differences in configuration have varying cost, in terms of both the number of disks required to achieve the configuration’s goals, and capabilities in terms of reliability and performance advantages. Table 8.1 provides a brief description of the various RAID levels that are most commonly used.

Table 8.1. RAID Levels

RAID LevelDescription
RAID 0 Striped Set
RAID 1 Mirrored Set
RAID 3 Byte Level Striping with Dedicated Parity
RAID 4 Block Level Striping with Dedicated Parity
RAID 5 Block Level Striping with Distributed Parity
RAID 6 Block Level Striping with Double Distributed Parity

Three critical RAID terms are: mirroring, striping and parity.

Mirroring is the most obvious and basic of the fundamental RAID concepts, and is simply used to achieve full data redundancy by writing the same data to multiple hard disks. Since mirrored data must be written to multiple disks the write times are slower (though caching by the RAID controller may mitigate this). However, there can be performance gains when reading mirrored data by simultaneously pulling data from multiple hard disks. Other than read and write performance considerations, a major cost associated with mirroring is disk usage; at least half of the drives are used for redundancy when mirroring is used.

Striping is a RAID concept that is focused on increasing the read and write performance by spreading data across multiple hard disks. With data being spread amongst multiple disk drives, reads and writes can be performed in parallel across multiple disks rather than serially on one disk. This parallelization provides a performance increase, but does not aid in data redundancy.

Parity is a means to achieve data redundancy without incurring the same degree of cost as that of mirroring in terms of disk usage and write performance.

Exam Warning

While the ability to quickly recover from a disk failure is the goal of RAID there are configurations that do not have reliability as a capability. For the exam, be sure to understand that not all RAID configurations provide additional reliability.

RAID 0 – Striped Set

As is suggested by the title, RAID 0 employs striping to increase the performance of read and writes. By itself, striping offers no data redundancy so RAID 0 is a poor choice if recovery of data is the reason for leveraging RAID. Figure 8.8 shows visually what RAID 0 entails.

Which disk arrangement is fault tolerant and uses half of your disk space for fault tolerance?

Figure 8.8. RAID 0 – Striped Set

RAID 1 – Mirrored Set

This level of RAID is perhaps the simplest of all RAID levels to understand. RAID 1 creates/writes an exact duplicate of all data to an additional disk. The write performance is decreased, though the read performance can see an increase. Disk cost is one of the most troubling aspects of this level of RAID, as at least half of all disks are dedicated to redundancy. Figure 8.9 shows RAID 1 visually.

Which disk arrangement is fault tolerant and uses half of your disk space for fault tolerance?

Figure 8.9. RAID 1 – Mirrored Set

RAID 2 – Hamming Code

RAID 2 is not considered commercially viable for hard disks and is not used. This level of RAID would require either 14 or 39 hard disks and a specially designed hardware controller, which makes RAID 2 incredibly cost prohibitive. RAID 2 is not likely to be tested.

RAID 3 – Striped Set with Dedicated Parity (Byte Level)

Striping is desirable due to the performance gains associated with spreading data across multiple disks. However, striping alone is not as desirable due to the lack of redundancy. With RAID 3, data, at the byte level, is striped across multiple disks, but an additional disk is leveraged for storage of parity information, which is used for recovery in the event of a failure.

RAID 4 – Striped Set with Dedicated Parity (Block Level)

RAID 4 provides the exact same configuration and functionality as that of RAID 3, but stripes data at the block, rather than byte, level. Like RAID 3, RAID 4 employs a dedicated parity drive.

RAID 5 – Striped Set with Distributed Parity

One of the most popular RAID configurations is that of RAID 5, Striped Set with Distributed Parity. Again with RAID 5 there is a focus on striping for the performance increase it offers, and RAID 5 leverages block level striping. Like RAIDs 3 and 4, RAID 5 writes parity information that is used for recovery purposes. However, unlike RAIDs 3 and 4, which require a dedicated disk for parity information, RAID 5 distributes the parity information across multiple disks. One of the reasons for RAID 5’s popularity is that the disk cost for redundancy is lower than that of a Mirrored set. Another important reason for this level’s popularity is the support for both hardware and software based implementations, which significantly reduces the barrier to entry for RAID configurations. RAID 5 allows for data recovery in the event that any one disk fails. Figure 8.10 provides a visual representation of RAID 5.

Which disk arrangement is fault tolerant and uses half of your disk space for fault tolerance?

Figure 8.10. RAID 5 – Striped Set with Distributed Parity

RAID 6 – Striped Set with Dual Distributed Parity

While RAID 5 accommodates the loss of any one drive in the array, RAID 6 can allow for the failure of two drives and still function. This redundancy is achieved by writing the same parity information to two different disks.

Note

There are many and varied RAID configurations that are simply combinations of the standard RAID levels. Nested RAID solutions are becoming increasingly common with larger arrays of disks that require a high degree of both reliability and speed. Some common nested RAID levels include RAID 0 + 1, 1 + 0, 5 + 0, 6 + 0, and (1 + 0) + 0, which are also commonly written as RAID 01, 10, 50, 60, and 100, respectively.

RAID 1 + 0 or RAID 10

RAID 1 + 0 or RAID 10 is an example of what is known as nested RAID or multi-RAID, which simply means that one standard RAID level is encapsulated within another. With RAID 10, which is also commonly written as RAID 1 + 0 to explicitly indicate the nesting, the configuration is that of a striped set of mirrors.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780128024379000084

Architectures and optimization methods of flash memory based storage systems

Yuhui Deng, Jipeng Zhou, in Journal of Systems Architecture, 2011

4.2.3 Parallelism

Parallel processing such as Redundant Arrays of Inexpensive Disks (RAID) [75] is a milestone which affects the evolution of storage systems. RAID uses multiple disk drives to store and distribute data. The basic idea of RAID is dividing data into strips and spreading the strips across multiple disk drives, thus aggregating the storage capacity, performance, and reliability. Different distribution policies endow the RAID subsystem with different features [23]. RAID0 is not widely used because it does not guarantee fault tolerance, but it provides the best performance. RAID1 offers the best fault tolerance but wastes storage capacity. RAID5 guarantees fault tolerance of a single disk drive by sacrificing some performance. RAID6 is normally employed to tolerate the fault tolerance of two disk drives.

SSD employs multiple flash memory chips to aggregate storage capacity. It also leaves the opportunities of improving performance by leveraging parallelism. A high performance NAND flash memory based storage system is proposed in [50]. The system consists of multiple independent channels, where each channel has multiple NAND flash memory chips. It is similar to Fig. 6. By leveraging the multi-channel architecture, the authors exploited an intra-request parallelism and an inter-request parallelism to improve performance. The intra-request parallelism is achieved by using a striping technique like RAID. This approach divides one request into multiple sub-requests and distributes the sub-requests across multiple channels, thus obtaining the parallel execution of a single request. The inter-request parallelism indicates the parallel execution of many different requests to improve throughput. Interleaving and pipelining are used to exploit the inter-request parallelism. For the interleaving method, several requests are handled in parallel by using several channel managers. The pipelining solution overlaps the processing of two requests within one single channel. Striping takes advantage of I/O parallelism to boot the performance of flash memory systems. However, it cannot succeed without considering the characteristics of flash memory.

Due to the limited endurance cycles, the RAID solution could wear out the redundant flash memory chips at similar rates. For instance, RAID5 cause all SSDs to wear out at approximately the same rate by balancing write load across chips. This also applies to RAID1. Two mirror chips could reach the endurance cycles at almost the same time. Therefore, using RAID technology to organize the NAND flash memory chips incurs correlated failures. Diff-RAID [48] is proposed to alleviate this challenge in two ways. First, the parity of data blocks is distributed unequally across flash devices. Because each writing has to update the parity block, this unequal distribution forces some flash devices to wear out faster than others. Second, Diff-RAID redistributes parity during device replacements to ensure that the oldest device always holds most of the parity and ages at the highest rate. When this oldest device is replaced at some threshold age, its parity blocks are assigned across all the devices in the array and not just to its replacement.

In contrast to the above work, Chang and Kuo [11] proposed a striping approach to leverage the physical feature of flash memory to achieve parallelism. As introduced in Section 3.1, accessing the flash memory media normally consists of setup phase and busy phase. Suppose a NAND flash memory device is composed of multiple NAND banks, where each bank is a flash unit that can operate independently. When a bank operates at the busy phase, the system can switch immediately to another bank such that multiple banks can operate simultaneously. Based on this parallelism, a joint striping and garbage-collection mechanism is designed to boost performance, while reducing the performance degradation incurred by garbage-collection.

Agrawal et al. [1] discussed the potential issues which could significantly impact the SSD performance. They reported that the bandwidth and operation rate of any given flash chip is not sufficient to achieve optimal performance. Hence, memory components must be coordinated so as to operate in parallel. They also suggested to carefully placing the data across multiple flash chips of an SSD to achieve load balance and effective wear-leveling. Write ordering was proposed to handle the random writing. Gray and Fitzgerald [33] also discussed the opportunities for enhancing the SSD, such as using non-volatile address map, employing block-buffer, adding logic for copy-on-write snapshots, writing data strips across the chip array, and so on.

Read full article

URL: https://www.sciencedirect.com/science/article/pii/S1383762110001657

What is a counter on a Windows system and how is one used provide an example quizlet?

What is a counter on a Windows system and how is one used? Provide an example. ***Counters are used to provide information as to how well the operating system or an application, service, or driver is performing. The counter data can help determine system bottlenecks and fine-tune system and application performance.

When a disk is formatted it is impossible to recover the data that was on the disk?

When a disk is formatted, it is impossible to recover the data that was on the disk. In Windows, you can start the Task Manager by simply right-clicking the taskbar and choosing Start Task Manager. One critical difference between a client OS and a server OS is the addition of directory services on some server OSs.

Why would you want to deploy a WAN over the Internet quizlet?

Why would you want to deploy a WAN over the Internet? ***Better Security: Network traffic is encrypted and the network is segmented to improve security when sending sensitive files. Improved Performance: Applications (like voice or video) can be prioritized ensuring a better experience for end users.

What storage solution involves a third party company that provides off site hosting of data?

Cloud storage allows you to save data and files in an off-site location that you access either through the public internet or a dedicated private network connection. Data that you transfer off-site for storage becomes the responsibility of a third-party cloud provider.