| ruu/19970922: Disk |
|---|
The graph shows the speed of processing swap-in and swap-out requests during the day (in requests per second).
The number of reads is higher that the number of writes despite the fact that hit ratio is less than than 50%. This is because many non-cachable misses are never written to disk. Thus, even with 50% hit ratio the majority of disk requests are reads.
The curves follow the number of requests processed by the proxy.
The graph shows the number of concurrent swap requests present in the proxy server. We count the number of requests in the system using 2 msec intervals and calculate the median based on 20 minute grouping. Small 2 msec intervals assure that we count the number of concurrent requests rather than total number of requests per [large] interval. Note that this graph is not a "disk request per second" graph.
We plot the 50th and 75th percentiles. The 50th percentile is the same as median.
This graph is useful in determining the increase in the length of disk queues during peak hours (if any).
For this server, in 50% of cases, a swap request will not compete with others on disk. In 75% of cases, there will be at most one request on disk so no competition again. Disk subsystem is clearly under-utilized.
Disk utilization is measured in the percent of time there was at least one active swap request. The measurements are done using 2 msec intervals. Note that only swap requests are taken into consideration. Disk utilization is represented by the "all" curve. Curves for swap-in and swap-out requests are given to compare the contribution of each class towards disk utilization.
The patch does not measure per disk utilization. The graph represents the utilization of the disk I/O subsystem as a whole. In other words, if there is always one disk I/O in the system, then utilization is 100% regardless of the number of physical disks installed.
Note that there is not enough load to fully utilize the disk I/O subsystem on this server. With two physical disks, the utilization is always less than 45%.
Note that this graph may be affected by a performance bug in Squid described elsewhere. Without the bug, the utilization will be even lower though.
Disk response time is the total time it takes to load/store (swap in/out) a document from/into the disk cache. The graph shows median disk response time in milliseconds during the day.
Median swap-in response time is noticeably higher than swap-out time. There are several factors affecting disk response time. See other disk related experiments for their quantification.
The patch allows for quantifying varios I/O delays. Let's consider a swap
request. Swapping a document is done in several steps. First, the
corresponding file should be opened for reading or created for
writing. This requires an open(2) system call which may incur
significant OS overhead: An open(2) call may result in
extra I/Os if OS has to write/read i-nodes to/from disk. Then the content in
swapped to/from disk using blocks of fixed size. Disk cache and various delays
in-between these I/Os affect the total response time.
To estimate the OS overhead on swapping a file we plot the median disk response time of a request versus file size. Response times for files smaller than 16 KB were grouped using 1 KB granularity. Larger files used 1 KB granularity to get enough entries per group. The graph is based on the 24 hour data.
Squid attempts to swap files using blocks of fixed length (e.g. 8 KB). For each I/O direction, we plot the total request response time and the time it takes to swap the first block. The "total" and "1st delay" curves for files smaller than 8 KB are the same.
Since various per I/O delays dominate disk transfer time, the size of an I/O is not very important (the number of I/Os is). This explains step-like shape of the "total" curves: If I/O block size is 8 KB, the times to read 5 KB and 8 KB are the same!
The first disk delay always includes OS overhead on opening a file. Consecutive I/Os for the same file (if any) do not have this overhead. Thus, for file sizes equal to two I/O blocks (16 KB), the difference between the first delay and second I/O approximates OS overhead on opening a file. The patch does not measure the duration of the second I/O. However, we can compute it, assuming that overhead does not depend on the file size for small files:
1st_Delay = Overhead + I/O
Total( 8KB) = 1s_Delay
Total(16KB) = 1s_Delay + I/O
=>
I/O = Total(16KB) - Total( 8KB)
Overhead = Total( 8KB) - I/O
Note that the first "step" on each "total" curve corresponds to the size of the Squid I/O block for this server (8 KB). "Steps" for requests larger than 3 blocks (24 KB) are not distinct: There are not enough files of that size to get a "stable" median; also disk and network delays in-between disk I/Os may spoil the picture.
Clearly, the duration of the first delay should be the same for any file size. However, on our graph swap-in requests do not follow this rule (first delay is smaller for files bigger than 8 KB) due to a performance bug in Squid described elsewhere. We silently adjust for this bug in calculations for swap-in requests. Swap-out requests need no adjustment.
We summarized our calculations for 24 hours and peak load in a table (all
times are medians in milliseconds).
| 24 Hours | Peak Load | |||
|---|---|---|---|---|
| I/O | Overhead | I/O | Overhead | |
| Swap In | 38 | 16 | 44 | 28 |
| Swap Out | 12 | 58 | 16 | 58 |
Huge OS overhead for swap-out requests could be caused by several I/Os needed to create a new file (reading and writing i-nodes). Swap-ins do not create a new file and have much smaller overhead.
Fast swap-out I/Os may be explained by the presence of a low-level disk cache (buffer): Reads have to go all the way to the disk surface while writes can be buffered in the disk cache and written at a later time (the writing process is released after data is buffered).
Interestingly, fast 12 msec swap-out I/Os make large swap-outs faster than large swap-ins despite large OS overhead for writes. Compared to swap-ins, swap-out requests pay big one time "subscription fee" but small "monthly payments". This saves them "money" on the long run.
The increase in I/O duration with load may be caused by longer queuing
delays of individual disk requests. Note that these are per I/O delays
and our computations account for them in the I/O cost.
Currently, we do not have enough data to quantify these delays. However, we
can show how I/O duration changes with load.
The graph is based on second I/O duration for 2 page files (8 KB, 16 KB]. The second I/O duration is calculated as a difference between total response time and the first delay (another method to approximate I/O duration which gives close results). Again, we adjust for the bug with swap-in requests.
It is tempting to improve the overall response time by pipelining disk and network transfers. The graph studies the percentage of atomic swap requests. An atomic requests is served using a single disk I/O. Thus, there is no opportunity for a cache server to pipeline the processing of an atomic request.
The graph proves that about 85% of swap-in requests are atomic. Thus, savings in disk response time will be directly reflected on overall response time of a request.
Interestingly, the percent of atomic reads is higher than of atomic writes. This may be attributed to the skew in documents popularity towards smaller objects. Squid stores (swaps out) small and large objects but smaller ones are requested back (swapped in) more often. The smaller the object the more likely it can be read in one (atomic) I/O.