| Swap-In Bug |
|---|
Update: The bug was fixed as of Squid v1.1.17.
The profiling experiment for Squid caching proxy of the Utrecht University in Netherlands revealed a strange pattern in swap-in requests. It turned out to be a performance bug in Squid. This page discusses the phenomenon and suggests a cure for the bug. Note that there are other interesting observations based on this portion of the experiment, but this page is devoted exclusively to one particular problem described below.
The proxy was running Squid v1.1.15. At the time of writing, the Change Log for v1.1.16 did not mention any relevant modifications.
To illustrate the problem, we plot the median disk response time of a request versus file size. Response times for files smaller than 16 KB were grouped using 1 KB granularity. Larger files used 2 KB granularity to get enough entries per group. The graph is based on 24 hour data.
Squid attempts to swap files using blocks of fixed length (e.g. 8 KB). For each I/O direction, we plot the total request response time and the time it takes to swap the first block. The "total" and "1st delay" curves for files smaller than 8 KB are the same (explained below).
Since various per I/O delays dominate disk transfer time, the size of an I/O is not very important (the number of I/Os is). This explains step-like shape of the "total" curves: If I/O block size is 8 KB, the times to read 5 KB and 8 KB are the same! Note that the first "step" on each "total" curve corresponds to the size of the Squid I/O block for this server (8 KB). "Steps" for requests larger than 3 blocks (24 KB) are not distinct: There are not enough files of that size to get a "stable" median; also disk and network delays in-between disk I/Os may spoil the picture.
The first disk delay always includes OS overhead on opening a file. Consecutive I/Os for the same file (if any) do not have this overhead. Clearly, the first delay should be the same for any file size. This holds for swap-out requests: The first delay for large files is the same as for small ones. However, this is not the case for swap-in requests(!): The first delay drops sharply for files larger than one I/O block (8 KB).
We have spent quite some time looking for a bug in the profiling code... :)
Here is an explanation. When reading a block, Squid does not specify the
actual file length. In other words, if the file size is 5 KB, then
SwapIn handler will request 8 KB from
diskHandleRead() via file_read() function. The
diskHandleRead() function calls read(2) to get an
8 KB block but gets only 5 KB. Note that the EOF is
not detected at this point since read(2) returns a positive
number! Thus, diskHandleRead() assumes that there are 3 KB
left in the file and schedules the second read to fetch them. The
second call to read(2) returns 0 indicating an
EOF condition. However, we did two read(2)'s instead
of one, and a read(2) is a very expensive system call.
The SwapOut handler does not have this problem. It requests
the exact amount of data from file_write().
We get one extra read(2) call for files of all
sizes because their last I/O block is almost never exactly 8 KB. The
largest performance loss (more than 80% according to the graph) is for
files smaller than 8 KB, of course.
The following patch was supplied by Duane Wessels. It simply prevents
diskHandleWrite() from immediately requesting more data if we
read less than was requested. The code to be removed is in bold.
Index: src/disk.c
===================================================================
RCS file: /surf1/CVS/squid/src/disk.c,v
retrieving revision 1.56.2.7
diff -w -u -r1.56.2.7 disk.c
--- disk.c 1997/07/21 22:59:20 1.56.2.7
+++ disk.c 1997/10/01 16:31:55
@@ -401,6 +401,7 @@
ctrl_dat->cur_len += len;
ctrl_dat->offset = lseek(fd, 0L, SEEK_CUR);
+#if BAD_IDEA
/* reschedule if need more data. */
if (ctrl_dat->cur_len < ctrl_dat->req_len) {
commSetSelect(fd,
@@ -410,6 +411,7 @@
0);
return DISK_OK;
} else {
+#endif
/* all data we need is here. */
/* call handler */
ctrl_dat->handler(fd,
@@ -419,7 +421,9 @@
ctrl_dat->client_data);
safe_free(ctrl_dat);
return DISK_OK;
+#if BAD_IDEA
}
+#endif
}
| http://www.cs.ndsu.nodak.edu/~rousskov/research/cache/squid/profiling/stats/ruu/swapin.bug.html |