| Profiling Requests in Squid |
|---|
This page describes a patch written for Squid caching proxy. The patch code profiles major states of individual client requests. Raw measurements are recorded in log files for further processing.
The patch enables Squid to record per-request measurements of network and disk I/O activity. These measurements are essential in identifying and quantifying bottleneck(s) in Squid. The patch allows to compare squid-induced delays with those of network. You can compare Squid performance on hits with misses, measure the effectiveness of IMS requests, analyze the performance of memory buffer for "hot" objects, etc.
We have some performance studies done with the patch and a nice example of how the patch helped to detect a performance problem.
The patch code consists of two files profile.h and
profile.c and a few additions to the Squid original code.
Profiling data is stored in state objects:
icpStateData, request_t, and MemObject.
(Squid does not have a single state object accessible on all stages of request
processing. Thus, profiling data have to be stored in several places.) When
the current state of a request changes, the profiling code records the
duration of the state. This allows to time disk and network activities on
per-request basis and with fine (usec) granularity. The
measurements are then logged for further processing.
The following activities are being profiled.
| Activity | Measurements | Code Location |
|---|---|---|
| Client In Accepting request from a client |
Start: The socket accept timestamp |
icp.c: asciiHandleConn() |
| First delay: The time it takes to receive the first piece of data from a client. | icp.c: clientReadRequest() |
|
| Total duration: The time it takes to receive all data from a client. | icp.c: clientReadRequest(), icpStateFree() |
Client Out Sending data to a client |
Start: The timestamp before the first network write is scheduled | icp.c: icpSendMoreData(), icpGetHeadersForIMS(), icpSendERROR() |
| First delay: The time it takes to send the first piece of data to a client. | icp.c: clientWriteComplete(), icpHandleIMSComplete(), icpSendERRORComplete() |
|
| Total duration: The time it takes to send all data to a client. | icp.c: icpStateFree() |
Server Out Sending request to a primary or proxy server |
Start: The timestamp before the first network write is scheduled. DNS lookup is not counted as Server Out activity. | http.c: httpConnect() |
| First delay: The time it takes to connect to a server. No data is send yet. | http.c: httpConnectDone() |
|
| Total duration: The total time it takes to send a request to a server. | http.c: httpSendComplete(), httpStateFree() |
Server In Receiving response from a server |
Start: The timestamp when request write is complete. We are ready to receive data from the server. | http.c: httpSendComplete() |
| First delay: The time it takes to receive the first piece of data from a server. | http.c: httpReadReply() |
|
| Total duration: The time it takes to receive all expecteddata from a server. We may not wait till the socket is closed because the client side does not wait for that to close its socket. | http.c: httpReadReply(), httpStateFree() | Disk In Swapping data into the memory buffer. |
Start: The timestamp after file_open call, before
starting swapping daemon. Note: Due to the timing schema in Squid, the
squid "current" time should not change while the file is being
opened with file_open? |
store.c: storeSwapInStart() |
| First delay: The time it takes to read the first piece of data from disk. | store.c: storeSwapInHandle() |
|
| Total duration: The time it takes to read all data from disk and close the file. | store.c: storeSwapInHandle() |
Disk Out Swapping data out from memory buffer to disk. |
Start: The timestamp after file_open call, before
starting swapping daemon. Note: Due to the timing schema in Squid, the
squid "current" time should not change while the file is being
opened with file_open? |
store.c: storeSwapOutStart() |
| First delay: The time it takes to write the first piece of data to disk. | store.c: storeSwapOutHandle() |
|
| Total duration: The time it takes to write all data to disk and close the file. | store.c: storeSwapOutHandle() |
access.log and store.logMeasurements for all six activities are recorded
for each request in the access.log file. Each activity has three
measurements resulting in 18(!) entries. Here is the format:
<standard_entries> <client_in> <client_out> <server_out> <server_in> <disk_in> <disk_out>Measurements for one activity are separated by slashes:
<start>/<first_delay>/<total_duration>
All measurements are in usec except for negative values of
starts for disk_in and disk_out which
are recorded in msec. The latter exception is done to measure,
possibly long, time interval when an object resides in memory buffer prior to
being requested again. All start times are relative to the
start time of the request which can be calculated using timestamp
and elapsed standard entries. "Not applicable" and
"yet unknown" values are recorded as -2.
Here is an sample line from access.log:
873487336.614 111 123 TCP_CLIENT_REFRESH/200 7517 GET http://www.com/ - DIRECT/www.com text/html 0/55320/55320 72028/1395/38993 57173/3795/4674 61847/10181/26720 -2/-2/-2 88567/-2/-2
873500579.028 2 134.129.91.206 TCP_HIT/200 438 GET http://www.com/ - NONE/- image/gif 0/1453/1453 1453/1205/1205 -2/-2/-2 -2/-2/-2 -2/-2/-2 -38024/16356/16356
The patch also appends the first_delay/total_duration entries
to SWAPIN and SWAPOUT standard
store.log entries. RELEASE entries are appended with
-2/-2. Here is a sample from store.log:
873497945.884 SWAPOUT 200 873497906 829637606 -2 image/gif 2186/2186 GET http://www.com/ 13240/13240
873497960.924 RELEASE 404 -2 -2 -2 text/html 0/53 GET http://www.com/ -2/-2
873497970.671 SWAPIN 200 873217707 872109850 -2 text/html 1718/1718 GET http://www.com/ 17709/17709
The patch currently does not measure server side activity for non-HTTP
connections. Also, only client side of POST requests is
profiled.
All time measurements rely on the current_time variable and do
not call gettimeofday(3C). Thus, CPU cycles are saved but timing
may not be precise if current_time is not synchronized with real
time by Squid often enough.
There is no side effects whatsoever if profiling is disabled: No internal structures are changed, no calls to profiling routines are made, and nothing extra is logged.
If enabled, profiling has two major side
effects. First, both log files become somewhat larger due to extra data recorded for
each request. Second, several state structures become larger. However,
StoreEntry structure is not modified. Thus, total memory growth
is limited to memory resident (in-transit + hot) objects only. Squid does not
seem to care much about the size of those structures anyway. :)
We expect no increase in CPU load due to profiling.
Note that some log analyzing scripts might be confused by extra entries.
The diff -Nur patch code is here.
The following 13+2 files were altered+created:
lib/
util.c util.h
src/
Makefile.in
http.c http.h
icp.c icp.h
profile.c profile.h
squid.h
stat.c stat.h
store.c store.h
url.h
Warning: You probably want to fix this performance bug in Squid before running our patch.
Apply the
patch using the patch -p
command (the -p option tells the patch program to
create new files using full paths specified in the patch text). Then you will
need to "realclean" and compile your code with the
PROFILE_OBJECTS define set via src/Makefile.
Note that PROFILE_OBJECTS is undefined by default.
The src/Makefile.in file was patched to add new files and
the PROFILE_OBJECTS option (undefined by default). Changes in
src/Makefile.in should be automatically applied to your current
Makefile when you compile the patched code the first time.
Analysis tools and their results are described elsewhere.