Profiling Requests in Squid

This page describes a patch written for Squid caching proxy. The patch code profiles major states of individual client requests. Raw measurements are recorded in log files for further processing.

What do I get from the patch?

The patch enables Squid to record per-request measurements of network and disk I/O activity. These measurements are essential in identifying and quantifying bottleneck(s) in Squid. The patch allows to compare squid-induced delays with those of network. You can compare Squid performance on hits with misses, measure the effectiveness of IMS requests, analyze the performance of memory buffer for "hot" objects, etc.

We have some performance studies done with the patch and a nice example of how the patch helped to detect a performance problem.

Overall patch organization

The patch code consists of two files profile.h and profile.c and a few additions to the Squid original code. Profiling data is stored in state objects: icpStateData, request_t, and MemObject. (Squid does not have a single state object accessible on all stages of request processing. Thus, profiling data have to be stored in several places.) When the current state of a request changes, the profiling code records the duration of the state. This allows to time disk and network activities on per-request basis and with fine (usec) granularity. The measurements are then logged for further processing.

What is being profiled?

The following activities are being profiled.
ActivityMeasurementsCode Location
Client In
Accepting request from a client
Start: The socket accept timestamp icp.c: asciiHandleConn()
First delay: The time it takes to receive the first piece of data from a client. icp.c: clientReadRequest()
Total duration: The time it takes to receive all data from a client. icp.c: clientReadRequest(), icpStateFree()
Client Out
Sending data to a client
Start: The timestamp before the first network write is scheduled icp.c: icpSendMoreData(), icpGetHeadersForIMS(), icpSendERROR()
First delay: The time it takes to send the first piece of data to a client. icp.c: clientWriteComplete(), icpHandleIMSComplete(), icpSendERRORComplete()
Total duration: The time it takes to send all data to a client. icp.c: icpStateFree()
Server Out
Sending request to a primary or proxy server
Start: The timestamp before the first network write is scheduled. DNS lookup is not counted as Server Out activity. http.c: httpConnect()
First delay: The time it takes to connect to a server. No data is send yet. http.c: httpConnectDone()
Total duration: The total time it takes to send a request to a server. http.c: httpSendComplete(), httpStateFree()
Server In
Receiving response from a server
Start: The timestamp when request write is complete. We are ready to receive data from the server. http.c: httpSendComplete()
First delay: The time it takes to receive the first piece of data from a server. http.c: httpReadReply()
Total duration: The time it takes to receive all expecteddata from a server. We may not wait till the socket is closed because the client side does not wait for that to close its socket. http.c: httpReadReply(), httpStateFree()
Disk In
Swapping data into the memory buffer.
Start: The timestamp after file_open call, before starting swapping daemon. Note: Due to the timing schema in Squid, the squid "current" time should not change while the file is being opened with file_open? store.c: storeSwapInStart()
First delay: The time it takes to read the first piece of data from disk. store.c: storeSwapInHandle()
Total duration: The time it takes to read all data from disk and close the file. store.c: storeSwapInHandle()
Disk Out
Swapping data out from memory buffer to disk.
Start: The timestamp after file_open call, before starting swapping daemon. Note: Due to the timing schema in Squid, the squid "current" time should not change while the file is being opened with file_open? store.c: storeSwapOutStart()
First delay: The time it takes to write the first piece of data to disk. store.c: storeSwapOutHandle()
Total duration: The time it takes to write all data to disk and close the file. store.c: storeSwapOutHandle()

Changes in access.log and store.log

Measurements for all six activities are recorded for each request in the access.log file. Each activity has three measurements resulting in 18(!) entries. Here is the format:

<standard_entries> <client_in> <client_out>  <server_out> <server_in>  <disk_in> <disk_out>
Measurements for one activity are separated by slashes:
<start>/<first_delay>/<total_duration>

All measurements are in usec except for negative values of starts for disk_in and disk_out which are recorded in msec. The latter exception is done to measure, possibly long, time interval when an object resides in memory buffer prior to being requested again. All start times are relative to the start time of the request which can be calculated using timestamp and elapsed standard entries. "Not applicable" and "yet unknown" values are recorded as -2.

Here is an sample line from access.log:

873487336.614    111 123 TCP_CLIENT_REFRESH/200 7517 GET http://www.com/ - DIRECT/www.com text/html 0/55320/55320 72028/1395/38993  57173/3795/4674 61847/10181/26720  -2/-2/-2 88567/-2/-2
873500579.028      2 134.129.91.206 TCP_HIT/200 438 GET http://www.com/ - NONE/- image/gif 0/1453/1453 1453/1205/1205  -2/-2/-2 -2/-2/-2  -2/-2/-2 -38024/16356/16356

The patch also appends the first_delay/total_duration entries to SWAPIN and SWAPOUT standard store.log entries. RELEASE entries are appended with -2/-2. Here is a sample from store.log:

873497945.884 SWAPOUT  200 873497906 829637606        -2 image/gif 2186/2186 GET http://www.com/ 13240/13240
873497960.924 RELEASE  404        -2        -2        -2 text/html 0/53 GET http://www.com/ -2/-2
873497970.671 SWAPIN   200 873217707 872109850        -2 text/html 1718/1718 GET http://www.com/ 17709/17709

Limitations of the patch.

The patch currently does not measure server side activity for non-HTTP connections. Also, only client side of POST requests is profiled.

All time measurements rely on the current_time variable and do not call gettimeofday(3C). Thus, CPU cycles are saved but timing may not be precise if current_time is not synchronized with real time by Squid often enough.

Side effects of profiling.

There is no side effects whatsoever if profiling is disabled: No internal structures are changed, no calls to profiling routines are made, and nothing extra is logged.

If enabled, profiling has two major side effects. First, both log files become somewhat larger due to extra data recorded for each request. Second, several state structures become larger. However, StoreEntry structure is not modified. Thus, total memory growth is limited to memory resident (in-transit + hot) objects only. Squid does not seem to care much about the size of those structures anyway. :) We expect no increase in CPU load due to profiling.

Note that some log analyzing scripts might be confused by extra entries.

Source code and installation.

The diff -Nur patch code is here.

The following 13+2 files were altered+created:

lib/
    util.c      util.h
src/
Makefile.in
    http.c      http.h
     icp.c       icp.h
 profile.c   profile.h
               squid.h
    stat.c      stat.h
   store.c     store.h
                 url.h

Warning: You probably want to fix this performance bug in Squid before running our patch.

Apply the patch using the patch -p command (the -p option tells the patch program to create new files using full paths specified in the patch text). Then you will need to "realclean" and compile your code with the PROFILE_OBJECTS define set via src/Makefile. Note that PROFILE_OBJECTS is undefined by default.

The src/Makefile.in file was patched to add new files and the PROFILE_OBJECTS option (undefined by default). Changes in src/Makefile.in should be automatically applied to your current Makefile when you compile the patched code the first time.

Analysis of collected data.

Analysis tools and their results are described elsewhere.