41.4.3 PagePool/ProxyTrace

The above two page pool synthesize request stream to a single web page by two random variables: one for request interval, another for requested page ID. Sometimes users may want more complicated request stream, which consists of multiple pages and exhibits spatial locality and temporal locality. There exists one proposal (SURGE [3]) which generates such request streams, we choose to provide an alternative solution: use real web proxy cache trace (or server trace).

The class PagePool/ProxyTrace uses real traces to drive simulation. Because there exist many web traces with different formats, they should be converted into a intermediate format before fed into this page pool. The converter is available at http://mash.cs.berkeley.edu/dist/vint/webcache-trace-conv.tar.gz. It accepts four trace formats: DEC proxy trace (1996), UCB Home-IP trace, NLANR proxy trace, and EPA web server trace. It converts a given trace into two files: pglog and reqlog. Each line in pglog has the following format:

[<serverID> <URL_ID> <PageSize> <AccessCount>]

Each line, except the last line, in reqlog has the following format:

[<time> <clientID> <serverID> <URL_ID>]

The last line in reqlog records the duration of the entire trace and the total number of unique URLs:

i <Duration> <Number_of_URL>

PagePool/ProxyTrace takes these two file as input, and use them to drive simulation. Because most existing web proxy traces do not contain complete page modification information, we choose to use a bimodal page modification model [7]. We allow user to select $x\%$ of the pages to have one random page modification interval generator, and the rest of the pages to have another generator. In this way, it's possible to let $x\%$ pages to be dynamic, i.e., modified frequently, and the rest static. Hot pages are evenly distributed among all pages. For example, assume 10% pages are dynamic, then if we sort pages into a list according to their popularity, then pages 0, 10, 20, $\ldots$ are dynamic, rest are static. Because of this selection mechanism, we only allow bimodal ratio to change in the unit of 10%.

In order to distribute requests to different requestors in the simulator, PagePool/ProxyTrace maps the client ID in the traces to requestors in the simulator using a modulo operation.

PagePool/ProxyTrace has the following major OTcl methods:

rX get-poolsize & Returns the total number of pages.

get-duration & Returns the duration of the trace.

bimodal-ratio & Returns the bimodal ratio.

set-client-num num & Set the number of requestors in the simulation.

gen-request ClientID & Generate the next request for the given requestor.

gen-size PageID & Returns the size of the given page.

bimodal-ratio ratio & Set the dynamic pages to be ratio*10 percent. Note that this ratio changes in unit of 10%.

ranvar-dp ranvar & Set page modification interval generator for dynamic pages. Similarly, ranvar-sp ranvar sets the generator for static pages.

set-reqfile file & Set request stream file, as discussed above.

set-pgfile file & Set page information file, as discussed above.

gen-modtime PageID LastModTime & Generate next modification time for the given page.

An example of using PagePool/ProxyTrace is available at ns/tcl/ex/simple-webcache-trace.tcl.

Tom Henderson 2011-11-05