[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: CalendarScheduler and "backwards in time"



Brad,

I've heard about this problem from other people, but never experienced
it myself.  I tried to make up a small script to reproduce it, but
without much success, even though it schedules a large number of
events and some of those are very closely spaced.  I didn't get the
error message no matter what, even when I set MIN_WIDTH to 1.0e-10.
Could it be that it's hardware specific?  I've seen before that Sun's
FP is sometimes different from Intel's.  I'm using x86/87.  Which
platform did you use?

  -Yuri

Here is the script:

set NUM 100000 ;# events to schedule, actually four times that many
set ns [new Simulator]
set rng [new RNG]
set TIMEMIN 300 ;# random events are taken from U[TIMEMIN, TIMEMAX]
set TIMEMAX 303

set TIMEMAXMAX [expr $TIMEMAX * 3] ;#also add some more distant events
set sched {
for { set i 0} { $i<$NUM } { incr i } {
	set time [$rng uniform $TIMEMIN $TIMEMAX]
	$ns after $time {set x 1}
	set time [$rng uniform $TIMEMAX $TIMEMAXMAX]
	$ns after $time {set x 1}
}
}
eval $sched
$ns at $TIMEMAX "puts second_round...; $sched"
$ns run




Brad Karp <[email protected]> writes:

> I, too, have encountered the "backwards in time" problem with ns's
> CalendarScheduler. The CalendarScheduler is indeed not reliable for
> long-running simulations with a great many events that are scheduled very
> closely.
> 
> First: my code never schedules events with negative delays; this sanity
> check was the obvious first move.
> 
> I believe I've identified the underlying cause and have a crude fix. I'm
> working on a much better fix, that I'll be able to post code for in the next
> day or so.
> 
> The problem, as one might expect, is with numerical stability. The Calendar
> Queue dequeue algorithm repeatedly adds a bucket's width (width_ in ns) to
> the bucket upper boundary (buckettop_ in ns), as it scans across the "days"
> of the calendar. Because buckettop_ and width_ are floating-point values,
> there is opportunity for rounding error in buckettop_. Specifically, when the
> width_ is very small, and buckettop_ is large relative to width_, there can
> be insufficient precision to capture the small increment to buckettop_. This
> loss of precision is magnified over the millions of repetitions of this FP
> addition.
> 
> The CalendarScheduler code's author attempts to avoid the problem in two ways:
> 
> 	- When it computes width_ (bucket width), it won't choose a width_
> 	  that's less than the current clock_ value * some small constant
> 	  (MIN_WIDTH, 1.0e-6 in the ns implementation).  This doesn't let the
> 	  width_ be too small relative to the current clock_, and thus should
> 	  avoid loss of precision in the repeated additions mentioned above.
> 
> 	- It uses a buckettop_ that is actually (.5 * width_) larger than the
> 	  true value, in case there *is* rounding error in the repeated
> 	  additions to buckettop_.
> 
> Note that the former will worsen the insert performance of the queue--when
> invoked, it effectively forces more events into a bucket than the sampling
> suggested.
> 
> The latter, however, is not sufficient alone for all simulation run lengths
> and event patterns--the underflow can accumulate more than .5 * width_'s
> worth without control of the choice of width_.
> 
> I can observe the underflow in gdb. After 300 seconds of a simulation with
> very many densely spaced events, I observe that the minimum width_ is always
> used, and that there are events in the current bucket with timestamps
> *less* than the bottom of the current bucket (this occurs after "wrapping
> around the year" when there is sufficient underflow--you'll leave a bucket
> "too early").
> 
> If I had time, I might look at the IEEE FP representation and see how these
> values look with regard to the representation's precision. But this isn't
> a 1st-year grad course in numerical computation. :-)
> 
> If I change MIN_WIDTH to 1.0e-5, the error vanishes. But this worsens enqueue
> performance significantly.
> 
> I'm preparing a better fix now, and will post code for it shortly. The
> better fix is to keep an unsigned long int count of "bucket hops", and
> periodically set buckettop_ using direct multiplication of width_ * hops_.
> Done occasionally (i.e., every hundred thousand or more bucket hops), this
> will pull buckettop_ back to its true value, and keep the underflow of the
> repeated FP additions from growing without bound.
> 
> -Brad, [email protected]