Skip to content

Commit

Permalink
sched: fix a scheduling issue
Browse files Browse the repository at this point in the history
The original code assumes the last 4 bits of the CPU cycle count is
uniformly distributed, but that is not true, at lease Intel IceLake
Intel(R) Xeon(R) Platinum 8369B CPU @ 2.70GHz, the CPU cycle is always
ODD number. This fact will result expensive ops are frequently scheduled
to signle thread, which will greatly increase the RT time (in custom
scenario, from ~30ms to ~45ms).

Signed-off-by: Xiaoguang Wu <zhongjian.wxg@alibaba-inc.com>
  • Loading branch information
Xiaoguang Wu committed Jan 31, 2024
1 parent 5eabe5f commit 22b9d59
Showing 1 changed file with 5 additions and 4 deletions.
9 changes: 5 additions & 4 deletions tensorflow/core/common_runtime/executor.cc
Original file line number Diff line number Diff line change
Expand Up @@ -730,15 +730,16 @@ Status ExecutorState<PropagatorStateType>::ProcessSync(

} else if (kernel_stats_->HasExpensiveMarker(item)) {
KernelTimer timer;
static uint64 update_counter = 0;
device->Compute(op_kernel, &ctx);
// For expensive kernels, always update the cost estimate. For inexpensive
// kernels, update the cost estimate with ~1/16 probability. This assumes
// that the last 4 bits of the CPU cycle count is uniformly distributed.

constexpr int kKernelExecutionTrackingInvocationSkipCount = 16;
if (is_expensive ||
timer.start_cycles % kKernelExecutionTrackingInvocationSkipCount == 0) {
update_counter % kKernelExecutionTrackingInvocationSkipCount == 0) {
kernel_stats_->UpdateCostEstimate(item, timer.ElapsedCycles());
}

update_counter++;
} else {
device->Compute(op_kernel, &ctx);
}
Expand Down

0 comments on commit 22b9d59

Please sign in to comment.