Releases: eXascaleInfolab/PyExPool
Releases · eXascaleInfolab/PyExPool
Chained Termination on Constraints Violation Fixed and Optimized
Features & Optimizations
- Threshold added on group vmem limit exceeding to reduce the number of reschedulings before the workers reduction and speedup heavy-loaded usecases
- Execution on the pure Python2 / PyPy without any dependences allowed, warnings are shown about the disabled functionality (vmlimit without
psutils
)
Fixes
- Termination of the chained jobs from the violating origin is fixed to not restart or postpone such jobs (even when they are already terminating with the restart flag)
- Logging to the implicit base directories fixed (filename without: "./" or full path specification)
Known Bugs
- Jobs rescheduling with
_CHAINED_CONSTRAINTS
kills related jobs that haveontimeout
flag and assumed to be restarted (should not been terminated at all)
Scheduling of the Spawning Processes
Features
- Kind of the evaluating virtual memory for the job is parameterized (origin process, heaviest spawned [sub]process, whole process tree of the origin)
By default the virtual memory is evaluated for the heaviest process in the process tree of the executing job.
It allows to use intermediate apps in the execution chain having valid memory constraints for the target app[s] that is assumed to the the heaviest. An example of job with intermediate process of time measuring (time
) that is not considered in the vmem constrains for the job:
find_job = Job(args=('time', 'find', '/etc', '-name', 'sh'))
Known Bugs
_LIMIT_WORKERS_RAM
causes huge degrade of the rescheduling performance when the worker processes meet the specified constraint- Jobs rescheduling with
_CHAINED_CONSTRAINTS
does not kill jobs related to the terminated origin if they are in the terminating state with requested restart or are rescheduled because of the group violation of the memory constraints
Load Balancing of Jobs with Chained Dependencies
Features
- Parameterized virtual memory constraints for each Job, optional guarantee of the in-RAM computations of all Jobs
- Chained rescheduling of the heavier Jobs with the same category to meet RAM limitation / timeout constraints
- Load balancing of the worker processes combined with jobs queue rescheduling, automatic reduction of the number of workers to compute heavier jobs withing the specified memory limit / in-RAM if jobs rescheduling does not help
- Unittests integrated
Fixes & Optimizations
- Forced termination of the job works fine even when SIG_TERM is ignored
- Lots of fixes and optimizations related to the scheduling
Known Bugs
_LIMIT_WORKERS_RAM
causes huge degrade of the rescheduling performance when the worker processes meet the specified constraint- Jobs rescheduling with
_CHAINED_CONSTRAINTS
does not kill jobs related to the terminated origin if they are in the terminating state with requested restart or are rescheduled because of the group violation of the memory constraints
Adjustments for NUMA, CPU cache and termination optimizations
Features
- Automatic CPU affinity management (warm cache for single-threaded processes)
- CPU cache adjustment (parallelization vs cache size)
- NUMA architecture considered (nodes of CPUs, CPU cores, HW threads)
- Execution Pool latency parameterized
Fixes & Optimizations
- Processing of the terminating jobs speeded up
- Workers deletion fixed (zombie workers eliminated on job restart)
Known bugs
- Has issues when the executing process can't be terminated gracefully, fixed in the next release
- Issues in the logical CPUs enumeration prevent cache maximization, fixed since v2.1-MultiprocAfn