Directord 0.10.0
The 0.10.0 release is the most significant Directord release since starting the project. Over this last development cycle, we've focused on use-cases and feedback from operators who are deploying complex applications. We've had an ongoing goal of pseudo-real-time execution, which scales horizontally. While more improvements are to be made in future releases, Directord is now close to the original goal of pseudo-real-time performance in both practice and test.
Highlights From This Development Cycle
- Directord is now faster than ever, approaching pseudo-real-time execution with a minimal memory footprint.
- The client and server codebase has been massively simplified.
- New in this release is the ability to do client-side coordination, allowing operators to craft complex components and build out job assurances that have intra-client dependencies.
- An example of coordination can be seen in the
JOB_WAIT
component.
- An example of coordination can be seen in the
- New data integrity checks have been added for file transfer operations.
- The ADD and COPY component has been re-written.
- No longer does Directord require the backend socket remain open while the client is running.
- The client will connect back to the server over the backend socket only when needed.
- The heartbeat socket and thread have been removed. While Directord still uses heartbeats, the messages travel over the one job socket.
- This clean-up removed two PIDs and vast chunks of code.
- The client and server will now fork when needing to ingest or run jobs.
- This change better ensures applications efficiency and minimizes resource consumption. While resource consumption was already low, it is now even lower.
- The client will now use dynamic command-based locking, which only resides in memory for as long as there are jobs to process.
- Before, Directord employed a global lock when required, now components make use of their named lock object, which further improves the speed of component execution. The speed improvements from the component locking changes are even more pronounced when leveraging
async
orchestrations.
- Before, Directord employed a global lock when required, now components make use of their named lock object, which further improves the speed of component execution. The speed improvements from the component locking changes are even more pronounced when leveraging
- The management function now provides an analysis tool, which will allow operators to analyze jobs and parents.
- This is useful for determining node outliers, runtime issues, and other fun facts.
- The command line
orchestrate
andexec
functions now have a--stream
option which will stream STDOUT/STDERR/INFO as it becomes available during execution.
While these highlights are excellent, there's a lot improved in Directord that was not mentioned, and more yet to come.
2e38bc9 add analysis function
1ff14ab remove heartbeat methods that no longer serve any purpose
26896ad cleanup management function
f8ce379 ensure efficient cleanup of dynamic locks
abb4ac5 Add {posargs} to tox coverage command
832acac add dynamic command based locking
386fce5 rollback dynamic locking
0466570 move callback processing to ensure multi-return for specific nodes is right
1730909 add debug to lock creation
6e87a2f ensure that the processing state is set correctly
5fbbfc9 allow commands to run with the global lock when force-lock is true
4d469ce add command type locking
ac9cabe allow async workers to run with the current cpu count
d763fc3 remove additional counts in favor of timing
fe18346 use multiple returns when running a callback
41dc5f1 use timing instead of loop counts
1bd5f17 move return notice to the end of the execution
5d10054 slow down the query_wait log warning
409cb64 fix minor issues with documentation
1cf2df4 Improve job wait and target coordination
85f1c16 when waiting on callbacks, just block on the last one
fcb7a3e use 1 second delays where possible
98b3cd8 update JOB_WAIT to use new relay
8ca362b add coordination relay
aa83856 add identity list to QUERY callback
3fff576 re-update the queue processor
fcb9610 finish moving transfer to backend
698e270 add job-wait coordination
a05074b Revert "Improve client processor"
89651f3 Fix coordination issues
009f7c4 move the transfer bind to a backend bind
8ba9804 Remove the use of the server side heartbeat socket
62bb591 add delay as an Event property
3f5da26 Improve client processor
a995e78 remove the healthcheck thread
8606f5b Add identity checks for query wait
ae6e3cf Add functional testing and improve process management
043ff6b Enhance our usage of dynamic threads and high watermark monitoring
4a37d12 Save a reference to zmq Driver, and restore it for each unit test
5a420e3 add bypass manager set
58cc064 Stream and callback improvements
202cad0 Server return and async tracing
67132a4 add timestamps to parent pruning