Releases: oshied/directord
Directord 0.12.0
Hot on the heals of 0.11.3 a new major has dropped! The default driver is now gRPC. While there's nothing wrong with the ZMQ implementation and we'll be supporting it for the foreseeable future, gRPC allows Directord to be used in FIPS certified environments; this is not possible with ZMQ due to it's use of libsodium
. The 0.12.0 release provides the means to run in secure clouds by default, while maintaining the performance and speed we've grown to expect.
What's Changed
- Rev 0120 by @cloudnull in #378
- Add exclude option to DNF component by @sshnaidm in #379
- Spec file and systemd packaging updates by @slagle in #381
- Allow drivers to run in isolation by @cloudnull in #380
- allow the driver to run in dummy mode by @cloudnull in #382
Full Changelog: 0.11.3...0.12.0
Directord 0.11.3
A new era for Directord: new capabilities, new components, new functions, new classes; just a better tool.
The changes included in this release of Directord are staggering. It really should be a major version however, we're keeping that for a little bit later, largely because I forgot to rev things. Internally just about everything has been improved. From a more robust process/thread model and better isolation, to a whole new driver capability. All of these changes come without cost to performance and stability, in-fact we've improved performance by about 5% over the last release.
This release also comes with some assurances to our claimed scale expectations. While we documented our systems and processes on https://directord.com and covered the internals, setup, and expected performance on YouTube, we've now scale tested Directord at 150 nodes and the results were incredible. The Basic Task-Core POC applied to 150 nodes took ~11 minutes to complete while using both the ZMQ and GRPC drivers. The Messaging driver accomplished the same task in 18 minutes. In contrast, our legacy deployment tooling took 45 minutes to do the same work.
So with all that said, checkout the release notes; there's SO much going on. The team is growing, we're adding contributors, and the project is making some incredible moves.
What's Changed
- Add exposed message ID to heartbeats by @cloudnull in #262
- Prepare dev-setup for CentOS 9 by @sshnaidm in #263
- cleanup default dev catalog by @cloudnull in #264
- Add option to cache STDERR to RUN component by @sshnaidm in #265
- Add hostname to Containerfile for tests as it's in Dockerfile by @sshnaidm in #266
- Add identity override to config by @mwhahaha in #268
- Update docs for RUN component by @sshnaidm in #267
- Add option to name orchestrations and jobs by @cloudnull in #270
- Allow to set debug from environment variable by @sshnaidm in #272
- Use --best in DNF component for install or update by @sshnaidm in #273
- Fix issues in components by @sshnaidm in #274
- Add option to allow orchestrations to override targets by @cloudnull in #271
- Fix messaging bootstrap for multiple nodes by @slagle in #259
- Migrate to directord organization by @kajinamit in #277
- Remove unnecessary characters by @kajinamit in #276
- Change --server-address to --zmq-server-address for container and docs by @sshnaidm in #278
- Remove the diskcache dep by @cloudnull in #279
- Add option to allow operators to set the machine id by @cloudnull in #280
- Add several small changes to tune scale testing by @cloudnull in #281
- remove extra print by @cloudnull in #282
- Add CONTAINER_IMAGE component to work with podman images by @sshnaidm in #275
- Fix TLS verify for all podman code by @sshnaidm in #283
- Run full functional tests for CONTAINER_IMAGE component by @sshnaidm in #284
- Make stdout and stderr args available for any component by @sshnaidm in #285
- Connect to client to get hostname by @slagle in #287
- ensure cacheargs is used in all components by @cloudnull in #286
- fix node pruning by @cloudnull in #288
- Job interaction improvements by @cloudnull in #289
- updating timings by @cloudnull in #290
- add poller to client job results by @cloudnull in #291
- Update disc store to be POSIX compliant by @cloudnull in #292
- add functional test for posix datastore by @cloudnull in #294
- add bootstrap to the Directord library implementation by @cloudnull in #293
- additional updates for POSIX cache types by @cloudnull in #295
- Updated docs by @cloudnull in #296
- fix bootstrap server targets by @cloudnull in #298
- Add orch file for provisioning clients only by @sshnaidm in #297
- Fix issue when no jobs for target by @sshnaidm in #299
- update file store by @cloudnull in #300
- add exception handling for bootstrap by @cloudnull in #301
- add more exception handling by @cloudnull in #302
- Re-work query to use coordination instead of client side callbacks by @cloudnull in #303
- Rev0113 by @cloudnull in #304
- add additional error handling for query call backs by @cloudnull in #305
- add prod-bootstrap and blueprint to query wait by @cloudnull in #306
- update readme by @cloudnull in #307
- add cache read lock by @cloudnull in #308
- Fix wait option handling by @mwhahaha in #309
- Fix status code check by @mwhahaha in #310
- Increase default wait retries by @mwhahaha in #311
- add retry decorator to components by @cloudnull in #312
- update machine checking and messaging workers by @cloudnull in #313
- gRPC driver by @mwhahaha in #314
- Fixes for grpcd backend by @mwhahaha in #317
- Add request id to grpc requests and responses by @mwhahaha in #318
- use threading instead of multiprocessing by @cloudnull in #316
- add grpc gate test by @cloudnull in #315
- Add coroutine timeout decorator by @cloudnull in #319
- bootstrap requires the use of multiprocessing by @cloudnull in #320
- ensure that drivers use process based locks by @cloudnull in #321
- Ensure components have unique locks by @cloudnull in #322
- remove coroutine timeout by @cloudnull in #323
- Reduce the debug logging for grpcd by @mwhahaha in #324
- Ensure events are driver specific by @cloudnull in #325
- Grpc increase wait and enable compression by @mwhahaha in #326
- reimplement timeout coroutine by @cloudnull in #327
- Fix disable compression default by @mwhahaha in #328
- Remove messaging drivers entrypoint by @slagle in #329
- Wire up ssl support for grpc by @mwhahaha in #330
- Create thread exception class and terminate events by @cloudnull in #331
- Increase file limits for the server by @mwhahaha in #333
- Only create a single client instance by @mwhahaha in #334
- Add durable queue type option for clients by @cloudnull in #335
- Add exception handling to client execution by @cloudnull in #336
- Add C++ compiler for grpcio deps build by @sshnaidm in #340
- Skip client close on job close by @mwhahaha in #339
- Revert "Add durable queue type option for clients" by @cloudnull in #341
- Add grpc scripts to packaging by @mwhahaha in #342
- Packaging updates by @slagle in #344
- Cover grpc driver with tests by @mwhahaha in #34...
Directord 0.11.0
Release 11, feature packed, cleaner, a new driver, and is lighter than ever before.
Highlights
This release introduces the new oslo-messaging driver, allowing Directord to operate in a traditional AMQP environment. This change is crucial to our success as we want to empower operators to leverage Directord in their existing environments, without needing to augment or change platforms. If operators have a messaging backend supported by OSLO-Messaging, Directord can make use of it today.
This release also cleans up a lot of the Directord legacy encoding. Before encoding was done throughout the code-base, now encoding is all done within the driver. This means the functional code within Directord is far more simple, better documented, and easier to understand.
TripleO PTG
Directord is being discuessed as part of the TripleO Yoga PTG. Checkout the PTG notes and sessions for more.
Slides from the Directord Overview PTG session can be seen within the PDF attached to this release here.
e7d8015 Add generic wait component
8d775ab Fix typo in README
ae023be Dyn drivers 2
cd59fcd add dynamic driver parsing to the help output
e0838d1 Update the dynamic driver parser
95f43ae Add SSL support for messaging driver
4855b87 add easy local doc generation and browsing
a24701f add job definitions to the bootstrap process
1d7fde5 update data-store options and documentation
5cd1f33 Update push.yml
99cb2fe Add credit loop to pollers
402ea40 Add message driver analysis
06a1d93 Create CNAME
0c12d8d Delete CNAME
9e3a069 Change the job processor to prioritize messages
cd7a267 add link
73df12d reformat
1ddda11 more doc updates
4abe45e add setup section
737bece add updated overview
5318e70 Driver docs
2354af5 Added driver-messaging.drawio.png
9ea6934 add bootstrap catalog for the messaging driver
5c8f8d7 fix bug 235
3b6c47e add missing abstract methods from messaging
ce71c9e add flake8 docstring tests
b144793 Messaging thread cleanup and job support
4a23912 Fix messaging heartbeat
8016e6d UX imporovements
09d696a updated diagram and docs
acf6985 Added highlevel-messaging.png
a9b68be Add driver_run in it's own process
6c3e2af add hostname fencing
9910714 add starting documentation for messaging and tweak the driver
3f3d053 more updates to support our simplified encoding process
49a886c Make CLI args override config file
e529792 Driver api
45ed933 Degrated -> Degraded
d8edb42 Update the messaging abstractions
e7fbc04 rev 0.10.1
190372f Add support for oslo-messaging as a driver
What's Changed
- Add support for oslo-messaging as a driver by @slagle in #213
- rev 0.10.1 by @cloudnull in #220
- Update the messaging abstractions by @cloudnull in #221
- Degrated -> Degraded by @slagle in #224
- Driver api by @cloudnull in #222
- Make CLI args override config file by @slagle in #223
- more updates to support our simplified encoding process by @cloudnull in #225
- add starting documentation for messaging and tweak the driver by @cloudnull in #226
- add hostname fencing by @cloudnull in #228
- Add driver_run in it's own process by @slagle in #227
- updated diagram and docs by @cloudnull in #229
- UX imporovements by @cloudnull in #230
- Fix messaging heartbeat by @slagle in #231
- Messaging thread cleanup and job support by @slagle in #232
- D102 updates by @cloudnull in #233
- add missing abstract methods from messaging by @cloudnull in #234
- fix bug 235 by @cloudnull in #236
- add bootstrap catalog for the messaging driver by @cloudnull in #237
- Driver docs by @cloudnull in #238
- analysis documentation updates by @cloudnull in #239
- add setup section by @cloudnull in #240
- more doc updates by @cloudnull in #241
- reformat by @cloudnull in #242
- add link by @cloudnull in #243
- Change the job processor to prioritize messages by @cloudnull in #244
- Add message driver analysis by @cloudnull in #245
- Add credit loop to pollers by @cloudnull in #246
- update data-store options and documentation by @cloudnull in #248
- add job definitions to the bootstrap process by @cloudnull in #249
- add easy local doc generation and browsing by @cloudnull in #250
- Add SSL support for messaging driver by @slagle in #247
- Update the dynamic driver parser by @cloudnull in #251
- add dynamic driver parsing to the help output by @cloudnull in #252
- Dyn drivers 2 by @cloudnull in #253
- Fix typo in README by @slagle in #254
- Add generic wait component by @mwhahaha in #255
Full Changelog: 0.10.0...0.11.0
Directord 0.10.0
The 0.10.0 release is the most significant Directord release since starting the project. Over this last development cycle, we've focused on use-cases and feedback from operators who are deploying complex applications. We've had an ongoing goal of pseudo-real-time execution, which scales horizontally. While more improvements are to be made in future releases, Directord is now close to the original goal of pseudo-real-time performance in both practice and test.
Highlights From This Development Cycle
- Directord is now faster than ever, approaching pseudo-real-time execution with a minimal memory footprint.
- The client and server codebase has been massively simplified.
- New in this release is the ability to do client-side coordination, allowing operators to craft complex components and build out job assurances that have intra-client dependencies.
- An example of coordination can be seen in the
JOB_WAIT
component.
- An example of coordination can be seen in the
- New data integrity checks have been added for file transfer operations.
- The ADD and COPY component has been re-written.
- No longer does Directord require the backend socket remain open while the client is running.
- The client will connect back to the server over the backend socket only when needed.
- The heartbeat socket and thread have been removed. While Directord still uses heartbeats, the messages travel over the one job socket.
- This clean-up removed two PIDs and vast chunks of code.
- The client and server will now fork when needing to ingest or run jobs.
- This change better ensures applications efficiency and minimizes resource consumption. While resource consumption was already low, it is now even lower.
- The client will now use dynamic command-based locking, which only resides in memory for as long as there are jobs to process.
- Before, Directord employed a global lock when required, now components make use of their named lock object, which further improves the speed of component execution. The speed improvements from the component locking changes are even more pronounced when leveraging
async
orchestrations.
- Before, Directord employed a global lock when required, now components make use of their named lock object, which further improves the speed of component execution. The speed improvements from the component locking changes are even more pronounced when leveraging
- The management function now provides an analysis tool, which will allow operators to analyze jobs and parents.
- This is useful for determining node outliers, runtime issues, and other fun facts.
- The command line
orchestrate
andexec
functions now have a--stream
option which will stream STDOUT/STDERR/INFO as it becomes available during execution.
While these highlights are excellent, there's a lot improved in Directord that was not mentioned, and more yet to come.
2e38bc9 add analysis function
1ff14ab remove heartbeat methods that no longer serve any purpose
26896ad cleanup management function
f8ce379 ensure efficient cleanup of dynamic locks
abb4ac5 Add {posargs} to tox coverage command
832acac add dynamic command based locking
386fce5 rollback dynamic locking
0466570 move callback processing to ensure multi-return for specific nodes is right
1730909 add debug to lock creation
6e87a2f ensure that the processing state is set correctly
5fbbfc9 allow commands to run with the global lock when force-lock is true
4d469ce add command type locking
ac9cabe allow async workers to run with the current cpu count
d763fc3 remove additional counts in favor of timing
fe18346 use multiple returns when running a callback
41dc5f1 use timing instead of loop counts
1bd5f17 move return notice to the end of the execution
5d10054 slow down the query_wait log warning
409cb64 fix minor issues with documentation
1cf2df4 Improve job wait and target coordination
85f1c16 when waiting on callbacks, just block on the last one
fcb7a3e use 1 second delays where possible
98b3cd8 update JOB_WAIT to use new relay
8ca362b add coordination relay
aa83856 add identity list to QUERY callback
3fff576 re-update the queue processor
fcb9610 finish moving transfer to backend
698e270 add job-wait coordination
a05074b Revert "Improve client processor"
89651f3 Fix coordination issues
009f7c4 move the transfer bind to a backend bind
8ba9804 Remove the use of the server side heartbeat socket
62bb591 add delay as an Event property
3f5da26 Improve client processor
a995e78 remove the healthcheck thread
8606f5b Add identity checks for query wait
ae6e3cf Add functional testing and improve process management
043ff6b Enhance our usage of dynamic threads and high watermark monitoring
4a37d12 Save a reference to zmq Driver, and restore it for each unit test
5a420e3 add bypass manager set
58cc064 Stream and callback improvements
202cad0 Server return and async tracing
67132a4 add timestamps to parent pruning
Directord 0.9.4
Directord 0.9.2
Faster client and server interactions and a promoted datastore option allowing Directord to run faster with even fewer resources.
Overview
The new disc datastore, now default in Directord, allows Directord to resume operations faster from a stopped state, it also allows Directord store jobs persistently without consuming system memory or requiring a remote datastore. Another incredible feature of the new datastore is speed. Directord is now able to store, reference, and recall information faster than ever before; this is especially true when dealing with tens of thousands of jobs and orchestrations.
CHANGES
63ffc6d remove artificial time blocks
9cc55b5 Add disc backed data caching for the server
Directord 0.9.1
Bug fixes, ennoblement of promoted components, and performance enhancements.
1f58c2b use service component when restarting services (#180)
a773663 Add a queue sentinel component (#179)
10396ec add force-lock option (#178)
f8101de Update query wait to account for null args (#177)
51a3c2c add catalog to assist in patching development clients (#176)
f7eb207 ensure we're only blocking for tasks that have ended or failed (#175)
71f3ed3 add timestamps to job and component returns (#174)
4affefb move component lock to a global (#173)
c6ca1ea add info when processing workdir (#172)
8ca5ab1 faster server processing (#169)
b8d2274 add more accurate time data (#168)
a249494 add syntax highlighting to readme (#166)
9d394c3 ensure that the stdout-arg target is defined (#165)
Directord 0.9.0
Directord is now more capable, efficient, and faster that ever before.
Directord is able to connect and operate in FIPS enabled environments.
- libssh2 in our bootstrap process was removed in-favor of libssh.
The user experienced has also been overhauled to provide more useful information at a glance.
- Improved manage commands
- easier to understand logging
- faster information access, even when sorting hundreds of thousands of tasks
In terms of operations, client side logging has been made more robust.
- You can now enable debug and understand most all interactions and follow a process linearly.
Client side execution has also been made massively more efficient.
- No more managers
- Simplified threading
- Intelligent class based locking.
A lot more testing. With the implementation of ASYNC we've added tests to ensure systems work as expected
- Async query tests
- Async race condition tests
- Async caching tests
The comparative analysis tests have also been updated to now show orchestrations with async
enabled.
Directord 0.8.1
Directord now supports asynchronous orchestrations, more components, component threading, updated documentation, and more.
Lots of fantastic changes have gone into Directord, which have all been precipitated by user feedback and increased usage. The project is better tested than ever before and making huge moves toward a rock-solid 1.0 release. While the primary reason for this release is the restructured client code, which enabled asynchronous orchestrations, the fact is this release is jam packed full of exciting features and boring stability improvements.
Directord 0.7.3
Many significant improvements have been implemented within Directord.
- Components, drivers, and data-stores are all modular
- Tests have been significantly improved
- Directord now builds and ships RPM packages
- UX improvements have been made
- Fingerprinting now uses SHA256
- Paramiko has been removed in-favor of ssh2-python; this change alone ensures we're meeting and exceeding our security commitment, even if SSH based bootstrapping is optional.
- Cache file semantics have been improved and documented
- Prod scripts will now install using RPM when running on an RPM based system
- Dev scripts more easily install Directord in debug mode for simpler development.
and more...