Directord 0.11.3
A new era for Directord: new capabilities, new components, new functions, new classes; just a better tool.
The changes included in this release of Directord are staggering. It really should be a major version however, we're keeping that for a little bit later, largely because I forgot to rev things. Internally just about everything has been improved. From a more robust process/thread model and better isolation, to a whole new driver capability. All of these changes come without cost to performance and stability, in-fact we've improved performance by about 5% over the last release.
This release also comes with some assurances to our claimed scale expectations. While we documented our systems and processes on https://directord.com and covered the internals, setup, and expected performance on YouTube, we've now scale tested Directord at 150 nodes and the results were incredible. The Basic Task-Core POC applied to 150 nodes took ~11 minutes to complete while using both the ZMQ and GRPC drivers. The Messaging driver accomplished the same task in 18 minutes. In contrast, our legacy deployment tooling took 45 minutes to do the same work.
So with all that said, checkout the release notes; there's SO much going on. The team is growing, we're adding contributors, and the project is making some incredible moves.
What's Changed
- Add exposed message ID to heartbeats by @cloudnull in #262
- Prepare dev-setup for CentOS 9 by @sshnaidm in #263
- cleanup default dev catalog by @cloudnull in #264
- Add option to cache STDERR to RUN component by @sshnaidm in #265
- Add hostname to Containerfile for tests as it's in Dockerfile by @sshnaidm in #266
- Add identity override to config by @mwhahaha in #268
- Update docs for RUN component by @sshnaidm in #267
- Add option to name orchestrations and jobs by @cloudnull in #270
- Allow to set debug from environment variable by @sshnaidm in #272
- Use --best in DNF component for install or update by @sshnaidm in #273
- Fix issues in components by @sshnaidm in #274
- Add option to allow orchestrations to override targets by @cloudnull in #271
- Fix messaging bootstrap for multiple nodes by @slagle in #259
- Migrate to directord organization by @kajinamit in #277
- Remove unnecessary characters by @kajinamit in #276
- Change --server-address to --zmq-server-address for container and docs by @sshnaidm in #278
- Remove the diskcache dep by @cloudnull in #279
- Add option to allow operators to set the machine id by @cloudnull in #280
- Add several small changes to tune scale testing by @cloudnull in #281
- remove extra print by @cloudnull in #282
- Add CONTAINER_IMAGE component to work with podman images by @sshnaidm in #275
- Fix TLS verify for all podman code by @sshnaidm in #283
- Run full functional tests for CONTAINER_IMAGE component by @sshnaidm in #284
- Make stdout and stderr args available for any component by @sshnaidm in #285
- Connect to client to get hostname by @slagle in #287
- ensure cacheargs is used in all components by @cloudnull in #286
- fix node pruning by @cloudnull in #288
- Job interaction improvements by @cloudnull in #289
- updating timings by @cloudnull in #290
- add poller to client job results by @cloudnull in #291
- Update disc store to be POSIX compliant by @cloudnull in #292
- add functional test for posix datastore by @cloudnull in #294
- add bootstrap to the Directord library implementation by @cloudnull in #293
- additional updates for POSIX cache types by @cloudnull in #295
- Updated docs by @cloudnull in #296
- fix bootstrap server targets by @cloudnull in #298
- Add orch file for provisioning clients only by @sshnaidm in #297
- Fix issue when no jobs for target by @sshnaidm in #299
- update file store by @cloudnull in #300
- add exception handling for bootstrap by @cloudnull in #301
- add more exception handling by @cloudnull in #302
- Re-work query to use coordination instead of client side callbacks by @cloudnull in #303
- Rev0113 by @cloudnull in #304
- add additional error handling for query call backs by @cloudnull in #305
- add prod-bootstrap and blueprint to query wait by @cloudnull in #306
- update readme by @cloudnull in #307
- add cache read lock by @cloudnull in #308
- Fix wait option handling by @mwhahaha in #309
- Fix status code check by @mwhahaha in #310
- Increase default wait retries by @mwhahaha in #311
- add retry decorator to components by @cloudnull in #312
- update machine checking and messaging workers by @cloudnull in #313
- gRPC driver by @mwhahaha in #314
- Fixes for grpcd backend by @mwhahaha in #317
- Add request id to grpc requests and responses by @mwhahaha in #318
- use threading instead of multiprocessing by @cloudnull in #316
- add grpc gate test by @cloudnull in #315
- Add coroutine timeout decorator by @cloudnull in #319
- bootstrap requires the use of multiprocessing by @cloudnull in #320
- ensure that drivers use process based locks by @cloudnull in #321
- Ensure components have unique locks by @cloudnull in #322
- remove coroutine timeout by @cloudnull in #323
- Reduce the debug logging for grpcd by @mwhahaha in #324
- Ensure events are driver specific by @cloudnull in #325
- Grpc increase wait and enable compression by @mwhahaha in #326
- reimplement timeout coroutine by @cloudnull in #327
- Fix disable compression default by @mwhahaha in #328
- Remove messaging drivers entrypoint by @slagle in #329
- Wire up ssl support for grpc by @mwhahaha in #330
- Create thread exception class and terminate events by @cloudnull in #331
- Increase file limits for the server by @mwhahaha in #333
- Only create a single client instance by @mwhahaha in #334
- Add durable queue type option for clients by @cloudnull in #335
- Add exception handling to client execution by @cloudnull in #336
- Add C++ compiler for grpcio deps build by @sshnaidm in #340
- Skip client close on job close by @mwhahaha in #339
- Revert "Add durable queue type option for clients" by @cloudnull in #341
- Add grpc scripts to packaging by @mwhahaha in #342
- Packaging updates by @slagle in #344
- Cover grpc driver with tests by @mwhahaha in #345
- Fix query information part by @sshnaidm in #346
- Add facter component for collections of facts on the node by @sshnaidm in #332
- Add --reloaded to service component by @mwhahaha in #349
- fix queue purge by @cloudnull in #350
- use pre-fork signals to allow exit by @cloudnull in #351
- Cleanup tests to remove the crazy output by @cloudnull in #353
- Packaging updates by @slagle in #352
- add trap for driver load errors by @cloudnull in #355
- move key generation to the zmq driver by @cloudnull in #356
- Add support for multiple words for echo by @sshnaidm in #357
- Replace the cache class with iodict by @cloudnull in #347
- DurableQueue by @cloudnull in #348
- move flushqueue to library by @cloudnull in #358
- Move the worker items into a object by @cloudnull in #359
- Add server side logic to mark nodes active by @cloudnull in #360
- Tuneup by @cloudnull in #361
- reintegrate iodict by @cloudnull in #362
- move signals to the process interface by @cloudnull in #364
- Drop server side query timeout by @mwhahaha in #366
- remove extra verbose log lines from components by @cloudnull in #367
- Use individual args for bootstrap interface by @slagle in #368
- Use sudo in rpm and config bootstrap by @slagle in #369
- Add bootstraps to manage/unmanage the cluster by @slagle in #370
- Update drivers doc for grpcd by @mwhahaha in #371
- Minor updates to grpc ssl docs by @mwhahaha in #372
- Fix help message for service by @sshnaidm in #374
- Add mask/unmask option to service component by @sshnaidm in #375
- add updated grpcd diagram and driver status info by @cloudnull in #376
New Contributors
- @kajinamit made their first contribution in #277
Full Changelog: 0.11.0...0.11.3