[Feature] Configuration for runtime "priority" among models with satisfied dependencies #10632

owenprough-sift · 2024-08-29T13:04:48Z

Is this your first time submitting a feature request?

I have read the expectations for open source contributors
I have searched the existing issues, and I could not find an existing issue for this feature
I am requesting a straightforward extension of existing dbt functionality, rather than a Big Idea better suited to a discussion

Describe the feature

Problem Statement

dbt runs models in DAG order, which is functionally correct. But there are situations¹ where it would be helpful to have more control over the relative execution order of models within a run. For example: in a run which includes a long-running model with no upstream dependencies but many downstream dependencies it would be helpful to start the long-running model first to minimize total run time.

Proposed Solution

A new execution_order configuration which allows you to specify the relative execution order of selected resources.
At runtime, dbt would:

Determine the set of resources whose dependencies are satisfied (aka "run in DAG order")
Within that set, run the resources ordered by execution_order (nulls last), falling back to whatever is the current ordering logic

Describe alternatives you've considered

Workarounds with which I am familiar:

dbt seems to run models in alphabetical order, so you could rename the long-running model to have an alphabetically-earlier name
- ...but it feels fragile to rely on this undocumented behavior
Add a --depends on: {{ ref('long_running_model') }} to all other models in the project to force long-running model to run first
- ...but the other models don't necessarily depend on this model, so it makes the DAG visualization misleading

Who will this benefit?

Folks with long-running models in the middle of their DAGs

Are you interested in contributing this feature?

No

Anything else?

I realize that giving developers some control over execution order is likely controversial and potentially complicated to implement, but I see this as a useful Advanced Feature™ (a la incremental predicates) for those situations where complex DAG runtime is sub-optimal.

https://getdbt.slack.com/archives/CBSQTAPLG/p1724930433868199 ↩

The text was updated successfully, but these errors were encountered:

owenprough-sift · 2024-09-06T19:22:53Z

Another data point: https://getdbt.slack.com/archives/CBSQTAPLG/p1725649648110359?thread_ts=1725643109.036959&cid=CBSQTAPLG

pempey · 2024-12-26T15:41:11Z

Additionally this would be useful in cases where a run would be 'thread starved', models are only waiting for an available thread and not an other model, at the beginning of a run and the critical path, the longest single path lineage, of model executions are being delayed because of the limited threads. It is similar to the original problem statement as it effects performance on the the critical path of the invocation, but includes the cases there the models individual may not be long running but the total lineage of models are quite long.

My first though for a config was an invocation_priority that would be include in the current sorting of the list of models to execute. This would be a numeric value with a default that could be overwritten with a model config. This conceptually better handles the case of 'these models need to run before others but the order between them is not important'. So setting multiple models to have the same priority would make more sense then giving the models the same execution order.

owenprough-sift added enhancement New feature or request triage labels Aug 29, 2024

dbeatty10 added the performance label Sep 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Configuration for runtime "priority" among models with satisfied dependencies #10632

[Feature] Configuration for runtime "priority" among models with satisfied dependencies #10632

owenprough-sift commented Aug 29, 2024

owenprough-sift commented Sep 6, 2024

pempey commented Dec 26, 2024

[Feature] Configuration for runtime "priority" among models with satisfied dependencies #10632

[Feature] Configuration for runtime "priority" among models with satisfied dependencies #10632

Comments

owenprough-sift commented Aug 29, 2024

Is this your first time submitting a feature request?

Describe the feature

Problem Statement

Proposed Solution

Describe alternatives you've considered

Who will this benefit?

Are you interested in contributing this feature?

Anything else?

Footnotes

owenprough-sift commented Sep 6, 2024

pempey commented Dec 26, 2024