You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have searched the existing issues, and I could not find an existing issue for this feature
I am requesting a straightforward extension of existing dbt functionality, rather than a Big Idea better suited to a discussion
Describe the feature
Problem Statement
dbt runs models in DAG order, which is functionally correct. But there are situations1 where it would be helpful to have more control over the relative execution order of models within a run. For example: in a run which includes a long-running model with no upstream dependencies but many downstream dependencies it would be helpful to start the long-running model first to minimize total run time.
Proposed Solution
A new execution_order configuration which allows you to specify the relative execution order of selected resources.
At runtime, dbt would:
Determine the set of resources whose dependencies are satisfied (aka "run in DAG order")
Within that set, run the resources ordered by execution_order (nulls last), falling back to whatever is the current ordering logic
Describe alternatives you've considered
Workarounds with which I am familiar:
dbt seems to run models in alphabetical order, so you could rename the long-running model to have an alphabetically-earlier name
...but it feels fragile to rely on this undocumented behavior
Add a --depends on: {{ ref('long_running_model') }} to all other models in the project to force long-running model to run first
...but the other models don't necessarily depend on this model, so it makes the DAG visualization misleading
Who will this benefit?
Folks with long-running models in the middle of their DAGs
Are you interested in contributing this feature?
No
Anything else?
I realize that giving developers some control over execution order is likely controversial and potentially complicated to implement, but I see this as a useful Advanced Feature™ (a la incremental predicates) for those situations where complex DAG runtime is sub-optimal.
Additionally this would be useful in cases where a run would be 'thread starved', models are only waiting for an available thread and not an other model, at the beginning of a run and the critical path, the longest single path lineage, of model executions are being delayed because of the limited threads. It is similar to the original problem statement as it effects performance on the the critical path of the invocation, but includes the cases there the models individual may not be long running but the total lineage of models are quite long.
My first though for a config was an invocation_priority that would be include in the current sorting of the list of models to execute. This would be a numeric value with a default that could be overwritten with a model config. This conceptually better handles the case of 'these models need to run before others but the order between them is not important'. So setting multiple models to have the same priority would make more sense then giving the models the same execution order.
Is this your first time submitting a feature request?
Describe the feature
Problem Statement
dbt runs models in DAG order, which is functionally correct. But there are situations1 where it would be helpful to have more control over the relative execution order of models within a run. For example: in a run which includes a long-running model with no upstream dependencies but many downstream dependencies it would be helpful to start the long-running model first to minimize total run time.
Proposed Solution
A new
execution_order
configuration which allows you to specify the relative execution order of selected resources.At runtime, dbt would:
execution_order
(nulls last), falling back to whatever is the current ordering logicDescribe alternatives you've considered
Workarounds with which I am familiar:
--depends on: {{ ref('long_running_model') }}
to all other models in the project to force long-running model to run firstWho will this benefit?
Folks with long-running models in the middle of their DAGs
Are you interested in contributing this feature?
No
Anything else?
I realize that giving developers some control over execution order is likely controversial and potentially complicated to implement, but I see this as a useful Advanced Feature™ (a la incremental predicates) for those situations where complex DAG runtime is sub-optimal.
Footnotes
https://getdbt.slack.com/archives/CBSQTAPLG/p1724930433868199 ↩
The text was updated successfully, but these errors were encountered: