Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Configuration for runtime "priority" among models with satisfied dependencies #10632

Open
3 tasks done
owenprough-sift opened this issue Aug 29, 2024 · 2 comments
Open
3 tasks done
Labels

Comments

@owenprough-sift
Copy link

Is this your first time submitting a feature request?

  • I have read the expectations for open source contributors
  • I have searched the existing issues, and I could not find an existing issue for this feature
  • I am requesting a straightforward extension of existing dbt functionality, rather than a Big Idea better suited to a discussion

Describe the feature

Problem Statement

dbt runs models in DAG order, which is functionally correct. But there are situations1 where it would be helpful to have more control over the relative execution order of models within a run. For example: in a run which includes a long-running model with no upstream dependencies but many downstream dependencies it would be helpful to start the long-running model first to minimize total run time.

Proposed Solution

A new execution_order configuration which allows you to specify the relative execution order of selected resources.
At runtime, dbt would:

  1. Determine the set of resources whose dependencies are satisfied (aka "run in DAG order")
  2. Within that set, run the resources ordered by execution_order (nulls last), falling back to whatever is the current ordering logic

Describe alternatives you've considered

Workarounds with which I am familiar:

  • dbt seems to run models in alphabetical order, so you could rename the long-running model to have an alphabetically-earlier name
    • ...but it feels fragile to rely on this undocumented behavior
  • Add a --depends on: {{ ref('long_running_model') }} to all other models in the project to force long-running model to run first
    • ...but the other models don't necessarily depend on this model, so it makes the DAG visualization misleading

Who will this benefit?

Folks with long-running models in the middle of their DAGs

Are you interested in contributing this feature?

No

Anything else?

I realize that giving developers some control over execution order is likely controversial and potentially complicated to implement, but I see this as a useful Advanced Feature™ (a la incremental predicates) for those situations where complex DAG runtime is sub-optimal.

Footnotes

  1. https://getdbt.slack.com/archives/CBSQTAPLG/p1724930433868199

@owenprough-sift owenprough-sift added enhancement New feature or request triage labels Aug 29, 2024
@owenprough-sift
Copy link
Author

@pempey
Copy link

pempey commented Dec 26, 2024

Additionally this would be useful in cases where a run would be 'thread starved', models are only waiting for an available thread and not an other model, at the beginning of a run and the critical path, the longest single path lineage, of model executions are being delayed because of the limited threads. It is similar to the original problem statement as it effects performance on the the critical path of the invocation, but includes the cases there the models individual may not be long running but the total lineage of models are quite long.

My first though for a config was an invocation_priority that would be include in the current sorting of the list of models to execute. This would be a numeric value with a default that could be overwritten with a model config. This conceptually better handles the case of 'these models need to run before others but the order between them is not important'. So setting multiple models to have the same priority would make more sense then giving the models the same execution order.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants