Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove execution_date and logical_date from DAG Run APIs and Functions, transition to run_id as sole identifier for Airflow 3.0 #42404

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

sunank200
Copy link
Collaborator

@sunank200 sunank200 commented Sep 23, 2024

This PR removes the execution_date and logical_date arguments from functions and APIs that are used to retrieve DAG runs, aligning with the broader changes introduced in Airflow 2.2 and preparing for Airflow 3.0. The functions now use run_id as the sole identifier for DAG runs, simplifying the process and eliminating deprecated behaviour.

Motivation:

In Airflow, execution_date has historically been used to distinguish different DAG run instances. However, the introduction of run_id and the DAG run concept in Airflow 2.2 shifts away from using execution_date as an identifier. Continuing to rely on execution_date introduces limitations, such as the inability to handle multiple DAG runs at the same logical time, especially in cases like TriggerDagRunOperator when dynamic runs are generated.

By removing execution_date in favor of run_id, this PR eliminates these limitations. This also removes the unique constraint on execution_date at the database level, paving the way for a cleaner and more flexible scheduling system in Airflow 3.0.

Key Changes:

  1. API and Function Changes:

    • The execution_date and logical_date arguments have been removed from all public APIs and Python functions related to DAG run lookups.
    • run_id is now the exclusive identifier for DAG runs in these contexts.
    • Deprecation warnings for execution_date and logical_date are no longer necessary and have been removed.
  2. Database Migration:

    • The unique constraint on execution_date in the database has been dropped, as run_id now ensures the uniqueness of DAG runs as part of #41818

Rationale:

Removing execution_date is necessary to enable more flexible DAG run management. For example, dynamic runs created by TriggerDagRunOperator can now be correctly identified and managed without awkward workarounds as discussed in this doc. This change makes subsequent DAG run lookups easier and more robust, while also simplifying the database schema by removing the unique constraint on execution_date.

Additionally, users will still be able to view execution_date for reference, renamed as logical_date, and paired with run_id for clarity in the web UI, making it easier to distinguish between DAG runs.


^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

@boring-cyborg boring-cyborg bot added area:API Airflow's REST/HTTP API area:CLI area:db-migrations PRs with DB migration area:providers area:Scheduler including HA (high availability) scheduler area:Triggerer area:UI Related to UI/UX. For Frontend Developers. area:webserver Webserver related Issues provider:cncf-kubernetes Kubernetes provider related issues provider:databricks labels Sep 23, 2024
airflow/utils/context.py Outdated Show resolved Hide resolved
@sunank200 sunank200 force-pushed the rename-execution-date branch 15 times, most recently from 3e729bc to efa47f6 Compare September 30, 2024 13:30
@sunank200 sunank200 marked this pull request as draft November 6, 2024 12:57
@sunank200 sunank200 force-pushed the rename-execution-date branch 11 times, most recently from 744d5ed to 91873be Compare November 6, 2024 20:39
@sunank200 sunank200 changed the title Remove execution_date and logical_date from arguments where function/API is used to look up a DAG run Remove execution_date and logical_date from DAG Run APIs and Functions, Transition to run_id as Sole Identifier for Airflow 3.0 Nov 6, 2024
@sunank200 sunank200 changed the title Remove execution_date and logical_date from DAG Run APIs and Functions, Transition to run_id as Sole Identifier for Airflow 3.0 Remove execution_date and logical_date from DAG Run APIs and Functions, Transition to run_id as sole identifier for Airflow 3.0 Nov 6, 2024
@sunank200 sunank200 changed the title Remove execution_date and logical_date from DAG Run APIs and Functions, Transition to run_id as sole identifier for Airflow 3.0 Remove execution_date and logical_date from DAG Run APIs and Functions, transition to run_id as sole identifier for Airflow 3.0 Nov 6, 2024
@sunank200 sunank200 marked this pull request as ready for review November 7, 2024 06:07
:param run_id: the DAG run_id to start looking from
:param commit: commit DAG and tasks to be altered to the database
:param session: database session
:return: If commit is true, list of tasks that have been updated,
otherwise list of tasks that will be updated
:raises: AssertionError if dag or execution_date is invalid
:raises: AssertionError if dag or logical_date is invalid
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still need to raise something for logical_date?

ret_data = {}
data["execution_date"] = data["logical_date"]
data["logical_date"] = data["logical_date"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still need this line?

def _fetch_dag_run_from_run_id_or_logical_date_string(
*,
dag_id: str,
value: str,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value is quite vague. maybe we can add a line to describe what it's could be?

session=session,
triggered_by=DagRunTriggeredByType.CLI,
)
return dag_run, True
return dag_run, True # type: ignore[return-value]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May I know why do we need it here?

Comment on lines +1341 to +1343
end_date = get_logical_date() if not future else None
start_date = get_logical_date() if not past else None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
end_date = get_logical_date() if not future else None
start_date = get_logical_date() if not past else None
end_date = None if future else get_logical_date()
start_date = None if past else get_logical_date()

) -> int | Iterable[TaskInstance]:
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why removing the docstring

self.validate()
self.log.debug("Clearing existing task instances for execution date %s", execution_date)
self.log.debug("Clearing existing task instances for execution date %s", logical_date)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use the name logical date in the doc and log? or should we just keep it as execution date?

@@ -14,25 +14,3 @@
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this change?

@@ -24,23 +24,23 @@
class ExecDateAfterStartDateDep(BaseTIDep):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do need to rename this class

run_id = www_request.args.get("run_id")
# First check run id, then check execution date, if not fall back on the latest dagrun
# First check run id, then check logical_date date, if not fall back on the latest dagrun
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# First check run id, then check logical_date date, if not fall back on the latest dagrun
# First check run id, then check logical_date, if not fall back on the latest dagrun

except ValueError:
error_message = (
f"Given execution date {execution_date_str!r} could not be identified as a date. "
f"Given execution date {logical_date_str!r} could not be identified as a date. "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
f"Given execution date {logical_date_str!r} could not be identified as a date. "
f"Given logical date {logical_date_str!r} could not be identified as a date. "

Comment on lines +41 to +46
try:
from airflow.cli.cli_config import ARG_LOGICAL_DATE
except ImportError: # 2.x compatibility.
from airflow.cli.cli_config import ( # type: ignore[attr-defined, no-redef]
ARG_EXECUTION_DATE as ARG_LOGICAL_DATE,
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably could move it to common.compat provider?

@sunank200 sunank200 force-pushed the rename-execution-date branch 3 times, most recently from 50d31cf to 252e242 Compare November 8, 2024 11:41
…API is used to look up a DAG run

- Resolve compatibility issues and refactor `execution_date` to `logical_date`
- Resolve compatibility tests
- Correct import paths after rebase
- Address static checks
- Refactor `execution_date` to `logical_date`
- Add missing DAG files for tests
- Enhance GCP ML and CloudBuild tests using helpers for compatibility
- Fix mypy errors
- Miscellaneous fixes and removals of `execution_date`
- Resolve compatibility tests
- Correct import paths after rebase
- Address static checks
- Refactor `execution_date` to `logical_date`
- Add missing DAG files for tests
- Enhance GCP ML and CloudBuild tests using helpers for compatibility
- Fix mypy errors
- Miscellaneous fixes and removals of `execution_date`
- Resolve compatibility tests
- Correct import paths after rebase
- Address static checks
- Refactor `execution_date` to `logical_date`
- Add missing DAG files for tests
- Enhance GCP ML and CloudBuild tests using helpers for compatibility
- Fix mypy errors
- Miscellaneous fixes and removals of `execution_date`

Resolve compatibility issues and refactor `execution_date` to `logical_date`

- Resolve compatibility tests
- Correct import paths after rebase
- Address static checks
- Refactor `execution_date` to `logical_date`
- Add missing DAG files for tests
- Enhance GCP ML and CloudBuild tests using helpers for compatibility
- Fix mypy errors
- Miscellaneous fixes and removals of `execution_date`

Resolve compatibility issues and refactor `execution_date` to `logical_date`

- Resolve compatibility tests
- Correct import paths after rebase
- Address static checks
- Refactor `execution_date` to `logical_date`
- Add missing DAG files for tests
- Enhance GCP ML and CloudBuild tests using helpers for compatibility
- Fix mypy errors
- Miscellaneous fixes and removals of `execution_date`

Resolve compatibility issues and refactor `execution_date` to `logical_date`

- Resolve compatibility tests
- Correct import paths after rebase
- Address static checks
- Refactor `execution_date` to `logical_date`
- Add missing DAG files for tests
- Enhance GCP ML and CloudBuild tests using helpers for compatibility
- Fix mypy errors
- Miscellaneous fixes and removals of `execution_date`

Resolve compatibility issues and refactor `execution_date` to `logical_date`
- Resolve compatibility tests
- Correct import paths after rebase
- Address static checks
- Refactor `execution_date` to `logical_date`
- Add missing DAG files for tests
- Enhance GCP ML and CloudBuild tests using helpers for compatibility
- Fix mypy errors
- Miscellaneous fixes and removals of `execution_date`

More name changes and argument removal

Mass replace execution_date in arguments

Not finished, still many to go.

Mass replace execution_date in arguments

Not finished, still many to go.

Drop execution_date unique constraint on DagRun

The column has also been renamed to logical_date, although the Python
model is not changed. This allows us to not need to fix all the Python
code at once (we'll do that later), but still do the two changes in one
migration instead of two.

Fix compat tests

Fix compat tests

Fix compat test

Fix the tests

Fix compat tests

Fix compat tests

Fix compat tests

Fix compat tests

Fix compat tests

Fix compat tests

Fix static checks

Fix the pytest_plugin by refactor renaming execution_date to logical_date

Fix the pre-commit errors

Fix import paths after rebase

fix more compat tests

Fix the compat tests

Fix the compat tests

Fix the compat tests

Use test helpers in GCP MLEngine tests for compat

Using Airflow internals directly presents a problem when dealing with
compatibility in tests (since the same tests must run against Airflow 2
and 3). The helpers already handle this well, so we should use them.

Use test helpers in GCP CloudBuild tests for compat

Using Airflow internals directly presents a problem when dealing with
compatibility in tests (since the same tests must run against Airflow 2
and 3). The helpers already handle this well, so we should use them.

Mark db tests

Some compat code to make DAG.clear() still work

Remove unneeded test cases for compat code

Use test helpers in GCS-BQ tests for compat

Using Airflow internals directly presents a problem when dealing with
compatibility in tests (since the same tests must run against Airflow 2
and 3). The helpers already handle this well, so we should use them.

Fix tests and add missing dag_file for tests

Add missing dag_file for tests

Fix some provider/cncf compat on logical date

We should continue to use execution_date if the provider is run against 2.

Fix the cache import path

refactor rename execution_date, executionDate to logical_date, logicalDate respectively.

refactor rename execution_date, executionDate to logical_date, logicalDate respectively.

fix more tests

Fix more tests

Fix more tests

More fixes on removal of execution date

Fix more tests

fix more tests

Fix tests

Fix tests

Fix tests

fix the static checks

remove the migration

fix the tests

Fix migration file

Remove renaming execution date for SlaMiss

Remove execution_date and add logical date

Fix mypy errors

try-except for 2.x. for DagRun

Remove _DEPRECATION_REPLACEMENTS

More name changes and argument removal

Mass replace execution_date in arguments

Not finished, still many to go.

Mass replace execution_date in arguments

Not finished, still many to go.

Drop execution_date unique constraint on DagRun

The column has also been renamed to logical_date, although the Python
model is not changed. This allows us to not need to fix all the Python
code at once (we'll do that later), but still do the two changes in one
migration instead of two.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:API Airflow's REST/HTTP API area:CLI area:db-migrations PRs with DB migration area:providers area:Scheduler including HA (high availability) scheduler area:Triggerer area:UI Related to UI/UX. For Frontend Developers. area:webserver Webserver related Issues legacy api Whether legacy API changes should be allowed in PR legacy ui Whether legacy UI change should be allowed in PR provider:cncf-kubernetes Kubernetes provider related issues provider:databricks
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants