Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add retry action for tasks in job admin #354

Closed
6 tasks done
jpmckinney opened this issue Apr 27, 2024 · 2 comments
Closed
6 tasks done

Add retry action for tasks in job admin #354

jpmckinney opened this issue Apr 27, 2024 · 2 comments
Labels

Comments

@jpmckinney
Copy link
Member

jpmckinney commented Apr 27, 2024

  • Add abstract retryable property on TaskManager
  • Add abstract retry method on TaskManager
  • exporter: Delete the lockfile and republish the message
  • flattener: Delete the lockfile and republish the message.
  • collect: Not retryable (need to cancel job Add cancel action to job admin #352 and start new job Allow scheduling new job manually #350).
  • process: Would need Collect to be finished, and would need a new API endpoint (e.g. /reload/), which would delete all collection files and derived collections (and most notes), and then add new files from disk. Otherwise, would need to do same as for Collect (cancel and start new).
  • pelican: Can skip implementation, as its current usage for coverage will be replaced with Cardinal. If we reintroduce Pelican later (e.g. via Allow JOB_TASKS_PLAN to be configurable per publication #304 to run more checks on priority datasets), then we can look into its retry logic. See old docs below.

We can start with the easily retryable tasks (exporter, flattener) and create a follow-up issue for Process.

This issue is mostly relevant for flattener, which seems to fail sometimes, but we don't want to have to re-run the entire job.

  • Add admin action
  • Update the docs that refer to this issue (#354) and surrounding text.

The old docs were:

Pelican
  Delete the dataset, using Pelican backend's ``remove`` `command <https://pelican-backend.readthedocs.io/en/latest/tasks/datasets.html#remove>`__.

  Change the status of the Pelican task and subsequent tasks to ``PLANNED``, then change the status of the job to ``RUNNING``.
Exporter
  Publish a message from the :ref:`Django shell<django-shell>`, using the compiled collection in Kingfisher Process:

  .. code-block:: bash

     from exporter.util import publish

     publish({"job_id": 123, "collection_id": 456}, "exporter_init")
Flattener
  Delete the ``.csv.tar.gz.lock`` files in the job's directory within the ``EXPORTER_DIR`` :ref:`directory<env-exporter-flattener>`.

  Publish a message from the :ref:`Django shell<django-shell>`:

  .. code-block:: bash

     from exporter.util import publish

     publish({"job_id": 123}, "flattener_init")
@jpmckinney
Copy link
Member Author

On second thought, Process isn't retryable until it's finished (to avoid enqueued work conflicting across runs).

@jpmckinney
Copy link
Member Author

jpmckinney commented May 8, 2024

Adding inline actions takes some work. I only found unmaintained code: escaped/django-inline-actions#62

Maybe another way to do it is to set the task back to PLANNED and to have PLANNED always perform retry logic (assuming such logic is safe).

Comment I had written about Process:

    # For this task to be retryable, it needs to finish (in case any messages are queued), and then call a new API
    # endpoint in Kingfisher Process (e.g. /collections/<pk>/reload/) that deletes the collection's collection files,
    # derived collections and (most) collection notes and re-adds the files from the filesystem.

And code for TaskManager:

    @property
    @abstractmethod
    def retryable(self) -> bool:
        """
        Whether the task can be retried.
        """

    @abstractmethod
    def retry(self) -> None:
        """
        Restart the task.
        """

jpmckinney added a commit that referenced this issue May 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant