Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

explore the use of a task manager for long queries #46

Open
havok2063 opened this issue Aug 14, 2024 · 1 comment
Open

explore the use of a task manager for long queries #46

havok2063 opened this issue Aug 14, 2024 · 1 comment
Labels
enhancement New feature or request question Further information is requested

Comments

@havok2063
Copy link
Contributor

We can have long-running queries that currently block a user from doing anything else. For example, searches on SDSS programs, without any other constraints can result in up to ~15 minutes queries. We should explore implementing a task-manager, e.g. https://taskiq-python.github.io/ as a way to offload long-running jobs.

I think we'd want something that manages the background task or query, and does not block Valis or Zora. Ideally we'd have a way to track the status of the job: e.g. running, completed, error, and potentially have a dashboard of jobs. Another consideration is anonymous versus user tasks and if we need one or the other or both.

FastAPI has some support for Background Tasks . I'm not sure if this is enough for our purposes. Other related packages:

I do like Taskiq as a simpler, more modern approach, but we may want to consider alternatives.

@havok2063 havok2063 added enhancement New feature or request question Further information is requested labels Aug 14, 2024
@albireox
Copy link
Member

albireox commented Sep 3, 2024

I looked a bit into this for lvmapi and ended up settling for TaskIQ, and I've been liking it quite a lot. It's really easy to set up and plays well with FastAPI. It may have fewer bells than Celery but I actually consider that an advantage.

I looked a bit into the other options (except Dramatiq, I think). Background Tasks sounded promising but I don't think there's a way to track task completion (you just launch something in the background and then lose track of it). It doesn't seem different from just having a pool of asyncio tasks, and in the end one would need to build something to track them. I'm also not sure how well it works with the uvicorn gunicorn worker.

Celery has everything one would need but it simply does not support async and all the hacks I tried failed dramatically. Maybe that's a bit less of a blocker for valis since currently all the DB connections are, in practice, sync. But that's a future limitation and it may not work well with FastAPI.

Huey's implementation of asyncio is a bit hacky and I could not make it work properly.

For what you describe I think TaskIQ would work out of the box with the exception of tracking all running tasks. But I think that's easy to implement with a Middleware that would record when a task starts (for example to a table in sdss5db, or to a Redis DB if we use that as the results backend) and when it finishes. Then it's easy to query that to get a complete picture of what tasks are running and where.

Another things that we should consider is how this and asynchronous DB connections are related. If one uses peewee in a worker, and the query takes 15 minutes, that worker won't be able to do anything else for 15 minutes even if all the action at that point is happening in the Postgresql server. As long as the workers are light one can imagine spinning 100 workers (and maybe using the knowledge of how many queries are running to throttle the start of new queries).

But if the database connection is async (with SQLAlchemy 2 or with some hack that would allow peewee to use psycopg 3 async support) then a few workers could handle all the tasks. At that point TaskIQ is not really doing any work other than tracking query completion. I'm actually not sure if TaskIQ allows workers to run concurrent tasks or if a single worker can only run one task at a time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants