Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: reindex_studio was crashing if instance had too many courses #34905

Merged
merged 1 commit into from
Jun 6, 2024

Conversation

bradenmacdonald
Copy link
Contributor

@bradenmacdonald bradenmacdonald commented Jun 3, 2024

Description

This fixes openedx/modular-learning#223 "Cannot create initial search index on instances with many courses".

The problem was that calling store.get_courses() would load too much data into memory at once.

To fix this, I changed the code to use CourseOverview to get the total course count, and to do a paginated query that loads only 1,000 course IDs (and names) at a time.

Supporting information

openedx/modular-learning#223

Testing instructions

See instructions for enabling Studio Content Search at https://openedx.atlassian.net/wiki/spaces/COMM/pages/3890380898/Next+Release+Redwood+-+Operator+Dev+Notes and follow that procedure.

Deadline

None, but we'd like to backport this fix to Redwood.

Private ref: MNG-4278

@openedx-webhooks openedx-webhooks added the open-source-contribution PR author is not from Axim or 2U label Jun 3, 2024
@openedx-webhooks
Copy link

Thanks for the pull request, @bradenmacdonald! Please note that it may take us up to several weeks or months to complete a review and merge your PR.

Feel free to add as much of the following information to the ticket as you can:

  • supporting documentation
  • Open edX discussion forum threads
  • timeline information ("this must be merged by XX date", and why that is)
  • partner information ("this is a course on edx.org")
  • any other information that can help Product understand the context for the PR

All technical communication about the code itself will be done via the GitHub pull request interface. As a reminder, our process documentation is here.

Please let us know once your PR is ready for our review and all tests are green.

@bradenmacdonald
Copy link
Contributor Author

@MoisesGSalas Does this fix the issue you were seeing?

@bradenmacdonald bradenmacdonald changed the title fix: reindex_sutdio was crashing if instance had too many courses fix: reindex_studio was crashing if instance had too many courses Jun 3, 2024
@MoisesGSalas
Copy link

I will try to test it out between today and tomorrow.

@MoisesGSalas
Copy link

I tried running it again and it does index successfully. The only issue I'm seeing is that this is going to take a while, it has indexed 2500 courses in around 3 hours, so it will probably take two days to finish.

I'm assuming that kind of optimization is outside the scope of this PR?

@bradenmacdonald
Copy link
Contributor Author

Yeah, for this PR I just want to get it working. Optimizations can be looked at separately. Thanks for testing it!

Copy link

@MoisesGSalas MoisesGSalas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall I think the changes are good.

I tested on an instance with 40k courses and didn't crash (previously that wasn't the case), that's as extreme as I can think of.

side-note: that search is fast, excited about more support for meilisearch.

Copy link
Contributor

@pomegranited pomegranited left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Works perfectly, thank you @bradenmacdonald !

Will get this merged today.

  • I tested this on my tutor devstack using the management command (reindex_studio --experimental) and by altering course content in Studio and seeing these changes reflected in search.
  • I read through the code
  • I checked for accessibility issues N/A backend only
  • Includes documentation N/A bugfix
  • User-facing strings are extracted for translation N/A

@pomegranited pomegranited merged commit 6bfe08c into openedx:master Jun 6, 2024
89 checks passed
@pomegranited pomegranited deleted the index-large-instances branch June 6, 2024 23:59
@openedx-webhooks
Copy link

@bradenmacdonald 🎉 Your pull request was merged! Please take a moment to answer a two question survey so we can improve your experience in the future.

@edx-pipeline-bot
Copy link
Contributor

2U Release Notice: This PR has been deployed to the edX staging environment in preparation for a release to production.

@edx-pipeline-bot
Copy link
Contributor

2U Release Notice: This PR has been deployed to the edX production environment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
open-source-contribution PR author is not from Axim or 2U
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

[Course Search] Cannot create initial search index on instances with many courses
5 participants