Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sitemap generation includes CMS StructrualPages, which it should not #15134

Closed
stevejalim opened this issue Sep 13, 2024 · 1 comment · Fixed by #15139
Closed

Sitemap generation includes CMS StructrualPages, which it should not #15134

stevejalim opened this issue Sep 13, 2024 · 1 comment · Fixed by #15139
Assignees
Labels
Bug 🐛 Something's not working the way it should

Comments

@stevejalim
Copy link
Collaborator

stevejalim commented Sep 13, 2024

Description

In the CMS we use StructuralPage create branches/folders in the page tree. These StructuralPages do not have content but allow us to structure URL paths as we need, including matching some routes that already exist in the page tree as static/Django-only paths

However, we've spotted a bug: if there's a static path for foo/ and a StructuralPage for foo/, the static path is dropped from the page tree, as can be seen here: mozmeao/www-sitemap-generator@bdd1cc0

Expected result

Paths for StructuralPages should not steamroll over static Django-view paths - only add to them

Actual result

Paths for StructuralPages did steamroll over static Django-view paths - leaving only /en-US/about/ in the sitemap, rather than /<all the other locales>/about/

Note that the /about/ page is still available in all the locales it was originally -- this is just about sitemap generation

@stevejalim stevejalim added the Bug 🐛 Something's not working the way it should label Sep 13, 2024
@stevejalim stevejalim self-assigned this Sep 13, 2024
@stevejalim
Copy link
Collaborator Author

stevejalim commented Sep 13, 2024

The clearest solution, as discussed with @alexgibson, is simply to ensure that StructuralPages are not included in the lookup of pages used for the sitemap - we never want to send anyone to them directly, after all, and visiting a StructuralPage just 302s to the parent path.

Current code:

def get_wagtail_urls():
urls = defaultdict(list)
# Get all live, non-private Wagtail pages
for cms_page in Page.objects.live().public().order_by("path"):
# We don't want the Wagtail structural Root page, nor the
# site root page, because that isn't surfaced from the CMS (yet)
if cms_page.is_root() or cms_page.is_site_root():
continue
_url = cms_page.get_url()
if _url:
lang_code = cms_page.locale.language_code
_path = _url.lstrip("/").replace(lang_code, "")
urls[_path].append(lang_code)
return urls

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug 🐛 Something's not working the way it should
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant