Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

XWIKI-12987: Relative links are made absolute or even broken after moving a page #3553

Closed
wants to merge 12 commits into from

Conversation

surli
Copy link
Member

@surli surli commented Oct 10, 2024

Jira URL

https://jira.xwiki.org/browse/XWIKI-12987

TODO

  • [] Document properly abstract methods in AbstractCopyOrMoveJob
  • Improve SubWikiIT#movePageToSubwiki
  • Why not checking if the doc exists in updateResourceReferenceAbsolute conditions?
  • Verify all new FIXME in the code

Changes

Description

The idea of this work is to:

  1. Change the way AbstractCopyOrMoveJob works to perform computation of couple source/target documents before processing them
  2. Provide a way to access that map source/target documents
  3. Use that information when performing a call to ReferenceRenamer to define if a relative untyped link should be handled or not

The PR provides mainly:

  • new APIs in ReferenceRenamer and MacroRefactoring to integrate the map of references that have been moved as part of same job
  • refactorings of AbstractCopyOrMoveJob:
    • specific computation of getEntities to actually visit the hierarchy and populate the entities with the couple of source/target documents
    • new abstract methods to avoid duplications (not strictly needed for this work)
    • new method to retrieve the map of source/target documents
  • new conditions in ResourceReferenceRenamer to decide if a link should be renamed or not: most of the logic of the fix is encoded there (see also clarifications)
  • new calls in XWiki#updateLinksForRename and BackLinkUpdaterListener#updateBackLinks to give the map of source/target when calling the rename of references
  • new integration test simulating the scenario indicated in the ticket and also performing a supplementary check related to a regression found afterwards
  • same integration test also performed on a subwiki in SubWikiIT

Clarifications

The refactoring of references is currently called at two places:

  1. by the BackLinkUpdaterListener for all backlinks after a document has been renamed (triggered by a document event)
  2. by XWiki#updateLinksForRename to rename the internal links of the current document (which is always a call to updateResourceReferenceRelative, see below)

The problem of XWIKI-12987 is that XWiki#updateLinksForRename is called first and does perform an absolute rename of the relative links.
Now ResourceReferenceRenamer APIs names might be misleading: updateResourceReferenceRelative and updateResourceReferenceAbsolute are not about the references being absolute or relative: it's about the renamed references being absolute or relative respectively to the current document. It took me a while to integrate this, and I'm still struggling a bit with it.

So the problem was to find a proper condition to decide when to not refactor links, for this I'm performing a check for assessing if a link is absolute or not, by trying to resolve the ResourceReference without any parameter: if the result equals the reference with parameter then it was absolute.

Then for the updateResourceReferenceAbsolute the idea is to only perform update of the links, if the provided link is absolute, or if it's relative but the current document hasn't been moved as part of same job: in such case we do need to update the relative link, because there won't be a call to XWiki#updateLinksForRename on that document to update the link, we only get the call from BackLinkUpdaterListener.

For the updateResourceReferenceRelative the check is a bit more complex.
We only update links that are relative here, we don't want to update absolute references (is that correct? Can't find a counter example right now).
Then since we only perform refactoring of links relative to current document, we also check that the link about to be refactored is not related to pages that are part of the moved document in the same job: if those are also moved in the same job, then they're moved using same "direction", they're part of same hierarchy and we don't want to change the relative links wrt to them. This check is the main part of avoiding to update the relative links.

And finally we perform the update of the link only if the doc actually exists: we would create absolute links for those not existing doc, which doesn't make sense, we should keep the relative link we don't really know what the user wanted to do with those. Note that we could do the same check in updateResourceReferenceAbsolute but we don't really have the need since this is only called from the BackLinkUpdaterListener and if I'm correct we'll never have registered backlinks for a not existing doc.

Note that initially we discussed about using untyped link as a condition to perform or not the refactoring: I dropped the idea because we currently always create image resource references as untyped references from the WYSIWYG editor.

Screenshots & Video

Tested and supported UCs

In RenamePageIT:

  • Rename of links outside moved hierarchy: My.Page contains link to [[1.2.WebHome]] and 1.2.WebHome full hierarchy is renamed to A.B.
  • Rename of absolute links in content, macros and images inside a hierarchy: X.WebHome contains content with absolute links to sub pages in X space. X is renamed, we check that all absolute links / images are properly updated.
  • Rename of relative links inside hierarchy: case of a hierarchy of pages containing Alice and Bob, WebHome contains simplest relative links (e.g. [[Alice]]) we check that those links are not updated if the whole hierarchy is moved. Test also performed when moving on a subwiki.

TODO:

  • Check refactoring of relative / absolute links in translations

Executed Tests

Run of tests on following modules / integration test:

  • xwiki-platform-refactoring
  • xwiki-platform-flamingo-skin-test-docker
  • xwiki-platform-attachment-test-docker and specifically AttachmentMoveIT
  • xwiki-platform-index-test-docker and specifically AllDocsIT
  • xwiki-platform-wiki-test-docker and specifically SubWikiIT

Expected merging strategy

  • Prefers squash: Yes
  • Backport on branches:
    *

@surli surli self-assigned this Oct 10, 2024
@surli surli force-pushed the XWIKI-12987 branch 2 times, most recently from bd6d6fe to 12a6e52 Compare October 23, 2024 16:24
@surli surli marked this pull request as ready for review October 29, 2024 14:32
@surli surli requested a review from tmortagne October 30, 2024 08:49

// Wait for an empty queue here to ensure that the deleted page has been removed from the index and links
// won't be updated just because the page is still in the index.
new SolrTestUtils(testUtils, testConfiguration.getServletEngine()).waitEmptyQueue();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not be needed anymore with XWIKI-22323.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually call it to avoid getting a question when performing the rename operation.

Copy link
Member

@tmortagne tmortagne Oct 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The question is supposed to automatically vanish (it's not a blocker question) as soon as the job is done AFAIU.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well what's sure is that I'm getting a timeout when I remove this and the screenshot shows the question...

Copy link
Member

@tmortagne tmortagne Oct 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might need to update the PO to support non-blocking questions (maybe it does not find what it expects to see to check if the job is still running when a question is displayed).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the problem is more that indexing takes quite some time, in particular with the frequent committing of the index with the default parameters, and the page object most likely just doesn't wait long enough. I have an integration test that explicitly triggers the question and there it works to just continue waiting without answering the question.

if (((CancelableEvent) beginEvent).isCanceled()) {
return;
}
this.progressManager.endStep(this);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need to call endStep if you have a startStep with the same source.

this.progressManager.startStep(this);
EndFoldEvent endEvent = getEndEvent();
this.observationManager.notify(endEvent, this, this.getRequest());
this.progressManager.endStep(this);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need this endStep, it will be closed by popLevelProgress.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't need it but it's not a mistake to have it right? Note that here it's some old code that I just moved, and personally I feel more comfortable to keep the symmetry start/end Step

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not going to cause problems, it's just useless.

I feel more comfortable to keep the symmetry start/end Step

On my side, I usually prefer to limit plumbing to not pollute too much the "real" code.

Map.of(sourceDoc.getDocumentReference(), newDocumentReference);
if (currentJob instanceof AbstractCopyOrMoveJob) {
updatedReferences = ((AbstractCopyOrMoveJob) currentJob).getSelectedEntities();
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tmortagne for that one you suggested:

But I'm not sure it's a good thing to do, might be cleaner to introduce something more genereric in the ExecutionContext that would be populated by jobs like the move job, etc.

wdyt, the PR is already quite complex, isn't it something we can address later?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, it's not a blocker, just a cleaner thing to do eventually for the long run. As long as we don't introduce public API for the current version, it's not adding too much debt.

@@ -182,8 +246,10 @@ private <T extends EntityReference> boolean updateRelativeResourceReference(Reso
return result;
}

// FIXME: Check if we shouldn't use the updatedEntities here.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be double checked: we probably never cover this in our tests.

}
}

// FIXME: this should be factorized with the code from getTargetReference
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: I'll drop that FIXME for after the PR is merged. I'd like to improve the code in AbstractCopyOrMoveJob but it could be done later probably (e.g. between RC and final release).


newPage = setup.rest().get(new LocalDocumentReference(newSpace, newName, Locale.FRENCH));
assertEquals("fr [[" + parent + ".OtherPage.WebHome]]", newPage.getContent());
assertEquals("fr [[OtherPage]]", newPage.getContent());
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is not good anymore: the OtherPage doesn't exist so we just check that the unexisting relative link is not updated. But we don't check anymore that link are refactored in a translation page. We need to improve it.

@michitux
Copy link
Contributor

michitux commented Nov 7, 2024

Then since we only perform refactoring of links relative to current document, we also check that the link about to be refactored is not related to pages that are part of the moved document in the same job: if those are also moved in the same job, then they're moved using same "direction", they're part of same hierarchy and we don't want to change the relative links wrt to them. This check is the main part of avoiding to update the relative links.

What if the relative link contains a part that has been renamed? I haven't checked the code yet, it just sounds like this high-level description doesn't take this into account.

Copy link
Contributor

@michitux michitux left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that the check to only update absolute references is correct. I haven't checked, though, if my example is already covered by integration tests that would show that it still works for some reason.

// We also update the link if it's not an absolute link but the current document is not part of the move job,
// as in this case there won't be any other call to perform the link refactoring.
boolean shouldBeUpdated = linkTargetDocumentReference.equals(oldReference)
&& (absoluteResolvedReference.equals(linkEntityReference)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this check for absolute references is okay. From what I understand, this means that links that are relative to the wiki won't be updated. Imagine in a subwiki, we rename Source.WebHome to Target.WebHome and Source.WebHome contains a link to Source.Subspace.WebHome, which is moved to Target.Subspace.WebHome. Then we should adapt that link, shouldn't we? And we need to do this here, as we don't have any chance to update it later (in the relative step, only links that don't resolve to the same page anymore are updated, but not links that need to point to a different page).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I did tested your scenario and the link is properly renamed, the condition here is respected.
When I debug I have those values:

  • resourceReference is a typed doc reference of value Source.Subspace.WebHome
  • currentDocumentReference is a reference to subwiki:Source.WebHome
  • linkTargetDocumentReference is resolved to subwiki:Source.Subspace.WebHome
  • oldReference is resolved to subwiki:Source.Subspace.WebHome
  • absoluteResolvedReference is resolved to subwiki:Source.Subspace.WebHome
  • linkEntityReference is resolved to subwiki:Source.Subspace.WebHome

From what I understand, this means that links that are relative to the wiki won't be updated.

it works because the link is resolved using the current document so the proper subwiki is used.

Copy link
Contributor

@michitux michitux Nov 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the answers below to a question about what seems to be the exactly same "is absolute" condition in the other method, you write:

the condition will never be met if I have a link to xwiki:MySpace.MyPage, but it will be met if I have MySpace.MyPage

However, here you argue that the condition is met for a link Source.Subspace.WebHome. So either I missed something or one of these answers is wrong. [Edit] Maybe the difference is between having one or two dots? But then I really don't see that it is a good condition to have as when moving pages between subwikis, it will break things.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still drinking my coffee so maybe it's me who is missing something, but the condition I was talking about in the other answer was actually the contrary of this one, there's a not: !absoluteResolvedReference.equals(linkEntityReference).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, there is a negation, I was talking about the actual check for absolute references without negation.


// We should update the reference if:
// - it's relative: the resolution of the reference without any parameter, and the resolution of the reference
// with the new reference should give different results
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need this check? Isn't the explanation the same as what is already checked by the third condition?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No the conditions are very different. This one is only checking that the current link is relative: i.e. the condition will never be met if I have a link to xwiki:MySpace.MyPage, but it will be met if I have MySpace.MyPage.
The third condition is about checking that the link points to a different target because of the rename, so if I have a link MySpace.MyPage in page xwiki:Bla.MySpace and I renamed Bla to Foo the resolution was xwiki:Bla.MySpace.MyPage and it's now xwiki:Foo.MySpace.MyPage: in such case the condition is met.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No the conditions are very different. This one is only checking that the current link is relative: i.e. the condition will never be met if I have a link to xwiki:MySpace.MyPage, but it will be met if I have MySpace.MyPage.

That's true, but what's relevant here is that the resolution relative to the old location is different and that's already checked by the third condition. So unless there is a different explanation why this check is needed, I think it could also just be removed. A possible reason could be that the absolute rename step already adapted the link, and we want to prevent it from being adapted again?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A possible reason could be that the absolute rename step already adapted the link, and we want to prevent it from being adapted again?

It's possible I needed it for that, to be honest I don't recall exactly.

I'll try to run again my test without that specific condition just to see if one of them is failing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll try to run again my test without that specific condition just to see if one of them is failing.

Ok apparently I can indeed discard that condition safely, at least for my tests to pass.

// as in this case there won't be any other call to perform the link refactoring.
boolean shouldBeUpdated = linkTargetDocumentReference.equals(oldReference)
&& (absoluteResolvedReference.equals(linkEntityReference)
|| !updatedEntities.containsKey(currentDocumentReference));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check doesn't work if the document with the link has already been moved before the current document I think.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're might be right on that one and I think it's the reason why part of my integration test isn't working see https://github.com/xwiki/xwiki-platform/pull/3553/files/7bb0a615e65405019dd416b286de29b5c7aa26c6#diff-223fd724fe6b367e09a02cb41234144641bea0082b58946fb457281668b7b092R716-R718

Now I felt it was better than nothing: it's not a regression for sure.

@michitux
Copy link
Contributor

michitux commented Nov 7, 2024

I wonder if a better logic for "absolute" updates could be the following:

Check if the page to update is a source or target page of the current move. If yes, resolve the link relative to the old and the new location of the page to update (the current location is either the new or the old one). If the link resolved relative to the old location is in the pages to be moved, check if the link resolved relative to the new location is the move target. If yes, skip it, otherwise, update it.

As the "relative" update skips all links to pages in the move, if the page to be updated has already been moved, resolving the link relative to it might not yield the actual target of it. Instead, the code explicitly needs to resolve it relative to the previous location to find the true target before the move.

…ving a page

WIP
The idea of this work is to:
  1. Provide a way to access all documents that are moved as part of a
     move job
  2. Use that information when performing a call to ReferenceRenamer to
     define if a relative untyped link should be handled or not

On top of it, the idea is also to check if the doc exists in case of
refactoring of a link to avoid refactoring unexisting relative links.
One problem is remaining about relative link pointing to sibling pages
(e.g. the link to Alice in Bob page in the ticket): we rely apparently
to an old mechanism for backward compatibility reason for this to work
in the UI, we might need same thing in the check, or to decide to
ignore that UC.

I started to add an integration tests but for some reason it's not
passing, though it seemed to be working locally for the scenario
described in the ticket (except for the link in Bob page).
…ving a page

  * Fix integration test setup
  * Fix some signatures
  * Work on the conditions for performing link update: WIP
…ving a page

  * Fix conditions to make all RenamePageIT passing
  * WIP: need to double check that some conditions are not redundant and
    double check side effects
…ving a page

  * Simplify a bit the conditions in ResourceReferenceRenamer and ensure
    all unit tests are passing in refactoring module
…ving a page

  * Fix checkstyle
  * WIP: try to find proper oracle for renaming absolute references,
    without success so far.
…ving a page

  * Find proper conditions to perform or not link renames
  * Fix unit tests to add missing conditions
  * WIP: need to fix coverage and check on subwikis / with more
    conditions (e.g. with holes in hierarchy)
…ving a page

  * Fix a regression and provide a test to cover it
…ving a page

  * Provide subwiki integration tests
  * Minor improvment in RenamePageIT
…ving a page

  * Improve SubWikiIT to add more checks
…ving a page

  * Few improvments following review
…ving a page

  * Change APIs to use a Map<EntityReference, EntityReference>
    corresponding to the source and target of refactorings in renamers
  * Change some logic of AbstractCopyOrMoveJob to compute the actual
    couple source/destination before performing any operation and store
the info in EntitySelection
  * Add a log in RenameJob if it's not executed because of the number of
    entities (not needed for this issue, but felt better to understand
what's happening)
…ving a page

  * Fix remaining coverage problems
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants