-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[PRO] Improve the way we remove duplicates during the import #1006
Comments
@vytisbulkevicius This is not the best approach either because if we compare by the item title and the user is using two different feed URLs, it will remove unique items if they share the same title. This is an edge case and we shouldn't be bothered about it apart from the documentation. The reason here being that whatever change we make, it will create more new edge cases. The current implementation that we have is quite reasonable. |
@HardeepAsrani can you share then what the Remove Duplicates Toggle does and why isn't it enabled by default? Duplicate items is very common issue our customers/users have so I think we should consider improving the logic even if my suggestion is not the best (I understand that the same title doesn't always mean they are duplicates) |
Such a toggle can be used that if it's enabled, we then remove checking for duplicated title as well which I still think is good, we can mention what it does. I think it's better to skip some item from being imported than to see duplicates on a website |
in the future we can offer a customization per job where people can define the duplicate criteria, i.e they can select title/url, etc as elements to compound the unique key. |
@vytisbulkevicius @selul @HardeepAsrani Currently, we identify duplicate items based on the
Using the If you have any other suggestions in mind, please let me know. Thank you! |
the idea was to give that flexibility to the end user, by default we keep it working as it is now but next to this toggle to remove duplicate we give an option to add fields like Title, Content, URL and if user thinks it's better for them to remove duplicates using such criteria they can choose them. Do you see this as complicated and causing risks? |
It can be challenging to identify duplicate items in imported posts when users have utilized our translation or rewrite services for titles, and content fields. For example, if a user translates a title, it becomes hard to find duplicates in previous imports because the original title is no longer available. I hope this makes sense. |
Understood. I thought we don't compare with new/rewritten title but save the original one. And the way we currently compare by link is that we have it stored (the original link) and compare by that? If we want to do by title we should also store original title the same way? |
@girishpanchal30 I suggest the following:
|
@selul, I understand your suggestion, and it seems possible to implement. I need to add a new text box field next to the "Remove Duplicates" field, and this will be for PRO users only, correct? And users can enter any magic tag in this field, right? |
That's correct. And the computed value of each item will be saved with the import as a meta, we can use sanitize_key to create the value based item values so we can later identify them. We can use a fixed length for the key like 256 chars. When checking future imports, we should check if there is a post with similar keys. Please have in mind if the field is not saved/set, the previous mechanism you used for duplication should still apply. |
I have implemented the suggestion in this PR. Please review it and let me know if you encounter any issues. |
Tested and feature mostly works. Potential issue: Also, I'm not sure about whether you can add multiple tags only a single tag? Does it mean that items will be treated as duplicates ONLY IF both title and content are identical or enough for ONE OF THEM to be identical? [My expectation is AND condition, not OR if we allow multiple tags] Or it shouldn't work at all and I can only add a single tag? Because if I add both tags [i#tem_title] and [#item_content] duplicates are imported. You can test with this feed: |
@vytisbulkevicius I’ve disabled the input field when the toggle is not enabled. Would you prefer to hide or show the entire section in this case? Thanks! |
It only works with a single tag, not multiple.
Using multiple tags doesn't work properly; it falls back to the old method for checking duplicate items. Adding multiple tags might cause issues. For example: Using multiple tags like In this case, if |
@girishpanchal30 I think this is ok; if this is the only drawback, we can allow multiple tags having in mind that the limit of the key will be max 256. Please make the limit length customizable via some filters also. |
Good, I think it makes sense to have the input disabled. I tested and looks good to me. |
@vytisbulkevicius @selul I've added support for multiple tags and introduced a new filter that allows users to modify the character limit.
Thanks |
Unfortunately, after the recent changes it doesn't work at all for me. You can check it here:
Import: https://boringpaste.s3-tastewp.com/wp-admin/post.php?post=9&action=edit |
@vytisbulkevicius I've checked with the local setup, and everything seems to be working fine. It seems there may be an issue with the long meta key. I'll check it and let you know. Thanks |
@vytisbulkevicius I have reset the import job in your test instance, re-run it, and it appears to be working fine. |
I don't know what I'm doing wrong, but it still doesn't work. I tried again on the test instance I shared above, and it still doesn't work. I also created a new fresh one and same thing happens:
Screencast: https://vertis.d.pr/v/xfOTSF |
@vytisbulkevicius I’ve updated the meta key flag, and it seems to be working now. Please test it with the latest build zip. |
I’ve already used the
Fixed, please check with the latest build zip. |
@girishpanchal30, I will check again by modifying some words. |
@vytisbulkevicius Also please increase the limit char if content is long. |
Yes, it worked fine after I changed the content, not just added that additional dot in the end. However, I'm not sure why this happens but with the latest build I'm getting weird behavior, there is an error in the input field and I can't overwrite it (happens on any fresh installation, so shouldn't be cache): Error is: If the error is related to some plugin, you can disable all plugins in TasteWP.com dashboard.
|
@vytisbulkevicius Resolved! The issue occurred after I rebased the branch code. |
works great now, thanks! |
What problem does this address?
The way we look for duplicates is not ideal, we compare by
title
of the feed ANDlink
of an item, this works if a single Feed is used but if you are using multiple feeds URLs will be different and it can be the same item.We have this option:
Which to be honest I don't know what is doing, I think it should be enabled by default and documentation to which it points is a code snippet I added a few months ago as before it didn't make sense at all and was pointing to a code snippet used for shortcode/widget usage:
I think what we can do is to rename it to remove duplicates by title and make it work as this code snippet:
https://github.com/Codeinwp/feedzy-rss-feeds-pro/issues/754#issuecomment-2382295616
What is your proposed solution?
No response
Will this feature require documentation? (Optional)
None
The text was updated successfully, but these errors were encountered: