Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re-downloaded copies of same (IMAP) messages are found by removedupes to have different bodies #191

Open
5TjpWBU2wkwHpFDb opened this issue Sep 21, 2023 · 14 comments

Comments

@5TjpWBU2wkwHpFDb
Copy link

5TjpWBU2wkwHpFDb commented Sep 21, 2023

I double-downloaded from an IMAP account to a local folder. So, many guaranteed dups. I even manually verified a few messages/files. No joy here: "No duplicates found."
Troubleshooting_Information_2023-09-21.txt
Additonal info (from Error Console): "Error: utils-message.js:2:1"

Thanks.

@eyalroz
Copy link
Owner

eyalroz commented Sep 22, 2023

Please check what happens if you remove "Number of lines" from the comparison criteria. Thunderbird has this problem where it sometimes appends 1-2 empty lines to the end of the message, when storing it.

Once you've removed this (and maybe other) comparison criteria and start seeing dupes, you again add criteria until you get the exact kinds of dupes you're interested in.

@5TjpWBU2wkwHpFDb
Copy link
Author

5TjpWBU2wkwHpFDb commented Sep 22, 2023 via email

@eyalroz
Copy link
Owner

eyalroz commented Sep 22, 2023

Well, the first point of note is that the "extra empty line" issue can sneak in through the body comparison.

But regardless - please gradually remove comparison criteria, one by one, and rerun the dupe check - until either you get dupes or you've removed all criteria.

Also, please check whether the Error Console has any warning or error messages mentioning "removedupes". The console is on the menus, under Tools | Developer Tools | Error Console.

@5TjpWBU2wkwHpFDb
Copy link
Author

5TjpWBU2wkwHpFDb commented Sep 22, 2023

Got it.
I ran a few tests. It seems that only issue is Body. I kept Status-Flags and Number-of-lines-in-message turned OFF.
If I specified MessageID, Folder, From, To, CC, Subject, SendTime, and Size (w/o Body) then I got 7334 sets of dups.
If I specified MessageID, Folder, From, To, CC, Subject, SendTime, and Body (w/o Size) then I got "No duplicates found."
Error Console seems innocuous.
Sorry to repeat, but I double-downloaded from an IMAP account to a local folder. So, many guaranteed dups.
Thanks.
PS. Sorry if the screen grabs are big.

Error Console:
10:43:17.594 No chrome package registered for chrome://communicator/skin/communicator.css
10:45:31.154 Error in parsing value for ‘width’. Declaration dropped. messenger.xhtml
10:47:39.391 No chrome package registered for chrome://communicator/skin/communicator.css
10:48:37.691 Error in parsing value for ‘width’. Declaration dropped. 2 messenger.xhtml
10:52:08.109 No chrome package registered for chrome://communicator/skin/communicator.css
10:53:30.416 Error in parsing value for ‘width’. Declaration dropped. 2 messenger.xhtml

Capture_2023-09-22 Capture_2023-09-22_Review_MessageID_and_Folder1 Capture_2023-09-22_Review_MessageID_Folder_From_To_CC_Subject_Body1 Capture_2023-09-22_Review_MessageID_Folder_From_To_CC_Subject_SendTime_Size1 Capture_2023-09-22_Review_MessageID_Folder_From_To_CC_Subject1 Capture_2023-09-22_Review_MessageID_Folder_From_To_CC1

@5TjpWBU2wkwHpFDb
Copy link
Author

Is there any other info./testing that I can provide? BTW, could this be related to issue 179 (Comparison of subjects with Unicode symbols fails)?

@JDrewes
Copy link

JDrewes commented Oct 4, 2023

Hi,
I have the same issue, starting with the upgrade of both thunderbird and removedupes to 115.2.
Does this mean that removedupes is now doing something different, or has thunderbird started to add more of those spurious newlines?

For me, I have 2 imap accounts, which receive both separate as well as identical emails. Due to the different pathways, the headers between duplicate emails can be quite different, but apparently, the linecount can also differ by 1 or 2, as @eyalroz indicated above.

Could this be solved by adding a "ignore whitespace" option to the body comparison?

Also, as a feature idea, it would be nice to be able to select two messages for comparison to see exactly where they are considered to be the same and where they differ. This would help greatly with criteria adjustment...

Thank you for providing this very essential functionality (I mean duperemove)! @eyalroz

@5TjpWBU2wkwHpFDb
Copy link
Author

Hi JDrewes,
Could you do me a favor? Double-download some messages from 1 account/pathway; run remove duplicates w/o Body, don't delete, (how many dupes?); run remove duplicates with Body, (how many dupes?). Do the counts match?
Thank you!

@JDrewes
Copy link

JDrewes commented Oct 5, 2023

I downloaded 5 messages twice from the same account. When comparing without Body, I get 5 pairs of 2 duplicates. When comparing with Body, I get "No duplicates found".

@5TjpWBU2wkwHpFDb
Copy link
Author

Thanks. That confirms it for me. There is either a (big?) change in TB 115 or an error in the Body comparison code. eyalroz, please, give us an update.

@5TjpWBU2wkwHpFDb
Copy link
Author

I found the culprit: get MsgService in RemoveDupes.MessengerOverlay.messageBodyFromURI() fails.
I added some diagnostic/status messages. I get the "Get MsgService . . ." messages in the Error console, but no "Got MsgService?" messages.

RemoveDupes.MessengerOverlay.messageBodyFromURI = function (msgURI) {
console.log("RemoveDupes.MessengerOverlay.messageBodyFromURI(): main entry . . .");
console.log(RemoveDupes.MessengerOverlay.messageBodyFromURI(): msgURI = ${msgURI});
// The following lines don't work because of asynchronicity
// let msgHdr = RemoveDupes.GetMsgFolderFromUri(msgURI);
// let msgContent = await getRawMessage(msgHdr);
let msgContent = "";
let MsgService;
console.log("Get MsgService . . .");
try {
MsgService = messenger.messageServiceFromURI(msgURI);
} catch (ex) {
return null;
}
console.log("Got MsgService?");
let MsgStream = Cc["@mozilla.org/network/sync-stream-listener;1"].createInstance();
let consumer = MsgStream.QueryInterface(Ci.nsIInputStream);
let ScriptInput = Cc["@mozilla.org/scriptableinputstream;1"].createInstance();
let ScriptInputStream = ScriptInput.QueryInterface(Ci.nsIScriptableInputStream);
ScriptInputStream.init(consumer);
console.log("Try MsgService.streamMessage . . .");
try {
MsgService.streamMessage(msgURI, MsgStream, msgWindow, null, false, null);
} catch (ex) {
return null;
}
console.log("Get msgContent . . .");
ScriptInputStream.available();
while (ScriptInputStream.available()) {
msgContent += ScriptInputStream.read(512);
}

console.log("Got msgContent");

@5TjpWBU2wkwHpFDb
Copy link
Author

5TjpWBU2wkwHpFDb commented Oct 5, 2023

I have a fix: messenger.messageServiceFromURI is not a function; Try using the MailServices.messageServiceFromURI function.
https://forums.mozillazine.org/viewtopic.php?p=14960035&sid=b9c05d0ec6b2e4640955fa7c7429df84#p14960035
https://forums.mozillazine.org/viewtopic.php?p=14960023&sid=b9c05d0ec6b2e4640955fa7c7429df84#p14960023
I have tested it and it works now. I will try to upload a patch ASAP.
removedupes_0.5.4b5_tbird.xpi.zip

@dbirchbauer
Copy link

I can also confirm something isnt working correctly. I use basic gmail forwarding. I am trying to cleanup the email between the accounts, I used to look for matching message-id values, but now nothing is found (I have manually compared several messages and their ID do match). Even searching for the same Author/Subject/Send Time (using seconds) and I get nothing.

@eyalroz
Copy link
Owner

eyalroz commented Nov 9, 2023

Hello everyone,

as you may be aware - there is a war going on here; I don't live in Gaza, I live in the Israeli-controlled part of Palestine - but here too there a bunch of government repression, and logistical trouble for a lot of people who have been either evacuated or told not to go to work etc. which volunteers in charitable organizations try to deal with, each in their own context. Plus I always have other obligations before removedupes maintenance, so - my apologies for not replying earlier.


@5TjpWBU2wkwHpFDb wrote:

It seems that only issue is Body. ... If I specified MessageID, Folder, From, To, CC, Subject, SendTime, and Size (w/o Body) then I got 7334 sets of dups. If I specified MessageID, Folder, From, To, CC, Subject, SendTime, and Body (w/o Size) then I got "No duplicates found."

Ok, so - let's make this bug page about just this specific issue, and nothing else. All commenters - if you have a similar/related problem, but not identical to this one - please open a separate issue.

@5TjpWBU2wkwHpFDb - if you move two of the duplicate-except-for-body messages into a local folder, does the problem persist? If it does, can you zip that folder and send it to me or attach it here? I would prefer messages which are as small and simple as possible.

@eyalroz eyalroz changed the title Just not finding dups. TB 115.2.2 Re-downloaded copies of same (IMAP) messages are found by removedupes to have different bodies Nov 9, 2023
@5TjpWBU2wkwHpFDb
Copy link
Author

Eyal, I understand.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants