Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: fix outlook quotes issue and release v0.1.2 #18

Merged
merged 1 commit into from
Apr 8, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 5 additions & 2 deletions packages/mailtools/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -116,8 +116,11 @@ We picked up on `tempo-email-parser` which was not being maintained any more and

## Limitations

It seems like we are unable to extract outlook signatures correctly. We need more source emails to add to the parsing tests and functions.
If you can help out with this, please open an issue with some html emails we can use
Its nearly impossible to parse every kind of outlook emails. We have implemented some measures to be able to parse them but we are not able to parse certain kind of signatures from them. Its totally impossible for us to parse them with out using some kind of LLM. Thats also might not be accurate.

We have covered major providers like gmail, newer outlook clients, proton mail and a few others.

You can help us improve this package by testing your email clients and signatures at <https://tools.unin.sh> and report in the built-in feedback system.

## License

Expand Down
2 changes: 1 addition & 1 deletion packages/mailtools/jsr.json
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
{
"name": "@u22n/mailtools",
"version": "0.1.1",
"version": "0.1.2",
"exports": "./src/index.ts"
}
2 changes: 1 addition & 1 deletion packages/mailtools/package.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"name": "@u22n/mailtools",
"version": "0.1.1",
"version": "0.1.2",
"type": "module",
"description": "Processes HTML email for display. Extracts quotations and more. Successor to tempo-email-parser.",
"main": "./dist/index.js",
Expand Down
20 changes: 18 additions & 2 deletions packages/mailtools/src/removeQuotations.ts
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ function removeQuotations($: CheerioAPI): { didFindQuotation: boolean } {
* Returns a selection of all quote elements that should be removed
*/
function findAllQuotes($: CheerioAPI) {
const quoteElements = $(
let quoteElements = $(
[
'.gmail_quote',
'blockquote',
Expand All @@ -38,7 +38,11 @@ function findAllQuotes($: CheerioAPI) {
// ENHANCEMENT: Add findQuotesAfter__OriginalMessage__
].join(', ')
);
// console.log(quoteElements.html());

if (quoteElements.length === 0) {
quoteElements = findAllQuotesOutlook($);
}

// Ignore inline quotes. Quotes that are followed by non-quote blocks.
const quoteElementsSet = new Set(toArray(quoteElements));
const withoutInlineQuotes = quoteElements.filter(
Expand All @@ -48,6 +52,18 @@ function findAllQuotes($: CheerioAPI) {
return withoutInlineQuotes;
}

// its always outlook that has everything built different
function findAllQuotesOutlook($: CheerioAPI) {
const quoteStart = $("div[style*='border-top']").first();
const quotation = quoteStart.add(quoteStart.nextAll());
if (quotation.length === 0) {
return $();
}
const newHolder = $('<div></div>');
quotation.each((_, el) => void newHolder.append($(el)));
return newHolder;
}

/**
* Returns true if the element looks like an inline quote:
* it is followed by unquoted elements
Expand Down
7 changes: 6 additions & 1 deletion packages/mailtools/src/removeSignatures.ts
Original file line number Diff line number Diff line change
Expand Up @@ -106,10 +106,15 @@ function findAllSignatures($: CheerioAPI) {
}

function findAllSignaturesOutlook($: CheerioAPI) {
// this works in most cases, but fails in cases like outlook-client-5 in fixtures
// there is nothing we can even do in that case
// I had to leave that test case with a part of signature in it, so basically the test is invalid
// its kept for future references
const start = $(
':has(>[style*="mso-ligatures"], >[style*="mso-fareast"])'
).first();
const signatureTags = start.add(start.nextAll());
// Outlook native signatures end at usually in a div with a border-top style
const signatureTags = start.add(start.nextUntil("div[style*='border-top']"));
const newHolder = $('<div></div>');
signatureTags.each((_, el) => void newHolder.append($(el)));
return newHolder;
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,107 @@
<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta name="Generator" content="Microsoft Word 15 (filtered medium)">
<!--[if !mso]><style>v\:* {behavior:url(#default#VML);}
o\:* {behavior:url(#default#VML);}
w\:* {behavior:url(#default#VML);}
.shape {behavior:url(#default#VML);}
</style><![endif]--><style><!--
/* Font Definitions */
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
{font-family:Verdana;
panose-1:2 11 6 4 3 5 4 4 2 4;}
@font-face
{font-family:Aptos;}
@font-face
{font-family:"Segoe UI Emoji";
panose-1:2 11 5 2 4 2 4 2 2 3;}
@font-face
{font-family:Raleway;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0cm;
font-size:12.0pt;
font-family:"Aptos",sans-serif;}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:#0563C1;
text-decoration:underline;}
.MsoChpDefault
{mso-style-type:export-only;
font-size:10.0pt;
mso-ligatures:none;}
@page WordSection1
{size:612.0pt 792.0pt;
margin:70.85pt 70.85pt 70.85pt 70.85pt;}
div.WordSection1
{page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang="NL-BE" link="#0563C1" vlink="#954F72" style="word-wrap:break-word">
<div class="WordSection1">
<p class="MsoNormal"><span lang="EN-GB" style="font-size:10.0pt;font-family:Raleway">Received but what about signature &amp; attachments?<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:10.0pt;font-family:Raleway"><o:p>&nbsp;</o:p></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:10.0pt;font-family:Raleway">Test attachment.txt included!<br>
<br>
What about a screenshot?<o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:10.0pt;font-family:Raleway"><o:p>&nbsp;</o:p></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:10.0pt;font-family:Raleway"><img width="604" height="347" style="width:6.2916in;height:3.6145in" id="Afbeelding_x0020_2" src="cid:12345678"></span><span lang="EN-GB" style="font-size:10.0pt;font-family:Raleway"><o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:10.0pt;font-family:&quot;Segoe UI Emoji&quot;,sans-serif">&#129315;</span><span lang="EN-GB" style="font-size:10.0pt;font-family:Raleway"> screenshot end
</span><span lang="EN-GB" style="font-size:10.0pt;font-family:&quot;Segoe UI Emoji&quot;,sans-serif">&#128515;</span><span lang="EN-GB" style="font-size:10.0pt;font-family:Raleway"><o:p></o:p></span></p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size:10.0pt;font-family:Raleway"><o:p>&nbsp;</o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:Raleway;color:#0E2432">Met vriendelijke groet<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:Raleway;mso-fareast-language:EN-US"><o:p>&nbsp;</o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:Raleway"><o:p>&nbsp;</o:p></span></p>
<table class="MsoNormalTable" border="0" cellspacing="0" cellpadding="0" style="margin-left:-.4pt;border-collapse:collapse">
<tbody>
<tr>
<td width="85" valign="top" style="width:63.8pt;padding:0cm 5.4pt 0cm 5.4pt">
<p class="MsoNormal" style="margin-bottom:12.0pt"><b><span style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#C4014B;mso-fareast-language:NL"><img width="95" height="95" style="width:.9895in;height:.9895in" id="Afbeelding_x0020_12" src="cid:12345678"></span></b><b><span style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#C4014B;mso-fareast-language:NL"><o:p></o:p></span></b></p>
</td>
<td width="541" valign="top" style="width:406.0pt;padding:0cm 5.4pt 0cm 5.4pt">
<p class="MsoNormal" style="margin-bottom:12.0pt"><b><span style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#C4014B;mso-fareast-language:NL"><br>
</span></b><b><span style="font-size:10.0pt;font-family:Raleway;color:#C4014B;mso-fareast-language:NL">Your Name</span></b><b><span style="font-size:10.0pt;font-family:&quot;Something&quot;,sans-serif;color:#C4014B;mso-fareast-language:NL">
<br>
</span></b><b><span style="font-size:10.0pt;font-family:Raleway;color:#0E2432;mso-fareast-language:NL">Your Position<br>
Your Company<o:p></o:p></span></b></p>
</td>
</tr>
</tbody>
</table>
<p class="MsoNormal"><b><span style="font-size:10.0pt;font-family:Raleway;color:#0E2432;mso-fareast-language:NL">Tel. 123 456 789</span></b><span style="font-size:10.0pt;font-family:Raleway;color:#0E2432;mso-fareast-language:NL"><o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:&quot;Calibri&quot;,sans-serif;color:black;mso-fareast-language:NL"><o:p>&nbsp;</o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:Raleway;color:gray;mso-fareast-language:NL">Company -&nbsp;</span><span style="font-size:10.0pt;font-family:&quot;Calibri&quot;,sans-serif"><a href="http://www.example.com/"><span style="font-family:Raleway;color:gray;mso-fareast-language:NL">example.com</span></a></span><span style="font-size:10.0pt;font-family:Raleway;color:black;mso-fareast-language:NL"><o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:Raleway;color:gray;mso-fareast-language:NL">Address<br>
</span><span style="font-size:8.0pt;font-family:Raleway;color:gray;mso-fareast-language:NL"><br>
</span><span style="font-size:7.0pt;font-family:Raleway;color:gray;mso-fareast-language:NL">Deze e-mail en eventuele bijlagen zijn vertrouwelijk en kunnen onder het wettelijk zwijgrecht vallen.<br>
Indien u niet de geadresseerde bent, is het ten strengste verboden deze e-mail publiek te maken, te reproduceren, te verdelen, of op een andere manier te verspreiden of te gebruiken.<br>
Indien u dit bericht per vergissing hebt ontvangen, gelieve dan de verzender onmiddellijk op de hoogte te stellen en deze e-mail te verwijderen.</span><span style="font-size:11.0pt;font-family:&quot;Calibri&quot;,sans-serif"><o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:Raleway"><o:p>&nbsp;</o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:Raleway;mso-fareast-language:EN-US"><o:p>&nbsp;</o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:Raleway;mso-fareast-language:EN-US"><o:p>&nbsp;</o:p></span></p>
<div style="border:none;border-top:solid #E1E1E1 1.0pt;padding:3.0pt 0cm 0cm 0cm">
<p class="MsoNormal"><b><span lang="NL" style="font-size:11.0pt;font-family:&quot;Calibri&quot;,sans-serif">Van:</span></b><span lang="NL" style="font-size:11.0pt;font-family:&quot;Calibri&quot;,sans-serif">
<a href="mailto:random@example.com">random@example.com</a> &lt;<a href="mailto:random@example.com">random@example.com</a>&gt;
<br>
<b>Verzonden:</b> zaterdag 6 april 2024 20:48<br>
<b>Aan:</b> Jelle Revyn &lt;<a href="mailto:random2@example.com">random2@example.com</a>&gt;<br>
<b>Onderwerp:</b> Test from unin.me<o:p></o:p></span></p>
</div>
<p class="MsoNormal"><o:p>&nbsp;</o:p></p>
<p>Signature for sure isn't filtered... Do I still get in spam box?<o:p></o:p></p>
</div>
</body>
</html>

Original file line number Diff line number Diff line change
@@ -0,0 +1,128 @@
<html
xmlns:v="urn:schemas-microsoft-com:vml"
xmlns:o="urn:schemas-microsoft-com:office:office"
xmlns:w="urn:schemas-microsoft-com:office:word"
xmlns:m="http://schemas.microsoft.com/office/2004/12/omml"
xmlns="http://www.w3.org/TR/REC-html40"
>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<meta name="Generator" content="Microsoft Word 15 (filtered medium)" />

<meta name="viewport" content="width=device-width" />
<style>
.customStyle {
background: red;
}
</style>
</head>
<body lang="NL-BE" link="#0563C1" vlink="#954F72" style="word-wrap: break-word">
<div class="WordSection1">
<p class="MsoNormal">
<span lang="EN-GB" style="font-size: 10pt; font-family: Raleway">Received but what about signature &amp; attachments?</span>
</p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size: 10pt; font-family: Raleway">&nbsp;</span></p>
<p class="MsoNormal">
<span lang="EN-GB" style="font-size: 10pt; font-family: Raleway"
>Test attachment.txt included!<br />
<br />
What about a screenshot?</span
>
</p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size: 10pt; font-family: Raleway">&nbsp;</span></p>
<p class="MsoNormal">
<span lang="EN-GB" style="font-size: 10pt; font-family: Raleway"
><img width="604" height="347" style="width: 6.2916in; height: 3.6145in" id="Afbeelding_x0020_2" src="cid:12345678" /></span
><span lang="EN-GB" style="font-size: 10pt; font-family: Raleway"></span>
</p>
<p class="MsoNormal">
<span lang="EN-GB" style="font-size: 10pt; font-family: &quot;Segoe UI Emoji&quot;, sans-serif">🤣</span
><span lang="EN-GB" style="font-size: 10pt; font-family: Raleway"> screenshot end </span
><span lang="EN-GB" style="font-size: 10pt; font-family: &quot;Segoe UI Emoji&quot;, sans-serif">😃</span
><span lang="EN-GB" style="font-size: 10pt; font-family: Raleway"></span>
</p>
<p class="MsoNormal"><span lang="EN-GB" style="font-size: 10pt; font-family: Raleway">&nbsp;</span></p>
<p class="MsoNormal"><span style="font-size: 10pt; font-family: Raleway; color: #0e2432">Met vriendelijke groet</span></p>
<p class="MsoNormal"><span style="font-size: 10pt; font-family: Raleway; mso-fareast-language: EN-US">&nbsp;</span></p>
<p class="MsoNormal"><span style="font-size: 10pt; font-family: Raleway">&nbsp;</span></p>
<table class="MsoNormalTable" border="0" cellspacing="0" cellpadding="0" style="margin-left: -0.4pt; border-collapse: collapse">
<tbody>
<tr>
<td width="85" valign="top" style="width: 63.8pt; padding: 0cm 5.4pt 0cm 5.4pt">
<p class="MsoNormal" style="margin-bottom: 12pt">
<b
><span style="font-size: 10pt; font-family: &quot;Verdana&quot;, sans-serif; color: #c4014b; mso-fareast-language: NL"
><img width="95" height="95" style="width: 0.9895in; height: 0.9895in" id="Afbeelding_x0020_12" src="cid:12345678" /></span></b
><b><span style="font-size: 10pt; font-family: &quot;Verdana&quot;, sans-serif; color: #c4014b; mso-fareast-language: NL"></span></b>
</p>
</td>
<td width="541" valign="top" style="width: 406pt; padding: 0cm 5.4pt 0cm 5.4pt">
<p class="MsoNormal" style="margin-bottom: 12pt">
<b
><span style="font-size: 10pt; font-family: &quot;Verdana&quot;, sans-serif; color: #c4014b; mso-fareast-language: NL"
><br /> </span></b
><b><span style="font-size: 10pt; font-family: Raleway; color: #c4014b; mso-fareast-language: NL">Your Name</span></b
><b
><span style="font-size: 10pt; font-family: &quot;Something&quot;, sans-serif; color: #c4014b; mso-fareast-language: NL">
<br /> </span></b
><b
><span style="font-size: 10pt; font-family: Raleway; color: #0e2432; mso-fareast-language: NL"
>Your Position<br />
Your Company</span
></b
>
</p>
</td>
</tr>
</tbody>
</table>
<p class="MsoNormal">
<b><span style="font-size: 10pt; font-family: Raleway; color: #0e2432; mso-fareast-language: NL">Tel. 123 456 789</span></b
><span style="font-size: 10pt; font-family: Raleway; color: #0e2432; mso-fareast-language: NL"></span>
</p>
<p class="MsoNormal">
<span style="font-size: 10pt; font-family: &quot;Calibri&quot;, sans-serif; color: black; mso-fareast-language: NL">&nbsp;</span>
</p>
<p class="MsoNormal">
<span style="font-size: 10pt; font-family: Raleway; color: gray; mso-fareast-language: NL">Company -&nbsp;</span
><span style="font-size: 10pt; font-family: &quot;Calibri&quot;, sans-serif"
><a href="http://www.example.com/" title="http://www.example.com/"
><span style="font-family: Raleway; color: gray; mso-fareast-language: NL">example.com</span></a
></span
><span style="font-size: 10pt; font-family: Raleway; color: black; mso-fareast-language: NL"></span>
</p>
<p class="MsoNormal">
<span style="font-size: 10pt; font-family: Raleway; color: gray; mso-fareast-language: NL">Address<br /> </span
><span style="font-size: 8pt; font-family: Raleway; color: gray; mso-fareast-language: NL"><br /> </span
><span style="font-size: 7pt; font-family: Raleway; color: gray; mso-fareast-language: NL"
>Deze e-mail en eventuele bijlagen zijn vertrouwelijk en kunnen onder het wettelijk zwijgrecht vallen.<br />
Indien u niet de geadresseerde bent, is het ten strengste verboden deze e-mail publiek te maken, te reproduceren, te verdelen, of op een
andere manier te verspreiden of te gebruiken.<br />
Indien u dit bericht per vergissing hebt ontvangen, gelieve dan de verzender onmiddellijk op de hoogte te stellen en deze e-mail te
verwijderen.</span
><span style="font-size: 11pt; font-family: &quot;Calibri&quot;, sans-serif"></span>
</p>
<p class="MsoNormal"><span style="font-size: 10pt; font-family: Raleway">&nbsp;</span></p>
<p class="MsoNormal"><span style="font-size: 10pt; font-family: Raleway; mso-fareast-language: EN-US">&nbsp;</span></p>
<p class="MsoNormal"><span style="font-size: 10pt; font-family: Raleway; mso-fareast-language: EN-US">&nbsp;</span></p>
<div style="border: none; border-top: solid #e1e1e1 1pt; padding: 3pt 0cm 0cm 0cm">
<p class="MsoNormal">
<b><span lang="NL" style="font-size: 11pt; font-family: &quot;Calibri&quot;, sans-serif">Van:</span></b
><span lang="NL" style="font-size: 11pt; font-family: &quot;Calibri&quot;, sans-serif">
<a href="mailto:random@example.com" title="mailto:random@example.com">random@example.com</a> &lt;<a
href="mailto:random@example.com"
title="mailto:random@example.com"
>random@example.com</a
>&gt;
<br />
<b>Verzonden:</b> zaterdag 6 april 2024 20:48<br />
<b>Aan:</b> Jelle Revyn &lt;<a href="mailto:random2@example.com" title="mailto:random2@example.com">random2@example.com</a>&gt;<br />
<b>Onderwerp:</b> Test from <a href="http://unin.me" target="_blank" rel="noopener noreferrer" title="http://unin.me">unin.me</a></span
>
</p>
</div>
<p class="MsoNormal">&nbsp;</p>
<p>Signature for sure isn't filtered... Do I still get in spam box?</p>
</div>
</body>
</html>
Loading