Skip to content

Commit

Permalink
Merge pull request #3268 from omnivore-app/fix/substack
Browse files Browse the repository at this point in the history
remove hidden labels from substack post by readability
  • Loading branch information
sywhb authored Dec 20, 2023
2 parents f47c83d + 9922e0b commit 78307b7
Show file tree
Hide file tree
Showing 7 changed files with 3,172 additions and 337 deletions.
2 changes: 1 addition & 1 deletion packages/readabilityjs/Readability.js
Original file line number Diff line number Diff line change
Expand Up @@ -202,7 +202,7 @@ Readability.prototype = {
// Readability-readerable.js. Please keep both copies in sync.
articleNegativeLookBehindCandidates: /breadcrumbs|breadcrumb|utils|trilist|_header/i,
articleNegativeLookAheadCandidates: /outstream(.?)_|sub(.?)_|m_|omeda-promo-|in-article-advert|block-ad-.*|tl_/i,
unlikelyCandidates: /\bad\b|ai2html|banner|breadcrumbs|breadcrumb|combx|comment|community|cover-wrap|disqus|extra|footer|gdpr|header|legends|menu|related|remark|replies|rss|shoutbox|sidebar|skyscraper|social|sponsor|supplemental|ad-break|agegate|pagination|pager(?!ow)|popup|yom-remote|copyright|keywords|outline|infinite-list|beta|recirculation|site-index|hide-for-print|post-end-share-cta|post-end-cta-full|post-footer|post-head|post-tag|li-date|main-navigation|programtic-ads|outstream_article|hfeed|comment-holder|back-to-top|show-up-next|onward-journey|topic-tracker|list-nav|block-ad-entity|adSpecs|gift-article-button|modal-title|in-story-masthead|share-tools|standard-dock|expanded-dock|margins-h|subscribe-dialog|icon|bumped|dvz-social-media-buttons|post-toc|mobile-menu|mobile-navbar|tl_article_header|mvp(-post)*-(add-story|soc(-mob)*-wrap)|w-condition-invisible|rich-text-block main w-richtext|rich-text-block_ataglance at-a-glance test w-richtext|PostsPage-commentsSection/i,
unlikelyCandidates: /\bad\b|ai2html|banner|breadcrumbs|breadcrumb|combx|comment|community|cover-wrap|disqus|extra|footer|gdpr|header|legends|menu|related|remark|replies|rss|shoutbox|sidebar|skyscraper|social|sponsor|supplemental|ad-break|agegate|pagination|pager(?!ow)|popup|yom-remote|copyright|keywords|outline|infinite-list|beta|recirculation|site-index|hide-for-print|post-end-share-cta|post-end-cta-full|post-footer|post-head|post-tag|li-date|main-navigation|programtic-ads|outstream_article|hfeed|comment-holder|back-to-top|show-up-next|onward-journey|topic-tracker|list-nav|block-ad-entity|adSpecs|gift-article-button|modal-title|in-story-masthead|share-tools|standard-dock|expanded-dock|margins-h|subscribe-dialog|icon|bumped|dvz-social-media-buttons|post-toc|mobile-menu|mobile-navbar|tl_article_header|mvp(-post)*-(add-story|soc(-mob)*-wrap)|w-condition-invisible|rich-text-block main w-richtext|rich-text-block_ataglance at-a-glance test w-richtext|PostsPage-commentsSection|hide-text/i,
// okMaybeItsACandidate: /and|article(?!-breadcrumb)|body|column|content|main|shadow|post-header/i,
get okMaybeItsACandidate() {
return new RegExp(`and|(?<!${this.articleNegativeLookAheadCandidates.source})article(?!-(${this.articleNegativeLookBehindCandidates.source}))|body|column|content|^(?!main-navigation|main-header)main|shadow|post-header|hfeed site|blog-posts hfeed|container-banners|menu-opacity|header-with-anchor-widget|commentOnSelection`, 'i')
Expand Down
678 changes: 342 additions & 336 deletions packages/readabilityjs/test/index.html

Large diffs are not rendered by default.

113 changes: 113 additions & 0 deletions packages/readabilityjs/test/test-pages/substack-new/distiller.html

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
{
"title": "Highlights from James Bennet's 17,000 word attack on the New York Times",
"byline": "Ann Coulter",
"dir": null,
"excerpt": "I've just saved you 6 hours.",
"siteName": "Unsafe",
"siteIcon": "https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F90209576-d3de-4442-b39b-aabda8670064%2Ffavicon.ico",
"previewImage": "https://substackcdn.com/image/fetch/f_auto,q_auto:best,fl_progressive:steep/https%3A%2F%2Fanncoulter.substack.com%2Fapi%2Fv1%2Fpost_preview%2F139824544%2Ftwitter.jpg%3Fversion%3D3",
"publishedDate": "2023-12-16T14:22:08.000Z",
"language": "English",
"readerable": true
}
377 changes: 377 additions & 0 deletions packages/readabilityjs/test/test-pages/substack-new/expected.html

Large diffs are not rendered by default.

Loading

1 comment on commit 78307b7

@vercel
Copy link

@vercel vercel bot commented on 78307b7 Dec 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please sign in to comment.