Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Links that are not working correctly. [post them here] #3

Open
bndr opened this issue Apr 27, 2014 · 10 comments
Open

Links that are not working correctly. [post them here] #3

bndr opened this issue Apr 27, 2014 · 10 comments
Labels

Comments

@bndr
Copy link
Owner

bndr commented Apr 27, 2014

If you find any links that node-read cannot correctly parse, please post them here.

@bndr bndr added the bug label Apr 27, 2014
@scheeser
Copy link
Contributor

http://bits.blogs.nytimes.com/2014/04/26/writing-in-a-nonstop-world

(node) warning: possible EventEmitter memory leak detected. 11 listeners added. Use emitter.setMaxListeners() to increase limit.
Trace
    at Request.EventEmitter.addListener (events.js:160:15)
    at Request.self._buildRequest (~/Desktop/temp2/node_modules/node-read/node_modules/request/request.js:366:10)
    at Request.init (~/Desktop/temp2/node_modules/node-read/node_modules/request/request.js:503:10)
    at Request.onResponse (~/Desktop/temp2/node_modules/node-read/node_modules/request/request.js:899:10)
    at ClientRequest.g (events.js:180:16)
    at ClientRequest.EventEmitter.emit (events.js:95:17)
    at HTTPParser.parserOnIncomingClient [as onIncoming] (http.js:1688:21)
    at HTTPParser.parserOnHeadersComplete [as onHeadersComplete] (http.js:121:23)
    at Socket.socketOnData [as ondata] (http.js:1583:20)
    at TCP.onread (net.js:527:27)
(node) warning: possible EventEmitter memory leak detected. 11 listeners added. Use emitter.setMaxListeners() to increase limit.
Trace
    at Request.EventEmitter.addListener (events.js:160:15)
    at Request.start (~/Desktop/temp2/node_modules/node-read/node_modules/request/request.js:700:8)
    at Request.end (~/Desktop/temp2/node_modules/node-read/node_modules/request/request.js:1319:28)
    at ~/Desktop/temp2/node_modules/node-read/node_modules/request/request.js:418:14
    at process._tickCallback (node.js:415:13)

~/Desktop/temp2/node_modules/node-read/index.js:78
      parseDOM(buffer.toString("utf8"), res);
                      ^
TypeError: Cannot call method 'toString' of undefined
    at Request._callback (~/Desktop/temp2/node_modules/node-read/index.js:78:23)
    at self.callback (~/Desktop/temp2/node_modules/node-read/node_modules/request/request.js:121:22)
    at Request.EventEmitter.emit (events.js:95:17)
    at Request.onResponse (~/Desktop/temp2/node_modules/node-read/node_modules/request/request.js:857:12)
    at ClientRequest.g (events.js:180:16)
    at ClientRequest.EventEmitter.emit (events.js:95:17)
    at HTTPParser.parserOnIncomingClient [as onIncoming] (http.js:1688:21)
    at HTTPParser.parserOnHeadersComplete [as onHeadersComplete] (http.js:121:23)
    at Socket.socketOnData [as ondata] (http.js:1583:20)
    at TCP.onread (net.js:527:27)

@bndr
Copy link
Owner Author

bndr commented Apr 28, 2014

There's a bug with nytimes and request library.
request/request#311 (comment)
request/request#673
request/request#865

Not much I can do here.

@simonccarter
Copy link

Trailing white space in title from:
http://www.howiechong.com/journal/2014/2/bike-helmets

@bndr
Copy link
Owner Author

bndr commented May 15, 2014

Thanks! Fixed that.

@midknight41
Copy link

First of all, thanks for doing this! The jsdom one was way too slow for my purposes and I've become increasingly frustrated by it. The performance improvements are off the charts!

A couple minor issues popped up when I migrated to your version:

The h1 tag is coming back inside a p tag.
http://www.bbc.co.uk/sport/0/football/26060048

Also, In node-readability I believe these codes (& #8217; & #039; & quot; & #8220; & #8221;) were translated automatically. Up to you if you want to mimic that behaviour or not.

@bndr
Copy link
Owner Author

bndr commented Aug 31, 2014

Thanks! I'll look into it.

@midknight41
Copy link

Another issue I'm afraid...

http://www.bbc.co.uk/sport/0/football/29053651

The first sentence in the article is not included in the result.

@kannanth
Copy link

kannanth commented Jun 5, 2015

Hello,
When I try the code with following article,
http://www.nbcnews.com/tech/mobile/facebook-lite-app-launched-tap-emerging-markets-n370481

I get this

has launched a stripped-down version of its app aimed at the huge potential user base in emerging markets.

The social media giant that it was testing the Android app in January, and was now rolling it out across countries in Asia, followed by parts of Latin America, Africa and Europe.

(and more...)

Basically, its cleaning out anchor tags within main article.

@rodrigocprates
Copy link

A simple tag is not showing links when getting through article.content:

Ministério das Comunicações

@neurosurgeonX
Copy link

http://tech.sina.com.cn/t/2016-04-20/doc-ifxrpvcy4243221.shtml

error when handleing this link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

7 participants