In March 2013, Google announced that Google Reader was to be closed. I used Google Reader every day so I set out to find a replacement. I started with other online offerings, but then I thought "I could build one". So I created BirdReader which I have released to the world in its unpolished "alpha".
BirdReader is designed to be installed on your own webserver or laptop, running Node.js. e.g.
- on an old PC
- on a cloud server e.g. AWS Micro server (free!)
- on a Raspberry Pi
- import your old Google Reader subscriptions
- fetches RSS every 5 minutes
- web-based aggregated newsfeed
-
- mark articles as read
-
- delete articles without reading
-
- 'star' articles
-
- add a new feed
-
- sorted in newest-first order
-
- bootstrap-based, responsive layout
-
- tagging/untagging of feeds
-
- Twitter/Facebook sharing
-
- basic HTTP authentication (optional)
-
- filter read/unread/starred streams by tag
-
- filter read/unread/starred streams by feed
-
- full-text search (only works when using Cloudant as the CouchDB storage engine)
-
- icons for feeds and articles
-
- expand all
-
- browse-mode - go through unread articles one-by-one, full screen
-
- live stats via WebSockets (NEW!)
As of July 2013, the web client also makes a WebSockets connection back to the server so that when new articles are added to the database, then the numbers of read, unread and starred articles can be 'pushed' to the server, without the client having to poll. This also offers other advantages
- when fetching an article or list of articles, we no longer have to also fetch the article counts, making fetches faster
- article counts arrive at the client asynchronously
- article counts are always up to date
- url scheme has changed to a 'hash-bang' scheme, so that all page updates are via Ajax, to prevent frequent disconnection of the WebSocket and to reduce network traffic
N.B if you have previous installation of BirdReader, you will have to run 'npm install' to pick up the socket.io package.
BirdReader doesn't store anything locally other than its source code and your configuration. The data is stored in a Cloudant (CouchDB) database in the cloud. You will need to sign up for a free Cloudant account (disclaimer: other hosted CouchDB services are available, and this code should work with any CouchDB server e.g. your own).
Two databases are used:
The 'feeds' database stores a document per RSS feed you are subscribed to e.g.
{
"_id": "f1cf38b2f6ffbbb69e75df476310b3a6",
"_rev": "8-6ad06e42183368bd696aec8d25eb03a1",
"text": "The GitHub Blog",
"title": "The GitHub Blog",
"type": "rss",
"xmlUrl": "http://feeds.feedburner.com/github",
"htmlUrl": "http://pipes.yahoo.com/pipes/pipe.info?_id=13d93fdc3d1fb71d8baa50a1e8b50381",
"tags": ["OpenSource"],
"lastModified": "2013-03-14 15:06:03 +00:00",
"icon": "http://www.bbc.co.uk/favicon.ico"
}
This data is directly imported from the Google Checkout OPML file and crucially stores:
- the url which contains the feed data (xmlUrl)
- the last modification date of the newest article on that feed (lastModified)
The 'articles' database stores a document per article e.g. :
{
"_id": "3c582426df29863513500a736111fa4e",
"_rev": "1-b49944fd0edf8f50fc17c6562d75169e",
"feedName": "BBC Entertainment",
"tags": ["BBC"],
"title": "VIDEO: Iran planning to sue over Argo",
"description": "Best Picture winner Argo has been criticised by the Iranian authorities over its portrayal of the 1979 Iran hostage crisis.",
"pubDate": "2013-03-15T15:20:31.000Z",
"link": "http://www.bbc.co.uk/news/entertainment-arts-21805140#sa-ns_mchannel=rss&ns_source=PublicRSS20-sa",
"pubDateTS": 1363360831,
"read": false,
"starred": false,
"icon": "http://www.bbc.co.uk/favicon.ico"
}
The _id and _rev fields are generated by CouchDB. The feedName and tags come from the feed where the article originated. The rest of the fields come form the RSS article itself apart from 'read' and 'starred' which we add to record whether an article has been consumed or favourited.
Cloudant/CouchDB only allows data to be retrieved by its "_id" unless you define a "view". We have one view on the articles database called "byts" that allows us to query our data:
- unread article, sorted by timestamp
- read articles, sorted by timestamp
- starred articles, sorted by timestamp
- counts of the number of read/unread/starred articles
The view creates a "map" function which emits keys like
["string",123455]
where "string" can be "unread", "read" or "starred", and "12345" is the timestamp of the article.
Another view "bytag" has a different key:
['string','tag',12345]
where "string" can be "unread", "read" or "starred", the "tag" is the user supplied tag and "12345" is the timestamp of the article. This allows us to get unread articles tagged by "BBC" in newest first order, for example.
BirdReader supports full-text search of articles by utilising Cloudant's full-text capability. A Lucene index is created to allow the articles' titles and descriptions to searchable. A simple form on the top bar allows the user to search the collected articles with ease. N.B if you are using a non-Cloudant backend (e.g. plain CouchDB), then the search facility will not work.
Every so often, BirdReader fetches all the feeds using the feedparser. Any articles newer than the feed's previous newest article is saved to the articles database.
New feeds can be added by filling in a web form with the url of the page that has an RSS link tag. We use the extractor library to pull back the page, find the title, meta description and link tags and add the data to our feeds database.
The site is built with Bootstrap so that it provides a decent interface on desktop and mobile browsers
- Node.js - Server side Javascript
- Express - Application framework for node
- feedparser - RSS feed parser
- Cloudant - Hosted CouchDB
- async - Control your parallelism in Node.js
- Bootstrap - Twitter responsive HTML framework
- sax - XML parser for Node.js
- extractor - HTML scraper, to find RSS links in HTML pages
- socket-io - WebSockets library
You will need Node.js and npm installed on your computer (version 0.8.x or version 0.10.x). Unpack the BirdReader repository and install its dependencies e.g.
git clone git@github.com:glynnbird/birdreader.git
cd birdreader
npm install
N.B on Mac, your're likely to need [https://developer.apple.com/xcode/](development tools) installed.
Copy the sample configuration into place
cd includes
cp _config.json config.json
Edit the sample configuration to point to your CouchDB server.
Run Birdreader with
node birdreader.js
See the website by pointing your browser to port 3000:
http://localhost:3000/
You can export your Google Reader subscriptions using Google Takeout. Download the file, unpack it and locate the subscriptions.xml file.
You can import this into BirdReader with:
node import_opml.js subscriptions.xml
BirdReader allows you to protect your webserver by username and password by adding an "authentication" section to our includes/config.json:
"cloudant": {
.
.
},
"authentication": {
"on": true,
"username": "myusername",
"password": "mypassword"
}
}
Authentication will only be enforced if "authentication.on" is set to "true". A restart of BirdReader is required to pick up the config.
If you don't want to keep articles older than x days, then you can add the following to your config:
"purgeArticles": {
"on": true,
"purgeBefore": 15
}
The above will instruct BirdReader to purge articles older than 15 days every 24 hours.
BirdReader has been tested on a Mac, Amazon EC2 and Raspberry Pi. Benchmarks here.
When a feed is added to BirdReader, we attempt to get an "icon" for the feed based on the "Favicon" of the blog. This is stored in the feeds database and every article that is fetched subsequently, inherits the feed's icon.
This feature was added after launch. To retro-fit icons to your existing feeds, run:
node retrofit_favicons.js
There are some tests in the 'test' directory to run them you'll need Mocha:
sudo npm install -g mocha
and then run the tests
mocha