It simply transforms a given url into key-value organized JSON with specification.
npm install --save dom-collector
Under the hood, it does ...
-
Validate rule specification you passed.
-
Load web page with well-known library request
-
Parse and fetch elements with proved dom selector cheerio; it might be better than jsdom.
-
Filter values and fill the default value configured.
-
Replace collected values into JSON Object, also iterative elements will be into JSON Array.
-
Return a thenable Promise function to be resolved asynchronously.
For this html body
<ul id="content-list">
<li data-id="1">
<a href="#"> aaa </a>
</li>
<li data-id="2">
<a href="#"> bbb </a>
</li>
<li data-id="3">
<a href="#"></a>
</li>
</ul>
Add a rule below
collector = require 'dom-collector'
rule =
url: 'https://gist.githubusercontent.com/eces/f8d377992a12f64dc353/raw/75fd1607925e12bb82fdc7890514a3899781531d/test-01.html'
timeout: 15000
encoding: 'utf8'
params: []
headers:
'User-Agent': 'Mozilla/5.0(iPad; U; CPU iPhone OS 3_2 like Mac OS X; en-us) AppleWebKit/531.21.10 (KHTML, like Gecko) Version/4.0.4 Mobile/7B314 Safari/531.21.10'
selector: [
{
key: 'items[]'
value: '#content-list li'
type: 'array'
default: []
}
{
key: 'items[].label'
value: 'a'
type: 'string'
filter: 'trim'
default: 'default'
}
{
key: 'items[].src'
value: '[data-id]'
type: 'number'
}
]
task = collector.fetch_json rule
task.then (result) ->
console.log result
Then, it brings the result
{
"items": [
{ "label": "aaa", "src": 1 }
{ "label": "bbb", "src": 2 }
{ "label": "default", "src": 3 }
]
}
require('dom-collector').fetch_json(rule);
This is DOM selector to find values for key. It supports querySelector and jQuery selector like. When you are supposed to do $('#content')
then this value should be #content
.
This key will be exposed and created into result JSON. If key has []
array notation, it becomes a parent key and every keys ending with parent[]
become children of the parent. If parent key has no entry, children may not resolved from empty array.
string
, number
, boolean
Please note that the default value will be set if failed type-casting.
This default value will be replaced into value if no element is found, and also
- when type is
string
and string length is zero. - when type is
number
and falsy withisFinite
; NaN, Infinity, undefined.
This regular expression will be evaluated and return the first value.
100
can be found from <li onclick="contentView(100, 3);"></li>
with below matcher:
match: "contentView\\(([0-9]+)\\,"
Reference: eces/dom-collector/src/filter.coffee
70.5M
to 70500
1,000,000
to 1000000
"\r\n hello. "
to "hello."
value
to String(value)
value
to Number(value)
value
to Boolean(value)
The value is directly transformed by given function that is capable of any value also including null
, undefined
.
filter: (v) -> '(' + String(v).trim() + ')'
Please be aware of unintended boolean conversion from this reading MDN - Boolean.
The value passed as the first parameter is converted to a boolean value, if necessary. If value is omitted or is 0, -0, null, false, NaN, undefined, or the empty string (""), the object has an initial value of false. All other values, including any object or the string "false", create an object with an initial value of true.
Do not confuse the primitive Boolean values true and false with the true and false values of the Boolean object.
Any object whose value is not undefined or null, including a Boolean object whose value is false, evaluates to true when passed to a conditional statement.
grunt build
grunt test
Welcome
Under MIT License.