Skip to content

Commit

Permalink
Merge branch 'master' of https://github.com/cdimascio/unfluffer-j
Browse files Browse the repository at this point in the history
  • Loading branch information
Carmine DiMascio committed Dec 5, 2018
2 parents 0c42186 + fe1e7d2 commit a1362a4
Showing 1 changed file with 7 additions and 7 deletions.
14 changes: 7 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,13 +6,12 @@ An automatic web page content extractor for _Kotlin_ and _Java_.

Given an HTML document, **essence** automatically extracts the main text content (and much more).

[Try out the demo](https://essence.mybluemix.net/index.html) - _a simple webapp to demonstrate essence_
[Try out the demo](https://essence.mybluemix.net) - _a simple webapp to demonstrate essence_

<p align="center">
<img src="https://raw.githubusercontent.com/cdimascio/essence/master/assets/essence.png" width="450px"/>
<img src="https://raw.githubusercontent.com/cdimascio/essence/master/assets/essence.png" width="400px"/>
</p>


_This library is heavily influenced by [node-unfluff](https://github.com/ageitgey/node-unfluff) and its [lineage](#credits)_

## Usage
Expand All @@ -31,7 +30,6 @@ System.out.println(data.getText());
```Kotlin
val data = Essence.extract(html)
println(data.text)
// ...
```

See [Extracted data elements](#extracted-data-elements) for additional extracted metadata.
Expand All @@ -44,23 +42,25 @@ See [Extracted data elements](#extracted-data-elements) for additional extracted
<dependency>
<groupId>io.github.cdimascio</groupId>
<artifactId>essence</artifactId>
<version>0.10.11</version>
<version>0.12.6</version>
<type>pom</type>
</dependency>
```

**Gradle**

```groovy
compile 'io.github.cdimascio:essence:0.10.11'
compile 'io.github.cdimascio:essence:0.12.6'
```

## Try the Essence web demo

[Essence web](https://essence.mybluemix.net/index.html) is a simple web page that fetches content at a given url and passes the HTML to this essence library.
[Essence web](https://essence.mybluemix.net) is a simple web page that fetches content at a given url and passes the HTML to this essence library.

![](https://raw.githubusercontent.com/cdimascio/essence/master/assets/example.png)

The essence web project lives [here](https://github.com/cdimascio/essence-web)

## Extracted data elements

**essence** attempts to extract the following content:
Expand Down

0 comments on commit a1362a4

Please sign in to comment.