-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Added 5 PORTULAN corpora.
- Loading branch information
Showing
5 changed files
with
75 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
{ | ||
"Name": "askIT Dataset", | ||
"URL": "https://hdl.handle.net/21.11129/0000-000D-F8BD-7", | ||
"Family": "Computer-mediated communication corpora", | ||
"Description": "This is a corpus of dialogues automatically extracted from subreddits related to the Information Technology domain.\nThe dialogues were extracted with the <a href=\"https://hdl.handle.net/21.11129/0000-000D-F898-0\">Reddit Dataset Extraction Tool</a>.\nThe corpus is available from PORTULAN.", | ||
"Language": ["eng"], | ||
"Licence": "CC BY", | ||
"Size": ["180,000 texts", "61.9 million tokens"], | ||
"Annotation": [""], | ||
"Infrastructure": "CLARIN", | ||
"Access": { | ||
"Download": "https://hdl.handle.net/21.11129/0000-000D-F8BD-7" | ||
}, | ||
"Publication":"" | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
{ | ||
"Name": "Brands.Br – a Portuguese Reviews Corpus", | ||
"URL": "https://hdl.handle.net/21.11129/0000-000D-FE57-4", | ||
"Family": "Computer-mediated communication corpora", | ||
"Description": "This is a corpus of product reviews.\nThe subjects of the reviews were semi-automatically classified.\nThe corpus is available from PORTULAN.", | ||
"Language": ["por"], | ||
"Licence": "CC BY-NC-ND", | ||
"Size": ["252 entries"], | ||
"Annotation": [""], | ||
"Infrastructure": "CLARIN", | ||
"Access": { | ||
"Download": "https://hdl.handle.net/21.11129/0000-000D-FE57-4" | ||
}, | ||
"Publication":"" | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
{ | ||
"Name": "FEUP Tweets", | ||
"URL": "https://hdl.handle.net/21.11129/0000-000D-F8C1-1", | ||
"Family": "Computer-mediated communication corpora", | ||
"Description": "This is a corpus of tweets.\nThe corpus is available from PORTULAN.", | ||
"Language": ["eng"], | ||
"Licence": "MS NC-NoReD-ND", | ||
"Size": ["338 million texts"], | ||
"Annotation": [""], | ||
"Infrastructure": "CLARIN", | ||
"Access": { | ||
"Download": "https://hdl.handle.net/21.11129/0000-000D-F8C1-1" | ||
}, | ||
"Publication":"" | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
{ | ||
"Name": "Georeferenced Tweets", | ||
"URL": "https://hdl.handle.net/21.11129/0000-000D-F8C4-E", | ||
"Family": "Computer-mediated communication corpora", | ||
"Description": "This is a corpus of tweets annotated with geographic coordinates.\nThe corpus is available from PORTULAN.", | ||
"Language": ["eng"], | ||
"Licence": "MS NC-NoReD-ND", | ||
"Size": ["26 million texts"], | ||
"Annotation": [""], | ||
"Infrastructure": "CLARIN", | ||
"Access": { | ||
"Download": "https://hdl.handle.net/21.11129/0000-000D-F8C4-E" | ||
}, | ||
"Publication":"" | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
{ | ||
"Name": "RedditPT Dataset", | ||
"URL": "https://hdl.handle.net/21.11129/0000-000D-F8BC-8", | ||
"Family": "Computer-mediated communication corpora", | ||
"Description": "This corpus collects dialogues extracted from the <a href=\"https://www.reddit.com/r/portugal/\"Portugal subreddit</a>.\nThe extraction was done with the <a href=\"https://hdl.handle.net/21.11129/0000-000D-F898-0\">Reddit Dataset Extraction Tool</a>.\nThe corpus is available from PORTULAN.", | ||
"Language": ["por"], | ||
"Licence": "CC BY", | ||
"Size": ["218,500 dialogues", "58.9 million tokens"], | ||
"Annotation": [""], | ||
"Infrastructure": "CLARIN", | ||
"Access": { | ||
"Download": "https://hdl.handle.net/21.11129/0000-000D-F8BC-8" | ||
}, | ||
"Publication":"" | ||
} |