Skip to content

Commit

Permalink
+metadata
Browse files Browse the repository at this point in the history
  • Loading branch information
ruoxining committed Jun 11, 2024
1 parent 67fe162 commit 12ce906
Show file tree
Hide file tree
Showing 2 changed files with 59 additions and 42 deletions.
51 changes: 30 additions & 21 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,33 +29,42 @@
# 📝 Dataset
## Data Description

Each data point in our dataset is represented as a dictionary with the following keys:
The dataset comprises two parts, the txt book and the json QA tuples. Each json file has a corresponding txt file with the same filename (as the novel title).
Each json file comprises a `list` of `dict`s, where a `dict` has a basic structure as follows.
```
[
{
"Question": The input question,
"Options": [
Option A,
Option B,
Option C,
Option D
],
"Complex": "mh",
"Aspect": "times"
}
"QID": the QID which remains unchanged for tracking updates (only happen if necessary),
"Aspect": the question classification in 'aspect', e.g., "times",
"Complexity": the question classification in complexity, e.g., "mh",
"Question": the input question,
"Options": {
"A": Option A,
"B": Option B,
"C": Option C (not applicable in several yes/no questions),
"D": Option D (not application in several yes/no questions)
},
},
...
]
```
Here is an example of a data point:
Here is an example of a real data point, selected from the demonstration file `Frankensstein`.
```json
[
{
"Question": "How many times has Robert written letters to his sister?",
"Options": [
"11",
"9",
"12",
"10"
],
"QID": "Q0148",
"Aspect": "times",
"Complex": "mh",
"Aspect": "times"
}
"Question": "How many times has Robert written letters to his sister?",
"Options": {
"A": "11",
"B": "9",
"C": "12",
"D": "10"
},
},
...
]
```
Currently we are only open-sourcing the fields above, without including the `Evidences` field in the case of answer leaking. However, individuals in need of the `Evidences` field for analysis can contact us (see [📮 Contact](#-contact)) to obtain it.

Expand Down
50 changes: 29 additions & 21 deletions index.html
Original file line number Diff line number Diff line change
Expand Up @@ -129,38 +129,46 @@ <h4>Data Description</h4>
Each data point in our dataset is represented as a dictionary with the following keys:
</p>
<pre id="eachcase">
[
{
"Question": The input question,
"Options": [
Option A,
Option B,
Option C,
Option D
],
"Complex": "mh",
"Aspect": "times"
}
"QID": the QID which remains unchanged for tracking updates (only happen if necessary),
"Aspect": the question classification in 'aspect', e.g., "times",
"Complexity": the question classification in complexity, e.g., "mh",
"Question": the input question,
"Options": {
"A": Option A,
"B": Option B,
"C": Option C (not applicable in several yes/no questions),
"D": Option D (not application in several yes/no questions)
},
},
...
]
</pre>
<p>
Here is an example of a data point:
</p>
<pre id="caseexample">
[
{
"Question": "How many times has Robert written letters to his sister?",
"Options": [
"11",
"9",
"12",
"10"
],
"Complex": "mh",
"Aspect": "times"
}
"QID": "Q0148",
"Aspect": "times",
"Complex": "mh",
"Question": "How many times has Robert written letters to his sister?",
"Options": {
"A": "11",
"B": "9",
"C": "12",
"D": "10"
},
},
...
]
</pre>
</div>
<div class="list-group-item" id="Contributers">
<h4>Contributors</h4>
<p>Cunxiang Wang, Ruoxi Ning, Boqi Pan, Tonghui Wu, Qipeng Guo, Cheng Deng, Guangsheng Bao, Qian Wang, and Yue Zhang</p>
<p>Cunxiang Wang*, Ruoxi Ning*, Boqi Pan, Tonghui Wu, Qipeng Guo, Cheng Deng, Guangsheng Bao, Qian Wang, and Yue Zhang</p>
</div>
<div class="list-group-item" id="Lisence">
<h4>License</h4>
Expand Down

0 comments on commit 12ce906

Please sign in to comment.