Skip to content

Latest commit

 

History

History
139 lines (99 loc) · 6.36 KB

File metadata and controls

139 lines (99 loc) · 6.36 KB

Personal Entity, Concept, and Named Entity Linking in Conversations

Open In Colab

This repository provides (1) conversational entity linking dataset (ConEL-2) and (2) conversational entity linking tool (CREL), as resources for the following research:

Figure: The example of entity linking in conversations.

This repository is structured in the following way:

  • tool/ : EL tool for conversation (CREL), with the example script.
  • dataset/ : Conversational entity linking datasets (ConEL-2), with the documentation of the statistics and format.
  • eval/ : Tool to calculate the performance of the entity linking method, with the run files of baseline and our method.

CREL: Conversational Entity Linking Tool

CREL is the conversational entity linking tool trained on the ConEL-2 dataset. Unlike existing EL methods, CREL is developed to identify both named entities and concepts. It also utilizes coreference resolution techniques to identify personal entities and references to the explicit entity mentions in the conversations.

Quickstart with Google Colab

The easiest way to get started with this project is to use our Google Colab code. By just running the notebook, you can try our entity linking approach.

The usage of the tool is as follows:

from conv_el import ConvEL
cel = ConvEL()

conversation_example = [
    {"speaker": "USER", 
    "utterance": "I am allergic to tomatoes but we have a lot of famous Italian restaurants here in London.",}, 

    # System turns are not annotated
    {"speaker": "SYSTEM", 
    "utterance": "Some people are allergic to histamine in tomatoes.",},

    {"speaker": "USER", 
    "utterance": "Talking of food, can you recommend me a restaurant in my city for our anniversary?",},
]

annotation_result = cel.annotate(conversation_example)
print_results(annotation_result) # This function is defined in the notebook.

# Output:
# 
# USER: I am allergic to tomatoes but we have a lot of famous Italian restaurants here in London.
# 	 [17, 8, 'tomatoes', 'Tomato']
# 	 [54, 19, 'Italian restaurants', 'Italian_cuisine']
# 	 [82, 6, 'London', 'London']
# SYST: Some people are allergic to histamine in tomatoes.
# USER: Talking of food, can you recommend me a restaurant in my city for our anniversary?
# 	 [11, 4, 'food', 'Food']
# 	 [40, 10, 'restaurant', 'Restaurant']
# 	 [54, 7, 'my city', 'London']

where, input for our tool is a conversation which has two keys for each turn: speaker and utterance. The speaker is the speaker of the utterance (either USER or SYSTEM), and the utterance is the utterance itself.

Note
  • Use CPU to run this notebook.
    • The code also run on GPU, however, because of the storage limitation, you cannot try GPU on Google Colab if you use free version.
  • It takes approx 30 mins to download the models. Please wait for a while.

Start on your local machine

You can also use our method locally. The documentation is available at ./tool/README.md.

ConEL-2: Conversational Entity Linking Dataset

Dataset

Our ConEL-2 dataset contains concepts, named entities (NEs), and personal entity annotations for conversations. This annotations is collected on Wizard of Wikipedia dataset. The format and detailed statistics of the dataset are described here ./dataset/README.md.

Table: Statistics of conversational entity linking dataset

Train Val Test
Conversations 174 58 58
User utterance 800 267 260
NE and concept annotations 1428 523 452
Personal entity annotations 268 89 73

The format of the dataset is as follows:

{
    "dialogue_id": "9161",
    "turns": [
        {
            "speaker": "USER", # or "SYSTEM"
            "utterance": "Alpacas are definitely my favorite animal.  I have 10 on my Alpaca farm in Friday harbor island in Washington state.",
            "turn_number": 0,
            "el_annotations": [ # Ground truth annotations
                {
                    "mention": "Alpacas",
                    "entity": "Alpaca",
                    "span": [0, 7],
                }, ...]
            "personal_entity_annotations": [ # Personal entity annotations
                {
                    "personal_entity_mention": "my favorite animal",
                    "explicit_entity_mention": "Alpacas",
                    "entity": "Alpaca"
                }
            ],
            "personal_entity_annotations_without_eems": [ # Personal entity annotations where EEM annotated as not found
                {
                    "personal_entity_mention": "my Alpaca farm"
                }
            ]
        },

You can find more details about the format of the dataset in the ./dataset/README.md

Additionally, we also provide personal entity linking mention detection dataset, which contains 985 conversations with 1369 personal entity mention annotations.

Evaluation Tool

The tool to evaluate your entity linking method is provided in the eval/ directory. The detail explanations are available here ./eval/README.md.

Contact

If you have any questions, please contact Hideaki Joko at hideaki.joko@ru.nl