FAQT

FAQT gives you a suite of approaches for matching text to existing content (like FAQs) or detecting urgency in a text in low and high resource settings.

How it works

FAQ matching

FAQT's FAQ Scorers score text (e.g. user message) against a list of predetermined contents (e.g. FAQs) based on their relevance to the text.

The Scorer class needs the contents and some pre-trained NLP model to do the match scoring. It outputs the match score for each of the content.

For Scorers that use word embeddings, scoring is based on the distances between words or sets of words from the message and the contents.
For the BERT-based Scorer, it accepts a BERT model with a final binary classification layer, which classifies (scores) the pair, e.g. user message and each of the FAQs, as relevant (1.0) or not (0.0).

Urgency detection

Currently, FAQT provides two models for urgency detection: rule-based and machine-learning-based (ML-based). The Urgency Detector accepts a raw text as input and outputs the urgency score (between 0.0 and 1.0) or an urgency label (0.0 for not urgent, 1.0 for urgent.)

The rule-based Urgency Detector expects a list of keyword rules, and evaluates urgency based on the presence or absence of the keywords.
The ML-based Urgency Detector expects a scikit-learn-like pipeline that accepts raw text and outputs an urgency score between 0.0 and 1.0. Hence, this pipeline should include all necessary steps like tokenization and vectorization.

Installation

Eventually FAQT will be open-source and available from PyPI. In the meantime, you can use pip to install it from github:

pip install git+https://github.com/IDinsight/faqt.git@main

It will prompt you for your username and password since the repository is private.

Extended installation

To use a BERT-based model such as faqt.model.faq_matching.bert.QuestionAnswerBERTScorer or the faqt.preprocessing.tokens.CustomHunspell class, install faqt with its extended optional dependencies:

pip install "faqt[extended] @ git+https://github.com/IDinsight/faqt.git@main"

Basic Usage

FAQ matching

We show an example that uses the WMDScorer class, which requires a word embedding and computes scores based on word mover's distance between the word vectors of the input and word vectors of the content.

Load a word embedding in a gensim.models.KeyedVectors class.

from gensim.models import Word2Vec

word_embedding = Word2Vec.load_word2vec_format(
    "GoogleNews-vectors-negative300.bin", # some pre-trained embedding
    binary=True, 
    norm_only=True
)

Load FAQ contents.

faqs = [
    "**How do I edit a page?** Editing most Wikipedia pages is easy, just click the \"Edit\" tab at the top of a Wikipedia page.",
    "**How do I create a new page?** You are required to have a Wikipedia account to create a new article.",
    "**How do I change the name of an article?** You can move the article using the \"move\" button."
]

Create a FAQ-matching model.

from faqt.models import WMDScorer

faq_scorer = WMDScorer(word_embedding)
faq_scorer.set_contents(faqs)

Score FAQs against a message.

scores = faq_scorer.score_contents(
    "I want to edit a page on Wikipedia"
)

scores contains a list of scores for each FAQ content:

[0.924, 0.345, 0.456]

Urgency Detection

We show an example that uses the RuleBasedUD.

Define a preprocessor. Since the rule-based model simply looks for the presence and absence of the tokens, you should apply the same preprocessing to the keywords as you would to the messages. In this example, we will use nltk's tokenizer and stemmer.

from nltk.tokenize import word_tokenize
from nltk.stem.porter import PorterStemmer

def preprocess(text):
    stemmer = PorterStemmer()

    tokens = word_tokenize(text)
    stemmed_tokens = [stemmer.stem(token) for token in tokens]

    return stemmed_tokens

Define KeywordRule's, each describing an urgent case.

from faqt.models import KeywordRule

rules = [
    KeywordRule(
        include=[preprocess(word) for word in ["blurry", "vision", "headache"]]
    ),
    KeywordRule(
        include=[preprocess(word) for word in ["vomit", "headache"]]
    )
]

Create a rule-based urgency detector

from faqt.model import RuleBasedUD

ud_model = RuleBasedUD(model=rules, preprocessor=preprocess)

"Predict" or evaluate!

ud_model.predict(
    "I keep getting headaches and my vision is blurry. Is this normal?"
)

This should return the value 1.0.

Development

To develop FAQT, clone this repository and run the following at the repository root:

pip install -e '.[dev]'

This installs all the additional dependencies required for development.

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
.github/workflows		.github/workflows
docs		docs
faqt		faqt
notebooks		notebooks
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
coverage.lcov		coverage.lcov
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
requirements_dev.txt		requirements_dev.txt
requirements_extended.txt		requirements_extended.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FAQT

How it works

FAQ matching

Urgency detection

Installation

Extended installation

Basic Usage

FAQ matching

Urgency Detection

Development

About

Releases 11

Packages

Contributors 6

Languages

License

IDinsight/faqt

Folders and files

Latest commit

History

Repository files navigation

FAQT

How it works

FAQ matching

Urgency detection

Installation

Extended installation

Basic Usage

FAQ matching

Urgency Detection

Development

About

Resources

License

Stars

Watchers

Forks

Releases 11

Packages 0

Contributors 6

Languages

Packages