Support Ticket NLP

Support Ticket Classification and Key Phrases Extraction

Identify the main issues in the ticket description
Extract the key phrases in the ticket description

Data

Support Ticket Classification

Tasks

Topic Modeling with LDA model
1. Preprocessing
  1. Divide text to tokens
  2. Remove stopwords, punctuations
  3. Lemmatization
2. Compute coherence values to find the optimal number of topics
3. Build the LDA model
4. Utilize pyLDAvis to visualize the topics
Key Phrases Extraction with pytextrank (combining spaCy and networkx)
1. Construct a graph, sentence by sentence, based on the spaCy part-of-speech tags tags
2. Use matplotlib to visualize the lemma graph
3. Use PageRank – which is approximately eigenvalue centrality – to calculate ranks for each of the nodes in the lemma graph
  1. $a_{v,t}=1$ if vertex $v$ is linked to vertex $t$, and $a_{v,t}=0$ otherwise
  2. $M(v)$ is a set of the neighbors of $v$ and $\lambda$ is a constant
4. Collect the top-ranked phrases from the lemma graph based on the noun chunks
5. Find a minimum span for each phrase based on combinations of lemmas
```
        permission 1 0.17555037929471423
        requisitions 1 0.1742458175386728
        recruiter 1 0.1416381454134179
        
```

Terminologies

Topic Coherence

Scores a single topic by measuring the degree of semantic similarity between high scoring words in the topic

Latent Dirichlet Allocation (LDA)

Given the # documents, # words, and # topics, output:

distribution of words for each topic K
distribution of topics for each document i