vanessaaleung / support-ticket-nlp Public

Notifications You must be signed in to change notification settings
Fork 1
Star 4

Support Ticket Classification and Key Phrases Extraction

4 stars 1 fork Branches Tags Activity

Notifications

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
LDA_model.png		LDA_model.png
README.md		README.md
Support_Ticket_NLP_Case.ipynb		Support_Ticket_NLP_Case.ipynb
all_tickets.csv		all_tickets.csv
coherence_scores.png		coherence_scores.png
data.png		data.png
lemma_graph.png		lemma_graph.png

Repository files navigation

Support Ticket NLP

Support Ticket Classification and Key Phrases Extraction

Identify the main issues in the ticket description
Extract the key phrases in the ticket description

Data

Support Ticket Classification

Tasks

Topic Modeling with LDA model
1. Preprocessing
  1. Divide text to tokens
  2. Remove stopwords, punctuations
  3. Lemmatization
2. Compute coherence values to find the optimal number of topics
3. Build the LDA model
4. Utilize pyLDAvis to visualize the topics
Key Phrases Extraction with pytextrank (combining spaCy and networkx)
1. Construct a graph, sentence by sentence, based on the spaCy part-of-speech tags tags
2. Use matplotlib to visualize the lemma graph
3. Use PageRank – which is approximately eigenvalue centrality – to calculate ranks for each of the nodes in the lemma graph
  1. $a_{v,t}=1$ if vertex $v$ is linked to vertex $t$, and $a_{v,t}=0$ otherwise
  2. $M(v)$ is a set of the neighbors of $v$ and $\lambda$ is a constant
4. Collect the top-ranked phrases from the lemma graph based on the noun chunks
5. Find a minimum span for each phrase based on combinations of lemmas
```
        permission 1 0.17555037929471423
        requisitions 1 0.1742458175386728
        recruiter 1 0.1416381454134179
        
```

Terminologies

Topic Coherence

Scores a single topic by measuring the degree of semantic similarity between high scoring words in the topic

Latent Dirichlet Allocation (LDA)

Given the # documents, # words, and # topics, output:

distribution of words for each topic K
distribution of topics for each document i

About

Support Ticket Classification and Key Phrases Extraction

python machine-learning text-classification keras ticket-classification

Report repository

Releases

No releases published

Packages

No packages published

Languages

Jupyter Notebook 100.0%