Skip to content

Latest commit

 

History

History
69 lines (61 loc) · 2.66 KB

README.md

File metadata and controls

69 lines (61 loc) · 2.66 KB

Support Ticket NLP

Support Ticket Classification and Key Phrases Extraction

  • Identify the main issues in the ticket description
  • Extract the key phrases in the ticket description


Data

Support Ticket Classification


Tasks

  1. Topic Modeling with LDA model
    1. Preprocessing
      1. Divide text to tokens
      2. Remove stopwords, punctuations
      3. Lemmatization
    2. Compute coherence values to find the optimal number of topics
    3. Build the LDA model
    4. Utilize pyLDAvis to visualize the topics
  2. Key Phrases Extraction with pytextrank (combining spaCy and networkx)
    1. Construct a graph, sentence by sentence, based on the spaCy part-of-speech tags tags
    2. Use matplotlib to visualize the lemma graph
    3. Use PageRank – which is approximately eigenvalue centrality – to calculate ranks for each of the nodes in the lemma graph
      1. $a_{v,t}=1$ if vertex $v$ is linked to vertex $t$, and $a_{v,t}=0$ otherwise
      2. $M(v)$ is a set of the neighbors of $v$ and $\lambda$ is a constant
    4. Collect the top-ranked phrases from the lemma graph based on the noun chunks
    5. Find a minimum span for each phrase based on combinations of lemmas
      
              permission 1 0.17555037929471423
              requisitions 1 0.1742458175386728
              recruiter 1 0.1416381454134179
              

Terminologies

Topic Coherence

Scores a single topic by measuring the degree of semantic similarity between high scoring words in the topic

Latent Dirichlet Allocation (LDA)

Given the # documents, # words, and # topics, output:

  1. distribution of words for each topic K
  2. distribution of topics for each document i