Author: Álvaro Rivera Arcelus
- This model is specifically developed for processing Spanish conversations this includes precise mexican jargon. Nevertheless, this same workflow can be applied for any language by only implementing subtle changes to the code.
- Data Pipeline
- Storage Architecture
- Models
- Libraries and packages
- Functions
- Relevant variables
- Limitations
Note: take into consideration that this pipeline is developed specifically for proecessing files formated according to Five9's specifications.
For differenciating which speaker spoke throughout the conversation, we used a diarization model known as pyannote.audio
found at Hugging Face - Pyannote
from pyannote.audio import Pipeline
pipeline = Pipeline.from_pretrained("pyannote/speaker-diarization@2.1",
use_auth_token="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")
How does it work?
For transforming all audio files to text, we used Chat GPT's python library whisper.ai
#function that transcribes an audio
model = whisper.load_model('medium')
def transcribe_me(audio):
result = model.transcribe(audio, language='es', fp16=False)
return result['text']
This pipeline uses a set of pre-trained neural network models known as Transformer models.
In this project we used the transformers
library available at Hugging Face - Transformers
Transformer models used:
- Text classification pipeline
$\rightarrow$ linksentiment = pipeline('sentiment-analysis', model='pysentimiento/robertuito-sentiment-analysis', truncation=True)
- Zero-shot classification
$\rightarrow$ linkclassifier = pipeline("zero-shot-classification", model='facebook/bart-large-mnli')
This model was developed manually by following all standard procedures for text analysis such as tokenization, stop words removal and so forth. Then, we simply determine whether either speaker said a word inside any of our two following lists:
- lenguaje_inapropiado.txt
- molestias.txt
Both lists include common spanish expressions as well as "mexican slangs" commonly used to express distress or any sort of nuissance.
Amongst multiple open-source libraries used, some of the most relevant are the following:
- NLTK
- PyAnnote
- TensorFlow
- Whisper
- Transformers
Fot the entire list of libraries & packages used, read the file libraries.py
found inside this same repository.
All local functions are saved and organized based on their purpose.
- Chat conversations preprocessing
chatPreprocessing.py
- Audio data cleansing
cleansing.py
- Data preprocessing
preprocessing.py
- All NLP and transformer models
myTransformers.py
Here is the entire list of all output variables calculated through the entire pipeline:
-
init_res
$\rightarrow$ initial reponse time of the first person to start the conversation -
avg_res_client
$\rightarrow$ average response time from client (minutes) -
avg_res_empl
$\rightarrow$ average response time from employee (minutes) -
first_res
$\rightarrow$ whether the employee, agent or bot initialized the conversatoin -
webhook
$\rightarrow$ if the first message was an image- Useful to filter out onversations which the first message was an automatic screenshot sent to the debt collection agent only as a proof of payment.
-
client_sent
$\rightarrow$ client sentiment (POS,NEU,NEG) -
client_scr
$\rightarrow$ client sentiment classification probabilty -
employee_sent
$\rightarrow$ employee sentiment (POS,NEU,NEG) -
employee_scr
$\rightarrow$ employee sentiment classification probabilty -
client_em
$\rightarrow$ client's main emotion in the conversation (emotion classification) -
employee_em
$\rightarrow$ employee's main emotion in the conversation -
client_swear
$\rightarrow$ whether the client spoke innapropiately or not -
employee_swear
$\rightarrow$ whether the employee spoke innapropiately or not -
depto
$\rightarrow$ department to wich the agent belongs -
client_disturb
$\rightarrow$ whether the client explicitly said a word that expresses a disconfort with the comopany's services.
- The diarization model performs well but fails to segment speakers correctly in some cases.
- Using Chat GPT's AI library
whisper
for transcribing every conversation, there are many cases in which the model does not perform correctly given the low quality of the call. - Some calls exported from CRM's tend to record voicemail calls. They could be easily filtered by adding an aditional step in the preprosesing job.