Sep2021

On Writing for Product Managers

The Product Podcast - Writing for Product Managers by Uber Sr PM https://bit.ly/2VXZSZ5

Science as we know it can't explain consciousness – but a revolution is coming https://bit.ly/39diNBS

Perceptrons, multi-layer perceptrons, NN, DNNs, CNNs, RNNs, LSTMs, BIDAF, Attention Nets, GANs, Transformers, ....

I was learning NLP using Spacy documents and starting out simple code experiments. It occurred to me that Spacy may have code for term extraction. I searched it and found PyATE.
PyATE has a pointer to Github repo, so I went there and found reference to a few papers. One of them turned out to be pretty engaging: Term extraction: A Review Draft Version 091221 Lars Ahrenberg Linköping University Department of Computer and Information Science E-mail: lah@ida.liu.se
Spent several hours yesterday just reading the first page!!!
Once I got to Cvalues/NCValues, I hit upon an equation which I did not understand
Since these kinds of road blocks in reading papers is a common occurrence, I decided to learn to read/write Mathematical notation
A bit of Googling got me to some good resources including Robert Kaplan and his books.
That is the journey I am in now (three layers deep in the stack - TE Code, Paper, Notation). That is the stack of enquiry. Actually they are quite independent tasks. So I should pursue each one of them independently as well.

Here are some end goals

Build a TE better than the Microsoft Azure Keywords one (or close enough) based on all the theory I am learning
Explore the relationship between terms, topics, vocabulary (of a domain), Ontology, Taxonomy, Knowledge Graph. These will be the artifacts we may end up generating.

Some learning

Questions to Self

If terms are nouns and noun groups a POS library should give us the first approximation of terms? A possible #codingexperiment
Understand TF/IDF at a deeper level than just using a library
Reread the LDA paper to figure out how terms are extracted there (look the code of pyLDA). Also learn how terms are clustered (can I reuse that algorithm during TE)