-
Notifications
You must be signed in to change notification settings - Fork 0
Sep2021
The Product Podcast - Writing for Product Managers by Uber Sr PM https://bit.ly/2VXZSZ5
Science as we know it can't explain consciousness – but a revolution is coming https://bit.ly/39diNBS
Perceptrons, multi-layer perceptrons, NN, DNNs, CNNs, RNNs, LSTMs, BIDAF, Attention Nets, GANs, Transformers, ....
-
I was learning NLP using Spacy documents and starting out simple code experiments. It occurred to me that Spacy may have code for term extraction. I searched it and found PyATE.
-
PyATE has a pointer to Github repo, so I went there and found reference to a few papers. One of them turned out to be pretty engaging: Term extraction: A Review Draft Version 091221 Lars Ahrenberg Linköping University Department of Computer and Information Science E-mail: lah@ida.liu.se
-
Spent several hours yesterday just reading the first page!!!
-
Once I got to Cvalues/NCValues, I hit upon an equation which I did not understand
-
Since these kinds of road blocks in reading papers is a common occurrence, I decided to learn to read/write Mathematical notation
-
A bit of Googling got me to some good resources including Robert Kaplan and his books.
-
That is the journey I am in now (three layers deep in the stack - TE Code, Paper, Notation). That is the stack of enquiry. Actually they are quite independent tasks. So I should pursue each one of them independently as well.
Here are some end goals
- Build a TE better than the Microsoft Azure Keywords one (or close enough) based on all the theory I am learning
- Explore the relationship between terms, topics, vocabulary (of a domain), Ontology, Taxonomy, Knowledge Graph. These will be the artifacts we may end up generating.
Some learning
- Terms are basically nouns and noun groups
Questions to Self
- If terms are nouns and noun groups a POS library should give us the first approximation of terms? A possible #codingexperiment
- Understand TF/IDF at a deeper level than just using a library
- Reread the LDA paper to figure out how terms are extracted there (look the code of pyLDA). Also learn how terms are clustered (can I reuse that algorithm during TE)