Skip to content

Sep2021

Dorai Thodla edited this page Sep 17, 2021 · 4 revisions

On Writing for Product Managers

The Product Podcast - Writing for Product Managers by Uber Sr PM https://bit.ly/2VXZSZ5

Science and Consciousness

Science as we know it can't explain consciousness – but a revolution is coming https://bit.ly/39diNBS

The Evolution of Deep Learning Architectures

Perceptrons, multi-layer perceptrons, NN, DNNs, CNNs, RNNs, LSTMs, BIDAF, Attention Nets, GANs, Transformers, ....

Term Extraction Rabbit Hole(s)

  • I was learning NLP using Spacy documents and starting out simple code experiments. It occurred to me that Spacy may have code for term extraction. I searched it and found PyATE.

  • PyATE has a pointer to Github repo, so I went there and found reference to a few papers. One of them turned out to be pretty engaging: Term extraction: A Review Draft Version 091221 Lars Ahrenberg Linköping University Department of Computer and Information Science E-mail: lah@ida.liu.se

  • Spent several hours yesterday just reading the first page!!!

  • Once I got to Cvalues/NCValues, I hit upon an equation which I did not understand

  • Since these kinds of road blocks in reading papers is a common occurrence, I decided to learn to read/write Mathematical notation

  • A bit of Googling got me to some good resources including Robert Kaplan and his books.

  • That is the journey I am in now (three layers deep in the stack - TE Code, Paper, Notation). That is the stack of enquiry. Actually they are quite independent tasks. So I should pursue each one of them independently as well.

Here are some end goals

  • Build a TE better than the Microsoft Azure Keywords one (or close enough) based on all the theory I am learning
  • Explore the relationship between terms, topics, vocabulary (of a domain), Ontology, Taxonomy, Knowledge Graph. These will be the artifacts we may end up generating.

Some learning

  • Terms are basically nouns and noun groups

Questions to Self

  • If terms are nouns and noun groups a POS library should give us the first approximation of terms? A possible #codingexperiment
  • Understand TF/IDF at a deeper level than just using a library
  • Reread the LDA paper to figure out how terms are extracted there (look the code of pyLDA). Also learn how terms are clustered (can I reuse that algorithm during TE)