Skip to content

Commit

Permalink
Updated next topics.
Browse files Browse the repository at this point in the history
  • Loading branch information
jeffheaton committed Dec 29, 2020
1 parent 3a243f8 commit dc204c0
Showing 1 changed file with 34 additions and 167 deletions.
201 changes: 34 additions & 167 deletions t81_558_class_14_05_new_tech.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -35,194 +35,61 @@
"* [GitHub](https://github.com/jeffheaton) - I post all changes to GitHub.\n",
"* [Jeff Heaton's YouTube Channel](https://www.youtube.com/user/HeatonResearch) - I add new videos for this class at my channel.\n",
"\n",
"Currently, four technologies are mainly on my radar for possible future inclusion in this course:\n",
"## New Technology Radar\n",
"\n",
"* Neural Structured Learning (NSL)\n",
"* Bert, AlBert, and Other NLP Technologies\n",
"* Explainability Frameworks\n",
"Currently, these new technologies are on my radar for possible future inclusion in this course:\n",
"\n",
"This section seeks only to provide a high-level overview of these emerging technologies. I provide links to supplemental material and code in each subsection. I describe these technologies in the following sections.\n",
"\n",
"# Neural Structured Learning (NSL)\n",
"\n",
"[Neural Structured Learning (NSL)](https://www.tensorflow.org/neural_structured_learning) provides additional training information to the neural network. [[Cite:bui2018neural]](https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/46568.pdf) This training information is in the form of a graph that relates individual training cases (rows in your training set) among each other. This technique allows the neural networks to train with greater accuracy with a smaller number of labeled data. When you eventually use the neural network for scoring and prediction, once training completes, the neural network no longer uses the graph.\n",
"\n",
"There are two primary sources that this graph data comes from:\n",
"\n",
"* Existing Graph Relationships in Data\n",
"* Automatic Adversarial Modification of Images\n",
"\n",
"Often existing graph relationships may exist in data beyond just the labels that describe what individual data items are. Consider many photo collections. There may be collections of images placed into specific albums. This album placement can form additional training information beyond the actual image data and labels.\n",
"\n",
"Sometimes graph data cannot be directly obtained for a data set. Just because you do not have a graph NSL still might be an option. In such cases, you can make adversarial-like modifications to the data. You can introduce additional examples and linked them to the original images in the training set. This technique might make the final trained neural network more resilient to adversarial example attacks.\n",
"\n",
"Built into TF 2.0, supports any type of ANN.\n",
"\n",
"```\n",
"pip install neural_structured_learning\n",
"```\n",
"\n",
"Figure 14.NSL is from the original NSL paper. [[Cite:bui2018neural]](https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/46568.pdf) \n",
"\n",
"**Figure 14.NSL: Neural Structured Learning (NSL)**\n",
"![Neural Structured Learning (NSL)](https://raw.githubusercontent.com/jeffheaton/t81_558_deep_learning/master/images/nn_graph_input.png)\n",
"\n",
"A: An example of a graph and feature inputs. In this case, there are two labeled nodes ($x_i$, $x_j$) and one unlabeled\n",
"node ($x_k$), and two edges. The feature vectors, one for each node, are used as neural network inputs. \n",
"\n",
"B, C, and D: Illustration of Neural Graph Machine for feed-forward, convolution, and recurrent networks respectively: the training flow ensures the neural net to make accurate node-level predictions and biases the hidden representations/embeddings of neighboring nodes to be similar. In this example, we force $h_i$ and $h_j$ to be similar as there is an edge connecting $x_i$ and $x_j$ nodes. \n",
"\n",
"E: Illustration of how we can construct inputs to the neural network using the adjacency matrix. In this example, we have three nodes and two edges. The feature vector created for each node (shown on the right) has 1's at its index and indices of nodes adjacent.\n",
"\n",
"Figure 14.NSL shows that NSL can help when there are fewer training elements, it is from a TensorFlow [presentation by Google](https://www.youtube.com/watch?v=2Ucq7a8CY94). This video is a great starting point if you wish to do more with NSL.\n",
"\n",
"**Figure 14.NSL: Neural Structured Learning (NSL) Results**\n",
"![Neural Structured Learning (NSL) Results](https://raw.githubusercontent.com/jeffheaton/t81_558_deep_learning/master/images/neural-graph-results.png \"Neural Structured Learning (NSL) Results\")\n",
"\n",
"The x-axis shows the amount of training data available, and the y-axis shows the accuracy attained by the model. The two graphs show NSL applied to different neural network architectures. You can use NSL with most supervised neural network architectures. As the number of training elements decreases, there is a period where NSL helps keep the accuracy higher.\n",
"\n",
"\n",
"# Bert, AlBert, and Other NLP Technologies\n",
"\n",
"Natural Language Processing (NLP) has seen a tremendous number of advances in the past few years. One recent technology is Bidirectional Encoder Representations from\n",
"Transformers (BERT). [[Cite:devlin2018bert]](https://arxiv.org/pdf/1810.04805.pdf) BERT achieved \"state of the art\" results in the following key NLP benchmarks:\n",
"\n",
"* [GLUE](https://gluebenchmark.com/) [[Cite:wang2018glue]](https://arxiv.org/abs/1804.07461)\n",
"* [MultiNLI](https://www.nyu.edu/projects/bowman/multinli/) [[Cite:williams2017broad]](https://www.nyu.edu/projects/bowman/multinli/paper.pdf)\n",
"* [SQuAD](https://rajpurkar.github.io/SQuAD-explorer/) [[Cite:rajpurkar2016squad]](https://nlp.stanford.edu/pubs/rajpurkar2018squad.pdf)\n",
"\n",
"When a framework, such as BERT claims \"state of the art\" results, it is important to understand what is meant by that. The GLUE benchmark is composed of the following parts:\n",
"\n",
"* Corpus of Linguistic Acceptability (CoLA)\n",
"* Stanford Sentiment Treebank (SST-2)\n",
"* Microsoft Research Paraphrase Corpus (MRPC) \n",
"* Semantic Textual Similarity Benchmark (STS-B)\n",
"* Quora Question Pairs (QQP) \n",
"* Multi-Genre Natural Language Inference Corpus matched/mismatched (MNLI-m/MNLI-mm)\n",
"* Stanford Question Answering Dataset (QNLI)\n",
"* Recognizing Textual Entailment (RTE)\n",
"* Winograd Schema Challenge (WNLI)\n",
"\n",
"**Single Sentence Tasks**\n",
"\n",
"CoLA and SST-2 are both single sentence tasks. CoLA is made up of sample sentences from English grammar textbooks where the authors demonstrated acceptable and unacceptable English grammar usage. The task in CoLA is to classify a sentence as acceptable or unacceptable. Examples include:\n",
"\n",
"* Acceptable: The angrier Sue gets, the more Fred admires her.\n",
"* Unacceptable: The most you want, the least you eat.\n",
"\n",
"The task SST-2 is used to analyze sentiment. Sentences are classified by their degree of positivity vs negativity. For example:\n",
"\n",
"* Positive: the greatest musicians\n",
"* Negative: lend some dignity to a dumb story\n",
"\n",
"**Multi-Sentence Similarity Tasks**\n",
"* Transformers\n",
"* More Advanced Transfer Learning\n",
"* Augmentation\n",
"* Reinforcement Learning beyond TF-Agents\n",
"\n",
"The MRPC, QQP, and STS-B tasks look at similarity and paraphrase tasks. The MRPC tests the AIs ability to paraphrase. Each row contains two sentences and a target that indicates whether each pair captures a paraphrase/semantic equivalence relationship. For example, the following two sentences are considered to be equivalent:\n",
"\n",
"* He told The Sun newspaper that Mr. Hussein's daughters had British schools and hospitals in mind when they decided to ask for asylum . \n",
"* \"Saddam's daughters had British schools and hospitals in mind when they decided to ask for asylum -- especially the schools,\" he told The Sun.\n",
"\n",
"Conversely, though the following two sentences look similar, they are not considered equivalent:\n",
"\n",
"* Gyorgy Heizler, head of the local disaster unit, said the coach was carrying 38 passengers. \n",
"* The head of the local disaster unit, Gyorgy Heizler, said the coach driver had failed to heed red stoplights.\n",
"\n",
"The QQP tasks look at if two questions are asking the same thing. The Quora website provided this data. Examples of sentences that are considered to ask the same question:\n",
"\n",
"* What are the coolest Android hacks and tricks you know? \n",
"* What are some cool hacks for Android phones?\n",
"\n",
"Similarly, the following two questions are considered to be different in the QQP dataset.\n",
"\n",
"* If you received a check from Donald Knuth, what did you do, and why did you get it?\n",
"* How can I contact Donald Knuth?\n",
"\n",
"The STS-B dataset evaluates how similar two sentences are. If the target/label it 0, then the two sentences are completely dissimilar. A target value of 5 indicates that the two sentences are completely equivalent, as they\n",
"mean the same thing. For example:\n",
"\n",
"Two sentences with a label of 0:\n",
"\n",
"* A woman is dancing. \n",
"* A man is talking.\n",
"\n",
"Two sentences with a label of 5:\n",
"\n",
"* A plane is taking off. \n",
"* An airplane is taking off.\n",
"\n",
"**Inference Tasks**\n",
"\n",
"The tasks MNLI, QNLI, RTE, and WNLI are all inference tasks. The MNLI task provides two sentences that must be labeled neutral, contradiction, or entailment. For example, the following two sentences are a contradiction:\n",
"\n",
"* At the end of Rue des Francs-Bourgeois is what many consider to be the city's most handsome residential square, the Place des Vosges, with its stone and red brick facades.\n",
"* Place des Vosges is constructed entirely of gray marble.\n",
"\n",
"These two sentences are an entailment:\n",
"\n",
"* I burst through a set of cabin doors, and fell to the ground- \n",
"* I burst through the doors and fell down. \n",
"\n",
"These two sentences are neutral:\n",
"\n",
"* It's not that the questions they asked weren't exciting or legitimate (though most did fall under the already asked and answered). \n",
"* All of the questions were interesting according to a focus group consulted on the subject. \n",
"\n",
"The QNLI task poses a question and supporting sentence. The label states if the supporting information can answer the question. For example, the following two are labeled as \"not_entailment\":\n",
"\n",
"* Question: Which missile batteries often have individual launchers several kilometers from one another? \n",
"* Answer: When MANPADS is operated by specialists, batteries may have several dozen teams deploying separately in small sections; self-propelled air defense guns may deploy in pairs. \n",
"\n",
"Similarly, the following two sentences are labeled as \"entailment\":\n",
"\n",
"* Question: What two things does Popper argue Tarski's theory involves in an evaluation of truth? \n",
"* Answer: He bases this interpretation on the fact that examples such as the one described above refer to two things: assertions and the facts to which they refer. \n",
"\n",
"The RTE task is similar and looks at whether one sentence entails another. For example, the following two sentences are labeled as \"entailment\":\n",
"\n",
"* Lin Piao, after all, was the creator of Mao's \"Little Red Book\" of quotations. \n",
"* Lin Piao wrote the \"Little Red Book\".\n",
"\n",
"Similarly, the following two sentences are labeled as \"not_entailment\".\n",
"This section seeks only to provide a high-level overview of these emerging technologies. I provide links to supplemental material and code in each subsection. I describe these technologies in the following sections.\n",
"\n",
"* Oil prices fall back as Yukos oil threat lifted \n",
"* Oil prices rise.\n",
"Transformers are a relatively new technology that I will soon add to this course. They have resulted in many NLP applications. Projects such as the Bidirectional Encoder Representations from Transformers (BERT) and Generative Pre-trained Transformer (GPT-1,2,3) received much attention from practitioners. Transformers allow the sequence to sequence machine learning, allowing the model to utilize variable length, potentially textual, input. The output from the transformer is also a variable-length sequence. This feature enables the transformer to learn to perform such tasks as translation between human languages or even complicated NLP-based classification. Considerable compute power is needed to take advantage of transformers; thus, you should be taking advantage of transfer learning to train and fine-tune your transformers.\n",
"\n",
"The WNLI task checks to see if two sentences agree to a third. For example, the following two agree:\n",
"Complex models can require considerable training time. It is not unusual to see GPU clusters trained for days to achieve state of the art results. This complexity requires a substantial monetary cost to train a state of the art model. Because of this cost, you must consider transfer learning. Services, such as Hugging Face and NVIDIA GPU Cloud (NGC), contains many advanced pretrained neural networks for you to implement.\n",
"\n",
"* The foxes are getting in at night and attacking the chickens. They have gotten very bold. \n",
"* The foxes have gotten very bold.\n",
"Augmentation is a technique where algorithms generate additional training data augmenting the training data with new items that are modified versions of the original training data. This technique has seen many applications to computer vision. In this most basic example, the algorithm can flip images vertically and horizontally to quadruple the training set's size. Projects, such as NVIDIA StyleGAN2 ADA have implemented augmentation to substantially decrease the amount of training data that the algorithm needs.\n",
"\n",
"Similarly, the following two do not agree:\n",
"Currently, this course makes use of TF-Agents to implement reinforcement learning. TF-Agents is convenient because it is based on TensorFlow. However, TF-Agents has been slow to update compared to other frameworks. Additionally, when TF-Agents is updated, internal errors are often introduced that can take months for the TF-Agents team to fix. When I compare simple \"Hello World\" type examples for Atari games on platforms like Stable Baselines, to their TF-Agents equivilants, I am left wanting more from TF-Agents.\n",
"\n",
"* Sam pulled up a chair to the piano, but it was broken, so he had to stand instead. \n",
"* The piano was broken, so he had to stand instead.\n",
"## Programming Language Radar\n",
"\n",
"As a machine learning programming language, Python has an absolute lock on the industry. Python is not going anywhere, any time soon. My main issue with Python is end-to-end deployment. Unless you are dealing with Jupyter notebooks or training/pipeline scripts, Python will be your go-to language. However, to create edge applications, such as web pages and mobile apps, you will certainly need to utilize other languages. I do not suggest replacing Python with any of the following languages; however, these are some alternative languages and domains that you might choose to use them.\n",
"\n",
"** BERT High-Level Overview **\n",
"* **IOS Application Development** - Swift\n",
"* **Android Development** - Kotlin and Java\n",
"* **Web Development** - NodeJS and JavaScript\n",
"* **Mac Application Development** - Swift or JavaScript with Electron or React Native\n",
"* **Windows Application Development** - C# or JavaScript with Electron or React Native\n",
"* **Linux Application Development** - C/C++ w with Tcl/Tk or JavaScript with Electron or React Native\n",
"\n",
"BERT makes use of both pretraining and fine-tuning before it is ready to be used to evaluate data. It is important to understand the different roles of these two functions.\n",
"\n",
"* **Pretraining** - Ideally, this is done once per language. Pretraining is the portion of BERT that most will simply obtain from the original BERT model. These can be [downloaded here](https://github.com/google-research/bert).\n",
"* **Fine-Tuning** - This is where additional layers are added to the base BERT models to adapt it to the intended task.\n",
"## What About PyTorch?\n",
"\n",
"Figure 14.BERT-1 summarizes the pretraining and fine-tuning phases of BERT.\n",
"Technical folks love debates that can reach levels of fervor generally reserved for religion or politics. Python and TensorFlow are approaching this level of spirited competition. There is no clear winner, at least at this point. Why did I base this class on Keras/TensorFlow, as opposed to PyTorch? There are two primary reasons. The first reason is a fact; the second is my opinion.\n",
"\n",
"**Figure 14.BERT-1: Pretraining and Fine-Tuning Phases of BERT**\n",
"![NSL Results](https://raw.githubusercontent.com/jeffheaton/t81_558_deep_learning/master/images/bert-1.png)\n",
"PyTorch was not available in early 2016 when I introduced/developed this course.\n",
"PyTorch exposes lower-level details that would be distracting for an applications of deep learning course.\n",
"I recommend being familiar with core deep learning techniques and being adaptable to switch between these two frameworks.\n",
"\n",
"You present sentences to BERT in the method demonstrated in Figure 14.BERT-2.\n",
"## Where to From Here?\n",
"\n",
"**Figure 14.BERT-2: Sentences Presented to BERT**\n",
"![NSL Results](https://raw.githubusercontent.com/jeffheaton/t81_558_deep_learning/master/images/bert-2.png)\n",
"\n",
"So whats next? Here are some some ideas.\n",
"\n",
"* [Google CoLab Pro](https://colab.research.google.com/signup) - If you need more GPU power; but are not yet ready to buy a GPU of your own.\n",
"* [TensorFlow Certification](https://www.tensorflow.org/certificate)\n",
"* [Coursera](https://www.coursera.org/)\n",
"\n",
"# Explainability Frameworks\n",
"I really hope that you have enjoyed this course. If you have any suggestions for improvement or technology suggestions, please contact me. This course is always evolving, and I invite you to subscribe to my [YouTube channel](https://www.youtube.com/user/HeatonResearch) for my latest updates. I also frequently post videos beyond the scope of this course, so the channel itself is a good next step. Thank you very much for your interest and focus on this course. Other social media links for me include:\n",
"\n",
"Neural networks are notorious as black-box models. Such a model may make accurate predictions; however, explanations of why the black box model chose what it did can be elusive. There are two explainability libraries that I occasionally use:\n",
"* [Jeff Heaton GitHub](https://github.com/jeffheaton)\n",
"* [Jeff Heaton Twitter](https://twitter.com/jeffheaton)\n",
"* [Jeff Heaton Medium](https://medium.com/@heatonresearch)\n",
"\n",
"* [Lime](https://github.com/marcotcr/lime)\n",
"* [Explain it to Me Like I'm 5 (ELI5)](https://eli5.readthedocs.io/en/latest/)\n"
"\n"
]
},
{
Expand Down

0 comments on commit dc204c0

Please sign in to comment.