With the incredible growth of Language AI in recent years, capturing everything in a single book (even with 400 pages!) is near impossible. That does not mean one should not try and come up with a creative solution to this problem of size and growth.
We decided to cover the fundamentals of LLMs in the book which left us with an interesting opportunity. Using the book as a starting point, we could continue creating visual/illustrative content that explore certain topics in-depth that could not be covered that way in the book.
All bonus materials enhance the book through the same visual and illustrative style of the book
After reading the book, you are ready to go through these more complex topics through highly visual, detailed, and in-depth guides. Because you are already familiar with our illustrative styles, reading through these advanced concepts should be a breeze!
A Visual Guide to Quantization (Extends Chapters 7 and 12)
In the book, we cover the concept of quantization and showcase how it can be used to reduce computational requirements for both training and fine-tuning.
A Visual Guide to Quantization dives deeper into the technical intricacies of the technique, building upon the foundational concepts we introduced in the book.
The guide explore various quantization methods, such as as post-training quantization and quantization-aware training. It even showcases BitNet, a method by which we can get ternary parameters (3 values!).
By extending the book's coverage, readers can quickly get up to speed with the advanced topics covered in the guide. It serves as an excellent next step in the reading material.
A Visual Guide to Mamba (An "Alternative" Chapter 3)
Instead of extending Chapter 3, let's "replace" it! Or more accurately, let's provide an alternative.
A Visual Guide to Mamba and State Space Models provides an exciting alternative to the transformer architecture discussed in Chapter 3. While the chapter focuses on decoder-based transformer models, this blog introduces readers to a fundamentally different approach to sequence modeling.
By exploring this alternative architecture, the blog complements Chapter 3 by showing you there is more to this field than transformers. Exciting hybrid architectures even start combining Transformer- and Mamba blocks.
A Visual Guide to Mixture of Experts (Extends Chapter 3)
Chapter 3 gives a solid foundation for traditional transformer decoders and how they work. We also cover more advanced topics like efficient attention and advancements in positional embeddings.
One topic that we did not discuss in detail that has been gaining traction is called Mixture of Experts. It is a technique that enhances these transformer decoders by incorporating multiple specialized sub-networks or "experts."
A Visual Guide to Mixture of Experts (MoE) illustrates how MoE models dynamically route different inputs to the most appropriate experts, allowing for more efficient processing of diverse language tasks.
As another bonus, the blog continues into the field of vision language models:
By connecting this advanced concept to the decoder fundamentals covered in Chapter 3, readers can quickly dive into the seemingly complex but in practice straightforward method of Mixture of Experts.
The Illustrated Stable Diffusion (Extends Chapters 9)
In Chapter 9, we cover the Vision Transformer (ViT) and explore how text and images can be modeled together through CLIP. We showcase how CLIP can be used to bring images to textual models, allowing them to "reason" about images as well as text.
But CLIP can also be used for the opposite, bringing text to vision models, allowing to use the input text as guidance for the images they create.
The Illustrated Stable Diffusion explores the role of CLIP in stable diffusion which it does in the highly visual way that you can expect having read the book.
It further illustrates stable diffusion's inner working, showing how it combines CLIP's capabilities with advanced diffusion models to create high-quality images from text prompts.
The book focuses mostly on generating text but the process of generating images has significant overlap. The illustrated guide complements Chapter 9 well as more architectures, technologies and methods enter different domains and modalities.