{"text": " Hi everyone. So today we are once again continuing our implementation of MakeMore. Now so far we've come up to here, multilayer perceptrons, and our neural net looked like this, and we were implementing this over the last few lectures. Now I'm sure everyone is very excited to go into recurrent neural networks and all of their variants and how they work, and the diagrams look cool and it's very exciting and interesting, and we're going to get a better result. But unfortunately I think we have to remain here for one more lecture. And the reason for that is we've already trained this multilayer perceptron, right, and we are getting pretty good loss, and I think we have a pretty decent understanding of the architecture and how it works. But the line of code here that I take an issue with is here, loss.backward. That is, we are taking PyTorch autograd and using it to calculate all of our gradients along the way. And I would like to remove the use of loss.backward, and I would like us to write our backward pass manually on the level of tensors. And I think that this is a very useful exercise for the following reasons. I actually have an entire blog post on this topic, but I'd like to call backpropagation a leaky abstraction. And what I mean by that is backpropagation doesn't just make your neural networks just work magically. It's not the case that you can just stack up arbitrary Lego blocks of differentiable functions and just cross your fingers and backpropagate and everything is great. Things don't just work automatically. It is a leaky abstraction in the sense that you can shoot yourself in the foot if you do not understand its internals. It will magically not work or not work optimally. And you will need to understand how it works under the hood if you're hoping to debug it and if you are hoping to address it in your neural net. So this blog post here from a while ago goes into some of those examples. So for example, we've already covered them, some of them already. For example, the flat tails of these functions and how you do not want to saturate them too much because your gradients will die. The case of dead neurons, which I've already covered as well. The case of exploding or exploding gradients in the case of recurring neural networks, which we are about to cover. And then also you will often come across some examples in the wild. This is a snippet that I found in a random code base on the internet where they actually have like a very subtle but pretty major bug in their implementation. And the bug points at the fact that the author of this code does not actually understand backpropagation. So what they're trying to do here is they're trying to clip the loss at a certain maximum value. But actually what they're trying to do is they're trying to clip the gradients to have a maximum value instead of trying to clip the loss at a maximum value. And indirectly, they're basically causing some of the outliers to be actually ignored. Because when you clip the loss of an outlier, you are setting its gradient to 0. And so have a look through this and read through it. But there's basically a bunch of subtle issues that you're going to avoid if you actually know what you're doing. And that's why I don't think it's the case that because PyTorch or other frameworks offer autograd, it is okay for us to do. ignore how it works. Now, we've actually already covered autograd and we wrote micrograd, but micrograd was an autograd engine only on the level of individual scalars. So the atoms were single individual numbers. And, you know, I don't think it's enough. And I'd like us to basically think about backpropagation on the level of tensors as well. And so in a summary, I think it's a good exercise. I think it is very, very valuable. You're going to become better at debugging neural networks and making sure that you understand what you're doing. It is going to make everything fully explicit. So you're not going to be nervous about what is hidden away from you. And basically in general, we're going to emerge stronger. And so let's get into it. A bit of a fun historical note here is that today writing your backward pass by hand and manually is not recommended and no one does it except for the purposes of exercise. But about 10 years ago in deep learning, this was fairly standard and in fact pervasive. So at the time, everyone used to write their backward pass by hand manually. Including myself. And it's just what you would do. So we used to write backward pass by hand. And now everyone just calls lost that backward. We've lost something. I want to give you a few examples of this. So here's a 2006 paper from Jeff Hinton and Ruslan Slakhtinov in science that was influential at the time. And this was training some architectures called restricted Boltzmann machines. And basically, it's an autoencoder trained here. And this is from roughly 2000. In 2010, I had a library for training restricted Boltzmann machines. And this was at the time written in Matlab. So Python was not used for deep learning pervasively. It was all Matlab. And Matlab was this scientific computing package that everyone would use. So we would write Matlab, which is barely a programming language as well. But it had a very convenient tensor class. And it was this computing environment and you would run here. It would all run on the CPU, of course. But you would have very nice plots to go with it and a built-in debugger. And it was pretty nice. Now, the code in this package in 2010 that I wrote for fitting restricted Boltzmann machines to a large extent is recognizable. But I wanted to show you how you would... Well, I'm creating the data in the XY batches. I'm initializing the neural net. So it's got weights and biases just like we're used to. And then this is the training loop where we actually do the forward pass. And then here, at this time, they didn't even necessarily use back propagation to train neural networks. So this, in particular, implements a lot of the training that we're doing. It implements contrastive divergence, which estimates a gradient. And then here, we take that gradient and use it for a parameter update along the lines that we're used to. Yeah, here. But you can see that basically people are meddling with these gradients directly and inline and themselves. It wasn't that common to use an autograd engine. Here's one more example from a paper of mine from 2014 called Deep Fragment Embeddings. And here, what I was doing is I was aligning images and text. And here, I'm implementing the cost function. And it was standard to implement not just the cost, but also the backward pass manually. So here, I'm calculating the image embeddings, and I'm implementing the cost function. And here, I'm implementing the backward pass manually. Sentence embeddings, I calculate the scores. This is the loss function. And then once I have the loss function, I do the backward pass right here. So I backward through the loss function and through the neural net, and I append regularization. So everything was done by hand manually, and you would just write out the backward pass. And then you would use a gradient checker to make sure that your numerical estimate of the gradient agrees with the one you calculated during back propagation. So this was very standard for a long time. But today, of course, it is standard to use an autograd engine. But it was definitely useful, and I think people sort of understood how these neural networks work on a very intuitive level. And so I think it's a good exercise again, and this is where we want to be. Okay, so just as a reminder from our previous lecture, this is the Jupyter notebook that we implemented at the time. And we're going to keep everything the same. So we're still going to have a two-layer multi-layer perceptron with a batch normalization layer. So the forward pass will be basically identical to this lecture. But here, we're going to get rid of loss.backward. And instead, we're going to write the backward pass manually. Now, here's the starter code for this lecture. We are becoming a backprop ninja in this notebook. And the first few cells here are identical to what we are used to. So we are doing some imports, loading in the data set, and processing the data set. None of this changed. Now, here, I'm introducing a utility function that we're going to use later to compare the gradients. So in particular, we are going to have the gradients that we estimate manually ourselves. And we're going to have gradients that PyTorch calculates. And we're going to be checking for correctness, assuming, of course, that PyTorch is correct. Then here, we have the initialization that we are quite used to. So we have our embedding table for the characters, the first layer, second layer, and a batch normalization in between. And here's where we create all the parameters. Now, you will note that I changed the initialization a little bit to be small numbers. So normally, you would set the biases to be all zero. Here, I'm setting them to be small random numbers. And I'm doing this because if your variables are all zero, or initialized to exactly zero, sometimes what can happen is that can mask an incorrect implementation of a gradient. Because when everything is zero, it sort of like simplifies and gives you a much simpler expression of the gradient than you would otherwise get. And so by making it small numbers, I'm trying to unmask those potential errors in these calculations. You also notice that I'm using b1 in the first layer. I'm using a bias despite batch normalization right afterwards. So this would typically not be what you'd do because we talked about the bias. So I'm going to mask the bias. And I'm going to mask the bias. And I'm going to fact that you don't need a bias but i'm doing this here just for fun because we're going to have a gradient with respect to it and we can check that we are still calculating it correctly even though this bias is spurious so here i'm calculating a single batch and then here i am doing a forward pass now you'll notice that the forward pass is significantly expanded from what we are used to here the forward pass was just um here now the reason that the forward pass is longer is for two reasons number one here we just had an f dot cross entropy but here i am bringing back a explicit implementation of the loss function and number two i've broken up the implementation into manageable chunks so we have a lot a lot more intermediate tensors along the way in the forward pass and that's because we are about to go backwards and calculate the gradients in this back propagation from the bottom to the top so we're going to go upwards and just like we have for example the lockpick props tensor in a forward pass in a backward pass we're going to have a d lock props which is going to store the derivative of the loss with respect to the lock props tensor and so we're going to be prepending d to every one of these tensors and calculating it along the way of this back propagation so as an example we have a b in raw here we're going to be calculating a db in raw so here i'm telling pytorch that we want to retain the grad of all these intermediate values because here in exercise one we're going to calculate the backward pass so we're going to calculate all these d variable d variables and use the cmp function i've introduced above to check our correctness with respect to what pytorch is telling us this is going to be exercise one where we sort of back propagate through this entire graph now just to give you a very quick preview of what's going to happen in exercise two and below here we have fully broken up the loss and back propagated through it manually in all the little atomic pieces that make it up but here we're going to collapse the loss into a single cross entropy call and instead we're going to analytically derive using math and paper and pencil the gradient of the loss with respect to the logits and instead of back propagating through all of its little chunks one at a time we're just going to analytically derive what that gradient is and we're going to implement that which is much more efficient as we'll see in a bit then we're going to do the exact same thing for batch normalization so instead of breaking up bastion arm into all the little tiny components we're going to use pen and paper and mathematics and calculus to derive the gradient through the bachelor bathroom layer so we're going to calculate the backward pass through bathroom layer in a much more efficient expression instead of backward propagating through all of its little pieces independently so it's going to be exercise three and then in exercise four we're going to put it all together and this is the full code of training this two layer mlp and we're going to basically insert our manual backdrop and we're going to take out lost up backward and you will basically see that you can get all the same results using fully your own code and the only thing we're using from pytorch is the torch.tensor to make the calculations efficient but otherwise you will understand fully what it means to forward and backward the neural net and train it and i think that'll be awesome so let's get to it okay so i ran all the cells of this notebook all the way up to here and i'm going to erase this and i'm going to start implementing backward pass starting with d lock probes so we go here to calculate the gradient of the loss with respect to all the elements of the lock props tensor now i'm going to give away the answer here but i wanted to put a quick note here that i think would be most pedagogically useful for you is to actually go into the description of this video and find the link to this jupyter notebook you can find it both on github but you can also find google collab with it so you don't have to install anything you'll just go to a website on google collab and you can try to implement these derivatives or gradients yourself and then if you are not able to come to my video and see me do it and so work in tandem and try it first yourself and then see me give away the answer and i think that'll be most valuable to you and that's how i recommend you go through this lecture so we are starting here with d log props now d log props will hold the derivative of the loss with respect to all the elements of log props what is inside log blobs the shape of this is 32 by 27. so it's not going to surprise you that d log props should also be an array of size 32 by 27 because we want the derivative loss with respect to all of its elements so the sizes of those are always going to be equal now how how does log probes influence the loss okay loss is negative log probes indexed with range of n and yb and then the mean of that now just as a reminder yb is just basically an array of all the correct indexes of all the indexes of all the indexes of all the indexes of all the indexes of all the indexes so what we're doing here is we're taking the log props array of size 32 by 27 right and then we are going in every single row and in each row we are plugging plucking out the index 8 and then 14 and 15 and so on so we're going down the rows that's the iterator range of n and then we are always plucking out the index and the column specified by this tensor yb so in the zeroth row we are taking the eighth column in the first row we're taking the 14th column etc and so log props at this plucks out all those log probabilities of the correct next character in a sequence so that's what that does and the shape of this or the size of it is of course 32 because our batch size is 32. so these elements get plucked out and then their mean and the negative of that becomes loss so i always like to examples to understand the numerical form of derivative what's going on here is once we've plucked out these examples um we're taking the mean and then the negative so the loss basically if i can write it this way is the negative of say a plus b plus c and the mean of those three numbers would be say negative would divide three that would be how we achieve the mean of three numbers a b c although we actually have 32 numbers here and so what is loss by say like da right well if we simplify this expression mathematically this is negative one over three of a and negative one plus negative one over three of b plus negative one over three of c and so what is d loss by d a it's just negative one over three and so you can see that if we don't just have a b and c but we have 32 numbers then d loss by d um you know every one of those numbers is going to be one over n more generally the size of the batch, 32 in this case. So DLoss by DLockProbs is negative one over N in all these places. Now, what about the other elements inside LockProbs? Because LockProbs is a large array. You see that LockProbs.shape is 32 by 27, but only 32 of them participate in the loss calculation. So what's the derivative of all the other, most of the elements that do not get blocked out here? Well, their loss intuitively is zero. Sorry, their gradient intuitively is zero. And that's because they did not participate in the loss. So most of these numbers inside this tensor does not feed into the loss. And so if we were to change these numbers, then the loss doesn't change, which is the equivalent of what I was saying, that the derivative of the loss with respect to them is zero. They don't impact it. So here's a way to implement this derivative then. We start out with Torch.zeros of shape 32. So we're going to set it to 32 by 27, or let's just say instead of doing this because we don't want to hard-code numbers, let's do Torch.zeros like LockProbs. So basically this is going to create an array of zeros exactly in the shape of LockProbs. And then we need to set the derivative of negative one over n inside exactly these locations. So here's what we can do. The LockProbs indexed in the identical way will be just set to negative one over zero divide n. Right, just like we derived here. So now let me erase all of these reasoning. And then this is the candidate derivative for DLockProbs. Let's uncomment the first line and check that this is correct. Okay, so CMP ran, and let's go back to CMP. And you see that what it's doing is it's calculating if the calculated value by us, which is dt, is exactly equal to t.grad as calculated by PyTorch. And then this is making sure that all of the elements are exactly equal. And then converting this to a single Boolean value because we don't want a Boolean tensor, we just want a Boolean value. And then here we are making sure that, okay, if they're not exactly equal, maybe they are approximately equal because of some floating point issues. But they're very, very close. So here we are using Torch.allClose, which has a little bit of a wiggle available because sometimes you can get very, very close. But if you use a slightly different calculation, because of floating point, you can't get very, very close. because of floating point arithmetic, you can get a slightly different result. So this is checking if you get an approximately close result. And then here we are checking the maximum, basically the value that has the highest difference, and what is the difference, and the absolute value difference between those two. And so we are printing whether we have an exact equality, an approximate equality, and what is the largest difference. And so here we see that we actually have exact equality. And so therefore, of course, we also have an approximate equality, and the maximum difference is exactly zero. So basically, our DLOGPROPS is exactly equal to what PyTorch calculated to be logPROPS.grad in its backpropagation. So, so far, we're doing pretty well. Okay, so let's now continue our backpropagation. We have that logPROPS depends on PROPS through a log. So all the elements of PROPS are being element-wise applied log to. Now, if we want DPROPS... then, then remember your micrograph training. We have like a log node. It takes in PROPS and creates logPROPS. And DPROPS will be the local derivative of that individual operation, log, times the derivative loss with respect to its output, which in this case is DLOGPROPS. So what is the local derivative of this operation? Well, we are taking log element-wise, and we can come here and we can see, well, from alpha is your friend, that d by dx of log of x is just simply 1 over x. So therefore, in this case, x is PROPS. So we have d by dx is 1 over x, which is 1 over PROPS. And then this is the local derivative. And then times, we want to chain it. So this is chain rule. Times DLOGPROPS. Then let me uncomment this and let me run the cell in place. And we see that the derivative of PROPS as we calculated here is exactly correct. And so notice here how this works. PROPS that are... PROPS is going to be inverted and then element-wise, and then element-wise, multiplied here. So if your PROPS is very, very close to 1, that means your network is currently predicting the character correctly, then this will become 1 over 1, and DLOGPROPS just gets passed through. But if your probabilities are incorrectly assigned, so if the correct character here is getting a very low probability, then 1.0 dividing by it will boost this and then multiply by DLOGPROPS. So basically, what this line is doing intuitively is it's taking the... the examples that have a very low probability currently assigned, and it's boosting their gradient. You can look at it that way. Next up is COUNTSUMINV. So we want the derivative of this. Now, let me just pause here and kind of introduce what's happening here in general, because I know it's a little bit confusing. We have the logits that come out of the neural net. Here, what I'm doing is I'm finding the maximum in each row, and I'm subtracting it for the purpose of numerical stability. And we talked about how... if you do not do this, you run into numerical issues if some of the logits take on two large values because we end up exponentiating them. So this is done just for safety, numerically. Then here's the exponentiation of all the sort of logits to create our counts. And then we want to take the sum of these counts and normalize so that all of the probes sum to 1. Now here, instead of using 1 over COUNTSUM, I use raised to the power of negative 1. Mathematically, they are identical. I just found that there's something wrong with the PyTorch implementation of the backward pass of division, and it gives like a weird result, but that doesn't happen for star star negative 1, so I'm using this formula instead. But basically, all that's happening here is we got the logits, we want to exponentiate all of them, and we want to normalize the counts to create our probabilities. It's just that it's happening across multiple lines. So now, here, we want to normalize the counts to create our probabilities. We want to first take the derivative, we want to backpropagate into COUNTSUM and then into COUNTS as well. So what should be the COUNTSUM? Now, we actually have to be careful here because we have to scrutinize and be careful with the shapes. So COUNTS.shape and then COUNTSUM.inv.shape are different. So in particular, COUNTS is 32 by 27, but this COUNTSUM.inv is 32 by 1. And so in this multiplication here, we also have an implicit broadcasting that PyTorch will do because it needs to take this column tensor of 32 numbers and replicate it horizontally 27 times to align these two tensors so it can do an element-wise multiply. So really what this looks like is the following, using a toy example again. What we really have here is just props is COUNTS times COUNTSUM.inv. So it's C equals A times B. But A is 3 by 3, and B is just 3 by 1, a column tensor. And so PyTorch internally replicated these elements of B, and it did that across all the columns. So for example, B1, which is the first element of B, would be replicated here across all the columns in this multiplication. And now we're trying to backpropagate through this operation to COUNTSUM.inv. So when we are calculating this derivative, it's important to realize that this looks like a single operation, but actually is two operations applied sequentially. The first operation that PyTorch did is it took this column tensor and replicated it across all the columns, basically 27 times. So that's the first operation, it's a replication. And then the second operation is the multiplication. So let's first backprop through the multiplication. If these two arrays were of the same size, and we just have A and B, both of them 3 by 3, then how do we backpropagate through a multiplication? So if we just have scalars and not tensors, then if you have C equals A times B, then what is the derivative of C with respect to B? Well, it's just A. And so that's the local derivative. So here in our case, undoing the multiplication and backpropagating through just the multiplication itself, which is element-wise, is going to be the local derivative, which in this case is simply COUNTS, because COUNTS is the A. So this is the local derivative, and then times, because of the chain rule, dprops. So this here is the dprops. So this is the local derivative, or the gradient, but with respect to replicated B. But we don't have a replicated B, we just have a single B column. So how do we now backpropagate through the replication? And intuitively, this B1 is the same variable, and it's just reused multiple times. And so you can look at it as being equivalent to a case we've encountered in micrograd. And so here I'm just pulling out a random graph we used in micrograd. We had an example where a single node, has its output feeding into two branches of basically the graph until the last function. And we're talking about how the correct thing to do in the backward pass is we need to sum all the gradients that arrive at any one node. So across these different branches, the gradients would sum. So if a node is used multiple times, the gradients for all of its uses sum during backpropagation. So here, B1 is used multiple times in all these columns, and therefore the right thing to do here is to sum horizontally across all the rows. So we want to sum in dimension 1, but we want to retain this dimension so that the countSumInv and its gradient are going to be exactly the same shape. So we want to make sure that we keep them as true so we don't lose this dimension. And this will make the countSumInv be exactly shaped 32 by 1. So revealing this comparison as well and running this, we see that we get an exact match. So this derivative is exactly correct. And let me erase this. Now let's also backpropagate into counts, which is the other variable here to create props. So from props to countSumInv, we just did that. Let's go into counts as well. So dCounts is our A. So dC by dA is just B. So therefore it's countSumInv. And then times, chain rule, dProps. Now countSumInv is 32 by 1. dProps is 32 by 27. So those will broadcast fine and will give us dCounts. There's no additional summation required here. There will be a broadcasting that happens in this multiply here because countSumInv needs to be replicated again to correctly multiply dProps. But that's going to give the correct result. So as far as this single operation is concerned. So we've backpropagated from props to counts, but we can't actually check the derivative of counts. I have it much later on. And the reason for that is because countSumInv depends on counts. And so there's a second branch here that we have to finish. Because countSumInv backpropagates into countSum, and countSum will backpropagate into counts. And so counts is a node that is being used twice. It's used right here in two props, and it goes through this other branch through countSumInv. So even though we've calculated the first contribution of it, we still have to calculate the second contribution of it later. Okay, so we're continuing with this branch. We have the derivative for countSumInv. Now we want the derivative for countSum. So dCountSum equals what is the local derivative of this operation? So this is basically an element-wise 1 over countsSum. So countSum raised to the power of negative 1 is the same as 1 over countsSum. If we go to wall from alpha, we see that x to the negative 1, d by dx of it, is basically negative x to the negative 2. Negative 1 over s squared is the same as negative x to the negative 2. So dCountSum here will be local derivative is going to be negative countsSum to the negative 2, that's the local derivative, times chain rule, which is countSumInv. So that's dCountSum. Let's uncomment this and check that I am correct. Okay, so we have perfect equality. And there's no sketchiness going on here with any shapes because these are of the same shape. Okay, next up we want to backpropagate through this line. We have that countSum is counts.sum along the rows. So I wrote out some help here. We have to keep in mind that counts, of course, is 32 by 27. And countsSum is 32 by 1. So in this backpropagation, we need to take this column of derivatives and transform it into an array of derivatives, two-dimensional array. So what is this operation doing? We're taking some kind of an input, like, say, a 3x3 matrix A, and we are summing up the rows into a column tensor B. B1, B2, B3, that is basically this. So now we have the derivatives of the loss with respect to B. And now we have the elements of B. And now we want the derivative of the loss with respect to all these little a's. So how do the b's depend on the a's, is basically what we're after. What is the local derivative of this operation? Well, we can see here that B1 only depends on these elements here. The derivative of B1 with respect to all of these elements down here is 0. But for these elements here, like A11, A12, etc., the local derivative is 1, right? So dB1 by dA11, for example, is 1. So it's 1, 1, and 1. So when we have the derivative of the loss with respect to B1, the local derivative of B1 with respect to these inputs is 0 here, but it's 1 on these guys. So in the chain rule, we have the local derivative times the derivative of B1. And so because the local derivative is 1 on these three elements, the local derivative of multiplying the derivative of B1 will just be the derivative of B1. And so you can look at it as a router. Basically, an addition is a router of gradient. Whatever gradient comes from above, it just gets routed equally to all the elements that participate in that addition. So in this case, the derivative of B1 will just flow equally to the derivative of A11, A12, and A13. So if we have a derivative of all the elements of B in this column tensor, which we calculated just now, we basically see that what that amounts to is all of these are now flowing to all these elements of A, and they're doing that horizontally. So basically what we want is we want to take the decount sum of size 32 by 1, and we just want to replicate it 27 times horizontally to create 32 by 27 array. So there's many ways to implement this operation. You could, of course, just replicate the tensor, but I think maybe one clean one is that decounts is simply torch.once-like, so just two-dimensional arrays of once in the shape of counts, so 32 by 27, times decounts sum. So this way we're letting the broadcasting here basically implement the replication. You can look at it that way. But then we have to also be careful because decounts was all already calculated. We calculated earlier here, and that was just the first branch, and we're now finishing the second branch. So we need to make sure that these gradients add, so plus equals. And then here, let's comment out the comparison, and let's make sure, crossing fingers, that we have the correct result. So PyTorch agrees with us on this gradient as well. Okay, hopefully we're getting a hang of this now. Counts is an element-wise exp of norm logits. So now we want dNormLogits, and because it's an element-wise operation, everything is very simple. It's the local derivative of e to the x. It's famously just e to the x. So this is the local derivative. That is the local derivative. Now we already calculated it, and it's inside counts. So we may as well potentially just reuse counts. That is the local derivative. Times dCounts. Funny as that looks. Counts times dCounts is dNormLogits. And now let's erase this, and let's verify, and let's go. So that's dNormLogits. Okay, so we are here on this line now, dNormLogits. We have that, and we're trying to calculate dLogits and dLogitMaxes. So back-propagating through this line. Now we have to be careful here, because the shapes, again, are not the same, and so there's an implicit broadcasting happening here. So dNormLogits has the shape 32x27. dLogits does as well, but dLogitMaxes is only 32x1. So there's a broadcast here in the minus. Now here I tried to sort of write out a toy example again. We basically have that this is our c equals a minus b, and we see that because of the shape, these are 3x3, but this one is just a column. And so for example, every element of c, we have to look at how it came to be. And every element of c is just the corresponding element of a minus basically that associated b. So it's very clear now that the derivatives of every one of these c's with respect to their inputs are 1 for the corresponding a, and it's a negative 1 for the corresponding b. And so therefore, the derivatives on the c will flow equally to the corresponding a's, and then also to the corresponding b's. But then in addition to that, the b's are broadcast, so we'll have to do the additional sum just like we did before. And of course, the derivatives for b's will undergo a minus, because the local derivative here is negative 1. So dc32 by db3 is negative 1. So let's just implement that. Basically, dlogits will be exactly copying the derivative on normlogits. So dlogits equals dnormlogits, and I'll do a .clone for safety, so we're just making a copy. And then we have that dlogitmaxis, will be the negative of dnormlogits, because of the negative sign. And then we have to be careful because dlogitmaxis is a column, and so just like we saw before, because we keep replicating the same elements across all the columns, then in the backward pass, because we keep reusing this, these are all just like separate branches of use of that one variable. And so therefore, we have to do a sum along 1, we'd keep them equals true, so that we don't destroy this dimension. And then dlogitmaxis will be the same shape. Now we have to be careful because this dlogits is not the final dlogits, and that's because not only do we get gradient signal into logits through here, but logitmaxis is a function of logits, and that's a second branch into logits. So this is not yet our final derivative for logits, we will come back later for the second branch. For now, dlogitmaxis is the final derivative, so let me uncomment this cmp here, and let's just run this. And logitmaxis, if pytorch, agrees with us. So that was the derivative into, through this line. Now before we move on, I want to pause here briefly, and I want to look at these logitmaxis, and especially their gradients. We've talked previously in the previous lecture that the only reason we're doing this is for the numerical stability of the softmax that we are implementing here. And we talked about how if you take these logits for any one of these examples, so one row of this logits tensor, if you add or subtract any value equally to all the elements, then the value of the probes will be unchanged. You're not changing the softmax. The only thing that this is doing is it's making sure that exp doesn't overflow. And the reason we're using a max is because then we are guaranteed that each row of logits, the highest number, is zero. And so this will be safe. And so basically that has repercussions. If it is the case that changing logitmaxis does not change the probes, and therefore does not change the loss, then the gradient on logitmaxis should be zero. Because saying those two things is the same. So indeed we hope that this is very, very small numbers. Indeed we hope this is zero. Now because of floating point sort of wonkiness, this doesn't come out exactly zero. Only in some of the rows it does. But we get extremely small values, like 1e-9 or 10. And so this is telling us that the values of logitmaxis are not impacting the loss as they shouldn't. It feels kind of weird to backpropagate through this branch, honestly, because if you have any implementation of f.crossentropy and pytorch, and you block together all of these elements, and you're not doing backpropagation piece by piece, then you would probably assume that the derivative through here is exactly zero. So you would be sort of skipping this branch. Because it's only done for numerical stability. But it's interesting to see that even if you break up everything into the full atoms, and you still do the computation as you'd like with respect to numerical stability, the correct thing happens. And you still get very, very small gradients here. Basically reflecting the fact that the values of these do not matter with respect to the final loss. Okay, so let's now continue backpropagation through this line here. We've just calculated the logitmaxis, and now we want to backprop into logits through this second branch. Now here of course we took logits, and we took the max along all the rows, and then we looked at its values here. Now the way this works is that in pytorch, this thing here, the max returns both the values, and it returns the indices at which those values to count the maximum value. Now in the forward pass, we only used values, because that's all we needed. But in the backward pass, it's extremely useful to know about where those maximum values occurred. And we have the indices at which they occurred. And this will of course help us do the backpropagation. Because what should the backward pass be here in this case? We have the logit tensor, which is 32 by 27, and in each row we find the maximum value, and then that value gets plucked out into logitmaxis. And so intuitively, basically the derivative flowing through here then should be 1 times the local derivative is 1 for the appropriate entry that was plucked out, and then times the global derivative, of the logitmaxis. So really what we're doing here, if you think through it, is we need to take the delogitmaxis, and we need to scatter it to the correct positions in these logits, from where the maximum values came. And so, I came up with one line of code that does that. Let me just erase a bunch of stuff here. You could do it kind of very similar to what we've done here, where we create a zeros, and then we populate the correct elements. So we use the indices here, and we would set them to be 1. But you can also use one hot. So f dot one hot, and then I'm taking the logits of max over the first dimension dot indices, and I'm telling PyTorch that the dimension of every one of these tensors should be 27. And so what this is going to do is... Okay, I apologize, this is crazy. PLT dot imchev of this. It's really just an array of where the maxes came from in each row, and that element is 1, and all the other elements are 0. So it's one hot vector in each row, and these indices are now populating a single 1 in the proper place. And then what I'm doing here is I'm multiplying by the logit maxes. And keep in mind that this is a column of 32 by 1. And so when I'm doing this times the logit maxes, the logit maxes will broadcast and that column will get replicated, and then element-wise multiply will ensure that each of these just gets routed to whichever one of these bits is turned on. And so that's another way to implement this kind of operation, and both of these can be used. I just thought I would show an equivalent way to do it. And I'm using plus equals because we already calculated the logits here, and this is now the second branch. So let's look at logits and make sure that this is correct. And we see that we have exactly the correct answer. Next up, we want to continue with logits here. That is an outcome of a matrix multiplication and a bias offset in this linear layer. So I've printed out the shapes of all these intermediate tensors. We see that logits is of course 32 by 27, as we've just seen. Then the h here is 32 by 64. So these are 64-dimensional hidden states. And then this w matrix projects those 64-dimensional vectors into 27 dimensions. And then there's a 27-dimensional offset, which is a one-dimensional vector. Now we should note that this plus here actually broadcasts, because h multiplied by w2 will give us a 32 by 27. And so then this plus b2 is a 27-dimensional vector here. Now in the rules of broadcasting, what's going to happen with this bias vector is that this one-dimensional vector of 27 will get a lot aligned with an padded dimension of 1 on the left. And it will basically become a row vector, and then it will get replicated vertically 32 times to make it 32 by 27, and then there's an element-wise multiply. Now the question is how do we backpropagate from logits to the hidden states, the weight matrix w2, and the bias b2? And you might think that we need to go to some matrix calculus, and then we have to look up the derivative for matrix multiplication, but actually you don't have to do any of that, and you can go back to first principles and derive this yourself on a piece of paper. And specifically what I like to do, and what I find works well for me, is you find a specific small example that you then fully write out, and then in the process of analyzing how that individual small example works, you will understand the broader pattern, and you'll be able to generalize and write out the full general formula for how these derivatives flow in an expression like this. So let's try that out. So pardon the low-budget production here, but what I've done here is I'm writing it out on a piece of paper. Really what we are interested in is we have a multiply b plus c, and that creates a d. And we have the derivative of the loss with respect to d, and we'd like to know what the derivative of the loss is with respect to a, b, and c. Now these here are little two-dimensional examples of matrix multiplication. 2 by 2 times a 2 by 2, plus a 2, a vector of just two elements, c1 and c2, gives me a 2 by 2. Now notice here that I have a bias vector here called c, and the bias vector is c1 and c2, but as I described over here, that bias vector will become a row vector in the broadcasting, and will replicate vertically. So that's what's happening here as well. c1, c2 is replicated vertically, and we see how we have two rows of c1, c2 as a result. So now when I say write it out, I just mean like this. Basically break up this matrix multiplication into the actual thing that's going on under the hood. So as a result of matrix multiplication and how it works, d11 is the result of a dot product between the first row of a and the first column of b. So a11, b11, plus a12, b21, plus c1. And so on and so forth for all the other elements of d. And once you actually write it out, it becomes obvious that it's just a bunch of multiplies and adds. And we know from micrograd how to differentiate multiplies and adds. And so this is not scary anymore. It's not just matrix multiplication. It's just tedious, unfortunately. But this is completely tractable. We have dl by d for all of these, and we want dl by all these little other variables. So how do we achieve that, and how do we actually get the gradients? Okay, so the low-budget production continues here. So let's, for example, derive the derivative of the loss with respect to a11. We see here that a11 occurs twice in our simple expression, right here, right here, and influences d11 and d12. So this is, so what is dl by d a11? Well, it's dl by d11 times the local derivative of d11, which in this case is just b11, because that's what's multiplying a11 here. And likewise here, the local derivative of d12 with respect to a11 is just b12. And so b12 will, in the chain rule, therefore, multiply dl by d12. And then, because a11 is used both to produce d11 and d12, we need to add up the contributions of both of those sort of chains that are running in parallel. And that's why we get a plus, just adding up those two, those two contributions. And that gives us dl by d a11. We can do the exact same analysis for the other one, for all the other elements of a. And when you simply write it out, it's just super simple taking of gradients on, you know, expressions like this. You find that this matrix, dl by d a, that we're after, right, if we just arrange all of them in the same shape as a takes, so a is just a 2x2 matrix, so dl by d a here will be also just the same shape tensor with the derivatives now. So dl by d a11 etc. And we see that actually we can express what we've written out here as a matrix multiply. And so it just so happens that dl by, that all of these formulas that we've derived here by taking gradients can actually be expressed as a matrix multiplication. And in particular, we see that it is the matrix multiplication of these two matrices. So it is the dl by d and then matrix multiplying b, but b transpose, actually. So you see that b21 and b12 have changed place, whereas before we had, of course, b11, b12, b21, b22. So you see that this other matrix, b, is transposed. And so basically what we have, long story short, just by doing very simple reasoning here, by breaking up the expression in the case of a very simple example, is that dl by d a is which is this, is simply equal to dl by dd matrix multiplied with b transpose. So that is what we have so far. Now, we also want the derivative with respect to b and c. Now, for b, I'm not actually doing the full derivation because, honestly, it's not deep. It's just annoying. It's exhausting. You can actually do this analysis yourself. You'll also find that if you take these expressions and you differentiate with respect to b instead of a, you will find that dl by db is also a matrix multiplication. In this case, you have to take the matrix a and transpose it and matrix multiply that with dl by dd. And that's what gives you dl by db. And then here for the offsets, c1 and c2, if you again just differentiate with respect to c1, you will find an expression like this. And c2, an expression like this. And basically you'll find that dl by dc is simply, because they're just offsetting these expressions, you just have to take the dl by dd matrix of the derivatives of d and you just have to sum across the columns. And that gives you the derivatives for c. So, long story short, the backward pass of a matrix multiply is a matrix multiply. And instead of, just like we had d equals a times b plus c, in a scalar case, we sort of arrive at something very, very similar but now with a matrix multiplication instead of a scalar multiplication. So, the derivative of d with respect to a is dl by dd matrix multiply b transpose and here it's a transpose multiply dl by dd. But in both cases it's a matrix multiplication with the derivative and the other term in the multiplication. And for c it is a sum. Now, I'll tell you a secret. I can never remember the formulas that we just derived for backpropagating a matrix multiplication and I can backpropagate through these expressions just fine. And the reason this works is because the dimensions have to work out. So, let me give you an example. Say I want to create dh. Then what should dh be? Number one, I have to know that the shape of dh must be the same as the shape of h. And the shape of h is 32 by 64. And then the other piece of information I know is that dh must be some kind of matrix multiplication of dlogits with w2. And dlogits is 32 by 27 and w2 is 64 by 27. There is only a single way to make the shape work out in this case and it is indeed the correct result. In particular here, h needs to be 32 by 64. The only way to achieve that is to take dlogits and matrix multiply it with you see how I have to take w2 but I have to transpose it to make the dimensions work out. So w2 transpose. And it is the only way to make these to matrix multiply those two pieces to make the shapes work out. And that turns out to be the correct formula. So if we come here, we want dh which is da and we see that da is dl by dd matrix multiply b transpose. So that is dlogits multiply and b is w2. So w2 transpose which is exactly what we have here. So there is no need to remember these formulas. Similarly, now if I want dw2 well I know that it must be a matrix multiplication of dlogits and h and maybe there is a few transpose like there is one transpose in there as well. And I don't know which way it is so I have to come to w2 and I see that its shape is 64 by 27 and that has to come from some matrix multiplication of these two. And so to get a 64 by 27 I need to take h I need to transpose it and then I need to matrix multiply it so that will become 64 by 32 and then I need to matrix multiply it with 32 by 27 and that's going to give me a 64 by 27. So I need to matrix multiply this with dlogits.shape just like that. That's the only way to make the dimensions work out and just use matrix multiplication. And if we come here, we see that that's exactly what's here. So a transpose a for us is h multiplied with dlogits. So that's w2 and then db2 is just the vertical sum and actually in the same way, there's only one way to make the shapes work out. I don't have to remember that it's a vertical sum along the 0th axis because that's the only way that this makes sense. Because b2's shape is 27 so in order to get a dlogits here it's 32 by 27 so knowing that it's just sum over dlogits in some direction that direction must be 0 because I need to eliminate this dimension. So it's this. So this is kind of like the hacky way. Let me copy paste and delete that and let me swing over here and this is our backward pass for the linear layer. Hopefully. So now let's uncomment these three and we're checking that we got all the three derivatives correct and run and we see that h, w2 and b2 are all exactly correct. So we backpropagate it through a linear layer. Now next up we have derivative for the h already and we need to backpropagate through tanh into h preact. So we want to derive dh preact and here we have to backpropagate through a tanh and we've already done this in micrograd and we remember that tanh is a very simple backward formula. Now unfortunately if I just put in d by dx of f tanh of x into volt from alpha it lets us down. It tells us that it's a hyperbolic secant function squared of x. It's not exactly helpful but luckily google image search does not let us down and it gives us the simpler formula. In particular if you have that a is equal to tanh of z then da by dz backpropagating through tanh is just 1 minus a square and take note that 1 minus a square a here is the output of the tanh not the input to the tanh z. So the da by dz is here formulated in terms of the output of that tanh and here also in google image search we have the full derivation if you want to actually take the actual definition of tanh and work through the math to figure out 1 minus tanh square of z. So 1 minus a square is the local derivative. In our case that is 1 minus the output of tanh square which here is h so it's h square and that is the local derivative and then times the chain rule dh. So that is going to be our candidate implementation so if we come here and then uncomment this let's hope for the best and we have the right answer. Okay next up we have dh preact and we want to backpropagate into the gain the b in raw and the b in bias. So here this is the bash norm parameters b in gain and bias inside the bash norm that take the b in raw that is exact unit Gaussian and they scale it and shift it and these are the parameters of the bash norm. Now here we have a multiplication but it's worth noting that this multiply is very very different from this matrix multiply here matrix multiply are dot products between rows and columns of these matrices involved. This is an element wise multiply so things are quite a bit simpler. Now we do have to be careful with some of the broadcasting happening in this line of code though. So you see how b in gain and b in bias are 1 by 64 but h preact and b in raw are 32 by 64. So we have to be careful with that and make sure that all the shapes work out fine and that the broadcasting is correctly backpropagated. So in particular let's start with db in gain so db in gain should be and here this is again element wise multiply and whenever we have a times b equals c we saw that the local derivative here is just if this is a the local derivative is just the b the other one. So this local derivative is just b in raw and then times chain rule so dh preact. So this is the candidate gradient. Now again we have to be careful because b in gain is of size 1 by 64 but this here would be 32 by 64 and so the correct thing to do in this case of course is that b in gain here is a rule vector of 64 numbers it gets replicated vertically in this operation and so therefore the correct thing to do is to sum because it's being replicated and therefore all the gradients in each of the rows that are now flowing backwards need to sum up to that same tensor db in gain. So we have to sum across all the zero all the examples basically which is the direction in which this gets replicated and now we have to be also careful because b in gain is of shape 1 by 64. So in fact I need to keep them as true otherwise I would just get 64. Now I don't actually really remember why the b in gain and the b in bias I made them be 1 by 64 but the biases b1 and b2 I just made them be one-dimensional vectors they're not two-dimensional tensors so I can't recall exactly why I left the gain and the bias as two-dimensional but it doesn't really matter as long as you are consistent and you're keeping it the same. So in this case we want to keep the dimension so that the tensor shapes work. Next up we have b in raw so db in raw will be b in gain multiplying dh preact that's our chain rule. Now what about the dimensions of this? We have to be careful, right? So dh preact is 32 by 64 b in gain is 1 by 64 so it will just get replicated to create this multiplication which is the correct thing because in a forward pass it also gets replicated in just the same way. So in fact we don't need the brackets here, we're done. And the shapes are already correct. And finally for the bias very similar this bias here is very very similar to the bias we saw in the linear layer and we see that the gradients from h preact will simply flow into the biases and add up because these are just offsets. And so basically we want this to be dh preact but it needs to sum along the right dimension and in this case similar to the gain we need to sum across the 0th dimension, the examples because of the way that the bias gets replicated vertically and we also want to have keep them as true. And so this will basically take this and sum it up and give us a 1 by 64. So this is the candidate implementation it makes all the shapes work let me bring it up down here and then let me uncomment these 3 lines to check that we are getting the correct result for all the 3 tensors and indeed we see that all of that got backpropagated correctly. So now we get to the batchnorm layer we see how here bngain and bmbias are the primers so the backpropagation ends but bnraw now is the output of the standardization so here what I'm doing of course is I'm breaking up the batchnorm into manageable pieces so we can backpropagate through each line individually but basically what's happening is bnmeani is the sum so this is the bnmeani I apologize for the variable naming bndiff is x minus mu bndiff2 is x minus mu squared here inside the variance bnvar is the variance so sigma square this is bnvar and it's basically the sum of squares so this is the x minus mu squared and then the sum now you'll notice one departure here here it is normalized as 1 over m which is the number of examples here I'm normalizing as 1 over n minus 1 instead of m and this is deliberate and I'll come back to that in a bit when we are at this line it is something called the Bessel's correction but this is how I want it in our case bnvar inv then becomes basically bnvar plus epsilon epsilon is 1 negative 5 and then it's 1 over square root is the same as raising to the power of negative 0.5 because 0.5 is square root and then negative makes it 1 over square root so bnvar inv is 1 over this denominator here and then we can see that bnraw which is the x hat here is equal to the bndiff the numerator multiplied by the bnvar inv and this line here that creates hpreact was the last piece we've already backpropagated through it so now what we want to do is we are here and we have bnraw and we have to first backpropagate into bndiff and bnvar inv so now we are here and we have dbnraw and we need to backpropagate through this line now I've written out the shapes here and indeed bnvar inv is a shape 1 by 64 so there is a little bit of broadcasting happening here that we have to be careful with but it is just an elementwise simple multiplication by now we should be pretty comfortable with that to get dbndiff we know that this is just bnvar inv multiplied with dbnraw and conversely to get dbnvar inv we need to take bndiff and multiply that by dbnraw so this is the candidate but of course we need to make sure that broadcasting is obeyed so in particular bnvar inv multiplying with dbnraw will be okay and give us 32 by 64 as we expect but dbnvar inv would be taking a 32 by 64 multiplying it by 32 by 64 so this is a 32 by 64 but of course this bnvar inv is only 1 by 64 so this second line here needs a sum across the examples and because there's this dimension here we need to make sure that keep them is true so this is the candidate let's erase this and let's swing down here and implement it and then let's comment out dbnvar inv and dbndiff now we'll actually notice that dbndiff by the way is going to be incorrect so when I run this bnvar inv is correct bndiff is not correct and this is actually expected because we're not done with bndiff so in particular when we slide here we see here that bnraw is a function of bndiff but actually bnvar inv is a function of bnvar which is a function of bndiff too which is a function of bndiff so it comes here so bdndiff these variable names are crazy I'm sorry it branches out into two branches we've only done one branch of it we have to continue our backpropagation and eventually come back to bndiff and then we'll be able to do a plus equals and get the actual correct gradient for now it is good to verify that cmp also works it doesn't just lie to us and tell us that everything is always correct it can in fact detect when your gradient is not correct so that's good to see as well okay so now we have the derivative here and we're trying to backpropagate through this line and because we're raising to a power of negative 0.5 I brought up the power rule and we see that basically we have that the bnvar will now be we bring down the exponent so negative 0.5 times x which is this and now raised to the power of negative 0.5 minus 1 which is negative 1.5 now we would have to also apply a small chain rule here in our head because we need to take further the derivative of bnvar with respect to this expression here inside the bracket but because this is an element-wise operation everything is fairly simple that's just one and so there's nothing to do there so this is the local derivative and then times the global derivative to create the chain rule this is just times the bnvar so this is our candidate let me bring this down and uncomment the check and we see that we have the correct result now before we backpropagate through the next line I want to briefly talk about the node here where I'm using the Bessel's correction which is 1 over n minus 1 instead of dividing by n when I normalize here the sum of squares now you'll notice that this is a departure from the paper which uses 1 over n instead not 1 over n minus 1 there m is rn so it turns out that there are two ways of estimating variance of an array one is the biased estimate which is 1 over n and the other one is the unbiased estimate which is 1 over n minus 1 now confusingly in the paper it's not very clearly described and also it's a detail that kind of matters I think we are using the biased version at training time but later when they are talking about the inference they are mentioning that when they do the inference they are using the unbiased estimate which is the n minus 1 version in basically for inference and to calibrate the running mean and the running variance basically and so they actually introduce a train test mismatch where in training they use the biased version and in test time they use the unbiased version I find this extremely confusing you can read more about the Bessel's correction and why dividing by n minus 1 gives you a better estimate of the variance in the case where you have population sizes or samples from a population that are very small and that is indeed the case for us because we are dealing with mini-matches and these mini-matches are a small sample of a larger population which is the entire training set and it turns out that if you just estimate it using 1 over n that actually almost always underestimates the variance and it is a biased estimator and it is advised that you use the unbiased version and divide by n minus 1 and you can go through this article here that I liked that actually describes the fall of reasoning and I'll link it in the video description now when you calculate the torshta variance you'll notice that they take the unbiased flag whether or not you want to divide by n or n minus 1 so the default is for unbiased but I believe unbiased by default is true I'm not sure why the docs here don't cite that now in the batch norm 1 , the documentation again is kind of wrong and confusing it says that the standard deviation is calculated via the biased estimator but this is actually not exactly right and people have pointed out that it is not right in a number of issues since then because actually the rabbit hole is deeper and they follow the paper exactly and they use the biased version for training but when they're estimating the running standard deviation they are using the unbiased version so again there's the train test mismatch so long story short I'm not a fan of train test discrepancies I basically kind of consider the fact that we use the biased version the training time and the unbiased test time I basically consider this to be a bug and I don't think that there's a good reason for that it's not really they don't really go into the detail of the reasoning behind it in this paper I basically prefer to use the Bessel's correction in my own work unfortunately BatchNorm does not take a keyword argument that tells you whether or not you want to use the unbiased version or the biased version in both train and test and so therefore anyone using BatchNormalization basically in my view has a bit of a bug in the code and this turns out to be much less of a problem if your batch many batch sizes are a bit larger but still I just find it kind of unpalatable so maybe someone can explain why this is okay but for now I prefer to use the unbiased version consistently both during training and at test time and that's why I'm using 1 over n minus 1 here okay so let's now actually backpropagate through this line so the first thing that I always like to do is I like to scrutinize the shapes first so in particular here looking at the shapes of what's involved I see that bnvar shape is 1 by 64 so it's a row vector and bndiff2.shape is 32 by 64 so I can see that so clearly here we're doing a sum over the 0th axis to squash the first dimension of the shapes here using a sum so that right away actually hints to me that there will be some kind of a replication or broadcasting in the backward pass and maybe you're noticing the pattern here but basically any time you have a sum in the forward pass that turns into a replication or broadcasting in the backward pass along the same dimension and conversely when we have a replication or a broadcasting in the forward pass that indicates a variable reuse and so in the backward pass that turns into a sum over the exact same dimension and so hopefully you're noticing that duality that those two are kind of like the opposites of each other in the forward and backward pass now once we understand the shapes the next thing I like to do always is I like to look at a toy example in my head to sort of just like understand roughly how the variable dependencies go in the mathematical formula so here we have a two-dimensional array b and div 2 which we are scaling by a constant and then we are summing vertically over the columns so if we have a 2x2 matrix a and then we sum over the columns and scale we would get a row vector b1 b2 and b1 depends on a in this way where it's just sum that is scaled of a and b2 in this way where it's the second column summed and scaled and so looking at this basically what we want to do is we have the derivatives on b1 and b2 and we want to back propagate them into a's and so it's clear that just differentiating in your head the local derivative here is 1 over n-1 times 1 for each one of these a's and basically the derivative of b1 has to flow through the columns of a scaled by 1 over n-1 and that's roughly what's happening here so intuitively the derivative flow tells us that db and df2 will be the local derivative of this operation and there are many ways to do this by the way but I like to do something like this torch dot ones like of b and df2 so I'll create a large array two dimensional of ones and then I will scale it so 1.0 divided by n-1 so this is an array of 1 over n-1 and that's sort of like the local derivative and now for the chain rule I will simply just multiply it by db and var and notice here what's going to happen this is 32 by 64 and this is just 1 by 64 so I'm letting the broadcasting do the replication because internally in pytorch basically db and var which is 1 by 64 row vector will in this multiplication get copied vertically until the two are of the same shape and then there will be an elementwise multiply so the broadcasting is basically doing the replication and I will end up with the derivatives of db and df2 here so this is the candidate solution let's bring it down here let's uncomment this line where we check it and let's hope for the best and indeed we see that this is the correct formula next up let's differentiate here into b and df so here we have that b and df is elementwise squared to create b and df2 so this is a relatively simple derivative because it's a simple elementwise operation so it's kind of like the scalar case and we have that db and df should be if this is x squared then the derivative of this is 2x so it's simply 2 times b and df that's the local derivative and then times chain rule and the shape of these is the same they are of the same shape so times this so that's the backward pass for this variable let me bring it down here I've already calculated db and df so this is just the end of the other branch coming back to b and df because b and df were already back propagated to way over here from b and raw so we now completed the second branch and so that's why I have to do plus equals and if you recall we had an incorrect derivative for b and df before and I'm hoping that once we append this last missing piece we have the exact correctness so let's run and b and df now actually shows the exact correct derivative so that's comforting okay so let's now back propagate through this line here the first thing we do of course is we check the shapes and I wrote them out here and basically the shape of this is 32 by 64 h pre bn is the same shape but b and mean i is a row vector 1 by 64 so this minus here will actually do broadcasting and so we have to be careful with that again because of the duality a broadcasting in the forward pass means a variable reuse and therefore there will be a sum in the backward pass so let's write out the backward pass here now back propagate into the h pre bn because these are the same shape then the local derivative for each one of the elements here is just 1 for the corresponding element in here so basically what this means is that the gradient just simply copies it's just a variable assignment so I'm just going to clone this tensor just for safety to create an exact copy of db and diff and then here to back propagate into this one what I'm inclined to do here is d bn mean i will basically be what is the local derivative well it's negative torch.once like of the shape of b and diff right so and then times the derivative here db and diff and this here is the back propagation for the replicated b and mean i so I still have to back propagate through the replication in the broadcasting and I do that by doing a sum so I'm going to take this whole thing and I'm going to do a sum and I'm going to do a replication so if you scrutinize this by the way you'll notice that this is the same shape as that and so what I'm doing what I'm doing here doesn't actually make that much sense because it's just a array of ones multiplying db and diff so in fact I can just do this and that is equivalent so this is the candidate backward pass let me copy it here let me comment out this one and this one enter and it's wrong damn actually sorry this is supposed to be wrong and it's supposed to be wrong because we are back propagating from b and diff into h pre bn but we're not done because b and mean i depends on h pre bn and there will be a second portion of that derivative coming from this second branch but we're not done yet and we expect it to be incorrect so there you go so let's now back propagate from b and mean i into h pre bn and so here again we have to be careful because there's a broadcasting along or there's a sum along the 0th dimension so this will turn into broadcasting in the backward pass now and I'm going to go a little bit faster on this line because it is very similar to the line that we had before multiple lines in the past in fact so d h pre bn will be the gradient will be scaled by 1 over n and then basically this gradient here on d bn mean i is going to be scaled by 1 over n and then it's going to flow across all the columns and deposit itself into d h pre bn so what we want is this thing scaled by 1 over n let me put the constant up front here so scale down the gradient and we need to replicate it across all the across all the rows here so I like to do that by torch dot once like of basically h pre bn and I will let broadcasting do the work of replication so A like that so this is d h pre bn and hopefully we can plus equals that so this here is broadcasting and then this is the scaling so this should be correct okay so that completes the backpropagation let's backpropagate through the linear layer 1 here now because everything is getting a little vertically crazy I copy pasted the line here and let's just backpropagate through this one line so first of course we inspect the shapes and we see that this is 32 by 64 mcat is 32 by 30 w1 is 30 by 64 and b1 is just 64 so as I mentioned backpropagating through linear layers is fairly easy just by matching the shapes so let's do that we have that d mcat should be some matrix multiplication of d h pre bn with w1 and 1 transpose thrown in there so to make mcat be 32 by 30 I need to take d h pre bn 32 by 64 and multiply it by w1 dot transpose ... to get d w1 I need to end up with 30 by 64 so to get that I need to take mcat transpose ... and multiply that by d h pre bn ... and finally to get d b1 this is an addition and we saw that basically I need to just sum the elements in d h pre bn along some dimensions and to make the dimensions work out I need to sum along the 0th axis here to eliminate this dimension and we do not keep dims so that we want to just get a single one-dimensional vector of 64 so these are the claimed derivatives let me put that here and let me uncomment three lines and cross our fingers everything is great okay so we now continue almost there we have the derivative of mcat and we want to backpropagate into mb so I again copied this line over here so this is the forward pass and then this is the shapes so remember that the shape here was 32 by 30 and the original shape of mb was 32 by 3 by 10 so this layer in the forward pass as you recall did the concatenation of these three 10-dimensional character vectors and so now we just want to undo that so this is actually a relatively simple iteration because the backward pass of the what is the view? view is just a representation of the array it's just a logical form of how you interpret the array so let's just reinterpret it to be what it was before so in other words dmb is not 32 by 30 it is basically dmpcat but if you view it as the original shape so just m.shape you can pass and tuple into view and so this should just be okay we just re-represent that view and then we uncomment this line here and hopefully yeah, so the derivative of m is correct so in this case we just have to re-represent the shape of those derivatives into the original view so now we are at the final line and the only thing that's left to backpropagate through is this indexing operation here m is c at xb or I copy pasted this line here and let's look at the shapes of everything that's involved and remind ourselves how this worked so m.shape was 32 by 3 by 10 so it's 32 examples and then we have 3 characters each one of them has a 10 dimensional embedding and this was achieved by taking the lookup table c which have 27 possible characters each of them 10 dimensional and we looked up at the rows that were specified inside this tensor xb so xb is 32 by 3 and it's basically giving us for each example the identity or the index of which character is part of that example and so here I'm showing the first 5 rows of this tensor xb and so we can see that for example here it was the first example in this batch is that the first character and the first character and the fourth character comes into the neural net and then we want to predict the next character in the sequence after the character is 114 so basically what's happening here is there are integers inside xb and each one of these integers is specifying which row of c we want to pluck out right and then we arrange those rows that we've plucked out into 32 by 3 by 10 tensor and we just package them in we just package them into this tensor and now what's happening is that we have dimp so for every one of these basically plucked out rows we have their gradients now but they're arranged inside this 32 by 3 by 10 tensor so all we have to do now is we just need to route this gradient backwards through this assignment so we need to find which row of c that every one of these 10 dimensional embeddings come from and then we need to deposit them into dc so we just need to undo the indexing and of course if any of these rows of c were used multiple times which almost certainly is the case like the row 1 and 1 was used multiple times then we have to remember that the gradients that arrive there have to add so for each occurrence we have to have an addition so let's now write this out and I don't actually know of like a much better way to do this than a for loop unfortunately in python so maybe someone can come up with a vectorized efficient operation but for now let's just use for loops so let me create torch.zeros like c and I'm going to utilize just a 27 by 10 tensor of all zeros and then honestly for k in range xb.shape at 0 maybe someone has a better way to do this but for j in range xb.shape at 1 this is going to iterate over all the elements of xb all these integers and then let's get the index at this position so the index is basically the value of xb and then let's get the index at this position which is basically xb at kj so an example of that is 11 or 14 and so on and now in a forward pass we took we basically took um the row of c at index and we deposited it into emb at k at j that's what happened that's where they are packaged so now we need to go backwards and we just need to route deemb at the position kj we now have these derivatives for each position and it's 10 dimensional and you just need to go into the correct row of c so dc rather at ix is this but plus equals because there could be multiple occurrences like the same row could have been used many many times and so all those derivatives will just go backwards through the indexing and they will add so this is my candidate solution let's copy it here let's uncomment this and cross our fingers yay so that's it we've backpropagated through this entire beast so there we go totally makes sense so now we come to exercise 2 it basically turns out that in this first exercise we were doing way too much work we were backpropagating way too much and it was all good practice and so on but it's not what you would do in practice and the reason for that is for example here I separated out this loss calculation over multiple lines and I broke it up all to like its smallest atomic pieces and we backpropagated through all of those individually but it turns out that if you just look at the mathematical expression for the loss then actually you can do the differentiation on pen and paper and a lot of terms cancel and simplify and the mathematical expression you end up with is significantly shorter and easier to implement than backpropagating through all the little pieces of everything you've done so before we had this complicated forward pass going from logits to the loss but in pytorch everything can just be glued together into a single call at that cross entropy you just pass in logits and the labels and you get the exact same loss as I verify here so our previous loss and the fast loss coming from the chunk of operations as a single mathematical expression is much faster than the backward pass it's also much much faster in backward pass and the reason for that is if you just look at the mathematical form of this and differentiate again you will end up with a very small and short expression so that's what we want to do here we want to in a single operation or in a single go or like very quickly go directly into dlogits and we need to implement dlogits as a function of logits and yb's but it will be significantly shorter than whatever we did here where to get to dlogits we need to go all the way here so all of this work can be skipped in a much much simpler mathematical expression that you can implement here so you can give it a shot yourself basically look at what exactly is the mathematical expression of loss and differentiate with respect to the logits so let me show you a hint you can of course try it fully yourself but if not I can give you some hint of how to get started mathematically so basically what's happening here is we have logits then there's the softmax that takes the logits and gives you probabilities then we are using the identity of the correct next character to pluck out a row of probabilities take the negative log of it to get our negative log probability and then we average up all the log probabilities or negative log probabilities to get our loss so basically what we have is for a single individual example we have that loss is equal to where p here is kind of like thought of as a vector of all the probabilities so at the yth position where y is the label and we have that p here of course is the softmax so the ith component of p of this probability vector is just the softmax function so raising all the logits basically to the power of e and normalizing so everything sums to one now if you write out this expression here you can just write out the softmax and then basically what we're interested in is we're interested in the derivative of the loss with respect to the ith logit and so basically it's a d by dLi of this expression here where we have l indexed with the specific label y and on the bottom we have a sum over j of e to the lj and the negative log of all that so potentially give it a shot pen and paper and see if you can actually derive the expression for the loss by dLi and to implement it here okay so I'm going to give away the result here so this is some of the math I did to derive the gradients analytically and so we see here that I'm just applying the rules of calculus from your first or second year of bachelor's degree if you took it and we see that the expressions actually simplify quite a bit you have to separate out the analysis in the case where the ith index that you're interested in inside logits is either equal to the label or it's not equal to the label in a slightly different way and what we end up with is something very very simple we either end up with basically p at i where p is again this vector of probabilities after a softmax or p at i minus one where we just simply subtract a one but in any case we just need to calculate the softmax p and then in the correct dimension we need to subtract a one and that's the gradient the form that it takes analytically so let's implement this basically but here we are working with batches of examples so we have to be careful of that and then the loss for a batch is the average loss over all the examples so in other words is the example for all the individual examples is the loss for each individual example summed up and then divided by n and we have to backpropagate through that as well and be careful with it so dlogits is going to be f dot softmax pytorch has a softmax function that you can call and we want to apply the softmax on the logits and we want to go in the dimension that is one so basically we want to do the softmax along the rows of these logits then at the correct positions we need to subtract a one so dlogits at iterating over all the rows and indexing into the columns provided by the correct labels inside yb we need to subtract one and then finally it's the average loss that is the loss so in average there's a one over n of all the losses added up and so we need to also backpropagate through that division so the gradient has to be scaled down by n as well because of the mean but this otherwise should be the result so now if we verify this we see that we don't get an exact match but at the same time the maximum difference from logits from pytorch and rdlogits here is on the order of 5e-9 so it's a tiny tiny number so because of floating point wonkiness we don't get the exact bitwise result but we basically get the correct answer approximately now I'd like to pause here briefly before we move on to the next exercise because I'd like us to get an intuitive sense of what dlogits is because it has a beautiful and very simple explanation honestly so here I'm taking dlogits and I'm visualizing it and I see that we have a batch of 32 examples of 27 characters and what is dlogits intuitively? dlogits is the probabilities that the probabilities matrix in the forward pass but then here these black squares are the positions of the correct indices where we subtracted a 1 and so what is this doing? these are the derivatives on dlogits and so let's look at just the first row here so that's what I'm doing here I'm calculating the probabilities and then I'm taking just the first row and this is the probability row and then dlogits of the first row and multiplying by n just for us so that we don't have the scaling by n in here and everything is more interpretable we see that it's exactly equal to the probability of course but then the position of the correct index has a minus equals 1 so minus 1 on that position and so notice that if you take dlogits at 0 and you sum it it actually sums to 0 and so you should think of these gradients here at each cell as like a force we are going to be basically pulling down on the probabilities of the incorrect characters and we're going to be pulling up on the probability at the correct index and that's what's basically happening in each row and the amount of push and pull is exactly equalized because the sum is 0 and the amount to which we pull down on the probabilities and the amount that we push up on the probability of the correct character is equal so the repulsion and the attraction are equal and think of the neural net now as a massive pulley system or something like that we're up here on top of dlogits and we're pulling up we're pulling down the probabilities of incorrect and pulling up the probability of the correct and in this complicated pulley system we think of it as sort of like this tension translating to this complicating pulley mechanism and then eventually we get a tug on the weights and the biases and basically in each update we just kind of like tug in the direction that we'd like for each of these elements and the parameters are slowly given in to the tug and that's what training in neural net kind of like looks like on a high level and so I think the forces of push and pull in these gradients are actually very intuitive here we're pushing and pulling on the correct answer and the amount of force that we're applying is actually proportional to the probabilities that came out in the forward pass and so for example if our probabilities came out exactly correct so they would have had zero everywhere except for one at the correct position then the dlogits would be all a row of zeros for that example there would be no push and pull so the amount to which your prediction is incorrect is exactly the amount by which you're going to get a pull or a push in that dimension so if you have for example a very confidently mispredicted element here then what's going to happen is that element is going to be pulled down very heavily and the correct answer is going to be pulled up to the same amount and the other characters are not going to be influenced too much so the amount to which you mispredict is then proportional to the strength of the pull and that's happening independently in all the dimensions of this tensor and it's sort of very intuitive and very easy to think through and that's basically the magic of the cross entropy loss and what it's doing dynamically in the backward pass of the neural net so now we get to exercise number three which is a very fun exercise depending on your definition of fun and we are going to do for batch normalization exactly what we did for cross entropy loss in exercise number two that is we are going to consider it as a glued single mathematical expression and back propagate through it in a very efficient manner because we are going to derive a much simpler formula for the backward pass of batch normalization and we're going to do that using pen and paper so previously we've broken up batch normalization into all of the little intermediate pieces and all the atomic operations inside it and then we back propagated through it one by one now we just have a single sort of forward pass of a batch form and it's all glued together and we see that we get the exact same result as before now for the backward pass we'd like to also implement a single formula basically for back propagating through this entire operation that is the batch normalization so in the forward pass previously we took h pre bn the hidden states of the pre batch normalization and created h preact which is the hidden states just before the activation in the batch normalization paper h pre bn is x and h preact is y so in the backward pass what we'd like to do now is we have dh preact and we'd like to produce dh pre bn and we'd like to do that in a very efficient manner so that's the name of the game calculate dh pre bn given dh preact and for the purposes of this exercise we're going to ignore gamma and beta and their derivatives because they take on a very simple form in a very similar way to what we did up above so let's calculate this given that right here so to help you a little bit like I did before I started off the implementation here on pen and paper and I took two sheets of paper to derive the mathematical formulas for the backward pass so to solve the problem just write out the mu sigma square variance xi hat and yi exactly as in the paper except for the Bessel correction and then in the backward pass we have the derivative of the laws with respect to all the elements of y and remember that y is a vector there's multiple numbers here so we have all the derivatives with respect to all the y's and then there's a gamma and a beta and this is kind of like the compute graph the gamma and the beta there's the x hat and then the mu and the sigma square and the x so we have dl by dyi and we want dl by dxi for all the i's in these vectors so this is the compute graph and you have to be careful because I'm trying to note here that these are vectors there's many nodes here inside x x hat and y but mu and sigma sorry sigma square so you have to be careful with that you have to imagine there's multiple nodes here or you're going to get your math wrong so as an example I would suggest that you go in the following order one, two, three, four in terms of the back propagation so back propagate into x hat then into sigma square then into mu and then into x just like in a topological sort in micrograd we would go from right to left you're doing the exact same thing except you're doing it with symbols and on a piece of paper so for number one I'm not giving away too much if you want dl of dxi hat then we just take dl by dyi and multiply it by gamma because of this expression here where any individual yi is just gamma times xi hat plus beta so it didn't help you too much there but this gives you basically the derivatives for all the x hats and so now try to go through this computational graph and derive what is dl by d sigma square and then what is dl by d mu and then what is dl by dx eventually so give it a go and I'm going to be revealing the answer one piece at a time okay, so to get dl by d sigma square we have to remember again, like I mentioned that there are many x hats here and remember that sigma square is just a single individual number here so when we look at the expression for dl by d sigma square for dl by d sigma square we have that we have to actually consider all the possible paths that we basically have that there's many x hats and they all feed off from they all depend on sigma square so sigma square has a large fan out there's lots of arrows coming out from sigma square into all the x hats and then there's a back-replicating signal from each x hat into sigma square and that's why we actually need to sum over all those i's into 1 to m of the dl by dx hat which is the global gradient times the xi hat by d sigma square which is the local gradient of this operation here and then mathematically I'm just working it out here and I'm simplifying and you get a certain expression for dl by d sigma square and we're going to be using this expression when we back-propagate into mu and then eventually into x so now let's continue our back-propagation into mu which is dl by d mu now again be careful that mu influences x hat and x hat is actually lots of values so for example if our mini-batch size is 32 as it is in our example that we were working on then this is 32 numbers and 32 arrows going back to mu and then mu going to sigma square is just a single arrow because sigma square is a scalar so in total there are 33 arrows emanating from mu and then all of them have gradients coming into mu and they all need to be summed up and so that's why when we look at the expression for dl by d mu I'm summing up over all the gradients of dl by dx i hat times dx i hat by d mu so that's this arrow and that's 32 arrows here and then plus the one arrow from here which is dl by d sigma square times d sigma square by d mu so now we have to work out that expression and let me just reveal the rest of it simplifying here is not complicated the first term and you just get an expression here for the second term though there's something really interesting that happens when we look at d sigma square by d mu and we simplify at one point if we assume that in a special case where mu is actually the average of xi's as it is in this case then if we plug that in then actually the gradient vanishes and becomes exactly zero and that makes the entire second term cancel and so these if you have a mathematical expression like this and you look at d sigma square by d mu you would get some mathematical formula for how mu impacts sigma square but if it is the special case that mu is actually equal to the average as it is in the case of batch normalization that gradient will actually vanish and become zero so the whole term cancels and we just get a fairly straightforward expression here for dl by d mu okay and now we get to the craziest part which is deriving dl by d xi which is ultimately what we're after now let's count first of all how many numbers are there inside x as I mentioned there are 32 numbers there are 32 little xi's and let's count the number of arrows emanating from each xi there's an arrow going to mu an arrow going to sigma square and then there's an arrow going to x hat but this arrow here let's scrutinize that a little bit each xi hat is just a function of xi and all the other scalars so xi hat only depends on xi and all the other x's and so therefore there are actually in this single arrow there are 32 arrows but those 32 arrows are going exactly parallel they don't interfere they're just going parallel between x and x hat you can look at it that way and so how many arrows are emanating from each xi there are three arrows mu sigma square and the associated x hat and so in back propagation we now need to apply the chain rule and we need to add up those three contributions like if I just write that out we have we're going through we're chaining through mu sigma square and through x hat and those three terms are just here now we already have three of these we have dl by d xi hat we have dl by d mu which we derived here and we have dl by d sigma square which we derived here but we need three other terms here this one, this one, and this one so I invite you to try to derive them if you find it complicated you're just looking at these expressions here and differentiating with respect to xi so give it a shot but here's the result or at least what I got I'm just differentiating with respect to xi for all of these expressions and honestly I don't think there's anything too tricky here it's basic calculus now what gets a little bit more tricky is we are now going to plug everything together so all of these terms multiplied with all of these terms and added up according to this formula and that gets a little bit hairy so what ends up happening is you get a large expression and the thing to be very careful with here of course is we are working with a dl by d xi for a specific i here but when we are plugging in some of these terms like say this term here dl by d sigma squared you see how dl by d sigma squared I end up with an expression and I'm iterating over little i's here but I can't use i as the variable when I plug in here because this is a different i from this i this i here is just a placeholder like a local variable for a for loop in here so here when I plug that in you notice that I rename the i to a j because I need to make sure that this j is not this i this j is like a little local iterator over 32 terms and so you have to be careful with that when you are plugging in the expressions from here to here you may have to rename i's into j's but you have to be very careful what is actually an i with respect to dl by d xi so some of these are j's some of these are i's and then we simplify this expression and I guess like the big thing to notice here is a bunch of terms are just going to come out to the front and you can refactor them there is a sigma squared plus epsilon raised to the power of negative 3 over 2 this sigma squared plus epsilon can be actually separated out into 3 terms each of them are sigma squared plus epsilon raised to the power of negative 1 over 2 so the 3 of them multiplied is equal to this and then those 3 terms can go different places because of the multiplication so one of them actually comes out to the front and will end up here outside one of them joins up with this term and one of them joins up with this other term and then when you simplify the expression you will notice that some of these terms that are coming out are just the xi hats so you can simplify just by rewriting that and what we end up with at the end is a fairly simple mathematical expression over here that I cannot simplify further but basically you'll notice that it only uses the stuff we have and it derives the thing we need so we have dl by dy for all the i's and those are used plenty of times here and also in addition what we're using is these xi hats and xj hats and they just come from the forward pass and otherwise this is a simple expression and it gives us dl by d xi for all the i's and that's ultimately what we're interested in so that's the end of batch norm backward pass analytically let's now implement this final result okay so I implemented the expression into a single line of code here and you can see that the max diff is tiny so this is the correct implementation of this formula now I'll just basically tell you that getting this formula here from this mathematical expression was not trivial and there's a lot going on packed into this one formula and this is a whole exercise by itself because you have to consider the fact that this formula here is just for a single neuron and a batch of 32 examples but what I'm doing here is I'm actually we actually have 64 neurons and so this expression has to in parallel evaluate the batch norm backward pass for all of those 64 neurons in parallel and independently so this has to happen basically in every single column of the inputs here and in addition to that you see how there are a bunch of sums here and I want to make sure that when I do those sums that they broadcast correctly onto everything else that's here and so getting this expression is just like highly non-trivial and I invite you to basically look through it and step through it and it's a whole exercise to make sure that this checks out but once all the shapes agree and once you convince yourself that it's correct you can also verify that PyTorch gets the exact same answer as well and so that gives you a lot of peace of mind that this mathematical formula is correctly implemented here and broadcasted correctly and replicated in parallel for all of the 64 neurons inside this batch norm layer okay and finally exercise number 4 asks you to put it all together and here we have a redefinition of the entire problem so you see that we re-initialized the neural net from scratch and everything and then here instead of calling loss that backward we want to have the manual back propagation here as we derived it up above so go up copy paste all the chunks of code that we've already derived put them here and derive your own gradients and then optimize this model using this neural net basically using your own gradients all the way to the calibration of the batch norm and the evaluation of the loss and I was able to achieve quite a good loss basically the same loss you would achieve before and that shouldn't be surprising because all we've done is we've really got into loss that backward and we've pulled out all the code and inserted it here but those gradients are identical and everything is identical and the results are identical it's just that we have full visibility in this specific case okay and this is all of our code this is the full backward pass using basically the simplified backward pass for the cross entropy loss and the batch normalization so back propagating through cross entropy the second layer the 10H null linearity the batch normalization through the first layer and through the embedding and so you see that this is only maybe what is this 20 lines of code or something like that and that's what gives us gradients in this case loss that backward so the way I have the code set up is you should be able to run this entire cell once you fill this in and this will run for only 100 iterations and then break and it breaks because it gives you an opportunity to check your gradients against PyTorch so here our gradients we see are not exactly equal they are approximately equal and the differences are tiny one in negative nine or so and I don't exactly know where they're coming from to be honest but if I'm basically correct we can take out the gradient checking we can disable this breaking statement and then we can basically disable loss that backward we don't need it anymore feels amazing to say that and then here when we are doing the update we're not going to use p.grad this is the old way of PyTorch we don't have that anymore because we're not doing backward we are going to use this update I'm grading over I've arranged the grads to be in the same order as the parameters and I'm zipping them up the gradients and the parameters into p and grad and then here I'm going to step with just the grad that we derived manually so the last piece is that none of this now requires gradients from PyTorch and so one thing you can do here is you can do with torch.nograd and offset this whole code block and really what you're saying is you're telling PyTorch that hey I'm not going to call backward on any of this and this allows PyTorch to be a bit more efficient with all of it and then we should be able to just run this and it's running and you see that loss that backward is commented out and we're optimizing so we're going to leave this run and hopefully we get a good result okay so I allowed the neural net optimization then here I calibrate the BatchNorm parameters because I did not keep track of the running mean and variance in the training loop then here I ran the loss and you see that we actually obtained a pretty good loss very similar to what we've achieved before and then here I'm sampling from the model and we see some of the name-like gibberish that we're sort of used to so basically the model worked and samples pretty decent results compared to what we were used to so everything is the same but of course the big deal is that we did not use lots of backward we did not use PyTorch AutoGrad and we estimated our gradients ourselves by hand and so hopefully you're looking at this the backward pass of this neural net and you're thinking to yourself actually that's not too complicated each one of these layers is like three lines of code or something like that and most of it is fairly straightforward potentially with the notable exception of the BatchNormalization backward pass otherwise it's pretty good okay and that's everything I wanted to cover so hopefully you found this interesting and what I liked about it honestly is that it gave us a very nice diversity of layers to backpropagate through and I think it gives a pretty nice and comprehensive sense of how these backward passes are implemented and how they work and you'd be able to derive them yourself but of course in practice you probably don't want to and you want to use the PyTorch AutoGrad but hopefully you have some intuition about how gradients flow backwards through the neural net starting at the loss and how they flow through all the variables and if you understood a good chunk of it and if you have a sense of that then you can count yourself as one of these buff dojis on the left instead of the dojis on the right here now in the next lecture we're actually going to go to recurrent neural nets LSTMs and all the other variants of RNNs and we're going to start to complexify the architecture and start to achieve better log likelihoods and so I'm really looking forward to that and I'll see you then", "segments": [{"id": 0, "seek": 0, "start": 0.0, "end": 4.2, "text": " Hi everyone. So today we are once again continuing our implementation of MakeMore.", "tokens": [50365, 2421, 1518, 13, 407, 965, 321, 366, 1564, 797, 9289, 527, 11420, 295, 4387, 33986, 13, 50575], "temperature": 0.0, "avg_logprob": -0.13868363459307448, "compression_ratio": 1.728395061728395, "no_speech_prob": 0.02171696163713932}, {"id": 1, "seek": 0, "start": 4.98, "end": 10.56, "text": " Now so far we've come up to here, multilayer perceptrons, and our neural net looked like this,", "tokens": [50614, 823, 370, 1400, 321, 600, 808, 493, 281, 510, 11, 2120, 388, 11167, 43276, 13270, 11, 293, 527, 18161, 2533, 2956, 411, 341, 11, 50893], "temperature": 0.0, "avg_logprob": -0.13868363459307448, "compression_ratio": 1.728395061728395, "no_speech_prob": 0.02171696163713932}, {"id": 2, "seek": 0, "start": 10.8, "end": 14.88, "text": " and we were implementing this over the last few lectures. Now I'm sure everyone is very excited", "tokens": [50905, 293, 321, 645, 18114, 341, 670, 264, 1036, 1326, 16564, 13, 823, 286, 478, 988, 1518, 307, 588, 2919, 51109], "temperature": 0.0, "avg_logprob": -0.13868363459307448, "compression_ratio": 1.728395061728395, "no_speech_prob": 0.02171696163713932}, {"id": 3, "seek": 0, "start": 14.88, "end": 19.34, "text": " to go into recurrent neural networks and all of their variants and how they work, and the diagrams", "tokens": [51109, 281, 352, 666, 18680, 1753, 18161, 9590, 293, 439, 295, 641, 21669, 293, 577, 436, 589, 11, 293, 264, 36709, 51332], "temperature": 0.0, "avg_logprob": -0.13868363459307448, "compression_ratio": 1.728395061728395, "no_speech_prob": 0.02171696163713932}, {"id": 4, "seek": 0, "start": 19.34, "end": 22.18, "text": " look cool and it's very exciting and interesting, and we're going to get a better result.", "tokens": [51332, 574, 1627, 293, 309, 311, 588, 4670, 293, 1880, 11, 293, 321, 434, 516, 281, 483, 257, 1101, 1874, 13, 51474], "temperature": 0.0, "avg_logprob": -0.13868363459307448, "compression_ratio": 1.728395061728395, "no_speech_prob": 0.02171696163713932}, {"id": 5, "seek": 0, "start": 22.76, "end": 28.400000000000002, "text": " But unfortunately I think we have to remain here for one more lecture. And the reason for that is", "tokens": [51503, 583, 7015, 286, 519, 321, 362, 281, 6222, 510, 337, 472, 544, 7991, 13, 400, 264, 1778, 337, 300, 307, 51785], "temperature": 0.0, "avg_logprob": -0.13868363459307448, "compression_ratio": 1.728395061728395, "no_speech_prob": 0.02171696163713932}, {"id": 6, "seek": 2840, "start": 28.4, "end": 32.3, "text": " we've already trained this multilayer perceptron, right, and we are getting pretty good loss,", "tokens": [50365, 321, 600, 1217, 8895, 341, 2120, 388, 11167, 43276, 2044, 11, 558, 11, 293, 321, 366, 1242, 1238, 665, 4470, 11, 50560], "temperature": 0.0, "avg_logprob": -0.05792852599045326, "compression_ratio": 1.758513931888545, "no_speech_prob": 3.56494274456054e-05}, {"id": 7, "seek": 2840, "start": 32.519999999999996, "end": 35.879999999999995, "text": " and I think we have a pretty decent understanding of the architecture and how it works.", "tokens": [50571, 293, 286, 519, 321, 362, 257, 1238, 8681, 3701, 295, 264, 9482, 293, 577, 309, 1985, 13, 50739], "temperature": 0.0, "avg_logprob": -0.05792852599045326, "compression_ratio": 1.758513931888545, "no_speech_prob": 3.56494274456054e-05}, {"id": 8, "seek": 2840, "start": 36.36, "end": 42.239999999999995, "text": " But the line of code here that I take an issue with is here, loss.backward. That is, we are", "tokens": [50763, 583, 264, 1622, 295, 3089, 510, 300, 286, 747, 364, 2734, 365, 307, 510, 11, 4470, 13, 3207, 1007, 13, 663, 307, 11, 321, 366, 51057], "temperature": 0.0, "avg_logprob": -0.05792852599045326, "compression_ratio": 1.758513931888545, "no_speech_prob": 3.56494274456054e-05}, {"id": 9, "seek": 2840, "start": 42.239999999999995, "end": 48.08, "text": " taking PyTorch autograd and using it to calculate all of our gradients along the way. And I would", "tokens": [51057, 1940, 9953, 51, 284, 339, 1476, 664, 6206, 293, 1228, 309, 281, 8873, 439, 295, 527, 2771, 2448, 2051, 264, 636, 13, 400, 286, 576, 51349], "temperature": 0.0, "avg_logprob": -0.05792852599045326, "compression_ratio": 1.758513931888545, "no_speech_prob": 3.56494274456054e-05}, {"id": 10, "seek": 2840, "start": 48.08, "end": 52.58, "text": " like to remove the use of loss.backward, and I would like us to write our backward pass manually", "tokens": [51349, 411, 281, 4159, 264, 764, 295, 4470, 13, 3207, 1007, 11, 293, 286, 576, 411, 505, 281, 2464, 527, 23897, 1320, 16945, 51574], "temperature": 0.0, "avg_logprob": -0.05792852599045326, "compression_ratio": 1.758513931888545, "no_speech_prob": 3.56494274456054e-05}, {"id": 11, "seek": 2840, "start": 52.58, "end": 57.94, "text": " on the level of tensors. And I think that this is a very useful exercise for the following reasons.", "tokens": [51574, 322, 264, 1496, 295, 10688, 830, 13, 400, 286, 519, 300, 341, 307, 257, 588, 4420, 5380, 337, 264, 3480, 4112, 13, 51842], "temperature": 0.0, "avg_logprob": -0.05792852599045326, "compression_ratio": 1.758513931888545, "no_speech_prob": 3.56494274456054e-05}, {"id": 12, "seek": 5840, "start": 58.4, "end": 64.03999999999999, "text": " I actually have an entire blog post on this topic, but I'd like to call backpropagation a leaky", "tokens": [50365, 286, 767, 362, 364, 2302, 6968, 2183, 322, 341, 4829, 11, 457, 286, 1116, 411, 281, 818, 646, 79, 1513, 559, 399, 257, 476, 15681, 50647], "temperature": 0.2, "avg_logprob": -0.1156613506487946, "compression_ratio": 1.8471760797342194, "no_speech_prob": 0.006890374235808849}, {"id": 13, "seek": 5840, "start": 64.03999999999999, "end": 69.24, "text": " abstraction. And what I mean by that is backpropagation doesn't just make your neural", "tokens": [50647, 37765, 13, 400, 437, 286, 914, 538, 300, 307, 646, 79, 1513, 559, 399, 1177, 380, 445, 652, 428, 18161, 50907], "temperature": 0.2, "avg_logprob": -0.1156613506487946, "compression_ratio": 1.8471760797342194, "no_speech_prob": 0.006890374235808849}, {"id": 14, "seek": 5840, "start": 69.24, "end": 73.56, "text": " networks just work magically. It's not the case that you can just stack up arbitrary Lego blocks", "tokens": [50907, 9590, 445, 589, 39763, 13, 467, 311, 406, 264, 1389, 300, 291, 393, 445, 8630, 493, 23211, 28761, 8474, 51123], "temperature": 0.2, "avg_logprob": -0.1156613506487946, "compression_ratio": 1.8471760797342194, "no_speech_prob": 0.006890374235808849}, {"id": 15, "seek": 5840, "start": 73.56, "end": 77.82, "text": " of differentiable functions and just cross your fingers and backpropagate and everything is great.", "tokens": [51123, 295, 819, 9364, 6828, 293, 445, 3278, 428, 7350, 293, 646, 79, 1513, 559, 473, 293, 1203, 307, 869, 13, 51336], "temperature": 0.2, "avg_logprob": -0.1156613506487946, "compression_ratio": 1.8471760797342194, "no_speech_prob": 0.006890374235808849}, {"id": 16, "seek": 5840, "start": 78.75999999999999, "end": 82.66, "text": " Things don't just work automatically. It is a leaky abstraction in the sense that", "tokens": [51383, 9514, 500, 380, 445, 589, 6772, 13, 467, 307, 257, 476, 15681, 37765, 294, 264, 2020, 300, 51578], "temperature": 0.2, "avg_logprob": -0.1156613506487946, "compression_ratio": 1.8471760797342194, "no_speech_prob": 0.006890374235808849}, {"id": 17, "seek": 5840, "start": 82.66, "end": 88.14, "text": " you can shoot yourself in the foot if you do not understand its internals. It will magically not", "tokens": [51578, 291, 393, 3076, 1803, 294, 264, 2671, 498, 291, 360, 406, 1223, 1080, 2154, 1124, 13, 467, 486, 39763, 406, 51852], "temperature": 0.2, "avg_logprob": -0.1156613506487946, "compression_ratio": 1.8471760797342194, "no_speech_prob": 0.006890374235808849}, {"id": 18, "seek": 8840, "start": 88.4, "end": 93.38000000000001, "text": " work or not work optimally. And you will need to understand how it works under the hood if you're", "tokens": [50365, 589, 420, 406, 589, 5028, 379, 13, 400, 291, 486, 643, 281, 1223, 577, 309, 1985, 833, 264, 13376, 498, 291, 434, 50614], "temperature": 0.0, "avg_logprob": -0.10632562637329102, "compression_ratio": 1.8368055555555556, "no_speech_prob": 0.0004740937438327819}, {"id": 19, "seek": 8840, "start": 93.38000000000001, "end": 99.2, "text": " hoping to debug it and if you are hoping to address it in your neural net. So this blog post", "tokens": [50614, 7159, 281, 24083, 309, 293, 498, 291, 366, 7159, 281, 2985, 309, 294, 428, 18161, 2533, 13, 407, 341, 6968, 2183, 50905], "temperature": 0.0, "avg_logprob": -0.10632562637329102, "compression_ratio": 1.8368055555555556, "no_speech_prob": 0.0004740937438327819}, {"id": 20, "seek": 8840, "start": 99.2, "end": 103.58000000000001, "text": " here from a while ago goes into some of those examples. So for example, we've already covered", "tokens": [50905, 510, 490, 257, 1339, 2057, 1709, 666, 512, 295, 729, 5110, 13, 407, 337, 1365, 11, 321, 600, 1217, 5343, 51124], "temperature": 0.0, "avg_logprob": -0.10632562637329102, "compression_ratio": 1.8368055555555556, "no_speech_prob": 0.0004740937438327819}, {"id": 21, "seek": 8840, "start": 103.58000000000001, "end": 109.52000000000001, "text": " them, some of them already. For example, the flat tails of these functions and how you do not want", "tokens": [51124, 552, 11, 512, 295, 552, 1217, 13, 1171, 1365, 11, 264, 4962, 28537, 295, 613, 6828, 293, 577, 291, 360, 406, 528, 51421], "temperature": 0.0, "avg_logprob": -0.10632562637329102, "compression_ratio": 1.8368055555555556, "no_speech_prob": 0.0004740937438327819}, {"id": 22, "seek": 8840, "start": 109.52000000000001, "end": 114.98, "text": " to saturate them too much because your gradients will die. The case of dead neurons, which I've", "tokens": [51421, 281, 21160, 473, 552, 886, 709, 570, 428, 2771, 2448, 486, 978, 13, 440, 1389, 295, 3116, 22027, 11, 597, 286, 600, 51694], "temperature": 0.0, "avg_logprob": -0.10632562637329102, "compression_ratio": 1.8368055555555556, "no_speech_prob": 0.0004740937438327819}, {"id": 23, "seek": 8840, "start": 114.98, "end": 118.16000000000001, "text": " already covered as well. The case of exploding or", "tokens": [51694, 1217, 5343, 382, 731, 13, 440, 1389, 295, 35175, 420, 51853], "temperature": 0.0, "avg_logprob": -0.10632562637329102, "compression_ratio": 1.8368055555555556, "no_speech_prob": 0.0004740937438327819}, {"id": 24, "seek": 11840, "start": 118.4, "end": 121.88000000000001, "text": " exploding gradients in the case of recurring neural networks, which we are about to cover.", "tokens": [50365, 35175, 2771, 2448, 294, 264, 1389, 295, 32279, 18161, 9590, 11, 597, 321, 366, 466, 281, 2060, 13, 50539], "temperature": 0.0, "avg_logprob": -0.13447314097469015, "compression_ratio": 1.7981072555205047, "no_speech_prob": 0.0005857757641933858}, {"id": 25, "seek": 11840, "start": 122.84, "end": 128.78, "text": " And then also you will often come across some examples in the wild. This is a snippet that I", "tokens": [50587, 400, 550, 611, 291, 486, 2049, 808, 2108, 512, 5110, 294, 264, 4868, 13, 639, 307, 257, 35623, 302, 300, 286, 50884], "temperature": 0.0, "avg_logprob": -0.13447314097469015, "compression_ratio": 1.7981072555205047, "no_speech_prob": 0.0005857757641933858}, {"id": 26, "seek": 11840, "start": 128.78, "end": 134.24, "text": " found in a random code base on the internet where they actually have like a very subtle but pretty", "tokens": [50884, 1352, 294, 257, 4974, 3089, 3096, 322, 264, 4705, 689, 436, 767, 362, 411, 257, 588, 13743, 457, 1238, 51157], "temperature": 0.0, "avg_logprob": -0.13447314097469015, "compression_ratio": 1.7981072555205047, "no_speech_prob": 0.0005857757641933858}, {"id": 27, "seek": 11840, "start": 134.24, "end": 139.88, "text": " major bug in their implementation. And the bug points at the fact that the author of this code", "tokens": [51157, 2563, 7426, 294, 641, 11420, 13, 400, 264, 7426, 2793, 412, 264, 1186, 300, 264, 3793, 295, 341, 3089, 51439], "temperature": 0.0, "avg_logprob": -0.13447314097469015, "compression_ratio": 1.7981072555205047, "no_speech_prob": 0.0005857757641933858}, {"id": 28, "seek": 11840, "start": 139.88, "end": 143.42000000000002, "text": " does not actually understand backpropagation. So what they're trying to do here is they're trying", "tokens": [51439, 775, 406, 767, 1223, 646, 79, 1513, 559, 399, 13, 407, 437, 436, 434, 1382, 281, 360, 510, 307, 436, 434, 1382, 51616], "temperature": 0.0, "avg_logprob": -0.13447314097469015, "compression_ratio": 1.7981072555205047, "no_speech_prob": 0.0005857757641933858}, {"id": 29, "seek": 11840, "start": 143.42000000000002, "end": 148.04000000000002, "text": " to clip the loss at a certain maximum value. But actually what they're trying to do is they're", "tokens": [51616, 281, 7353, 264, 4470, 412, 257, 1629, 6674, 2158, 13, 583, 767, 437, 436, 434, 1382, 281, 360, 307, 436, 434, 51847], "temperature": 0.0, "avg_logprob": -0.13447314097469015, "compression_ratio": 1.7981072555205047, "no_speech_prob": 0.0005857757641933858}, {"id": 30, "seek": 14840, "start": 148.4, "end": 152.36, "text": " trying to clip the gradients to have a maximum value instead of trying to clip the loss at a", "tokens": [50365, 1382, 281, 7353, 264, 2771, 2448, 281, 362, 257, 6674, 2158, 2602, 295, 1382, 281, 7353, 264, 4470, 412, 257, 50563], "temperature": 0.8, "avg_logprob": -0.20969542651109294, "compression_ratio": 1.8409090909090908, "no_speech_prob": 0.00045174689148552716}, {"id": 31, "seek": 14840, "start": 152.36, "end": 158.24, "text": " maximum value. And indirectly, they're basically causing some of the outliers to be actually", "tokens": [50563, 6674, 2158, 13, 400, 37779, 11, 436, 434, 1936, 9853, 512, 295, 264, 484, 23646, 281, 312, 767, 50857], "temperature": 0.8, "avg_logprob": -0.20969542651109294, "compression_ratio": 1.8409090909090908, "no_speech_prob": 0.00045174689148552716}, {"id": 32, "seek": 14840, "start": 158.24, "end": 164.3, "text": " ignored. Because when you clip the loss of an outlier, you are setting its gradient to 0.", "tokens": [50857, 19735, 13, 1436, 562, 291, 7353, 264, 4470, 295, 364, 484, 2753, 11, 291, 366, 3287, 1080, 16235, 281, 1958, 13, 51160], "temperature": 0.8, "avg_logprob": -0.20969542651109294, "compression_ratio": 1.8409090909090908, "no_speech_prob": 0.00045174689148552716}, {"id": 33, "seek": 14840, "start": 164.84, "end": 169.76, "text": " And so have a look through this and read through it. But there's basically a bunch of subtle", "tokens": [51187, 400, 370, 362, 257, 574, 807, 341, 293, 1401, 807, 309, 13, 583, 456, 311, 1936, 257, 3840, 295, 13743, 51433], "temperature": 0.8, "avg_logprob": -0.20969542651109294, "compression_ratio": 1.8409090909090908, "no_speech_prob": 0.00045174689148552716}, {"id": 34, "seek": 14840, "start": 169.76, "end": 173.6, "text": " issues that you're going to avoid if you actually know what you're doing. And that's why I don't", "tokens": [51433, 2663, 300, 291, 434, 516, 281, 5042, 498, 291, 767, 458, 437, 291, 434, 884, 13, 400, 300, 311, 983, 286, 500, 380, 51625], "temperature": 0.8, "avg_logprob": -0.20969542651109294, "compression_ratio": 1.8409090909090908, "no_speech_prob": 0.00045174689148552716}, {"id": 35, "seek": 14840, "start": 173.6, "end": 178.38, "text": " think it's the case that because PyTorch or other frameworks offer autograd, it is okay for us to do.", "tokens": [51625, 519, 309, 311, 264, 1389, 300, 570, 9953, 51, 284, 339, 420, 661, 29834, 2626, 1476, 664, 6206, 11, 309, 307, 1392, 337, 505, 281, 360, 13, 51864], "temperature": 0.8, "avg_logprob": -0.20969542651109294, "compression_ratio": 1.8409090909090908, "no_speech_prob": 0.00045174689148552716}, {"id": 36, "seek": 17840, "start": 178.4, "end": 185.48000000000002, "text": " ignore how it works. Now, we've actually already covered autograd and we wrote micrograd, but", "tokens": [50365, 11200, 577, 309, 1985, 13, 823, 11, 321, 600, 767, 1217, 5343, 1476, 664, 6206, 293, 321, 4114, 4532, 7165, 11, 457, 50719], "temperature": 0.0, "avg_logprob": -0.054147979405921275, "compression_ratio": 1.708185053380783, "no_speech_prob": 0.02971663512289524}, {"id": 37, "seek": 17840, "start": 185.48000000000002, "end": 190.5, "text": " micrograd was an autograd engine only on the level of individual scalars. So the atoms were single", "tokens": [50719, 4532, 7165, 390, 364, 1476, 664, 6206, 2848, 787, 322, 264, 1496, 295, 2609, 15664, 685, 13, 407, 264, 16871, 645, 2167, 50970], "temperature": 0.0, "avg_logprob": -0.054147979405921275, "compression_ratio": 1.708185053380783, "no_speech_prob": 0.02971663512289524}, {"id": 38, "seek": 17840, "start": 190.5, "end": 195.0, "text": " individual numbers. And, you know, I don't think it's enough. And I'd like us to basically think", "tokens": [50970, 2609, 3547, 13, 400, 11, 291, 458, 11, 286, 500, 380, 519, 309, 311, 1547, 13, 400, 286, 1116, 411, 505, 281, 1936, 519, 51195], "temperature": 0.0, "avg_logprob": -0.054147979405921275, "compression_ratio": 1.708185053380783, "no_speech_prob": 0.02971663512289524}, {"id": 39, "seek": 17840, "start": 195.0, "end": 199.42000000000002, "text": " about backpropagation on the level of tensors as well. And so in a summary, I think it's a good", "tokens": [51195, 466, 646, 79, 1513, 559, 399, 322, 264, 1496, 295, 10688, 830, 382, 731, 13, 400, 370, 294, 257, 12691, 11, 286, 519, 309, 311, 257, 665, 51416], "temperature": 0.0, "avg_logprob": -0.054147979405921275, "compression_ratio": 1.708185053380783, "no_speech_prob": 0.02971663512289524}, {"id": 40, "seek": 17840, "start": 199.42000000000002, "end": 204.76, "text": " exercise. I think it is very, very valuable. You're going to become better at debugging neural", "tokens": [51416, 5380, 13, 286, 519, 309, 307, 588, 11, 588, 8263, 13, 509, 434, 516, 281, 1813, 1101, 412, 45592, 18161, 51683], "temperature": 0.0, "avg_logprob": -0.054147979405921275, "compression_ratio": 1.708185053380783, "no_speech_prob": 0.02971663512289524}, {"id": 41, "seek": 20476, "start": 204.76, "end": 209.17999999999998, "text": " networks and making sure that you understand what you're doing. It is going to make everything", "tokens": [50365, 9590, 293, 1455, 988, 300, 291, 1223, 437, 291, 434, 884, 13, 467, 307, 516, 281, 652, 1203, 50586], "temperature": 0.0, "avg_logprob": -0.039265250483303205, "compression_ratio": 1.7580174927113703, "no_speech_prob": 5.996247273287736e-05}, {"id": 42, "seek": 20476, "start": 209.17999999999998, "end": 213.45999999999998, "text": " fully explicit. So you're not going to be nervous about what is hidden away from you. And basically", "tokens": [50586, 4498, 13691, 13, 407, 291, 434, 406, 516, 281, 312, 6296, 466, 437, 307, 7633, 1314, 490, 291, 13, 400, 1936, 50800], "temperature": 0.0, "avg_logprob": -0.039265250483303205, "compression_ratio": 1.7580174927113703, "no_speech_prob": 5.996247273287736e-05}, {"id": 43, "seek": 20476, "start": 213.45999999999998, "end": 219.1, "text": " in general, we're going to emerge stronger. And so let's get into it. A bit of a fun historical note", "tokens": [50800, 294, 2674, 11, 321, 434, 516, 281, 21511, 7249, 13, 400, 370, 718, 311, 483, 666, 309, 13, 316, 857, 295, 257, 1019, 8584, 3637, 51082], "temperature": 0.0, "avg_logprob": -0.039265250483303205, "compression_ratio": 1.7580174927113703, "no_speech_prob": 5.996247273287736e-05}, {"id": 44, "seek": 20476, "start": 219.1, "end": 224.01999999999998, "text": " here is that today writing your backward pass by hand and manually is not recommended and no one", "tokens": [51082, 510, 307, 300, 965, 3579, 428, 23897, 1320, 538, 1011, 293, 16945, 307, 406, 9628, 293, 572, 472, 51328], "temperature": 0.0, "avg_logprob": -0.039265250483303205, "compression_ratio": 1.7580174927113703, "no_speech_prob": 5.996247273287736e-05}, {"id": 45, "seek": 20476, "start": 224.01999999999998, "end": 229.0, "text": " does it except for the purposes of exercise. But about 10 years ago in deep learning, this was", "tokens": [51328, 775, 309, 3993, 337, 264, 9932, 295, 5380, 13, 583, 466, 1266, 924, 2057, 294, 2452, 2539, 11, 341, 390, 51577], "temperature": 0.0, "avg_logprob": -0.039265250483303205, "compression_ratio": 1.7580174927113703, "no_speech_prob": 5.996247273287736e-05}, {"id": 46, "seek": 20476, "start": 229.0, "end": 233.73999999999998, "text": " fairly standard and in fact pervasive. So at the time, everyone used to write their backward pass", "tokens": [51577, 6457, 3832, 293, 294, 1186, 680, 39211, 13, 407, 412, 264, 565, 11, 1518, 1143, 281, 2464, 641, 23897, 1320, 51814], "temperature": 0.0, "avg_logprob": -0.039265250483303205, "compression_ratio": 1.7580174927113703, "no_speech_prob": 5.996247273287736e-05}, {"id": 47, "seek": 20476, "start": 233.73999999999998, "end": 234.6, "text": " by hand manually.", "tokens": [51814, 538, 1011, 16945, 13, 51857], "temperature": 0.0, "avg_logprob": -0.039265250483303205, "compression_ratio": 1.7580174927113703, "no_speech_prob": 5.996247273287736e-05}, {"id": 48, "seek": 23476, "start": 234.76, "end": 239.94, "text": " Including myself. And it's just what you would do. So we used to write backward pass by hand. And now", "tokens": [50365, 27137, 2059, 13, 400, 309, 311, 445, 437, 291, 576, 360, 13, 407, 321, 1143, 281, 2464, 23897, 1320, 538, 1011, 13, 400, 586, 50624], "temperature": 0.0, "avg_logprob": -0.2254780133565267, "compression_ratio": 1.5838926174496644, "no_speech_prob": 0.0003019081486854702}, {"id": 49, "seek": 23476, "start": 239.94, "end": 246.1, "text": " everyone just calls lost that backward. We've lost something. I want to give you a few examples of", "tokens": [50624, 1518, 445, 5498, 2731, 300, 23897, 13, 492, 600, 2731, 746, 13, 286, 528, 281, 976, 291, 257, 1326, 5110, 295, 50932], "temperature": 0.0, "avg_logprob": -0.2254780133565267, "compression_ratio": 1.5838926174496644, "no_speech_prob": 0.0003019081486854702}, {"id": 50, "seek": 23476, "start": 246.1, "end": 253.48, "text": " this. So here's a 2006 paper from Jeff Hinton and Ruslan Slakhtinov in science that was", "tokens": [50932, 341, 13, 407, 510, 311, 257, 14062, 3035, 490, 7506, 389, 12442, 293, 13155, 8658, 6187, 514, 357, 2982, 85, 294, 3497, 300, 390, 51301], "temperature": 0.0, "avg_logprob": -0.2254780133565267, "compression_ratio": 1.5838926174496644, "no_speech_prob": 0.0003019081486854702}, {"id": 51, "seek": 23476, "start": 253.48, "end": 258.8, "text": " influential at the time. And this was training some architectures called restricted Boltzmann", "tokens": [51301, 22215, 412, 264, 565, 13, 400, 341, 390, 3097, 512, 6331, 1303, 1219, 20608, 37884, 89, 14912, 51567], "temperature": 0.0, "avg_logprob": -0.2254780133565267, "compression_ratio": 1.5838926174496644, "no_speech_prob": 0.0003019081486854702}, {"id": 52, "seek": 23476, "start": 258.8, "end": 264.58, "text": " machines. And basically, it's an autoencoder trained here. And this is from roughly", "tokens": [51567, 8379, 13, 400, 1936, 11, 309, 311, 364, 8399, 22660, 19866, 8895, 510, 13, 400, 341, 307, 490, 9810, 51856], "temperature": 0.0, "avg_logprob": -0.2254780133565267, "compression_ratio": 1.5838926174496644, "no_speech_prob": 0.0003019081486854702}, {"id": 53, "seek": 23476, "start": 264.58, "end": 264.74, "text": " 2000.", "tokens": [51856, 8132, 13, 51864], "temperature": 0.0, "avg_logprob": -0.2254780133565267, "compression_ratio": 1.5838926174496644, "no_speech_prob": 0.0003019081486854702}, {"id": 54, "seek": 26476, "start": 264.76, "end": 270.09999999999997, "text": " In 2010, I had a library for training restricted Boltzmann machines. And this was at the time", "tokens": [50365, 682, 9657, 11, 286, 632, 257, 6405, 337, 3097, 20608, 37884, 89, 14912, 8379, 13, 400, 341, 390, 412, 264, 565, 50632], "temperature": 0.0, "avg_logprob": -0.13492572053949883, "compression_ratio": 1.7217125382262997, "no_speech_prob": 0.0014530746266245842}, {"id": 55, "seek": 26476, "start": 270.09999999999997, "end": 274.98, "text": " written in Matlab. So Python was not used for deep learning pervasively. It was all Matlab. And", "tokens": [50632, 3720, 294, 6789, 44990, 13, 407, 15329, 390, 406, 1143, 337, 2452, 2539, 680, 7967, 3413, 13, 467, 390, 439, 6789, 44990, 13, 400, 50876], "temperature": 0.0, "avg_logprob": -0.13492572053949883, "compression_ratio": 1.7217125382262997, "no_speech_prob": 0.0014530746266245842}, {"id": 56, "seek": 26476, "start": 274.98, "end": 281.06, "text": " Matlab was this scientific computing package that everyone would use. So we would write Matlab,", "tokens": [50876, 6789, 44990, 390, 341, 8134, 15866, 7372, 300, 1518, 576, 764, 13, 407, 321, 576, 2464, 6789, 44990, 11, 51180], "temperature": 0.0, "avg_logprob": -0.13492572053949883, "compression_ratio": 1.7217125382262997, "no_speech_prob": 0.0014530746266245842}, {"id": 57, "seek": 26476, "start": 281.06, "end": 286.74, "text": " which is barely a programming language as well. But it had a very convenient tensor class.", "tokens": [51180, 597, 307, 10268, 257, 9410, 2856, 382, 731, 13, 583, 309, 632, 257, 588, 10851, 40863, 1508, 13, 51464], "temperature": 0.0, "avg_logprob": -0.13492572053949883, "compression_ratio": 1.7217125382262997, "no_speech_prob": 0.0014530746266245842}, {"id": 58, "seek": 26476, "start": 286.74, "end": 290.42, "text": " And it was this computing environment and you would run here. It would all run on the CPU,", "tokens": [51464, 400, 309, 390, 341, 15866, 2823, 293, 291, 576, 1190, 510, 13, 467, 576, 439, 1190, 322, 264, 13199, 11, 51648], "temperature": 0.0, "avg_logprob": -0.13492572053949883, "compression_ratio": 1.7217125382262997, "no_speech_prob": 0.0014530746266245842}, {"id": 59, "seek": 26476, "start": 290.42, "end": 294.58, "text": " of course. But you would have very nice plots to go with it and a built-in debugger. And it was", "tokens": [51648, 295, 1164, 13, 583, 291, 576, 362, 588, 1481, 28609, 281, 352, 365, 309, 293, 257, 3094, 12, 259, 24083, 1321, 13, 400, 309, 390, 51856], "temperature": 0.0, "avg_logprob": -0.13492572053949883, "compression_ratio": 1.7217125382262997, "no_speech_prob": 0.0014530746266245842}, {"id": 60, "seek": 29458, "start": 294.58, "end": 301.08, "text": " pretty nice. Now, the code in this package in 2010 that I wrote for fitting restricted Boltzmann", "tokens": [50365, 1238, 1481, 13, 823, 11, 264, 3089, 294, 341, 7372, 294, 9657, 300, 286, 4114, 337, 15669, 20608, 37884, 89, 14912, 50690], "temperature": 0.2, "avg_logprob": -0.2608677770050479, "compression_ratio": 1.6579710144927535, "no_speech_prob": 0.0006326159345917404}, {"id": 61, "seek": 29458, "start": 301.08, "end": 306.28, "text": " machines to a large extent is recognizable. But I wanted to show you how you would... Well,", "tokens": [50690, 8379, 281, 257, 2416, 8396, 307, 40757, 13, 583, 286, 1415, 281, 855, 291, 577, 291, 576, 485, 1042, 11, 50950], "temperature": 0.2, "avg_logprob": -0.2608677770050479, "compression_ratio": 1.6579710144927535, "no_speech_prob": 0.0006326159345917404}, {"id": 62, "seek": 29458, "start": 306.28, "end": 311.88, "text": " I'm creating the data in the XY batches. I'm initializing the neural net. So it's got weights", "tokens": [50950, 286, 478, 4084, 264, 1412, 294, 264, 48826, 15245, 279, 13, 286, 478, 5883, 3319, 264, 18161, 2533, 13, 407, 309, 311, 658, 17443, 51230], "temperature": 0.2, "avg_logprob": -0.2608677770050479, "compression_ratio": 1.6579710144927535, "no_speech_prob": 0.0006326159345917404}, {"id": 63, "seek": 29458, "start": 311.88, "end": 316.44, "text": " and biases just like we're used to. And then this is the training loop where we actually do the", "tokens": [51230, 293, 32152, 445, 411, 321, 434, 1143, 281, 13, 400, 550, 341, 307, 264, 3097, 6367, 689, 321, 767, 360, 264, 51458], "temperature": 0.2, "avg_logprob": -0.2608677770050479, "compression_ratio": 1.6579710144927535, "no_speech_prob": 0.0006326159345917404}, {"id": 64, "seek": 29458, "start": 316.44, "end": 321.2, "text": " forward pass. And then here, at this time, they didn't even necessarily use back propagation to", "tokens": [51458, 2128, 1320, 13, 400, 550, 510, 11, 412, 341, 565, 11, 436, 994, 380, 754, 4725, 764, 646, 38377, 281, 51696], "temperature": 0.2, "avg_logprob": -0.2608677770050479, "compression_ratio": 1.6579710144927535, "no_speech_prob": 0.0006326159345917404}, {"id": 65, "seek": 29458, "start": 321.2, "end": 324.4, "text": " train neural networks. So this, in particular, implements a lot of the training that we're doing.", "tokens": [51696, 3847, 18161, 9590, 13, 407, 341, 11, 294, 1729, 11, 704, 17988, 257, 688, 295, 264, 3097, 300, 321, 434, 884, 13, 51856], "temperature": 0.2, "avg_logprob": -0.2608677770050479, "compression_ratio": 1.6579710144927535, "no_speech_prob": 0.0006326159345917404}, {"id": 66, "seek": 32440, "start": 324.4, "end": 329.91999999999996, "text": " It implements contrastive divergence, which estimates a gradient. And then here, we take", "tokens": [50365, 467, 704, 17988, 8712, 488, 47387, 11, 597, 20561, 257, 16235, 13, 400, 550, 510, 11, 321, 747, 50641], "temperature": 0.0, "avg_logprob": -0.23745745997275075, "compression_ratio": 1.6366666666666667, "no_speech_prob": 0.0007432288839481771}, {"id": 67, "seek": 32440, "start": 329.91999999999996, "end": 335.85999999999996, "text": " that gradient and use it for a parameter update along the lines that we're used to. Yeah, here.", "tokens": [50641, 300, 16235, 293, 764, 309, 337, 257, 13075, 5623, 2051, 264, 3876, 300, 321, 434, 1143, 281, 13, 865, 11, 510, 13, 50938], "temperature": 0.0, "avg_logprob": -0.23745745997275075, "compression_ratio": 1.6366666666666667, "no_speech_prob": 0.0007432288839481771}, {"id": 68, "seek": 32440, "start": 335.85999999999996, "end": 341.2, "text": " But you can see that basically people are meddling with these gradients directly and inline and", "tokens": [50938, 583, 291, 393, 536, 300, 1936, 561, 366, 1205, 35543, 365, 613, 2771, 2448, 3838, 293, 294, 1889, 293, 51205], "temperature": 0.0, "avg_logprob": -0.23745745997275075, "compression_ratio": 1.6366666666666667, "no_speech_prob": 0.0007432288839481771}, {"id": 69, "seek": 32440, "start": 341.2, "end": 345.64, "text": " themselves. It wasn't that common to use an autograd engine. Here's one more example from a", "tokens": [51205, 2969, 13, 467, 2067, 380, 300, 2689, 281, 764, 364, 1476, 664, 6206, 2848, 13, 1692, 311, 472, 544, 1365, 490, 257, 51427], "temperature": 0.0, "avg_logprob": -0.23745745997275075, "compression_ratio": 1.6366666666666667, "no_speech_prob": 0.0007432288839481771}, {"id": 70, "seek": 32440, "start": 345.64, "end": 351.64, "text": " paper of mine from 2014 called Deep Fragment Embeddings. And here, what I was doing is I was", "tokens": [51427, 3035, 295, 3892, 490, 8227, 1219, 14895, 479, 3731, 518, 24234, 292, 29432, 13, 400, 510, 11, 437, 286, 390, 884, 307, 286, 390, 51727], "temperature": 0.0, "avg_logprob": -0.23745745997275075, "compression_ratio": 1.6366666666666667, "no_speech_prob": 0.0007432288839481771}, {"id": 71, "seek": 32440, "start": 351.64, "end": 353.02, "text": " aligning images and text.", "tokens": [51727, 419, 9676, 5267, 293, 2487, 13, 51796], "temperature": 0.0, "avg_logprob": -0.23745745997275075, "compression_ratio": 1.6366666666666667, "no_speech_prob": 0.0007432288839481771}, {"id": 72, "seek": 35440, "start": 354.4, "end": 379.52, "text": " And here, I'm implementing the cost function. And it was standard to implement not just the cost,", "tokens": [50365, 400, 510, 11, 286, 478, 18114, 264, 2063, 2445, 13, 400, 309, 390, 3832, 281, 4445, 406, 445, 264, 2063, 11, 51621], "temperature": 0.0, "avg_logprob": -0.3556677011343149, "compression_ratio": 1.9574468085106382, "no_speech_prob": 0.0009897610871121287}, {"id": 73, "seek": 35440, "start": 379.52, "end": 384.2, "text": " but also the backward pass manually. So here, I'm calculating the image embeddings,", "tokens": [51621, 457, 611, 264, 23897, 1320, 16945, 13, 407, 510, 11, 286, 478, 28258, 264, 3256, 12240, 29432, 11, 51855], "temperature": 0.0, "avg_logprob": -0.3556677011343149, "compression_ratio": 1.9574468085106382, "no_speech_prob": 0.0009897610871121287}, {"id": 74, "seek": 35440, "start": 384.2, "end": 384.38, "text": " and I'm implementing the cost function. And here, I'm implementing the backward pass manually.", "tokens": [51855, 293, 286, 478, 18114, 264, 2063, 2445, 13, 400, 510, 11, 286, 478, 18114, 264, 23897, 1320, 16945, 13, 51864], "temperature": 0.0, "avg_logprob": -0.3556677011343149, "compression_ratio": 1.9574468085106382, "no_speech_prob": 0.0009897610871121287}, {"id": 75, "seek": 38440, "start": 384.4, "end": 391.26, "text": " Sentence embeddings, I calculate the scores. This is the loss function. And then once I have", "tokens": [50365, 23652, 655, 12240, 29432, 11, 286, 8873, 264, 13444, 13, 639, 307, 264, 4470, 2445, 13, 400, 550, 1564, 286, 362, 50708], "temperature": 0.0, "avg_logprob": -0.06716302332987312, "compression_ratio": 1.8704318936877076, "no_speech_prob": 0.0008388263522647321}, {"id": 76, "seek": 38440, "start": 391.26, "end": 395.82, "text": " the loss function, I do the backward pass right here. So I backward through the loss function", "tokens": [50708, 264, 4470, 2445, 11, 286, 360, 264, 23897, 1320, 558, 510, 13, 407, 286, 23897, 807, 264, 4470, 2445, 50936], "temperature": 0.0, "avg_logprob": -0.06716302332987312, "compression_ratio": 1.8704318936877076, "no_speech_prob": 0.0008388263522647321}, {"id": 77, "seek": 38440, "start": 395.82, "end": 401.47999999999996, "text": " and through the neural net, and I append regularization. So everything was done by hand", "tokens": [50936, 293, 807, 264, 18161, 2533, 11, 293, 286, 34116, 3890, 2144, 13, 407, 1203, 390, 1096, 538, 1011, 51219], "temperature": 0.0, "avg_logprob": -0.06716302332987312, "compression_ratio": 1.8704318936877076, "no_speech_prob": 0.0008388263522647321}, {"id": 78, "seek": 38440, "start": 401.47999999999996, "end": 405.0, "text": " manually, and you would just write out the backward pass. And then you would use a gradient", "tokens": [51219, 16945, 11, 293, 291, 576, 445, 2464, 484, 264, 23897, 1320, 13, 400, 550, 291, 576, 764, 257, 16235, 51395], "temperature": 0.0, "avg_logprob": -0.06716302332987312, "compression_ratio": 1.8704318936877076, "no_speech_prob": 0.0008388263522647321}, {"id": 79, "seek": 38440, "start": 405.0, "end": 409.34, "text": " checker to make sure that your numerical estimate of the gradient agrees with the one you calculated", "tokens": [51395, 1520, 260, 281, 652, 988, 300, 428, 29054, 12539, 295, 264, 16235, 26383, 365, 264, 472, 291, 15598, 51612], "temperature": 0.0, "avg_logprob": -0.06716302332987312, "compression_ratio": 1.8704318936877076, "no_speech_prob": 0.0008388263522647321}, {"id": 80, "seek": 38440, "start": 409.34, "end": 413.82, "text": " during back propagation. So this was very standard for a long time. But today, of course, it is", "tokens": [51612, 1830, 646, 38377, 13, 407, 341, 390, 588, 3832, 337, 257, 938, 565, 13, 583, 965, 11, 295, 1164, 11, 309, 307, 51836], "temperature": 0.0, "avg_logprob": -0.06716302332987312, "compression_ratio": 1.8704318936877076, "no_speech_prob": 0.0008388263522647321}, {"id": 81, "seek": 41382, "start": 413.82, "end": 419.18, "text": " standard to use an autograd engine. But it was definitely useful, and I think people sort of", "tokens": [50365, 3832, 281, 764, 364, 1476, 664, 6206, 2848, 13, 583, 309, 390, 2138, 4420, 11, 293, 286, 519, 561, 1333, 295, 50633], "temperature": 0.0, "avg_logprob": -0.046594131212293007, "compression_ratio": 1.7513661202185793, "no_speech_prob": 0.0012907267082482576}, {"id": 82, "seek": 41382, "start": 419.18, "end": 423.18, "text": " understood how these neural networks work on a very intuitive level. And so I think it's a good", "tokens": [50633, 7320, 577, 613, 18161, 9590, 589, 322, 257, 588, 21769, 1496, 13, 400, 370, 286, 519, 309, 311, 257, 665, 50833], "temperature": 0.0, "avg_logprob": -0.046594131212293007, "compression_ratio": 1.7513661202185793, "no_speech_prob": 0.0012907267082482576}, {"id": 83, "seek": 41382, "start": 423.18, "end": 427.02, "text": " exercise again, and this is where we want to be. Okay, so just as a reminder from our previous", "tokens": [50833, 5380, 797, 11, 293, 341, 307, 689, 321, 528, 281, 312, 13, 1033, 11, 370, 445, 382, 257, 13548, 490, 527, 3894, 51025], "temperature": 0.0, "avg_logprob": -0.046594131212293007, "compression_ratio": 1.7513661202185793, "no_speech_prob": 0.0012907267082482576}, {"id": 84, "seek": 41382, "start": 427.02, "end": 432.44, "text": " lecture, this is the Jupyter notebook that we implemented at the time. And we're going to keep", "tokens": [51025, 7991, 11, 341, 307, 264, 22125, 88, 391, 21060, 300, 321, 12270, 412, 264, 565, 13, 400, 321, 434, 516, 281, 1066, 51296], "temperature": 0.0, "avg_logprob": -0.046594131212293007, "compression_ratio": 1.7513661202185793, "no_speech_prob": 0.0012907267082482576}, {"id": 85, "seek": 41382, "start": 432.44, "end": 436.53999999999996, "text": " everything the same. So we're still going to have a two-layer multi-layer perceptron with a batch", "tokens": [51296, 1203, 264, 912, 13, 407, 321, 434, 920, 516, 281, 362, 257, 732, 12, 8376, 260, 4825, 12, 8376, 260, 43276, 2044, 365, 257, 15245, 51501], "temperature": 0.0, "avg_logprob": -0.046594131212293007, "compression_ratio": 1.7513661202185793, "no_speech_prob": 0.0012907267082482576}, {"id": 86, "seek": 41382, "start": 436.53999999999996, "end": 441.26, "text": " normalization layer. So the forward pass will be basically identical to this lecture. But here,", "tokens": [51501, 2710, 2144, 4583, 13, 407, 264, 2128, 1320, 486, 312, 1936, 14800, 281, 341, 7991, 13, 583, 510, 11, 51737], "temperature": 0.0, "avg_logprob": -0.046594131212293007, "compression_ratio": 1.7513661202185793, "no_speech_prob": 0.0012907267082482576}, {"id": 87, "seek": 41382, "start": 441.26, "end": 443.78, "text": " we're going to get rid of loss.backward. And instead, we're going to", "tokens": [51737, 321, 434, 516, 281, 483, 3973, 295, 4470, 13, 3207, 1007, 13, 400, 2602, 11, 321, 434, 516, 281, 51863], "temperature": 0.0, "avg_logprob": -0.046594131212293007, "compression_ratio": 1.7513661202185793, "no_speech_prob": 0.0012907267082482576}, {"id": 88, "seek": 44382, "start": 443.82, "end": 448.9, "text": " write the backward pass manually. Now, here's the starter code for this lecture. We are becoming", "tokens": [50365, 2464, 264, 23897, 1320, 16945, 13, 823, 11, 510, 311, 264, 22465, 3089, 337, 341, 7991, 13, 492, 366, 5617, 50619], "temperature": 0.0, "avg_logprob": -0.07046139902538723, "compression_ratio": 1.8918032786885246, "no_speech_prob": 0.0003213392337784171}, {"id": 89, "seek": 44382, "start": 448.9, "end": 455.02, "text": " a backprop ninja in this notebook. And the first few cells here are identical to what we are used", "tokens": [50619, 257, 646, 79, 1513, 31604, 294, 341, 21060, 13, 400, 264, 700, 1326, 5438, 510, 366, 14800, 281, 437, 321, 366, 1143, 50925], "temperature": 0.0, "avg_logprob": -0.07046139902538723, "compression_ratio": 1.8918032786885246, "no_speech_prob": 0.0003213392337784171}, {"id": 90, "seek": 44382, "start": 455.02, "end": 460.03999999999996, "text": " to. So we are doing some imports, loading in the data set, and processing the data set. None of", "tokens": [50925, 281, 13, 407, 321, 366, 884, 512, 41596, 11, 15114, 294, 264, 1412, 992, 11, 293, 9007, 264, 1412, 992, 13, 14492, 295, 51176], "temperature": 0.0, "avg_logprob": -0.07046139902538723, "compression_ratio": 1.8918032786885246, "no_speech_prob": 0.0003213392337784171}, {"id": 91, "seek": 44382, "start": 460.03999999999996, "end": 465.1, "text": " this changed. Now, here, I'm introducing a utility function that we're going to use later to compare", "tokens": [51176, 341, 3105, 13, 823, 11, 510, 11, 286, 478, 15424, 257, 14877, 2445, 300, 321, 434, 516, 281, 764, 1780, 281, 6794, 51429], "temperature": 0.0, "avg_logprob": -0.07046139902538723, "compression_ratio": 1.8918032786885246, "no_speech_prob": 0.0003213392337784171}, {"id": 92, "seek": 44382, "start": 465.1, "end": 469.02, "text": " the gradients. So in particular, we are going to have the gradients that we estimate manually", "tokens": [51429, 264, 2771, 2448, 13, 407, 294, 1729, 11, 321, 366, 516, 281, 362, 264, 2771, 2448, 300, 321, 12539, 16945, 51625], "temperature": 0.0, "avg_logprob": -0.07046139902538723, "compression_ratio": 1.8918032786885246, "no_speech_prob": 0.0003213392337784171}, {"id": 93, "seek": 44382, "start": 469.02, "end": 473.78, "text": " ourselves. And we're going to have gradients that PyTorch calculates. And we're going to be", "tokens": [51625, 4175, 13, 400, 321, 434, 516, 281, 362, 2771, 2448, 300, 9953, 51, 284, 339, 4322, 1024, 13, 400, 321, 434, 516, 281, 312, 51863], "temperature": 0.0, "avg_logprob": -0.07046139902538723, "compression_ratio": 1.8918032786885246, "no_speech_prob": 0.0003213392337784171}, {"id": 94, "seek": 47382, "start": 473.82, "end": 476.71999999999997, "text": " checking for correctness, assuming, of course, that PyTorch is correct.", "tokens": [50365, 8568, 337, 3006, 1287, 11, 11926, 11, 295, 1164, 11, 300, 9953, 51, 284, 339, 307, 3006, 13, 50510], "temperature": 0.0, "avg_logprob": -0.051689439540286715, "compression_ratio": 1.7870967741935484, "no_speech_prob": 0.0008276313310489058}, {"id": 95, "seek": 47382, "start": 478.65999999999997, "end": 483.9, "text": " Then here, we have the initialization that we are quite used to. So we have our embedding table for", "tokens": [50607, 1396, 510, 11, 321, 362, 264, 5883, 2144, 300, 321, 366, 1596, 1143, 281, 13, 407, 321, 362, 527, 12240, 3584, 3199, 337, 50869], "temperature": 0.0, "avg_logprob": -0.051689439540286715, "compression_ratio": 1.7870967741935484, "no_speech_prob": 0.0008276313310489058}, {"id": 96, "seek": 47382, "start": 483.9, "end": 488.86, "text": " the characters, the first layer, second layer, and a batch normalization in between. And here's", "tokens": [50869, 264, 4342, 11, 264, 700, 4583, 11, 1150, 4583, 11, 293, 257, 15245, 2710, 2144, 294, 1296, 13, 400, 510, 311, 51117], "temperature": 0.0, "avg_logprob": -0.051689439540286715, "compression_ratio": 1.7870967741935484, "no_speech_prob": 0.0008276313310489058}, {"id": 97, "seek": 47382, "start": 488.86, "end": 493.2, "text": " where we create all the parameters. Now, you will note that I changed the initialization a little", "tokens": [51117, 689, 321, 1884, 439, 264, 9834, 13, 823, 11, 291, 486, 3637, 300, 286, 3105, 264, 5883, 2144, 257, 707, 51334], "temperature": 0.0, "avg_logprob": -0.051689439540286715, "compression_ratio": 1.7870967741935484, "no_speech_prob": 0.0008276313310489058}, {"id": 98, "seek": 47382, "start": 493.2, "end": 498.6, "text": " bit to be small numbers. So normally, you would set the biases to be all zero. Here, I'm setting", "tokens": [51334, 857, 281, 312, 1359, 3547, 13, 407, 5646, 11, 291, 576, 992, 264, 32152, 281, 312, 439, 4018, 13, 1692, 11, 286, 478, 3287, 51604], "temperature": 0.0, "avg_logprob": -0.051689439540286715, "compression_ratio": 1.7870967741935484, "no_speech_prob": 0.0008276313310489058}, {"id": 99, "seek": 47382, "start": 498.6, "end": 503.8, "text": " them to be small random numbers. And I'm doing this because if your variables are all zero,", "tokens": [51604, 552, 281, 312, 1359, 4974, 3547, 13, 400, 286, 478, 884, 341, 570, 498, 428, 9102, 366, 439, 4018, 11, 51864], "temperature": 0.0, "avg_logprob": -0.051689439540286715, "compression_ratio": 1.7870967741935484, "no_speech_prob": 0.0008276313310489058}, {"id": 100, "seek": 50382, "start": 503.82, "end": 507.9, "text": " or initialized to exactly zero, sometimes what can happen is that can mask an incorrect", "tokens": [50365, 420, 5883, 1602, 281, 2293, 4018, 11, 2171, 437, 393, 1051, 307, 300, 393, 6094, 364, 18424, 50569], "temperature": 0.6000000000000001, "avg_logprob": -0.20702709871179917, "compression_ratio": 1.8210227272727273, "no_speech_prob": 0.0002850581076927483}, {"id": 101, "seek": 50382, "start": 507.9, "end": 513.2, "text": " implementation of a gradient. Because when everything is zero, it sort of like simplifies", "tokens": [50569, 11420, 295, 257, 16235, 13, 1436, 562, 1203, 307, 4018, 11, 309, 1333, 295, 411, 6883, 11221, 50834], "temperature": 0.6000000000000001, "avg_logprob": -0.20702709871179917, "compression_ratio": 1.8210227272727273, "no_speech_prob": 0.0002850581076927483}, {"id": 102, "seek": 50382, "start": 513.2, "end": 517.3199999999999, "text": " and gives you a much simpler expression of the gradient than you would otherwise get. And so by", "tokens": [50834, 293, 2709, 291, 257, 709, 18587, 6114, 295, 264, 16235, 813, 291, 576, 5911, 483, 13, 400, 370, 538, 51040], "temperature": 0.6000000000000001, "avg_logprob": -0.20702709871179917, "compression_ratio": 1.8210227272727273, "no_speech_prob": 0.0002850581076927483}, {"id": 103, "seek": 50382, "start": 517.3199999999999, "end": 522.12, "text": " making it small numbers, I'm trying to unmask those potential errors in these calculations.", "tokens": [51040, 1455, 309, 1359, 3547, 11, 286, 478, 1382, 281, 517, 3799, 74, 729, 3995, 13603, 294, 613, 20448, 13, 51280], "temperature": 0.6000000000000001, "avg_logprob": -0.20702709871179917, "compression_ratio": 1.8210227272727273, "no_speech_prob": 0.0002850581076927483}, {"id": 104, "seek": 50382, "start": 522.92, "end": 528.8199999999999, "text": " You also notice that I'm using b1 in the first layer. I'm using a bias despite batch", "tokens": [51320, 509, 611, 3449, 300, 286, 478, 1228, 272, 16, 294, 264, 700, 4583, 13, 286, 478, 1228, 257, 12577, 7228, 15245, 51615], "temperature": 0.6000000000000001, "avg_logprob": -0.20702709871179917, "compression_ratio": 1.8210227272727273, "no_speech_prob": 0.0002850581076927483}, {"id": 105, "seek": 50382, "start": 528.8199999999999, "end": 533.58, "text": " normalization right afterwards. So this would typically not be what you'd do because we talked", "tokens": [51615, 2710, 2144, 558, 10543, 13, 407, 341, 576, 5850, 406, 312, 437, 291, 1116, 360, 570, 321, 2825, 51853], "temperature": 0.6000000000000001, "avg_logprob": -0.20702709871179917, "compression_ratio": 1.8210227272727273, "no_speech_prob": 0.0002850581076927483}, {"id": 106, "seek": 50382, "start": 533.58, "end": 533.8, "text": " about the bias. So I'm going to mask the bias. And I'm going to mask the bias. And I'm going to", "tokens": [51853, 466, 264, 12577, 13, 407, 286, 478, 516, 281, 6094, 264, 12577, 13, 400, 286, 478, 516, 281, 6094, 264, 12577, 13, 400, 286, 478, 516, 281, 51864], "temperature": 0.6000000000000001, "avg_logprob": -0.20702709871179917, "compression_ratio": 1.8210227272727273, "no_speech_prob": 0.0002850581076927483}, {"id": 107, "seek": 53382, "start": 533.82, "end": 538.46, "text": " fact that you don't need a bias but i'm doing this here just for fun because we're going to", "tokens": [50365, 1186, 300, 291, 500, 380, 643, 257, 12577, 457, 741, 478, 884, 341, 510, 445, 337, 1019, 570, 321, 434, 516, 281, 50597], "temperature": 0.0, "avg_logprob": -0.06811989775491417, "compression_ratio": 1.9142857142857144, "no_speech_prob": 0.05579676479101181}, {"id": 108, "seek": 53382, "start": 538.46, "end": 542.22, "text": " have a gradient with respect to it and we can check that we are still calculating it correctly", "tokens": [50597, 362, 257, 16235, 365, 3104, 281, 309, 293, 321, 393, 1520, 300, 321, 366, 920, 28258, 309, 8944, 50785], "temperature": 0.0, "avg_logprob": -0.06811989775491417, "compression_ratio": 1.9142857142857144, "no_speech_prob": 0.05579676479101181}, {"id": 109, "seek": 53382, "start": 542.22, "end": 548.1400000000001, "text": " even though this bias is spurious so here i'm calculating a single batch and then here i am", "tokens": [50785, 754, 1673, 341, 12577, 307, 637, 24274, 370, 510, 741, 478, 28258, 257, 2167, 15245, 293, 550, 510, 741, 669, 51081], "temperature": 0.0, "avg_logprob": -0.06811989775491417, "compression_ratio": 1.9142857142857144, "no_speech_prob": 0.05579676479101181}, {"id": 110, "seek": 53382, "start": 548.1400000000001, "end": 553.2600000000001, "text": " doing a forward pass now you'll notice that the forward pass is significantly expanded from what", "tokens": [51081, 884, 257, 2128, 1320, 586, 291, 603, 3449, 300, 264, 2128, 1320, 307, 10591, 14342, 490, 437, 51337], "temperature": 0.0, "avg_logprob": -0.06811989775491417, "compression_ratio": 1.9142857142857144, "no_speech_prob": 0.05579676479101181}, {"id": 111, "seek": 53382, "start": 553.2600000000001, "end": 559.4200000000001, "text": " we are used to here the forward pass was just um here now the reason that the forward pass is", "tokens": [51337, 321, 366, 1143, 281, 510, 264, 2128, 1320, 390, 445, 1105, 510, 586, 264, 1778, 300, 264, 2128, 1320, 307, 51645], "temperature": 0.0, "avg_logprob": -0.06811989775491417, "compression_ratio": 1.9142857142857144, "no_speech_prob": 0.05579676479101181}, {"id": 112, "seek": 55942, "start": 559.42, "end": 565.02, "text": " longer is for two reasons number one here we just had an f dot cross entropy but here i am bringing", "tokens": [50365, 2854, 307, 337, 732, 4112, 1230, 472, 510, 321, 445, 632, 364, 283, 5893, 3278, 30867, 457, 510, 741, 669, 5062, 50645], "temperature": 0.0, "avg_logprob": -0.07490217789359714, "compression_ratio": 1.802120141342756, "no_speech_prob": 0.00027216190937906504}, {"id": 113, "seek": 55942, "start": 565.02, "end": 570.9399999999999, "text": " back a explicit implementation of the loss function and number two i've broken up the", "tokens": [50645, 646, 257, 13691, 11420, 295, 264, 4470, 2445, 293, 1230, 732, 741, 600, 5463, 493, 264, 50941], "temperature": 0.0, "avg_logprob": -0.07490217789359714, "compression_ratio": 1.802120141342756, "no_speech_prob": 0.00027216190937906504}, {"id": 114, "seek": 55942, "start": 570.9399999999999, "end": 577.02, "text": " implementation into manageable chunks so we have a lot a lot more intermediate tensors along the way", "tokens": [50941, 11420, 666, 38798, 24004, 370, 321, 362, 257, 688, 257, 688, 544, 19376, 10688, 830, 2051, 264, 636, 51245], "temperature": 0.0, "avg_logprob": -0.07490217789359714, "compression_ratio": 1.802120141342756, "no_speech_prob": 0.00027216190937906504}, {"id": 115, "seek": 55942, "start": 577.02, "end": 581.42, "text": " in the forward pass and that's because we are about to go backwards and calculate the gradients", "tokens": [51245, 294, 264, 2128, 1320, 293, 300, 311, 570, 321, 366, 466, 281, 352, 12204, 293, 8873, 264, 2771, 2448, 51465], "temperature": 0.0, "avg_logprob": -0.07490217789359714, "compression_ratio": 1.802120141342756, "no_speech_prob": 0.00027216190937906504}, {"id": 116, "seek": 55942, "start": 582.2199999999999, "end": 588.38, "text": " in this back propagation from the bottom to the top so we're going to go upwards and just like we", "tokens": [51505, 294, 341, 646, 38377, 490, 264, 2767, 281, 264, 1192, 370, 321, 434, 516, 281, 352, 22167, 293, 445, 411, 321, 51813], "temperature": 0.0, "avg_logprob": -0.07490217789359714, "compression_ratio": 1.802120141342756, "no_speech_prob": 0.00027216190937906504}, {"id": 117, "seek": 55942, "start": 588.38, "end": 589.3399999999999, "text": " have for example the lockpick", "tokens": [51813, 362, 337, 1365, 264, 4017, 79, 618, 51861], "temperature": 0.0, "avg_logprob": -0.07490217789359714, "compression_ratio": 1.802120141342756, "no_speech_prob": 0.00027216190937906504}, {"id": 118, "seek": 58942, "start": 589.42, "end": 593.9799999999999, "text": " props tensor in a forward pass in a backward pass we're going to have a d lock props which is going", "tokens": [50365, 26173, 40863, 294, 257, 2128, 1320, 294, 257, 23897, 1320, 321, 434, 516, 281, 362, 257, 274, 4017, 26173, 597, 307, 516, 50593], "temperature": 0.0, "avg_logprob": -0.08430802731113579, "compression_ratio": 2.079847908745247, "no_speech_prob": 9.200189379043877e-05}, {"id": 119, "seek": 58942, "start": 593.9799999999999, "end": 598.3, "text": " to store the derivative of the loss with respect to the lock props tensor and so we're going to", "tokens": [50593, 281, 3531, 264, 13760, 295, 264, 4470, 365, 3104, 281, 264, 4017, 26173, 40863, 293, 370, 321, 434, 516, 281, 50809], "temperature": 0.0, "avg_logprob": -0.08430802731113579, "compression_ratio": 2.079847908745247, "no_speech_prob": 9.200189379043877e-05}, {"id": 120, "seek": 58942, "start": 598.3, "end": 603.42, "text": " be prepending d to every one of these tensors and calculating it along the way of this back", "tokens": [50809, 312, 2666, 2029, 274, 281, 633, 472, 295, 613, 10688, 830, 293, 28258, 309, 2051, 264, 636, 295, 341, 646, 51065], "temperature": 0.0, "avg_logprob": -0.08430802731113579, "compression_ratio": 2.079847908745247, "no_speech_prob": 9.200189379043877e-05}, {"id": 121, "seek": 58942, "start": 603.42, "end": 609.42, "text": " propagation so as an example we have a b in raw here we're going to be calculating a db in raw", "tokens": [51065, 38377, 370, 382, 364, 1365, 321, 362, 257, 272, 294, 8936, 510, 321, 434, 516, 281, 312, 28258, 257, 274, 65, 294, 8936, 51365], "temperature": 0.0, "avg_logprob": -0.08430802731113579, "compression_ratio": 2.079847908745247, "no_speech_prob": 9.200189379043877e-05}, {"id": 122, "seek": 58942, "start": 610.14, "end": 616.2199999999999, "text": " so here i'm telling pytorch that we want to retain the grad of all these intermediate values because", "tokens": [51401, 370, 510, 741, 478, 3585, 25878, 284, 339, 300, 321, 528, 281, 18340, 264, 2771, 295, 439, 613, 19376, 4190, 570, 51705], "temperature": 0.0, "avg_logprob": -0.08430802731113579, "compression_ratio": 2.079847908745247, "no_speech_prob": 9.200189379043877e-05}, {"id": 123, "seek": 58942, "start": 616.2199999999999, "end": 619.26, "text": " here in exercise one we're going to calculate the backward pass", "tokens": [51705, 510, 294, 5380, 472, 321, 434, 516, 281, 8873, 264, 23897, 1320, 51857], "temperature": 0.0, "avg_logprob": -0.08430802731113579, "compression_ratio": 2.079847908745247, "no_speech_prob": 9.200189379043877e-05}, {"id": 124, "seek": 61942, "start": 619.42, "end": 624.86, "text": " so we're going to calculate all these d variable d variables and use the cmp function i've introduced", "tokens": [50365, 370, 321, 434, 516, 281, 8873, 439, 613, 274, 7006, 274, 9102, 293, 764, 264, 269, 2455, 2445, 741, 600, 7268, 50637], "temperature": 0.0, "avg_logprob": -0.07180276464243404, "compression_ratio": 1.8506944444444444, "no_speech_prob": 0.0003620932693593204}, {"id": 125, "seek": 61942, "start": 624.86, "end": 629.9, "text": " above to check our correctness with respect to what pytorch is telling us this is going to be", "tokens": [50637, 3673, 281, 1520, 527, 3006, 1287, 365, 3104, 281, 437, 25878, 284, 339, 307, 3585, 505, 341, 307, 516, 281, 312, 50889], "temperature": 0.0, "avg_logprob": -0.07180276464243404, "compression_ratio": 1.8506944444444444, "no_speech_prob": 0.0003620932693593204}, {"id": 126, "seek": 61942, "start": 629.9, "end": 635.5, "text": " exercise one where we sort of back propagate through this entire graph now just to give you", "tokens": [50889, 5380, 472, 689, 321, 1333, 295, 646, 48256, 807, 341, 2302, 4295, 586, 445, 281, 976, 291, 51169], "temperature": 0.0, "avg_logprob": -0.07180276464243404, "compression_ratio": 1.8506944444444444, "no_speech_prob": 0.0003620932693593204}, {"id": 127, "seek": 61942, "start": 635.5, "end": 638.38, "text": " a very quick preview of what's going to happen in exercise two and below", "tokens": [51169, 257, 588, 1702, 14281, 295, 437, 311, 516, 281, 1051, 294, 5380, 732, 293, 2507, 51313], "temperature": 0.0, "avg_logprob": -0.07180276464243404, "compression_ratio": 1.8506944444444444, "no_speech_prob": 0.0003620932693593204}, {"id": 128, "seek": 61942, "start": 639.5799999999999, "end": 645.3399999999999, "text": " here we have fully broken up the loss and back propagated through it manually in all", "tokens": [51373, 510, 321, 362, 4498, 5463, 493, 264, 4470, 293, 646, 12425, 770, 807, 309, 16945, 294, 439, 51661], "temperature": 0.0, "avg_logprob": -0.07180276464243404, "compression_ratio": 1.8506944444444444, "no_speech_prob": 0.0003620932693593204}, {"id": 129, "seek": 61942, "start": 645.3399999999999, "end": 649.3399999999999, "text": " the little atomic pieces that make it up but here we're going to collapse the loss into", "tokens": [51661, 264, 707, 22275, 3755, 300, 652, 309, 493, 457, 510, 321, 434, 516, 281, 15584, 264, 4470, 666, 51861], "temperature": 0.0, "avg_logprob": -0.07180276464243404, "compression_ratio": 1.8506944444444444, "no_speech_prob": 0.0003620932693593204}, {"id": 130, "seek": 64942, "start": 649.42, "end": 654.3, "text": " a single cross entropy call and instead we're going to analytically derive using", "tokens": [50365, 257, 2167, 3278, 30867, 818, 293, 2602, 321, 434, 516, 281, 10783, 984, 28446, 1228, 50609], "temperature": 0.0, "avg_logprob": -0.0674903052193778, "compression_ratio": 2.003623188405797, "no_speech_prob": 0.00017478330119047314}, {"id": 131, "seek": 64942, "start": 654.9399999999999, "end": 660.78, "text": " math and paper and pencil the gradient of the loss with respect to the logits and instead of", "tokens": [50641, 5221, 293, 3035, 293, 10985, 264, 16235, 295, 264, 4470, 365, 3104, 281, 264, 3565, 1208, 293, 2602, 295, 50933], "temperature": 0.0, "avg_logprob": -0.0674903052193778, "compression_ratio": 2.003623188405797, "no_speech_prob": 0.00017478330119047314}, {"id": 132, "seek": 64942, "start": 660.78, "end": 664.78, "text": " back propagating through all of its little chunks one at a time we're just going to analytically", "tokens": [50933, 646, 12425, 990, 807, 439, 295, 1080, 707, 24004, 472, 412, 257, 565, 321, 434, 445, 516, 281, 10783, 984, 51133], "temperature": 0.0, "avg_logprob": -0.0674903052193778, "compression_ratio": 2.003623188405797, "no_speech_prob": 0.00017478330119047314}, {"id": 133, "seek": 64942, "start": 664.78, "end": 668.78, "text": " derive what that gradient is and we're going to implement that which is much more efficient as", "tokens": [51133, 28446, 437, 300, 16235, 307, 293, 321, 434, 516, 281, 4445, 300, 597, 307, 709, 544, 7148, 382, 51333], "temperature": 0.0, "avg_logprob": -0.0674903052193778, "compression_ratio": 2.003623188405797, "no_speech_prob": 0.00017478330119047314}, {"id": 134, "seek": 64942, "start": 668.78, "end": 674.06, "text": " we'll see in a bit then we're going to do the exact same thing for batch normalization so", "tokens": [51333, 321, 603, 536, 294, 257, 857, 550, 321, 434, 516, 281, 360, 264, 1900, 912, 551, 337, 15245, 2710, 2144, 370, 51597], "temperature": 0.0, "avg_logprob": -0.0674903052193778, "compression_ratio": 2.003623188405797, "no_speech_prob": 0.00017478330119047314}, {"id": 135, "seek": 64942, "start": 674.06, "end": 679.18, "text": " instead of breaking up bastion arm into all the little tiny components we're going to use pen and", "tokens": [51597, 2602, 295, 7697, 493, 8414, 313, 3726, 666, 439, 264, 707, 5870, 6677, 321, 434, 516, 281, 764, 3435, 293, 51853], "temperature": 0.0, "avg_logprob": -0.0674903052193778, "compression_ratio": 2.003623188405797, "no_speech_prob": 0.00017478330119047314}, {"id": 136, "seek": 67942, "start": 679.42, "end": 685.0999999999999, "text": " paper and mathematics and calculus to derive the gradient through the bachelor bathroom layer so", "tokens": [50365, 3035, 293, 18666, 293, 33400, 281, 28446, 264, 16235, 807, 264, 25947, 8687, 4583, 370, 50649], "temperature": 0.0, "avg_logprob": -0.0982896952793516, "compression_ratio": 2.0290909090909093, "no_speech_prob": 0.00014641371672041714}, {"id": 137, "seek": 67942, "start": 685.0999999999999, "end": 689.66, "text": " we're going to calculate the backward pass through bathroom layer in a much more efficient expression", "tokens": [50649, 321, 434, 516, 281, 8873, 264, 23897, 1320, 807, 8687, 4583, 294, 257, 709, 544, 7148, 6114, 50877], "temperature": 0.0, "avg_logprob": -0.0982896952793516, "compression_ratio": 2.0290909090909093, "no_speech_prob": 0.00014641371672041714}, {"id": 138, "seek": 67942, "start": 689.66, "end": 692.78, "text": " instead of backward propagating through all of its little pieces independently", "tokens": [50877, 2602, 295, 23897, 12425, 990, 807, 439, 295, 1080, 707, 3755, 21761, 51033], "temperature": 0.0, "avg_logprob": -0.0982896952793516, "compression_ratio": 2.0290909090909093, "no_speech_prob": 0.00014641371672041714}, {"id": 139, "seek": 67942, "start": 693.5, "end": 698.78, "text": " so it's going to be exercise three and then in exercise four we're going to put it all together", "tokens": [51069, 370, 309, 311, 516, 281, 312, 5380, 1045, 293, 550, 294, 5380, 1451, 321, 434, 516, 281, 829, 309, 439, 1214, 51333], "temperature": 0.0, "avg_logprob": -0.0982896952793516, "compression_ratio": 2.0290909090909093, "no_speech_prob": 0.00014641371672041714}, {"id": 140, "seek": 67942, "start": 698.78, "end": 703.98, "text": " and this is the full code of training this two layer mlp and we're going to basically insert", "tokens": [51333, 293, 341, 307, 264, 1577, 3089, 295, 3097, 341, 732, 4583, 23271, 79, 293, 321, 434, 516, 281, 1936, 8969, 51593], "temperature": 0.0, "avg_logprob": -0.0982896952793516, "compression_ratio": 2.0290909090909093, "no_speech_prob": 0.00014641371672041714}, {"id": 141, "seek": 67942, "start": 703.98, "end": 709.26, "text": " our manual backdrop and we're going to take out lost up backward and you will basically see", "tokens": [51593, 527, 9688, 32697, 293, 321, 434, 516, 281, 747, 484, 2731, 493, 23897, 293, 291, 486, 1936, 536, 51857], "temperature": 0.0, "avg_logprob": -0.0982896952793516, "compression_ratio": 2.0290909090909093, "no_speech_prob": 0.00014641371672041714}, {"id": 142, "seek": 70942, "start": 709.42, "end": 715.9799999999999, "text": " that you can get all the same results using fully your own code and the only thing we're using from", "tokens": [50365, 300, 291, 393, 483, 439, 264, 912, 3542, 1228, 4498, 428, 1065, 3089, 293, 264, 787, 551, 321, 434, 1228, 490, 50693], "temperature": 0.2, "avg_logprob": -0.11250668865139202, "compression_ratio": 1.8241758241758241, "no_speech_prob": 0.0007710991776548326}, {"id": 143, "seek": 70942, "start": 715.9799999999999, "end": 722.4599999999999, "text": " pytorch is the torch.tensor to make the calculations efficient but otherwise you will understand fully", "tokens": [50693, 25878, 284, 339, 307, 264, 27822, 13, 83, 23153, 281, 652, 264, 20448, 7148, 457, 5911, 291, 486, 1223, 4498, 51017], "temperature": 0.2, "avg_logprob": -0.11250668865139202, "compression_ratio": 1.8241758241758241, "no_speech_prob": 0.0007710991776548326}, {"id": 144, "seek": 70942, "start": 722.4599999999999, "end": 726.78, "text": " what it means to forward and backward the neural net and train it and i think that'll be awesome so", "tokens": [51017, 437, 309, 1355, 281, 2128, 293, 23897, 264, 18161, 2533, 293, 3847, 309, 293, 741, 519, 300, 603, 312, 3476, 370, 51233], "temperature": 0.2, "avg_logprob": -0.11250668865139202, "compression_ratio": 1.8241758241758241, "no_speech_prob": 0.0007710991776548326}, {"id": 145, "seek": 70942, "start": 726.78, "end": 733.0999999999999, "text": " let's get to it okay so i ran all the cells of this notebook all the way up to here and i'm going", "tokens": [51233, 718, 311, 483, 281, 309, 1392, 370, 741, 5872, 439, 264, 5438, 295, 341, 21060, 439, 264, 636, 493, 281, 510, 293, 741, 478, 516, 51549], "temperature": 0.2, "avg_logprob": -0.11250668865139202, "compression_ratio": 1.8241758241758241, "no_speech_prob": 0.0007710991776548326}, {"id": 146, "seek": 70942, "start": 733.0999999999999, "end": 738.38, "text": " to erase this and i'm going to start implementing backward pass starting with d lock probes so we", "tokens": [51549, 281, 23525, 341, 293, 741, 478, 516, 281, 722, 18114, 23897, 1320, 2891, 365, 274, 4017, 1239, 279, 370, 321, 51813], "temperature": 0.2, "avg_logprob": -0.11250668865139202, "compression_ratio": 1.8241758241758241, "no_speech_prob": 0.0007710991776548326}, {"id": 147, "seek": 73942, "start": 739.42, "end": 743.9799999999999, "text": " go here to calculate the gradient of the loss with respect to all the elements of the lock props", "tokens": [50365, 352, 510, 281, 8873, 264, 16235, 295, 264, 4470, 365, 3104, 281, 439, 264, 4959, 295, 264, 4017, 26173, 50593], "temperature": 0.0, "avg_logprob": -0.07443541505911054, "compression_ratio": 1.838095238095238, "no_speech_prob": 0.0006420881254598498}, {"id": 148, "seek": 73942, "start": 743.9799999999999, "end": 749.26, "text": " tensor now i'm going to give away the answer here but i wanted to put a quick note here that", "tokens": [50593, 40863, 586, 741, 478, 516, 281, 976, 1314, 264, 1867, 510, 457, 741, 1415, 281, 829, 257, 1702, 3637, 510, 300, 50857], "temperature": 0.0, "avg_logprob": -0.07443541505911054, "compression_ratio": 1.838095238095238, "no_speech_prob": 0.0006420881254598498}, {"id": 149, "seek": 73942, "start": 749.26, "end": 754.38, "text": " i think would be most pedagogically useful for you is to actually go into the description of this", "tokens": [50857, 741, 519, 576, 312, 881, 5670, 31599, 984, 4420, 337, 291, 307, 281, 767, 352, 666, 264, 3855, 295, 341, 51113], "temperature": 0.0, "avg_logprob": -0.07443541505911054, "compression_ratio": 1.838095238095238, "no_speech_prob": 0.0006420881254598498}, {"id": 150, "seek": 73942, "start": 754.38, "end": 758.9399999999999, "text": " video and find the link to this jupyter notebook you can find it both on github but you can also", "tokens": [51113, 960, 293, 915, 264, 2113, 281, 341, 361, 1010, 88, 391, 21060, 291, 393, 915, 309, 1293, 322, 290, 355, 836, 457, 291, 393, 611, 51341], "temperature": 0.0, "avg_logprob": -0.07443541505911054, "compression_ratio": 1.838095238095238, "no_speech_prob": 0.0006420881254598498}, {"id": 151, "seek": 73942, "start": 758.9399999999999, "end": 762.78, "text": " find google collab with it so you don't have to install anything you'll just go to a website on", "tokens": [51341, 915, 20742, 44228, 365, 309, 370, 291, 500, 380, 362, 281, 3625, 1340, 291, 603, 445, 352, 281, 257, 3144, 322, 51533], "temperature": 0.0, "avg_logprob": -0.07443541505911054, "compression_ratio": 1.838095238095238, "no_speech_prob": 0.0006420881254598498}, {"id": 152, "seek": 73942, "start": 762.78, "end": 769.26, "text": " google collab and you can try to implement these derivatives or gradients yourself and then if you", "tokens": [51533, 20742, 44228, 293, 291, 393, 853, 281, 4445, 613, 33733, 420, 2771, 2448, 1803, 293, 550, 498, 291, 51857], "temperature": 0.0, "avg_logprob": -0.07443541505911054, "compression_ratio": 1.838095238095238, "no_speech_prob": 0.0006420881254598498}, {"id": 153, "seek": 76942, "start": 769.42, "end": 775.18, "text": " are not able to come to my video and see me do it and so work in tandem and try it first yourself", "tokens": [50365, 366, 406, 1075, 281, 808, 281, 452, 960, 293, 536, 385, 360, 309, 293, 370, 589, 294, 48120, 293, 853, 309, 700, 1803, 50653], "temperature": 0.0, "avg_logprob": -0.08058811028798421, "compression_ratio": 1.8171641791044777, "no_speech_prob": 0.000432652304880321}, {"id": 154, "seek": 76942, "start": 775.18, "end": 779.66, "text": " and then see me give away the answer and i think that'll be most valuable to you and that's how i", "tokens": [50653, 293, 550, 536, 385, 976, 1314, 264, 1867, 293, 741, 519, 300, 603, 312, 881, 8263, 281, 291, 293, 300, 311, 577, 741, 50877], "temperature": 0.0, "avg_logprob": -0.08058811028798421, "compression_ratio": 1.8171641791044777, "no_speech_prob": 0.000432652304880321}, {"id": 155, "seek": 76942, "start": 779.66, "end": 785.9, "text": " recommend you go through this lecture so we are starting here with d log props now d log props", "tokens": [50877, 2748, 291, 352, 807, 341, 7991, 370, 321, 366, 2891, 510, 365, 274, 3565, 26173, 586, 274, 3565, 26173, 51189], "temperature": 0.0, "avg_logprob": -0.08058811028798421, "compression_ratio": 1.8171641791044777, "no_speech_prob": 0.000432652304880321}, {"id": 156, "seek": 76942, "start": 785.9, "end": 792.2199999999999, "text": " will hold the derivative of the loss with respect to all the elements of log props what is inside", "tokens": [51189, 486, 1797, 264, 13760, 295, 264, 4470, 365, 3104, 281, 439, 264, 4959, 295, 3565, 26173, 437, 307, 1854, 51505], "temperature": 0.0, "avg_logprob": -0.08058811028798421, "compression_ratio": 1.8171641791044777, "no_speech_prob": 0.000432652304880321}, {"id": 157, "seek": 76942, "start": 792.2199999999999, "end": 799.42, "text": " log blobs the shape of this is 32 by 27. so it's not going to surprise you that d log props should", "tokens": [51505, 3565, 1749, 929, 264, 3909, 295, 341, 307, 8858, 538, 7634, 13, 370, 309, 311, 406, 516, 281, 6365, 291, 300, 274, 3565, 26173, 820, 51865], "temperature": 0.0, "avg_logprob": -0.08058811028798421, "compression_ratio": 1.8171641791044777, "no_speech_prob": 0.000432652304880321}, {"id": 158, "seek": 79942, "start": 799.42, "end": 804.38, "text": " also be an array of size 32 by 27 because we want the derivative loss with respect to all of its", "tokens": [50365, 611, 312, 364, 10225, 295, 2744, 8858, 538, 7634, 570, 321, 528, 264, 13760, 4470, 365, 3104, 281, 439, 295, 1080, 50613], "temperature": 0.0, "avg_logprob": -0.20258976480235225, "compression_ratio": 2.017857142857143, "no_speech_prob": 0.001818479155190289}, {"id": 159, "seek": 79942, "start": 804.38, "end": 812.62, "text": " elements so the sizes of those are always going to be equal now how how does log probes influence", "tokens": [50613, 4959, 370, 264, 11602, 295, 729, 366, 1009, 516, 281, 312, 2681, 586, 577, 577, 775, 3565, 1239, 279, 6503, 51025], "temperature": 0.0, "avg_logprob": -0.20258976480235225, "compression_ratio": 2.017857142857143, "no_speech_prob": 0.001818479155190289}, {"id": 160, "seek": 79942, "start": 812.62, "end": 821.18, "text": " the loss okay loss is negative log probes indexed with range of n and yb and then the mean of that", "tokens": [51025, 264, 4470, 1392, 4470, 307, 3671, 3565, 1239, 279, 8186, 292, 365, 3613, 295, 297, 293, 288, 65, 293, 550, 264, 914, 295, 300, 51453], "temperature": 0.0, "avg_logprob": -0.20258976480235225, "compression_ratio": 2.017857142857143, "no_speech_prob": 0.001818479155190289}, {"id": 161, "seek": 79942, "start": 821.74, "end": 827.66, "text": " now just as a reminder yb is just basically an array of all the", "tokens": [51481, 586, 445, 382, 257, 13548, 288, 65, 307, 445, 1936, 364, 10225, 295, 439, 264, 51777], "temperature": 0.0, "avg_logprob": -0.20258976480235225, "compression_ratio": 2.017857142857143, "no_speech_prob": 0.001818479155190289}, {"id": 162, "seek": 79942, "start": 828.78, "end": 829.42, "text": " correct indexes of all the indexes of all the indexes of all the indexes of all the indexes of", "tokens": [51833, 3006, 8186, 279, 295, 439, 264, 8186, 279, 295, 439, 264, 8186, 279, 295, 439, 264, 8186, 279, 295, 439, 264, 8186, 279, 295, 51865], "temperature": 0.0, "avg_logprob": -0.20258976480235225, "compression_ratio": 2.017857142857143, "no_speech_prob": 0.001818479155190289}, {"id": 163, "seek": 82942, "start": 829.42, "end": 835.0999999999999, "text": " all the indexes so what we're doing here is we're taking the log props array of size 32 by 27", "tokens": [50365, 439, 264, 8186, 279, 370, 437, 321, 434, 884, 510, 307, 321, 434, 1940, 264, 3565, 26173, 10225, 295, 2744, 8858, 538, 7634, 50649], "temperature": 0.0, "avg_logprob": -0.05944683315517666, "compression_ratio": 1.8982300884955752, "no_speech_prob": 0.00016896170563995838}, {"id": 164, "seek": 82942, "start": 837.3399999999999, "end": 844.2199999999999, "text": " right and then we are going in every single row and in each row we are plugging plucking out", "tokens": [50761, 558, 293, 550, 321, 366, 516, 294, 633, 2167, 5386, 293, 294, 1184, 5386, 321, 366, 42975, 499, 33260, 484, 51105], "temperature": 0.0, "avg_logprob": -0.05944683315517666, "compression_ratio": 1.8982300884955752, "no_speech_prob": 0.00016896170563995838}, {"id": 165, "seek": 82942, "start": 844.2199999999999, "end": 848.62, "text": " the index 8 and then 14 and 15 and so on so we're going down the rows", "tokens": [51105, 264, 8186, 1649, 293, 550, 3499, 293, 2119, 293, 370, 322, 370, 321, 434, 516, 760, 264, 13241, 51325], "temperature": 0.0, "avg_logprob": -0.05944683315517666, "compression_ratio": 1.8982300884955752, "no_speech_prob": 0.00016896170563995838}, {"id": 166, "seek": 82942, "start": 848.62, "end": 853.3399999999999, "text": " that's the iterator range of n and then we are always plucking out the index and the", "tokens": [51325, 300, 311, 264, 17138, 1639, 3613, 295, 297, 293, 550, 321, 366, 1009, 499, 33260, 484, 264, 8186, 293, 264, 51561], "temperature": 0.0, "avg_logprob": -0.05944683315517666, "compression_ratio": 1.8982300884955752, "no_speech_prob": 0.00016896170563995838}, {"id": 167, "seek": 82942, "start": 853.3399999999999, "end": 859.02, "text": " column specified by this tensor yb so in the zeroth row we are taking the eighth column", "tokens": [51561, 7738, 22206, 538, 341, 40863, 288, 65, 370, 294, 264, 44746, 900, 5386, 321, 366, 1940, 264, 19495, 7738, 51845], "temperature": 0.0, "avg_logprob": -0.05944683315517666, "compression_ratio": 1.8982300884955752, "no_speech_prob": 0.00016896170563995838}, {"id": 168, "seek": 85942, "start": 859.5799999999999, "end": 867.0999999999999, "text": " in the first row we're taking the 14th column etc and so log props at this plucks out all those", "tokens": [50373, 294, 264, 700, 5386, 321, 434, 1940, 264, 3499, 392, 7738, 5183, 293, 370, 3565, 26173, 412, 341, 499, 15493, 484, 439, 729, 50749], "temperature": 0.0, "avg_logprob": -0.07324900316155475, "compression_ratio": 1.6666666666666667, "no_speech_prob": 0.00038211524952203035}, {"id": 169, "seek": 85942, "start": 869.02, "end": 874.3, "text": " log probabilities of the correct next character in a sequence so that's what that does and the", "tokens": [50845, 3565, 33783, 295, 264, 3006, 958, 2517, 294, 257, 8310, 370, 300, 311, 437, 300, 775, 293, 264, 51109], "temperature": 0.0, "avg_logprob": -0.07324900316155475, "compression_ratio": 1.6666666666666667, "no_speech_prob": 0.00038211524952203035}, {"id": 170, "seek": 85942, "start": 874.3, "end": 881.5, "text": " shape of this or the size of it is of course 32 because our batch size is 32. so these elements", "tokens": [51109, 3909, 295, 341, 420, 264, 2744, 295, 309, 307, 295, 1164, 8858, 570, 527, 15245, 2744, 307, 8858, 13, 370, 613, 4959, 51469], "temperature": 0.0, "avg_logprob": -0.07324900316155475, "compression_ratio": 1.6666666666666667, "no_speech_prob": 0.00038211524952203035}, {"id": 171, "seek": 85942, "start": 881.5, "end": 888.62, "text": " get plucked out and then their mean and the negative of that becomes loss so i always like to", "tokens": [51469, 483, 41514, 292, 484, 293, 550, 641, 914, 293, 264, 3671, 295, 300, 3643, 4470, 370, 741, 1009, 411, 281, 51825], "temperature": 0.0, "avg_logprob": -0.07324900316155475, "compression_ratio": 1.6666666666666667, "no_speech_prob": 0.00038211524952203035}, {"id": 172, "seek": 88942, "start": 889.5799999999999, "end": 895.74, "text": " examples to understand the numerical form of derivative what's going on here is once we've", "tokens": [50373, 5110, 281, 1223, 264, 29054, 1254, 295, 13760, 437, 311, 516, 322, 510, 307, 1564, 321, 600, 50681], "temperature": 0.0, "avg_logprob": -0.0976779530349287, "compression_ratio": 1.8860759493670887, "no_speech_prob": 0.0004494956519920379}, {"id": 173, "seek": 88942, "start": 895.74, "end": 902.06, "text": " plucked out these examples um we're taking the mean and then the negative so the loss basically", "tokens": [50681, 41514, 292, 484, 613, 5110, 1105, 321, 434, 1940, 264, 914, 293, 550, 264, 3671, 370, 264, 4470, 1936, 50997], "temperature": 0.0, "avg_logprob": -0.0976779530349287, "compression_ratio": 1.8860759493670887, "no_speech_prob": 0.0004494956519920379}, {"id": 174, "seek": 88942, "start": 902.78, "end": 906.54, "text": " if i can write it this way is the negative of say a plus b plus c", "tokens": [51033, 498, 741, 393, 2464, 309, 341, 636, 307, 264, 3671, 295, 584, 257, 1804, 272, 1804, 269, 51221], "temperature": 0.0, "avg_logprob": -0.0976779530349287, "compression_ratio": 1.8860759493670887, "no_speech_prob": 0.0004494956519920379}, {"id": 175, "seek": 88942, "start": 907.9, "end": 911.9799999999999, "text": " and the mean of those three numbers would be say negative would divide three that would be how we", "tokens": [51289, 293, 264, 914, 295, 729, 1045, 3547, 576, 312, 584, 3671, 576, 9845, 1045, 300, 576, 312, 577, 321, 51493], "temperature": 0.0, "avg_logprob": -0.0976779530349287, "compression_ratio": 1.8860759493670887, "no_speech_prob": 0.0004494956519920379}, {"id": 176, "seek": 88942, "start": 911.9799999999999, "end": 918.06, "text": " achieve the mean of three numbers a b c although we actually have 32 numbers here and so what is", "tokens": [51493, 4584, 264, 914, 295, 1045, 3547, 257, 272, 269, 4878, 321, 767, 362, 8858, 3547, 510, 293, 370, 437, 307, 51797], "temperature": 0.0, "avg_logprob": -0.0976779530349287, "compression_ratio": 1.8860759493670887, "no_speech_prob": 0.0004494956519920379}, {"id": 177, "seek": 91942, "start": 919.42, "end": 925.9, "text": " loss by say like da right well if we simplify this expression mathematically this is negative", "tokens": [50365, 4470, 538, 584, 411, 1120, 558, 731, 498, 321, 20460, 341, 6114, 44003, 341, 307, 3671, 50689], "temperature": 0.6000000000000001, "avg_logprob": -0.20360840691460502, "compression_ratio": 2.071090047393365, "no_speech_prob": 0.00010953819582937285}, {"id": 178, "seek": 91942, "start": 925.9, "end": 929.02, "text": " one over three of a and negative one plus negative one over three of b", "tokens": [50689, 472, 670, 1045, 295, 257, 293, 3671, 472, 1804, 3671, 472, 670, 1045, 295, 272, 50845], "temperature": 0.6000000000000001, "avg_logprob": -0.20360840691460502, "compression_ratio": 2.071090047393365, "no_speech_prob": 0.00010953819582937285}, {"id": 179, "seek": 91942, "start": 930.9399999999999, "end": 936.14, "text": " plus negative one over three of c and so what is d loss by d a it's just negative one over three", "tokens": [50941, 1804, 3671, 472, 670, 1045, 295, 269, 293, 370, 437, 307, 274, 4470, 538, 274, 257, 309, 311, 445, 3671, 472, 670, 1045, 51201], "temperature": 0.6000000000000001, "avg_logprob": -0.20360840691460502, "compression_ratio": 2.071090047393365, "no_speech_prob": 0.00010953819582937285}, {"id": 180, "seek": 91942, "start": 937.02, "end": 940.62, "text": " and so you can see that if we don't just have a b and c but we have 32 numbers", "tokens": [51245, 293, 370, 291, 393, 536, 300, 498, 321, 500, 380, 445, 362, 257, 272, 293, 269, 457, 321, 362, 8858, 3547, 51425], "temperature": 0.6000000000000001, "avg_logprob": -0.20360840691460502, "compression_ratio": 2.071090047393365, "no_speech_prob": 0.00010953819582937285}, {"id": 181, "seek": 91942, "start": 941.3399999999999, "end": 948.06, "text": " then d loss by d um you know every one of those numbers is going to be one over n more generally", "tokens": [51461, 550, 274, 4470, 538, 274, 1105, 291, 458, 633, 472, 295, 729, 3547, 307, 516, 281, 312, 472, 670, 297, 544, 5101, 51797], "temperature": 0.6000000000000001, "avg_logprob": -0.20360840691460502, "compression_ratio": 2.071090047393365, "no_speech_prob": 0.00010953819582937285}, {"id": 182, "seek": 94942, "start": 949.42, "end": 952.14, "text": " the size of the batch, 32 in this case.", "tokens": [50365, 264, 2744, 295, 264, 15245, 11, 8858, 294, 341, 1389, 13, 50501], "temperature": 0.0, "avg_logprob": -0.20565375635179423, "compression_ratio": 1.6131687242798354, "no_speech_prob": 0.1173354834318161}, {"id": 183, "seek": 94942, "start": 953.14, "end": 959.06, "text": " So DLoss by DLockProbs is negative one over N", "tokens": [50551, 407, 413, 43, 772, 538, 413, 43, 1560, 12681, 929, 307, 3671, 472, 670, 426, 50847], "temperature": 0.0, "avg_logprob": -0.20565375635179423, "compression_ratio": 1.6131687242798354, "no_speech_prob": 0.1173354834318161}, {"id": 184, "seek": 94942, "start": 959.06, "end": 961.1999999999999, "text": " in all these places.", "tokens": [50847, 294, 439, 613, 3190, 13, 50954], "temperature": 0.0, "avg_logprob": -0.20565375635179423, "compression_ratio": 1.6131687242798354, "no_speech_prob": 0.1173354834318161}, {"id": 185, "seek": 94942, "start": 961.9799999999999, "end": 964.62, "text": " Now, what about the other elements inside LockProbs?", "tokens": [50993, 823, 11, 437, 466, 264, 661, 4959, 1854, 16736, 12681, 929, 30, 51125], "temperature": 0.0, "avg_logprob": -0.20565375635179423, "compression_ratio": 1.6131687242798354, "no_speech_prob": 0.1173354834318161}, {"id": 186, "seek": 94942, "start": 964.76, "end": 966.26, "text": " Because LockProbs is a large array.", "tokens": [51132, 1436, 16736, 12681, 929, 307, 257, 2416, 10225, 13, 51207], "temperature": 0.0, "avg_logprob": -0.20565375635179423, "compression_ratio": 1.6131687242798354, "no_speech_prob": 0.1173354834318161}, {"id": 187, "seek": 94942, "start": 966.26, "end": 969.3, "text": " You see that LockProbs.shape is 32 by 27,", "tokens": [51207, 509, 536, 300, 16736, 12681, 929, 13, 82, 42406, 307, 8858, 538, 7634, 11, 51359], "temperature": 0.0, "avg_logprob": -0.20565375635179423, "compression_ratio": 1.6131687242798354, "no_speech_prob": 0.1173354834318161}, {"id": 188, "seek": 94942, "start": 969.7199999999999, "end": 973.64, "text": " but only 32 of them participate in the loss calculation.", "tokens": [51380, 457, 787, 8858, 295, 552, 8197, 294, 264, 4470, 17108, 13, 51576], "temperature": 0.0, "avg_logprob": -0.20565375635179423, "compression_ratio": 1.6131687242798354, "no_speech_prob": 0.1173354834318161}, {"id": 189, "seek": 94942, "start": 974.24, "end": 976.16, "text": " So what's the derivative of all the other,", "tokens": [51606, 407, 437, 311, 264, 13760, 295, 439, 264, 661, 11, 51702], "temperature": 0.0, "avg_logprob": -0.20565375635179423, "compression_ratio": 1.6131687242798354, "no_speech_prob": 0.1173354834318161}, {"id": 190, "seek": 94942, "start": 976.4599999999999, "end": 979.38, "text": " most of the elements that do not get blocked out here?", "tokens": [51717, 881, 295, 264, 4959, 300, 360, 406, 483, 15470, 484, 510, 30, 51863], "temperature": 0.0, "avg_logprob": -0.20565375635179423, "compression_ratio": 1.6131687242798354, "no_speech_prob": 0.1173354834318161}, {"id": 191, "seek": 97942, "start": 980.38, "end": 981.88, "text": " Well, their loss intuitively is zero.", "tokens": [50413, 1042, 11, 641, 4470, 46506, 307, 4018, 13, 50488], "temperature": 0.0, "avg_logprob": -0.17123902362325918, "compression_ratio": 1.8436363636363637, "no_speech_prob": 2.698513344512321e-05}, {"id": 192, "seek": 97942, "start": 982.1999999999999, "end": 984.4, "text": " Sorry, their gradient intuitively is zero.", "tokens": [50504, 4919, 11, 641, 16235, 46506, 307, 4018, 13, 50614], "temperature": 0.0, "avg_logprob": -0.17123902362325918, "compression_ratio": 1.8436363636363637, "no_speech_prob": 2.698513344512321e-05}, {"id": 193, "seek": 97942, "start": 984.78, "end": 986.74, "text": " And that's because they did not participate in the loss.", "tokens": [50633, 400, 300, 311, 570, 436, 630, 406, 8197, 294, 264, 4470, 13, 50731], "temperature": 0.0, "avg_logprob": -0.17123902362325918, "compression_ratio": 1.8436363636363637, "no_speech_prob": 2.698513344512321e-05}, {"id": 194, "seek": 97942, "start": 987.26, "end": 989.76, "text": " So most of these numbers inside this tensor", "tokens": [50757, 407, 881, 295, 613, 3547, 1854, 341, 40863, 50882], "temperature": 0.0, "avg_logprob": -0.17123902362325918, "compression_ratio": 1.8436363636363637, "no_speech_prob": 2.698513344512321e-05}, {"id": 195, "seek": 97942, "start": 989.76, "end": 991.56, "text": " does not feed into the loss.", "tokens": [50882, 775, 406, 3154, 666, 264, 4470, 13, 50972], "temperature": 0.0, "avg_logprob": -0.17123902362325918, "compression_ratio": 1.8436363636363637, "no_speech_prob": 2.698513344512321e-05}, {"id": 196, "seek": 97942, "start": 991.9399999999999, "end": 993.56, "text": " And so if we were to change these numbers,", "tokens": [50991, 400, 370, 498, 321, 645, 281, 1319, 613, 3547, 11, 51072], "temperature": 0.0, "avg_logprob": -0.17123902362325918, "compression_ratio": 1.8436363636363637, "no_speech_prob": 2.698513344512321e-05}, {"id": 197, "seek": 97942, "start": 994.02, "end": 995.14, "text": " then the loss doesn't change,", "tokens": [51095, 550, 264, 4470, 1177, 380, 1319, 11, 51151], "temperature": 0.0, "avg_logprob": -0.17123902362325918, "compression_ratio": 1.8436363636363637, "no_speech_prob": 2.698513344512321e-05}, {"id": 198, "seek": 97942, "start": 995.1999999999999, "end": 997.3199999999999, "text": " which is the equivalent of what I was saying,", "tokens": [51154, 597, 307, 264, 10344, 295, 437, 286, 390, 1566, 11, 51260], "temperature": 0.0, "avg_logprob": -0.17123902362325918, "compression_ratio": 1.8436363636363637, "no_speech_prob": 2.698513344512321e-05}, {"id": 199, "seek": 97942, "start": 997.8199999999999, "end": 1000.4399999999999, "text": " that the derivative of the loss with respect to them is zero.", "tokens": [51285, 300, 264, 13760, 295, 264, 4470, 365, 3104, 281, 552, 307, 4018, 13, 51416], "temperature": 0.0, "avg_logprob": -0.17123902362325918, "compression_ratio": 1.8436363636363637, "no_speech_prob": 2.698513344512321e-05}, {"id": 200, "seek": 97942, "start": 1000.74, "end": 1001.64, "text": " They don't impact it.", "tokens": [51431, 814, 500, 380, 2712, 309, 13, 51476], "temperature": 0.0, "avg_logprob": -0.17123902362325918, "compression_ratio": 1.8436363636363637, "no_speech_prob": 2.698513344512321e-05}, {"id": 201, "seek": 97942, "start": 1003.0799999999999, "end": 1006.0999999999999, "text": " So here's a way to implement this derivative then.", "tokens": [51548, 407, 510, 311, 257, 636, 281, 4445, 341, 13760, 550, 13, 51699], "temperature": 0.0, "avg_logprob": -0.17123902362325918, "compression_ratio": 1.8436363636363637, "no_speech_prob": 2.698513344512321e-05}, {"id": 202, "seek": 97942, "start": 1006.4399999999999, "end": 1009.38, "text": " We start out with Torch.zeros of shape 32.", "tokens": [51716, 492, 722, 484, 365, 7160, 339, 13, 4527, 329, 295, 3909, 8858, 13, 51863], "temperature": 0.0, "avg_logprob": -0.17123902362325918, "compression_ratio": 1.8436363636363637, "no_speech_prob": 2.698513344512321e-05}, {"id": 203, "seek": 100942, "start": 1009.5799999999999, "end": 1011.42, "text": " So we're going to set it to 32 by 27,", "tokens": [50373, 407, 321, 434, 516, 281, 992, 309, 281, 8858, 538, 7634, 11, 50465], "temperature": 0.0, "avg_logprob": -0.4640083312988281, "compression_ratio": 1.7058823529411764, "no_speech_prob": 0.0006360951228998601}, {"id": 204, "seek": 100942, "start": 1011.42, "end": 1015.18, "text": " or let's just say instead of doing this because we don't want to hard-code numbers,", "tokens": [50465, 420, 718, 311, 445, 584, 2602, 295, 884, 341, 570, 321, 500, 380, 528, 281, 1152, 12, 22332, 3547, 11, 50653], "temperature": 0.0, "avg_logprob": -0.4640083312988281, "compression_ratio": 1.7058823529411764, "no_speech_prob": 0.0006360951228998601}, {"id": 205, "seek": 100942, "start": 1015.18, "end": 1019.18, "text": " let's do Torch.zeros like LockProbs.", "tokens": [50653, 718, 311, 360, 7160, 339, 13, 4527, 329, 411, 16736, 12681, 929, 13, 50853], "temperature": 0.0, "avg_logprob": -0.4640083312988281, "compression_ratio": 1.7058823529411764, "no_speech_prob": 0.0006360951228998601}, {"id": 206, "seek": 100942, "start": 1019.18, "end": 1023.18, "text": " So basically this is going to create an array of zeros exactly in the shape of LockProbs.", "tokens": [50853, 407, 1936, 341, 307, 516, 281, 1884, 364, 10225, 295, 35193, 2293, 294, 264, 3909, 295, 16736, 12681, 929, 13, 51053], "temperature": 0.0, "avg_logprob": -0.4640083312988281, "compression_ratio": 1.7058823529411764, "no_speech_prob": 0.0006360951228998601}, {"id": 207, "seek": 100942, "start": 1024.18, "end": 1029.18, "text": " And then we need to set the derivative of negative one over n inside exactly these locations.", "tokens": [51103, 400, 550, 321, 643, 281, 992, 264, 13760, 295, 3671, 472, 670, 297, 1854, 2293, 613, 9253, 13, 51353], "temperature": 0.0, "avg_logprob": -0.4640083312988281, "compression_ratio": 1.7058823529411764, "no_speech_prob": 0.0006360951228998601}, {"id": 208, "seek": 100942, "start": 1029.18, "end": 1031.18, "text": " So here's what we can do.", "tokens": [51353, 407, 510, 311, 437, 321, 393, 360, 13, 51453], "temperature": 0.0, "avg_logprob": -0.4640083312988281, "compression_ratio": 1.7058823529411764, "no_speech_prob": 0.0006360951228998601}, {"id": 209, "seek": 100942, "start": 1031.18, "end": 1034.18, "text": " The LockProbs indexed in the identical way", "tokens": [51453, 440, 16736, 12681, 929, 8186, 292, 294, 264, 14800, 636, 51603], "temperature": 0.0, "avg_logprob": -0.4640083312988281, "compression_ratio": 1.7058823529411764, "no_speech_prob": 0.0006360951228998601}, {"id": 210, "seek": 100942, "start": 1035.18, "end": 1039.18, "text": " will be just set to negative one over zero divide n.", "tokens": [51653, 486, 312, 445, 992, 281, 3671, 472, 670, 4018, 9845, 297, 13, 51853], "temperature": 0.0, "avg_logprob": -0.4640083312988281, "compression_ratio": 1.7058823529411764, "no_speech_prob": 0.0006360951228998601}, {"id": 211, "seek": 103942, "start": 1039.92, "end": 1041.42, "text": " Right, just like we derived here.", "tokens": [50390, 1779, 11, 445, 411, 321, 18949, 510, 13, 50465], "temperature": 0.0, "avg_logprob": -0.2497989458915515, "compression_ratio": 1.5870445344129556, "no_speech_prob": 0.0011261019390076399}, {"id": 212, "seek": 103942, "start": 1042.66, "end": 1045.42, "text": " So now let me erase all of these reasoning.", "tokens": [50527, 407, 586, 718, 385, 23525, 439, 295, 613, 21577, 13, 50665], "temperature": 0.0, "avg_logprob": -0.2497989458915515, "compression_ratio": 1.5870445344129556, "no_speech_prob": 0.0011261019390076399}, {"id": 213, "seek": 103942, "start": 1045.92, "end": 1049.42, "text": " And then this is the candidate derivative for DLockProbs.", "tokens": [50690, 400, 550, 341, 307, 264, 11532, 13760, 337, 413, 43, 1560, 12681, 929, 13, 50865], "temperature": 0.0, "avg_logprob": -0.2497989458915515, "compression_ratio": 1.5870445344129556, "no_speech_prob": 0.0011261019390076399}, {"id": 214, "seek": 103942, "start": 1049.66, "end": 1052.42, "text": " Let's uncomment the first line and check that this is correct.", "tokens": [50877, 961, 311, 8585, 518, 264, 700, 1622, 293, 1520, 300, 341, 307, 3006, 13, 51015], "temperature": 0.0, "avg_logprob": -0.2497989458915515, "compression_ratio": 1.5870445344129556, "no_speech_prob": 0.0011261019390076399}, {"id": 215, "seek": 103942, "start": 1054.18, "end": 1059.18, "text": " Okay, so CMP ran, and let's go back to CMP.", "tokens": [51103, 1033, 11, 370, 383, 12224, 5872, 11, 293, 718, 311, 352, 646, 281, 383, 12224, 13, 51353], "temperature": 0.0, "avg_logprob": -0.2497989458915515, "compression_ratio": 1.5870445344129556, "no_speech_prob": 0.0011261019390076399}, {"id": 216, "seek": 103942, "start": 1059.92, "end": 1062.18, "text": " And you see that what it's doing is it's calculating if", "tokens": [51390, 400, 291, 536, 300, 437, 309, 311, 884, 307, 309, 311, 28258, 498, 51503], "temperature": 0.0, "avg_logprob": -0.2497989458915515, "compression_ratio": 1.5870445344129556, "no_speech_prob": 0.0011261019390076399}, {"id": 217, "seek": 103942, "start": 1062.92, "end": 1066.18, "text": " the calculated value by us, which is dt,", "tokens": [51540, 264, 15598, 2158, 538, 505, 11, 597, 307, 36423, 11, 51703], "temperature": 0.0, "avg_logprob": -0.2497989458915515, "compression_ratio": 1.5870445344129556, "no_speech_prob": 0.0011261019390076399}, {"id": 218, "seek": 103942, "start": 1066.42, "end": 1069.18, "text": " is exactly equal to t.grad as calculated by PyTorch.", "tokens": [51715, 307, 2293, 2681, 281, 256, 13, 7165, 382, 15598, 538, 9953, 51, 284, 339, 13, 51853], "temperature": 0.0, "avg_logprob": -0.2497989458915515, "compression_ratio": 1.5870445344129556, "no_speech_prob": 0.0011261019390076399}, {"id": 219, "seek": 106942, "start": 1069.42, "end": 1073.42, "text": " And then this is making sure that all of the elements are exactly equal.", "tokens": [50365, 400, 550, 341, 307, 1455, 988, 300, 439, 295, 264, 4959, 366, 2293, 2681, 13, 50565], "temperature": 0.6000000000000001, "avg_logprob": -0.2817071805334395, "compression_ratio": 2.0132013201320134, "no_speech_prob": 0.0007055926253087819}, {"id": 220, "seek": 106942, "start": 1073.42, "end": 1076.42, "text": " And then converting this to a single Boolean value", "tokens": [50565, 400, 550, 29942, 341, 281, 257, 2167, 23351, 28499, 2158, 50715], "temperature": 0.6000000000000001, "avg_logprob": -0.2817071805334395, "compression_ratio": 2.0132013201320134, "no_speech_prob": 0.0007055926253087819}, {"id": 221, "seek": 106942, "start": 1076.46, "end": 1079.42, "text": " because we don't want a Boolean tensor, we just want a Boolean value.", "tokens": [50717, 570, 321, 500, 380, 528, 257, 23351, 28499, 40863, 11, 321, 445, 528, 257, 23351, 28499, 2158, 13, 50865], "temperature": 0.6000000000000001, "avg_logprob": -0.2817071805334395, "compression_ratio": 2.0132013201320134, "no_speech_prob": 0.0007055926253087819}, {"id": 222, "seek": 106942, "start": 1079.92, "end": 1083.92, "text": " And then here we are making sure that, okay, if they're not exactly equal,", "tokens": [50890, 400, 550, 510, 321, 366, 1455, 988, 300, 11, 1392, 11, 498, 436, 434, 406, 2293, 2681, 11, 51090], "temperature": 0.6000000000000001, "avg_logprob": -0.2817071805334395, "compression_ratio": 2.0132013201320134, "no_speech_prob": 0.0007055926253087819}, {"id": 223, "seek": 106942, "start": 1083.92, "end": 1086.92, "text": " maybe they are approximately equal because of some floating point issues.", "tokens": [51090, 1310, 436, 366, 10447, 2681, 570, 295, 512, 12607, 935, 2663, 13, 51240], "temperature": 0.6000000000000001, "avg_logprob": -0.2817071805334395, "compression_ratio": 2.0132013201320134, "no_speech_prob": 0.0007055926253087819}, {"id": 224, "seek": 106942, "start": 1086.92, "end": 1088.92, "text": " But they're very, very close.", "tokens": [51240, 583, 436, 434, 588, 11, 588, 1998, 13, 51340], "temperature": 0.6000000000000001, "avg_logprob": -0.2817071805334395, "compression_ratio": 2.0132013201320134, "no_speech_prob": 0.0007055926253087819}, {"id": 225, "seek": 106942, "start": 1088.92, "end": 1093.42, "text": " So here we are using Torch.allClose, which has a little bit of a wiggle available", "tokens": [51340, 407, 510, 321, 366, 1228, 7160, 339, 13, 336, 9966, 541, 11, 597, 575, 257, 707, 857, 295, 257, 33377, 2435, 51565], "temperature": 0.6000000000000001, "avg_logprob": -0.2817071805334395, "compression_ratio": 2.0132013201320134, "no_speech_prob": 0.0007055926253087819}, {"id": 226, "seek": 106942, "start": 1093.42, "end": 1095.92, "text": " because sometimes you can get very, very close.", "tokens": [51565, 570, 2171, 291, 393, 483, 588, 11, 588, 1998, 13, 51690], "temperature": 0.6000000000000001, "avg_logprob": -0.2817071805334395, "compression_ratio": 2.0132013201320134, "no_speech_prob": 0.0007055926253087819}, {"id": 227, "seek": 106942, "start": 1096.92, "end": 1098.42, "text": " But if you use a slightly different calculation,", "tokens": [51740, 583, 498, 291, 764, 257, 4748, 819, 17108, 11, 51815], "temperature": 0.6000000000000001, "avg_logprob": -0.2817071805334395, "compression_ratio": 2.0132013201320134, "no_speech_prob": 0.0007055926253087819}, {"id": 228, "seek": 106942, "start": 1098.42, "end": 1098.92, "text": " because of floating point, you can't get very, very close.", "tokens": [51815, 570, 295, 12607, 935, 11, 291, 393, 380, 483, 588, 11, 588, 1998, 13, 51840], "temperature": 0.6000000000000001, "avg_logprob": -0.2817071805334395, "compression_ratio": 2.0132013201320134, "no_speech_prob": 0.0007055926253087819}, {"id": 229, "seek": 109892, "start": 1098.92, "end": 1103.3400000000001, "text": " because of floating point arithmetic, you can get a slightly different result.", "tokens": [50365, 570, 295, 12607, 935, 42973, 11, 291, 393, 483, 257, 4748, 819, 1874, 13, 50586], "temperature": 0.0, "avg_logprob": -0.19982435943883495, "compression_ratio": 1.9799196787148594, "no_speech_prob": 0.01000282634049654}, {"id": 230, "seek": 109892, "start": 1103.8600000000001, "end": 1106.64, "text": " So this is checking if you get an approximately close result.", "tokens": [50612, 407, 341, 307, 8568, 498, 291, 483, 364, 10447, 1998, 1874, 13, 50751], "temperature": 0.0, "avg_logprob": -0.19982435943883495, "compression_ratio": 1.9799196787148594, "no_speech_prob": 0.01000282634049654}, {"id": 231, "seek": 109892, "start": 1107.3400000000001, "end": 1112.5, "text": " And then here we are checking the maximum, basically the value that has the highest difference,", "tokens": [50786, 400, 550, 510, 321, 366, 8568, 264, 6674, 11, 1936, 264, 2158, 300, 575, 264, 6343, 2649, 11, 51044], "temperature": 0.0, "avg_logprob": -0.19982435943883495, "compression_ratio": 1.9799196787148594, "no_speech_prob": 0.01000282634049654}, {"id": 232, "seek": 109892, "start": 1113.0800000000002, "end": 1116.74, "text": " and what is the difference, and the absolute value difference between those two.", "tokens": [51073, 293, 437, 307, 264, 2649, 11, 293, 264, 8236, 2158, 2649, 1296, 729, 732, 13, 51256], "temperature": 0.0, "avg_logprob": -0.19982435943883495, "compression_ratio": 1.9799196787148594, "no_speech_prob": 0.01000282634049654}, {"id": 233, "seek": 109892, "start": 1117.3400000000001, "end": 1121.1000000000001, "text": " And so we are printing whether we have an exact equality, an approximate equality,", "tokens": [51286, 400, 370, 321, 366, 14699, 1968, 321, 362, 364, 1900, 14949, 11, 364, 30874, 14949, 11, 51474], "temperature": 0.0, "avg_logprob": -0.19982435943883495, "compression_ratio": 1.9799196787148594, "no_speech_prob": 0.01000282634049654}, {"id": 234, "seek": 109892, "start": 1121.52, "end": 1123.5, "text": " and what is the largest difference.", "tokens": [51495, 293, 437, 307, 264, 6443, 2649, 13, 51594], "temperature": 0.0, "avg_logprob": -0.19982435943883495, "compression_ratio": 1.9799196787148594, "no_speech_prob": 0.01000282634049654}, {"id": 235, "seek": 109892, "start": 1125.18, "end": 1128.7, "text": " And so here we see that we actually have exact equality.", "tokens": [51678, 400, 370, 510, 321, 536, 300, 321, 767, 362, 1900, 14949, 13, 51854], "temperature": 0.0, "avg_logprob": -0.19982435943883495, "compression_ratio": 1.9799196787148594, "no_speech_prob": 0.01000282634049654}, {"id": 236, "seek": 112892, "start": 1129.18, "end": 1131.38, "text": " And so therefore, of course, we also have an approximate equality,", "tokens": [50378, 400, 370, 4412, 11, 295, 1164, 11, 321, 611, 362, 364, 30874, 14949, 11, 50488], "temperature": 0.0, "avg_logprob": -0.1714251858371121, "compression_ratio": 1.602787456445993, "no_speech_prob": 0.00014777800242882222}, {"id": 237, "seek": 112892, "start": 1131.68, "end": 1134.3000000000002, "text": " and the maximum difference is exactly zero.", "tokens": [50503, 293, 264, 6674, 2649, 307, 2293, 4018, 13, 50634], "temperature": 0.0, "avg_logprob": -0.1714251858371121, "compression_ratio": 1.602787456445993, "no_speech_prob": 0.00014777800242882222}, {"id": 238, "seek": 112892, "start": 1134.8400000000001, "end": 1140.3600000000001, "text": " So basically, our DLOGPROPS is exactly equal to what PyTorch calculated to be", "tokens": [50661, 407, 1936, 11, 527, 413, 20184, 38, 47, 7142, 6273, 307, 2293, 2681, 281, 437, 9953, 51, 284, 339, 15598, 281, 312, 50937], "temperature": 0.0, "avg_logprob": -0.1714251858371121, "compression_ratio": 1.602787456445993, "no_speech_prob": 0.00014777800242882222}, {"id": 239, "seek": 112892, "start": 1140.3600000000001, "end": 1142.98, "text": " logPROPS.grad in its backpropagation.", "tokens": [50937, 3565, 47, 7142, 6273, 13, 7165, 294, 1080, 646, 79, 1513, 559, 399, 13, 51068], "temperature": 0.0, "avg_logprob": -0.1714251858371121, "compression_ratio": 1.602787456445993, "no_speech_prob": 0.00014777800242882222}, {"id": 240, "seek": 112892, "start": 1143.72, "end": 1145.54, "text": " So, so far, we're doing pretty well.", "tokens": [51105, 407, 11, 370, 1400, 11, 321, 434, 884, 1238, 731, 13, 51196], "temperature": 0.0, "avg_logprob": -0.1714251858371121, "compression_ratio": 1.602787456445993, "no_speech_prob": 0.00014777800242882222}, {"id": 241, "seek": 112892, "start": 1146.2, "end": 1147.9, "text": " Okay, so let's now continue our backpropagation.", "tokens": [51229, 1033, 11, 370, 718, 311, 586, 2354, 527, 646, 79, 1513, 559, 399, 13, 51314], "temperature": 0.0, "avg_logprob": -0.1714251858371121, "compression_ratio": 1.602787456445993, "no_speech_prob": 0.00014777800242882222}, {"id": 242, "seek": 112892, "start": 1148.66, "end": 1151.8200000000002, "text": " We have that logPROPS depends on PROPS through a log.", "tokens": [51352, 492, 362, 300, 3565, 47, 7142, 6273, 5946, 322, 15008, 6273, 807, 257, 3565, 13, 51510], "temperature": 0.0, "avg_logprob": -0.1714251858371121, "compression_ratio": 1.602787456445993, "no_speech_prob": 0.00014777800242882222}, {"id": 243, "seek": 112892, "start": 1152.18, "end": 1156.28, "text": " So all the elements of PROPS are being element-wise applied log to.", "tokens": [51528, 407, 439, 264, 4959, 295, 15008, 6273, 366, 885, 4478, 12, 3711, 6456, 3565, 281, 13, 51733], "temperature": 0.0, "avg_logprob": -0.1714251858371121, "compression_ratio": 1.602787456445993, "no_speech_prob": 0.00014777800242882222}, {"id": 244, "seek": 112892, "start": 1157.44, "end": 1158.8200000000002, "text": " Now, if we want DPROPS...", "tokens": [51791, 823, 11, 498, 321, 528, 413, 47, 7142, 6273, 485, 51860], "temperature": 0.0, "avg_logprob": -0.1714251858371121, "compression_ratio": 1.602787456445993, "no_speech_prob": 0.00014777800242882222}, {"id": 245, "seek": 115892, "start": 1158.92, "end": 1161.42, "text": " then, then remember your micrograph training.", "tokens": [50365, 550, 11, 550, 1604, 428, 4532, 34091, 3097, 13, 50490], "temperature": 0.0, "avg_logprob": -0.3030436439235715, "compression_ratio": 1.6472602739726028, "no_speech_prob": 0.0014999849954620004}, {"id": 246, "seek": 115892, "start": 1162.16, "end": 1163.68, "text": " We have like a log node.", "tokens": [50527, 492, 362, 411, 257, 3565, 9984, 13, 50603], "temperature": 0.0, "avg_logprob": -0.3030436439235715, "compression_ratio": 1.6472602739726028, "no_speech_prob": 0.0014999849954620004}, {"id": 247, "seek": 115892, "start": 1163.8400000000001, "end": 1166.24, "text": " It takes in PROPS and creates logPROPS.", "tokens": [50611, 467, 2516, 294, 15008, 6273, 293, 7829, 3565, 47, 7142, 6273, 13, 50731], "temperature": 0.0, "avg_logprob": -0.3030436439235715, "compression_ratio": 1.6472602739726028, "no_speech_prob": 0.0014999849954620004}, {"id": 248, "seek": 115892, "start": 1166.8000000000002, "end": 1171.5800000000002, "text": " And DPROPS will be the local derivative of that individual operation, log,", "tokens": [50759, 400, 413, 47, 7142, 6273, 486, 312, 264, 2654, 13760, 295, 300, 2609, 6916, 11, 3565, 11, 50998], "temperature": 0.0, "avg_logprob": -0.3030436439235715, "compression_ratio": 1.6472602739726028, "no_speech_prob": 0.0014999849954620004}, {"id": 249, "seek": 115892, "start": 1172.0, "end": 1176.98, "text": " times the derivative loss with respect to its output, which in this case is DLOGPROPS.", "tokens": [51019, 1413, 264, 13760, 4470, 365, 3104, 281, 1080, 5598, 11, 597, 294, 341, 1389, 307, 413, 20184, 38, 47, 7142, 6273, 13, 51268], "temperature": 0.0, "avg_logprob": -0.3030436439235715, "compression_ratio": 1.6472602739726028, "no_speech_prob": 0.0014999849954620004}, {"id": 250, "seek": 115892, "start": 1177.6000000000001, "end": 1179.96, "text": " So what is the local derivative of this operation?", "tokens": [51299, 407, 437, 307, 264, 2654, 13760, 295, 341, 6916, 30, 51417], "temperature": 0.0, "avg_logprob": -0.3030436439235715, "compression_ratio": 1.6472602739726028, "no_speech_prob": 0.0014999849954620004}, {"id": 251, "seek": 115892, "start": 1180.3200000000002, "end": 1183.8600000000001, "text": " Well, we are taking log element-wise, and we can come here and we can see,", "tokens": [51435, 1042, 11, 321, 366, 1940, 3565, 4478, 12, 3711, 11, 293, 321, 393, 808, 510, 293, 321, 393, 536, 11, 51612], "temperature": 0.0, "avg_logprob": -0.3030436439235715, "compression_ratio": 1.6472602739726028, "no_speech_prob": 0.0014999849954620004}, {"id": 252, "seek": 115892, "start": 1183.88, "end": 1188.14, "text": " well, from alpha is your friend, that d by dx of log of x is just simply 1 over x.", "tokens": [51613, 731, 11, 490, 8961, 307, 428, 1277, 11, 300, 274, 538, 30017, 295, 3565, 295, 2031, 307, 445, 2935, 502, 670, 2031, 13, 51826], "temperature": 0.0, "avg_logprob": -0.3030436439235715, "compression_ratio": 1.6472602739726028, "no_speech_prob": 0.0014999849954620004}, {"id": 253, "seek": 118892, "start": 1188.92, "end": 1191.68, "text": " So therefore, in this case, x is PROPS.", "tokens": [50365, 407, 4412, 11, 294, 341, 1389, 11, 2031, 307, 15008, 6273, 13, 50503], "temperature": 0.0, "avg_logprob": -0.36256686846415204, "compression_ratio": 1.7586206896551724, "no_speech_prob": 0.000323774351272732}, {"id": 254, "seek": 118892, "start": 1191.7, "end": 1195.8400000000001, "text": " So we have d by dx is 1 over x, which is 1 over PROPS.", "tokens": [50504, 407, 321, 362, 274, 538, 30017, 307, 502, 670, 2031, 11, 597, 307, 502, 670, 15008, 6273, 13, 50711], "temperature": 0.0, "avg_logprob": -0.36256686846415204, "compression_ratio": 1.7586206896551724, "no_speech_prob": 0.000323774351272732}, {"id": 255, "seek": 118892, "start": 1196.22, "end": 1197.68, "text": " And then this is the local derivative.", "tokens": [50730, 400, 550, 341, 307, 264, 2654, 13760, 13, 50803], "temperature": 0.0, "avg_logprob": -0.36256686846415204, "compression_ratio": 1.7586206896551724, "no_speech_prob": 0.000323774351272732}, {"id": 256, "seek": 118892, "start": 1197.7, "end": 1199.54, "text": " And then times, we want to chain it.", "tokens": [50804, 400, 550, 1413, 11, 321, 528, 281, 5021, 309, 13, 50896], "temperature": 0.0, "avg_logprob": -0.36256686846415204, "compression_ratio": 1.7586206896551724, "no_speech_prob": 0.000323774351272732}, {"id": 257, "seek": 118892, "start": 1200.1200000000001, "end": 1201.04, "text": " So this is chain rule.", "tokens": [50925, 407, 341, 307, 5021, 4978, 13, 50971], "temperature": 0.0, "avg_logprob": -0.36256686846415204, "compression_ratio": 1.7586206896551724, "no_speech_prob": 0.000323774351272732}, {"id": 258, "seek": 118892, "start": 1201.6200000000001, "end": 1202.72, "text": " Times DLOGPROPS.", "tokens": [51000, 11366, 413, 20184, 38, 47, 7142, 6273, 13, 51055], "temperature": 0.0, "avg_logprob": -0.36256686846415204, "compression_ratio": 1.7586206896551724, "no_speech_prob": 0.000323774351272732}, {"id": 259, "seek": 118892, "start": 1203.52, "end": 1207.1200000000001, "text": " Then let me uncomment this and let me run the cell in place.", "tokens": [51095, 1396, 718, 385, 8585, 518, 341, 293, 718, 385, 1190, 264, 2815, 294, 1081, 13, 51275], "temperature": 0.0, "avg_logprob": -0.36256686846415204, "compression_ratio": 1.7586206896551724, "no_speech_prob": 0.000323774351272732}, {"id": 260, "seek": 118892, "start": 1207.22, "end": 1211.8200000000002, "text": " And we see that the derivative of PROPS as we calculated here is exactly correct.", "tokens": [51280, 400, 321, 536, 300, 264, 13760, 295, 15008, 6273, 382, 321, 15598, 510, 307, 2293, 3006, 13, 51510], "temperature": 0.0, "avg_logprob": -0.36256686846415204, "compression_ratio": 1.7586206896551724, "no_speech_prob": 0.000323774351272732}, {"id": 261, "seek": 118892, "start": 1212.92, "end": 1214.52, "text": " And so notice here how this works.", "tokens": [51565, 400, 370, 3449, 510, 577, 341, 1985, 13, 51645], "temperature": 0.0, "avg_logprob": -0.36256686846415204, "compression_ratio": 1.7586206896551724, "no_speech_prob": 0.000323774351272732}, {"id": 262, "seek": 118892, "start": 1214.8000000000002, "end": 1216.4, "text": " PROPS that are...", "tokens": [51659, 15008, 6273, 300, 366, 485, 51739], "temperature": 0.0, "avg_logprob": -0.36256686846415204, "compression_ratio": 1.7586206896551724, "no_speech_prob": 0.000323774351272732}, {"id": 263, "seek": 118892, "start": 1216.6000000000001, "end": 1218.74, "text": " PROPS is going to be inverted and then element-wise,", "tokens": [51749, 15008, 6273, 307, 516, 281, 312, 38969, 293, 550, 4478, 12, 3711, 11, 51856], "temperature": 0.0, "avg_logprob": -0.36256686846415204, "compression_ratio": 1.7586206896551724, "no_speech_prob": 0.000323774351272732}, {"id": 264, "seek": 121874, "start": 1218.74, "end": 1220.24, "text": " and then element-wise, multiplied here.", "tokens": [50365, 293, 550, 4478, 12, 3711, 11, 17207, 510, 13, 50440], "temperature": 0.0, "avg_logprob": -0.26024935795710635, "compression_ratio": 1.710801393728223, "no_speech_prob": 9.510305972071365e-05}, {"id": 265, "seek": 121874, "start": 1220.84, "end": 1223.44, "text": " So if your PROPS is very, very close to 1,", "tokens": [50470, 407, 498, 428, 15008, 6273, 307, 588, 11, 588, 1998, 281, 502, 11, 50600], "temperature": 0.0, "avg_logprob": -0.26024935795710635, "compression_ratio": 1.710801393728223, "no_speech_prob": 9.510305972071365e-05}, {"id": 266, "seek": 121874, "start": 1223.54, "end": 1226.74, "text": " that means your network is currently predicting the character correctly,", "tokens": [50605, 300, 1355, 428, 3209, 307, 4362, 32884, 264, 2517, 8944, 11, 50765], "temperature": 0.0, "avg_logprob": -0.26024935795710635, "compression_ratio": 1.710801393728223, "no_speech_prob": 9.510305972071365e-05}, {"id": 267, "seek": 121874, "start": 1227.24, "end": 1230.84, "text": " then this will become 1 over 1, and DLOGPROPS just gets passed through.", "tokens": [50790, 550, 341, 486, 1813, 502, 670, 502, 11, 293, 413, 20184, 38, 47, 7142, 6273, 445, 2170, 4678, 807, 13, 50970], "temperature": 0.0, "avg_logprob": -0.26024935795710635, "compression_ratio": 1.710801393728223, "no_speech_prob": 9.510305972071365e-05}, {"id": 268, "seek": 121874, "start": 1231.74, "end": 1233.94, "text": " But if your probabilities are incorrectly assigned,", "tokens": [51015, 583, 498, 428, 33783, 366, 42892, 13279, 11, 51125], "temperature": 0.0, "avg_logprob": -0.26024935795710635, "compression_ratio": 1.710801393728223, "no_speech_prob": 9.510305972071365e-05}, {"id": 269, "seek": 121874, "start": 1234.04, "end": 1238.14, "text": " so if the correct character here is getting a very low probability,", "tokens": [51130, 370, 498, 264, 3006, 2517, 510, 307, 1242, 257, 588, 2295, 8482, 11, 51335], "temperature": 0.0, "avg_logprob": -0.26024935795710635, "compression_ratio": 1.710801393728223, "no_speech_prob": 9.510305972071365e-05}, {"id": 270, "seek": 121874, "start": 1238.64, "end": 1245.04, "text": " then 1.0 dividing by it will boost this and then multiply by DLOGPROPS.", "tokens": [51360, 550, 502, 13, 15, 26764, 538, 309, 486, 9194, 341, 293, 550, 12972, 538, 413, 20184, 38, 47, 7142, 6273, 13, 51680], "temperature": 0.0, "avg_logprob": -0.26024935795710635, "compression_ratio": 1.710801393728223, "no_speech_prob": 9.510305972071365e-05}, {"id": 271, "seek": 121874, "start": 1245.34, "end": 1248.14, "text": " So basically, what this line is doing intuitively is it's taking", "tokens": [51695, 407, 1936, 11, 437, 341, 1622, 307, 884, 46506, 307, 309, 311, 1940, 51835], "temperature": 0.0, "avg_logprob": -0.26024935795710635, "compression_ratio": 1.710801393728223, "no_speech_prob": 9.510305972071365e-05}, {"id": 272, "seek": 121874, "start": 1248.14, "end": 1248.44, "text": " the...", "tokens": [51835, 264, 485, 51850], "temperature": 0.0, "avg_logprob": -0.26024935795710635, "compression_ratio": 1.710801393728223, "no_speech_prob": 9.510305972071365e-05}, {"id": 273, "seek": 124844, "start": 1248.44, "end": 1251.74, "text": " the examples that have a very low probability currently assigned,", "tokens": [50365, 264, 5110, 300, 362, 257, 588, 2295, 8482, 4362, 13279, 11, 50530], "temperature": 0.0, "avg_logprob": -0.1382857235995206, "compression_ratio": 1.5696969696969696, "no_speech_prob": 0.0008013966726139188}, {"id": 274, "seek": 124844, "start": 1251.94, "end": 1253.3400000000001, "text": " and it's boosting their gradient.", "tokens": [50540, 293, 309, 311, 43117, 641, 16235, 13, 50610], "temperature": 0.0, "avg_logprob": -0.1382857235995206, "compression_ratio": 1.5696969696969696, "no_speech_prob": 0.0008013966726139188}, {"id": 275, "seek": 124844, "start": 1254.14, "end": 1255.44, "text": " You can look at it that way.", "tokens": [50650, 509, 393, 574, 412, 309, 300, 636, 13, 50715], "temperature": 0.0, "avg_logprob": -0.1382857235995206, "compression_ratio": 1.5696969696969696, "no_speech_prob": 0.0008013966726139188}, {"id": 276, "seek": 124844, "start": 1256.14, "end": 1258.24, "text": " Next up is COUNTSUMINV.", "tokens": [50750, 3087, 493, 307, 3002, 3979, 7327, 14340, 1464, 53, 13, 50855], "temperature": 0.0, "avg_logprob": -0.1382857235995206, "compression_ratio": 1.5696969696969696, "no_speech_prob": 0.0008013966726139188}, {"id": 277, "seek": 124844, "start": 1259.3400000000001, "end": 1261.74, "text": " So we want the derivative of this.", "tokens": [50910, 407, 321, 528, 264, 13760, 295, 341, 13, 51030], "temperature": 0.0, "avg_logprob": -0.1382857235995206, "compression_ratio": 1.5696969696969696, "no_speech_prob": 0.0008013966726139188}, {"id": 278, "seek": 124844, "start": 1262.24, "end": 1266.54, "text": " Now, let me just pause here and kind of introduce what's happening here in general,", "tokens": [51055, 823, 11, 718, 385, 445, 10465, 510, 293, 733, 295, 5366, 437, 311, 2737, 510, 294, 2674, 11, 51270], "temperature": 0.0, "avg_logprob": -0.1382857235995206, "compression_ratio": 1.5696969696969696, "no_speech_prob": 0.0008013966726139188}, {"id": 279, "seek": 124844, "start": 1266.64, "end": 1267.94, "text": " because I know it's a little bit confusing.", "tokens": [51275, 570, 286, 458, 309, 311, 257, 707, 857, 13181, 13, 51340], "temperature": 0.0, "avg_logprob": -0.1382857235995206, "compression_ratio": 1.5696969696969696, "no_speech_prob": 0.0008013966726139188}, {"id": 280, "seek": 124844, "start": 1268.44, "end": 1270.24, "text": " We have the logits that come out of the neural net.", "tokens": [51365, 492, 362, 264, 3565, 1208, 300, 808, 484, 295, 264, 18161, 2533, 13, 51455], "temperature": 0.0, "avg_logprob": -0.1382857235995206, "compression_ratio": 1.5696969696969696, "no_speech_prob": 0.0008013966726139188}, {"id": 281, "seek": 124844, "start": 1270.8400000000001, "end": 1274.04, "text": " Here, what I'm doing is I'm finding the maximum in each row,", "tokens": [51485, 1692, 11, 437, 286, 478, 884, 307, 286, 478, 5006, 264, 6674, 294, 1184, 5386, 11, 51645], "temperature": 0.0, "avg_logprob": -0.1382857235995206, "compression_ratio": 1.5696969696969696, "no_speech_prob": 0.0008013966726139188}, {"id": 282, "seek": 124844, "start": 1274.54, "end": 1277.3400000000001, "text": " and I'm subtracting it for the purpose of numerical stability.", "tokens": [51670, 293, 286, 478, 16390, 278, 309, 337, 264, 4334, 295, 29054, 11826, 13, 51810], "temperature": 0.0, "avg_logprob": -0.1382857235995206, "compression_ratio": 1.5696969696969696, "no_speech_prob": 0.0008013966726139188}, {"id": 283, "seek": 124844, "start": 1277.54, "end": 1278.24, "text": " And we talked about how...", "tokens": [51820, 400, 321, 2825, 466, 577, 485, 51855], "temperature": 0.0, "avg_logprob": -0.1382857235995206, "compression_ratio": 1.5696969696969696, "no_speech_prob": 0.0008013966726139188}, {"id": 284, "seek": 127844, "start": 1278.44, "end": 1279.64, "text": " if you do not do this,", "tokens": [50365, 498, 291, 360, 406, 360, 341, 11, 50425], "temperature": 0.0, "avg_logprob": -0.15942522408305734, "compression_ratio": 1.654109589041096, "no_speech_prob": 0.00034499922185204923}, {"id": 285, "seek": 127844, "start": 1280.04, "end": 1283.54, "text": " you run into numerical issues if some of the logits take on two large values", "tokens": [50445, 291, 1190, 666, 29054, 2663, 498, 512, 295, 264, 3565, 1208, 747, 322, 732, 2416, 4190, 50620], "temperature": 0.0, "avg_logprob": -0.15942522408305734, "compression_ratio": 1.654109589041096, "no_speech_prob": 0.00034499922185204923}, {"id": 286, "seek": 127844, "start": 1283.8400000000001, "end": 1285.64, "text": " because we end up exponentiating them.", "tokens": [50635, 570, 321, 917, 493, 12680, 23012, 990, 552, 13, 50725], "temperature": 0.0, "avg_logprob": -0.15942522408305734, "compression_ratio": 1.654109589041096, "no_speech_prob": 0.00034499922185204923}, {"id": 287, "seek": 127844, "start": 1286.44, "end": 1289.24, "text": " So this is done just for safety, numerically.", "tokens": [50765, 407, 341, 307, 1096, 445, 337, 4514, 11, 7866, 984, 13, 50905], "temperature": 0.0, "avg_logprob": -0.15942522408305734, "compression_ratio": 1.654109589041096, "no_speech_prob": 0.00034499922185204923}, {"id": 288, "seek": 127844, "start": 1289.8400000000001, "end": 1294.64, "text": " Then here's the exponentiation of all the sort of logits to create our counts.", "tokens": [50935, 1396, 510, 311, 264, 37871, 6642, 295, 439, 264, 1333, 295, 3565, 1208, 281, 1884, 527, 14893, 13, 51175], "temperature": 0.0, "avg_logprob": -0.15942522408305734, "compression_ratio": 1.654109589041096, "no_speech_prob": 0.00034499922185204923}, {"id": 289, "seek": 127844, "start": 1295.14, "end": 1298.74, "text": " And then we want to take the sum of these counts and normalize", "tokens": [51200, 400, 550, 321, 528, 281, 747, 264, 2408, 295, 613, 14893, 293, 2710, 1125, 51380], "temperature": 0.0, "avg_logprob": -0.15942522408305734, "compression_ratio": 1.654109589041096, "no_speech_prob": 0.00034499922185204923}, {"id": 290, "seek": 127844, "start": 1298.8400000000001, "end": 1301.04, "text": " so that all of the probes sum to 1.", "tokens": [51385, 370, 300, 439, 295, 264, 1239, 279, 2408, 281, 502, 13, 51495], "temperature": 0.0, "avg_logprob": -0.15942522408305734, "compression_ratio": 1.654109589041096, "no_speech_prob": 0.00034499922185204923}, {"id": 291, "seek": 127844, "start": 1301.74, "end": 1303.8400000000001, "text": " Now here, instead of using 1 over COUNTSUM,", "tokens": [51530, 823, 510, 11, 2602, 295, 1228, 502, 670, 3002, 3979, 7327, 14340, 11, 51635], "temperature": 0.0, "avg_logprob": -0.15942522408305734, "compression_ratio": 1.654109589041096, "no_speech_prob": 0.00034499922185204923}, {"id": 292, "seek": 127844, "start": 1303.94, "end": 1306.44, "text": " I use raised to the power of negative 1.", "tokens": [51640, 286, 764, 6005, 281, 264, 1347, 295, 3671, 502, 13, 51765], "temperature": 0.0, "avg_logprob": -0.15942522408305734, "compression_ratio": 1.654109589041096, "no_speech_prob": 0.00034499922185204923}, {"id": 293, "seek": 127844, "start": 1306.64, "end": 1308.04, "text": " Mathematically, they are identical.", "tokens": [51775, 15776, 40197, 11, 436, 366, 14800, 13, 51845], "temperature": 0.0, "avg_logprob": -0.15942522408305734, "compression_ratio": 1.654109589041096, "no_speech_prob": 0.00034499922185204923}, {"id": 294, "seek": 130804, "start": 1308.04, "end": 1310.84, "text": " I just found that there's something wrong with the PyTorch implementation", "tokens": [50365, 286, 445, 1352, 300, 456, 311, 746, 2085, 365, 264, 9953, 51, 284, 339, 11420, 50505], "temperature": 0.0, "avg_logprob": -0.21030567310474538, "compression_ratio": 1.8339222614840989, "no_speech_prob": 0.0005558416596613824}, {"id": 295, "seek": 130804, "start": 1310.94, "end": 1315.54, "text": " of the backward pass of division, and it gives like a weird result,", "tokens": [50510, 295, 264, 23897, 1320, 295, 10044, 11, 293, 309, 2709, 411, 257, 3657, 1874, 11, 50740], "temperature": 0.0, "avg_logprob": -0.21030567310474538, "compression_ratio": 1.8339222614840989, "no_speech_prob": 0.0005558416596613824}, {"id": 296, "seek": 130804, "start": 1315.6399999999999, "end": 1318.24, "text": " but that doesn't happen for star star negative 1,", "tokens": [50745, 457, 300, 1177, 380, 1051, 337, 3543, 3543, 3671, 502, 11, 50875], "temperature": 0.0, "avg_logprob": -0.21030567310474538, "compression_ratio": 1.8339222614840989, "no_speech_prob": 0.0005558416596613824}, {"id": 297, "seek": 130804, "start": 1318.34, "end": 1319.94, "text": " so I'm using this formula instead.", "tokens": [50880, 370, 286, 478, 1228, 341, 8513, 2602, 13, 50960], "temperature": 0.0, "avg_logprob": -0.21030567310474538, "compression_ratio": 1.8339222614840989, "no_speech_prob": 0.0005558416596613824}, {"id": 298, "seek": 130804, "start": 1320.04, "end": 1323.94, "text": " But basically, all that's happening here is we got the logits,", "tokens": [50965, 583, 1936, 11, 439, 300, 311, 2737, 510, 307, 321, 658, 264, 3565, 1208, 11, 51160], "temperature": 0.0, "avg_logprob": -0.21030567310474538, "compression_ratio": 1.8339222614840989, "no_speech_prob": 0.0005558416596613824}, {"id": 299, "seek": 130804, "start": 1324.04, "end": 1325.34, "text": " we want to exponentiate all of them,", "tokens": [51165, 321, 528, 281, 37871, 13024, 439, 295, 552, 11, 51230], "temperature": 0.0, "avg_logprob": -0.21030567310474538, "compression_ratio": 1.8339222614840989, "no_speech_prob": 0.0005558416596613824}, {"id": 300, "seek": 130804, "start": 1325.44, "end": 1328.54, "text": " and we want to normalize the counts to create our probabilities.", "tokens": [51235, 293, 321, 528, 281, 2710, 1125, 264, 14893, 281, 1884, 527, 33783, 13, 51390], "temperature": 0.0, "avg_logprob": -0.21030567310474538, "compression_ratio": 1.8339222614840989, "no_speech_prob": 0.0005558416596613824}, {"id": 301, "seek": 130804, "start": 1328.6399999999999, "end": 1331.24, "text": " It's just that it's happening across multiple lines.", "tokens": [51395, 467, 311, 445, 300, 309, 311, 2737, 2108, 3866, 3876, 13, 51525], "temperature": 0.0, "avg_logprob": -0.21030567310474538, "compression_ratio": 1.8339222614840989, "no_speech_prob": 0.0005558416596613824}, {"id": 302, "seek": 130804, "start": 1331.34, "end": 1335.54, "text": " So now, here,", "tokens": [51530, 407, 586, 11, 510, 11, 51740], "temperature": 0.0, "avg_logprob": -0.21030567310474538, "compression_ratio": 1.8339222614840989, "no_speech_prob": 0.0005558416596613824}, {"id": 303, "seek": 130804, "start": 1335.6399999999999, "end": 1337.44, "text": " we want to normalize the counts to create our probabilities.", "tokens": [51745, 321, 528, 281, 2710, 1125, 264, 14893, 281, 1884, 527, 33783, 13, 51835], "temperature": 0.0, "avg_logprob": -0.21030567310474538, "compression_ratio": 1.8339222614840989, "no_speech_prob": 0.0005558416596613824}, {"id": 304, "seek": 133804, "start": 1338.04, "end": 1340.44, "text": " We want to first take the derivative,", "tokens": [50365, 492, 528, 281, 700, 747, 264, 13760, 11, 50485], "temperature": 0.0, "avg_logprob": -0.19037896873307053, "compression_ratio": 1.7361702127659575, "no_speech_prob": 0.0006236304179765284}, {"id": 305, "seek": 133804, "start": 1340.54, "end": 1344.54, "text": " we want to backpropagate into COUNTSUM and then into COUNTS as well.", "tokens": [50490, 321, 528, 281, 646, 79, 1513, 559, 473, 666, 3002, 3979, 7327, 14340, 293, 550, 666, 3002, 3979, 7327, 382, 731, 13, 50690], "temperature": 0.0, "avg_logprob": -0.19037896873307053, "compression_ratio": 1.7361702127659575, "no_speech_prob": 0.0006236304179765284}, {"id": 306, "seek": 133804, "start": 1344.6399999999999, "end": 1348.1399999999999, "text": " So what should be the COUNTSUM?", "tokens": [50695, 407, 437, 820, 312, 264, 3002, 3979, 7327, 14340, 30, 50870], "temperature": 0.0, "avg_logprob": -0.19037896873307053, "compression_ratio": 1.7361702127659575, "no_speech_prob": 0.0006236304179765284}, {"id": 307, "seek": 133804, "start": 1348.24, "end": 1349.6399999999999, "text": " Now, we actually have to be careful here", "tokens": [50875, 823, 11, 321, 767, 362, 281, 312, 5026, 510, 50945], "temperature": 0.0, "avg_logprob": -0.19037896873307053, "compression_ratio": 1.7361702127659575, "no_speech_prob": 0.0006236304179765284}, {"id": 308, "seek": 133804, "start": 1349.74, "end": 1353.34, "text": " because we have to scrutinize and be careful with the shapes.", "tokens": [50950, 570, 321, 362, 281, 28949, 259, 1125, 293, 312, 5026, 365, 264, 10854, 13, 51130], "temperature": 0.0, "avg_logprob": -0.19037896873307053, "compression_ratio": 1.7361702127659575, "no_speech_prob": 0.0006236304179765284}, {"id": 309, "seek": 133804, "start": 1353.44, "end": 1360.74, "text": " So COUNTS.shape and then COUNTSUM.inv.shape are different.", "tokens": [51135, 407, 3002, 3979, 7327, 13, 82, 42406, 293, 550, 3002, 3979, 7327, 14340, 13, 259, 85, 13, 82, 42406, 366, 819, 13, 51500], "temperature": 0.0, "avg_logprob": -0.19037896873307053, "compression_ratio": 1.7361702127659575, "no_speech_prob": 0.0006236304179765284}, {"id": 310, "seek": 133804, "start": 1360.84, "end": 1363.1399999999999, "text": " So in particular, COUNTS is 32 by 27,", "tokens": [51505, 407, 294, 1729, 11, 3002, 3979, 7327, 307, 8858, 538, 7634, 11, 51620], "temperature": 0.0, "avg_logprob": -0.19037896873307053, "compression_ratio": 1.7361702127659575, "no_speech_prob": 0.0006236304179765284}, {"id": 311, "seek": 133804, "start": 1363.24, "end": 1365.94, "text": " but this COUNTSUM.inv is 32 by 1.", "tokens": [51625, 457, 341, 3002, 3979, 7327, 14340, 13, 259, 85, 307, 8858, 538, 502, 13, 51760], "temperature": 0.0, "avg_logprob": -0.19037896873307053, "compression_ratio": 1.7361702127659575, "no_speech_prob": 0.0006236304179765284}, {"id": 312, "seek": 133804, "start": 1366.04, "end": 1367.84, "text": " And so in this multiplication here,", "tokens": [51765, 400, 370, 294, 341, 27290, 510, 11, 51855], "temperature": 0.0, "avg_logprob": -0.19037896873307053, "compression_ratio": 1.7361702127659575, "no_speech_prob": 0.0006236304179765284}, {"id": 313, "seek": 136804, "start": 1368.04, "end": 1372.1399999999999, "text": " we also have an implicit broadcasting that PyTorch will do", "tokens": [50365, 321, 611, 362, 364, 26947, 30024, 300, 9953, 51, 284, 339, 486, 360, 50570], "temperature": 0.0, "avg_logprob": -0.14115360804966517, "compression_ratio": 1.6026936026936027, "no_speech_prob": 0.00034480393514968455}, {"id": 314, "seek": 136804, "start": 1372.24, "end": 1374.54, "text": " because it needs to take this column tensor of 32 numbers", "tokens": [50575, 570, 309, 2203, 281, 747, 341, 7738, 40863, 295, 8858, 3547, 50690], "temperature": 0.0, "avg_logprob": -0.14115360804966517, "compression_ratio": 1.6026936026936027, "no_speech_prob": 0.00034480393514968455}, {"id": 315, "seek": 136804, "start": 1374.6399999999999, "end": 1378.6399999999999, "text": " and replicate it horizontally 27 times to align these two tensors", "tokens": [50695, 293, 25356, 309, 33796, 7634, 1413, 281, 7975, 613, 732, 10688, 830, 50895], "temperature": 0.0, "avg_logprob": -0.14115360804966517, "compression_ratio": 1.6026936026936027, "no_speech_prob": 0.00034480393514968455}, {"id": 316, "seek": 136804, "start": 1378.74, "end": 1381.44, "text": " so it can do an element-wise multiply.", "tokens": [50900, 370, 309, 393, 360, 364, 4478, 12, 3711, 12972, 13, 51035], "temperature": 0.0, "avg_logprob": -0.14115360804966517, "compression_ratio": 1.6026936026936027, "no_speech_prob": 0.00034480393514968455}, {"id": 317, "seek": 136804, "start": 1381.54, "end": 1383.6399999999999, "text": " So really what this looks like is the following,", "tokens": [51040, 407, 534, 437, 341, 1542, 411, 307, 264, 3480, 11, 51145], "temperature": 0.0, "avg_logprob": -0.14115360804966517, "compression_ratio": 1.6026936026936027, "no_speech_prob": 0.00034480393514968455}, {"id": 318, "seek": 136804, "start": 1383.74, "end": 1386.1399999999999, "text": " using a toy example again.", "tokens": [51150, 1228, 257, 12058, 1365, 797, 13, 51270], "temperature": 0.0, "avg_logprob": -0.14115360804966517, "compression_ratio": 1.6026936026936027, "no_speech_prob": 0.00034480393514968455}, {"id": 319, "seek": 136804, "start": 1386.24, "end": 1389.24, "text": " What we really have here is just props is COUNTS times COUNTSUM.inv.", "tokens": [51275, 708, 321, 534, 362, 510, 307, 445, 26173, 307, 3002, 3979, 7327, 1413, 3002, 3979, 7327, 14340, 13, 259, 85, 13, 51425], "temperature": 0.0, "avg_logprob": -0.14115360804966517, "compression_ratio": 1.6026936026936027, "no_speech_prob": 0.00034480393514968455}, {"id": 320, "seek": 136804, "start": 1389.34, "end": 1391.6399999999999, "text": " So it's C equals A times B.", "tokens": [51430, 407, 309, 311, 383, 6915, 316, 1413, 363, 13, 51545], "temperature": 0.0, "avg_logprob": -0.14115360804966517, "compression_ratio": 1.6026936026936027, "no_speech_prob": 0.00034480393514968455}, {"id": 321, "seek": 136804, "start": 1391.74, "end": 1396.34, "text": " But A is 3 by 3, and B is just 3 by 1, a column tensor.", "tokens": [51550, 583, 316, 307, 805, 538, 805, 11, 293, 363, 307, 445, 805, 538, 502, 11, 257, 7738, 40863, 13, 51780], "temperature": 0.0, "avg_logprob": -0.14115360804966517, "compression_ratio": 1.6026936026936027, "no_speech_prob": 0.00034480393514968455}, {"id": 322, "seek": 136804, "start": 1396.44, "end": 1397.6399999999999, "text": " And so PyTorch internally", "tokens": [51785, 400, 370, 9953, 51, 284, 339, 19501, 51845], "temperature": 0.0, "avg_logprob": -0.14115360804966517, "compression_ratio": 1.6026936026936027, "no_speech_prob": 0.00034480393514968455}, {"id": 323, "seek": 139764, "start": 1397.64, "end": 1399.94, "text": " replicated these elements of B,", "tokens": [50365, 46365, 613, 4959, 295, 363, 11, 50480], "temperature": 0.0, "avg_logprob": -0.11291137455016609, "compression_ratio": 1.6875, "no_speech_prob": 0.00037007388891652226}, {"id": 324, "seek": 139764, "start": 1400.0400000000002, "end": 1402.24, "text": " and it did that across all the columns.", "tokens": [50485, 293, 309, 630, 300, 2108, 439, 264, 13766, 13, 50595], "temperature": 0.0, "avg_logprob": -0.11291137455016609, "compression_ratio": 1.6875, "no_speech_prob": 0.00037007388891652226}, {"id": 325, "seek": 139764, "start": 1402.3400000000001, "end": 1405.14, "text": " So for example, B1, which is the first element of B,", "tokens": [50600, 407, 337, 1365, 11, 363, 16, 11, 597, 307, 264, 700, 4478, 295, 363, 11, 50740], "temperature": 0.0, "avg_logprob": -0.11291137455016609, "compression_ratio": 1.6875, "no_speech_prob": 0.00037007388891652226}, {"id": 326, "seek": 139764, "start": 1405.24, "end": 1409.24, "text": " would be replicated here across all the columns in this multiplication.", "tokens": [50745, 576, 312, 46365, 510, 2108, 439, 264, 13766, 294, 341, 27290, 13, 50945], "temperature": 0.0, "avg_logprob": -0.11291137455016609, "compression_ratio": 1.6875, "no_speech_prob": 0.00037007388891652226}, {"id": 327, "seek": 139764, "start": 1409.3400000000001, "end": 1414.0400000000002, "text": " And now we're trying to backpropagate through this operation to COUNTSUM.inv.", "tokens": [50950, 400, 586, 321, 434, 1382, 281, 646, 79, 1513, 559, 473, 807, 341, 6916, 281, 3002, 3979, 7327, 14340, 13, 259, 85, 13, 51185], "temperature": 0.0, "avg_logprob": -0.11291137455016609, "compression_ratio": 1.6875, "no_speech_prob": 0.00037007388891652226}, {"id": 328, "seek": 139764, "start": 1414.14, "end": 1417.24, "text": " So when we are calculating this derivative,", "tokens": [51190, 407, 562, 321, 366, 28258, 341, 13760, 11, 51345], "temperature": 0.0, "avg_logprob": -0.11291137455016609, "compression_ratio": 1.6875, "no_speech_prob": 0.00037007388891652226}, {"id": 329, "seek": 139764, "start": 1417.3400000000001, "end": 1421.24, "text": " it's important to realize that this looks like a single operation,", "tokens": [51350, 309, 311, 1021, 281, 4325, 300, 341, 1542, 411, 257, 2167, 6916, 11, 51545], "temperature": 0.0, "avg_logprob": -0.11291137455016609, "compression_ratio": 1.6875, "no_speech_prob": 0.00037007388891652226}, {"id": 330, "seek": 139764, "start": 1421.3400000000001, "end": 1425.0400000000002, "text": " but actually is two operations applied sequentially.", "tokens": [51550, 457, 767, 307, 732, 7705, 6456, 5123, 3137, 13, 51735], "temperature": 0.0, "avg_logprob": -0.11291137455016609, "compression_ratio": 1.6875, "no_speech_prob": 0.00037007388891652226}, {"id": 331, "seek": 139764, "start": 1425.14, "end": 1427.3400000000001, "text": " The first operation that PyTorch did is it took", "tokens": [51740, 440, 700, 6916, 300, 9953, 51, 284, 339, 630, 307, 309, 1890, 51850], "temperature": 0.0, "avg_logprob": -0.11291137455016609, "compression_ratio": 1.6875, "no_speech_prob": 0.00037007388891652226}, {"id": 332, "seek": 142734, "start": 1427.34, "end": 1433.1399999999999, "text": " this column tensor and replicated it across all the columns,", "tokens": [50365, 341, 7738, 40863, 293, 46365, 309, 2108, 439, 264, 13766, 11, 50655], "temperature": 0.0, "avg_logprob": -0.13517029583454132, "compression_ratio": 1.8333333333333333, "no_speech_prob": 0.0002817301428876817}, {"id": 333, "seek": 142734, "start": 1433.24, "end": 1434.74, "text": " basically 27 times.", "tokens": [50660, 1936, 7634, 1413, 13, 50735], "temperature": 0.0, "avg_logprob": -0.13517029583454132, "compression_ratio": 1.8333333333333333, "no_speech_prob": 0.0002817301428876817}, {"id": 334, "seek": 142734, "start": 1434.84, "end": 1436.9399999999998, "text": " So that's the first operation, it's a replication.", "tokens": [50740, 407, 300, 311, 264, 700, 6916, 11, 309, 311, 257, 39911, 13, 50845], "temperature": 0.0, "avg_logprob": -0.13517029583454132, "compression_ratio": 1.8333333333333333, "no_speech_prob": 0.0002817301428876817}, {"id": 335, "seek": 142734, "start": 1437.04, "end": 1439.4399999999998, "text": " And then the second operation is the multiplication.", "tokens": [50850, 400, 550, 264, 1150, 6916, 307, 264, 27290, 13, 50970], "temperature": 0.0, "avg_logprob": -0.13517029583454132, "compression_ratio": 1.8333333333333333, "no_speech_prob": 0.0002817301428876817}, {"id": 336, "seek": 142734, "start": 1439.54, "end": 1442.74, "text": " So let's first backprop through the multiplication.", "tokens": [50975, 407, 718, 311, 700, 646, 79, 1513, 807, 264, 27290, 13, 51135], "temperature": 0.0, "avg_logprob": -0.13517029583454132, "compression_ratio": 1.8333333333333333, "no_speech_prob": 0.0002817301428876817}, {"id": 337, "seek": 142734, "start": 1442.84, "end": 1445.84, "text": " If these two arrays were of the same size,", "tokens": [51140, 759, 613, 732, 41011, 645, 295, 264, 912, 2744, 11, 51290], "temperature": 0.0, "avg_logprob": -0.13517029583454132, "compression_ratio": 1.8333333333333333, "no_speech_prob": 0.0002817301428876817}, {"id": 338, "seek": 142734, "start": 1445.9399999999998, "end": 1449.4399999999998, "text": " and we just have A and B, both of them 3 by 3,", "tokens": [51295, 293, 321, 445, 362, 316, 293, 363, 11, 1293, 295, 552, 805, 538, 805, 11, 51470], "temperature": 0.0, "avg_logprob": -0.13517029583454132, "compression_ratio": 1.8333333333333333, "no_speech_prob": 0.0002817301428876817}, {"id": 339, "seek": 142734, "start": 1449.54, "end": 1453.24, "text": " then how do we backpropagate through a multiplication?", "tokens": [51475, 550, 577, 360, 321, 646, 79, 1513, 559, 473, 807, 257, 27290, 30, 51660], "temperature": 0.0, "avg_logprob": -0.13517029583454132, "compression_ratio": 1.8333333333333333, "no_speech_prob": 0.0002817301428876817}, {"id": 340, "seek": 142734, "start": 1453.34, "end": 1455.6399999999999, "text": " So if we just have scalars and not tensors,", "tokens": [51665, 407, 498, 321, 445, 362, 15664, 685, 293, 406, 10688, 830, 11, 51780], "temperature": 0.0, "avg_logprob": -0.13517029583454132, "compression_ratio": 1.8333333333333333, "no_speech_prob": 0.0002817301428876817}, {"id": 341, "seek": 142734, "start": 1455.74, "end": 1457.1399999999999, "text": " then if you have C equals A times B,", "tokens": [51785, 550, 498, 291, 362, 383, 6915, 316, 1413, 363, 11, 51855], "temperature": 0.0, "avg_logprob": -0.13517029583454132, "compression_ratio": 1.8333333333333333, "no_speech_prob": 0.0002817301428876817}, {"id": 342, "seek": 145734, "start": 1457.34, "end": 1461.24, "text": " then what is the derivative of C with respect to B?", "tokens": [50365, 550, 437, 307, 264, 13760, 295, 383, 365, 3104, 281, 363, 30, 50560], "temperature": 0.0, "avg_logprob": -0.16363670576864214, "compression_ratio": 1.8436213991769548, "no_speech_prob": 0.0002179734583478421}, {"id": 343, "seek": 145734, "start": 1461.34, "end": 1462.6399999999999, "text": " Well, it's just A.", "tokens": [50565, 1042, 11, 309, 311, 445, 316, 13, 50630], "temperature": 0.0, "avg_logprob": -0.16363670576864214, "compression_ratio": 1.8436213991769548, "no_speech_prob": 0.0002179734583478421}, {"id": 344, "seek": 145734, "start": 1462.74, "end": 1464.6399999999999, "text": " And so that's the local derivative.", "tokens": [50635, 400, 370, 300, 311, 264, 2654, 13760, 13, 50730], "temperature": 0.0, "avg_logprob": -0.16363670576864214, "compression_ratio": 1.8436213991769548, "no_speech_prob": 0.0002179734583478421}, {"id": 345, "seek": 145734, "start": 1464.74, "end": 1467.74, "text": " So here in our case, undoing the multiplication", "tokens": [50735, 407, 510, 294, 527, 1389, 11, 23779, 278, 264, 27290, 50885], "temperature": 0.0, "avg_logprob": -0.16363670576864214, "compression_ratio": 1.8436213991769548, "no_speech_prob": 0.0002179734583478421}, {"id": 346, "seek": 145734, "start": 1467.84, "end": 1470.74, "text": " and backpropagating through just the multiplication itself,", "tokens": [50890, 293, 646, 79, 1513, 559, 990, 807, 445, 264, 27290, 2564, 11, 51035], "temperature": 0.0, "avg_logprob": -0.16363670576864214, "compression_ratio": 1.8436213991769548, "no_speech_prob": 0.0002179734583478421}, {"id": 347, "seek": 145734, "start": 1470.84, "end": 1473.84, "text": " which is element-wise, is going to be the local derivative,", "tokens": [51040, 597, 307, 4478, 12, 3711, 11, 307, 516, 281, 312, 264, 2654, 13760, 11, 51190], "temperature": 0.0, "avg_logprob": -0.16363670576864214, "compression_ratio": 1.8436213991769548, "no_speech_prob": 0.0002179734583478421}, {"id": 348, "seek": 145734, "start": 1473.9399999999998, "end": 1477.6399999999999, "text": " which in this case is simply COUNTS,", "tokens": [51195, 597, 294, 341, 1389, 307, 2935, 3002, 3979, 7327, 11, 51380], "temperature": 0.0, "avg_logprob": -0.16363670576864214, "compression_ratio": 1.8436213991769548, "no_speech_prob": 0.0002179734583478421}, {"id": 349, "seek": 145734, "start": 1477.74, "end": 1480.24, "text": " because COUNTS is the A.", "tokens": [51385, 570, 3002, 3979, 7327, 307, 264, 316, 13, 51510], "temperature": 0.0, "avg_logprob": -0.16363670576864214, "compression_ratio": 1.8436213991769548, "no_speech_prob": 0.0002179734583478421}, {"id": 350, "seek": 145734, "start": 1480.34, "end": 1482.34, "text": " So this is the local derivative, and then times,", "tokens": [51515, 407, 341, 307, 264, 2654, 13760, 11, 293, 550, 1413, 11, 51615], "temperature": 0.0, "avg_logprob": -0.16363670576864214, "compression_ratio": 1.8436213991769548, "no_speech_prob": 0.0002179734583478421}, {"id": 351, "seek": 145734, "start": 1482.4399999999998, "end": 1486.1399999999999, "text": " because of the chain rule, dprops.", "tokens": [51620, 570, 295, 264, 5021, 4978, 11, 274, 79, 49715, 13, 51805], "temperature": 0.0, "avg_logprob": -0.16363670576864214, "compression_ratio": 1.8436213991769548, "no_speech_prob": 0.0002179734583478421}, {"id": 352, "seek": 145734, "start": 1486.24, "end": 1487.1399999999999, "text": " So this here is the dprops.", "tokens": [51810, 407, 341, 510, 307, 264, 274, 79, 49715, 13, 51855], "temperature": 0.0, "avg_logprob": -0.16363670576864214, "compression_ratio": 1.8436213991769548, "no_speech_prob": 0.0002179734583478421}, {"id": 353, "seek": 148734, "start": 1487.34, "end": 1490.24, "text": " So this is the local derivative, or the gradient,", "tokens": [50365, 407, 341, 307, 264, 2654, 13760, 11, 420, 264, 16235, 11, 50510], "temperature": 0.0, "avg_logprob": -0.17434921264648437, "compression_ratio": 1.6933797909407666, "no_speech_prob": 0.0004139215452596545}, {"id": 354, "seek": 148734, "start": 1490.34, "end": 1493.24, "text": " but with respect to replicated B.", "tokens": [50515, 457, 365, 3104, 281, 46365, 363, 13, 50660], "temperature": 0.0, "avg_logprob": -0.17434921264648437, "compression_ratio": 1.6933797909407666, "no_speech_prob": 0.0004139215452596545}, {"id": 355, "seek": 148734, "start": 1493.34, "end": 1495.24, "text": " But we don't have a replicated B,", "tokens": [50665, 583, 321, 500, 380, 362, 257, 46365, 363, 11, 50760], "temperature": 0.0, "avg_logprob": -0.17434921264648437, "compression_ratio": 1.6933797909407666, "no_speech_prob": 0.0004139215452596545}, {"id": 356, "seek": 148734, "start": 1495.34, "end": 1497.24, "text": " we just have a single B column.", "tokens": [50765, 321, 445, 362, 257, 2167, 363, 7738, 13, 50860], "temperature": 0.0, "avg_logprob": -0.17434921264648437, "compression_ratio": 1.6933797909407666, "no_speech_prob": 0.0004139215452596545}, {"id": 357, "seek": 148734, "start": 1497.34, "end": 1500.24, "text": " So how do we now backpropagate through the replication?", "tokens": [50865, 407, 577, 360, 321, 586, 646, 79, 1513, 559, 473, 807, 264, 39911, 30, 51010], "temperature": 0.0, "avg_logprob": -0.17434921264648437, "compression_ratio": 1.6933797909407666, "no_speech_prob": 0.0004139215452596545}, {"id": 358, "seek": 148734, "start": 1500.34, "end": 1504.24, "text": " And intuitively, this B1 is the same variable,", "tokens": [51015, 400, 46506, 11, 341, 363, 16, 307, 264, 912, 7006, 11, 51210], "temperature": 0.0, "avg_logprob": -0.17434921264648437, "compression_ratio": 1.6933797909407666, "no_speech_prob": 0.0004139215452596545}, {"id": 359, "seek": 148734, "start": 1504.34, "end": 1506.24, "text": " and it's just reused multiple times.", "tokens": [51215, 293, 309, 311, 445, 319, 4717, 3866, 1413, 13, 51310], "temperature": 0.0, "avg_logprob": -0.17434921264648437, "compression_ratio": 1.6933797909407666, "no_speech_prob": 0.0004139215452596545}, {"id": 360, "seek": 148734, "start": 1506.34, "end": 1509.24, "text": " And so you can look at it as being equivalent", "tokens": [51315, 400, 370, 291, 393, 574, 412, 309, 382, 885, 10344, 51460], "temperature": 0.0, "avg_logprob": -0.17434921264648437, "compression_ratio": 1.6933797909407666, "no_speech_prob": 0.0004139215452596545}, {"id": 361, "seek": 148734, "start": 1509.34, "end": 1512.24, "text": " to a case we've encountered in micrograd.", "tokens": [51465, 281, 257, 1389, 321, 600, 20381, 294, 4532, 7165, 13, 51610], "temperature": 0.0, "avg_logprob": -0.17434921264648437, "compression_ratio": 1.6933797909407666, "no_speech_prob": 0.0004139215452596545}, {"id": 362, "seek": 148734, "start": 1512.34, "end": 1514.24, "text": " And so here I'm just pulling out a random graph", "tokens": [51615, 400, 370, 510, 286, 478, 445, 8407, 484, 257, 4974, 4295, 51710], "temperature": 0.0, "avg_logprob": -0.17434921264648437, "compression_ratio": 1.6933797909407666, "no_speech_prob": 0.0004139215452596545}, {"id": 363, "seek": 148734, "start": 1514.34, "end": 1515.24, "text": " we used in micrograd.", "tokens": [51715, 321, 1143, 294, 4532, 7165, 13, 51760], "temperature": 0.0, "avg_logprob": -0.17434921264648437, "compression_ratio": 1.6933797909407666, "no_speech_prob": 0.0004139215452596545}, {"id": 364, "seek": 148734, "start": 1515.34, "end": 1517.24, "text": " We had an example where a single node,", "tokens": [51765, 492, 632, 364, 1365, 689, 257, 2167, 9984, 11, 51860], "temperature": 0.0, "avg_logprob": -0.17434921264648437, "compression_ratio": 1.6933797909407666, "no_speech_prob": 0.0004139215452596545}, {"id": 365, "seek": 151734, "start": 1517.4399999999998, "end": 1519.24, "text": " has its output feeding into two branches", "tokens": [50370, 575, 1080, 5598, 12919, 666, 732, 14770, 50460], "temperature": 0.0, "avg_logprob": -0.09474751443573923, "compression_ratio": 1.8364312267657992, "no_speech_prob": 0.0004471240390557796}, {"id": 366, "seek": 151734, "start": 1519.34, "end": 1523.24, "text": " of basically the graph until the last function.", "tokens": [50465, 295, 1936, 264, 4295, 1826, 264, 1036, 2445, 13, 50660], "temperature": 0.0, "avg_logprob": -0.09474751443573923, "compression_ratio": 1.8364312267657992, "no_speech_prob": 0.0004471240390557796}, {"id": 367, "seek": 151734, "start": 1523.34, "end": 1525.24, "text": " And we're talking about how the correct thing to do", "tokens": [50665, 400, 321, 434, 1417, 466, 577, 264, 3006, 551, 281, 360, 50760], "temperature": 0.0, "avg_logprob": -0.09474751443573923, "compression_ratio": 1.8364312267657992, "no_speech_prob": 0.0004471240390557796}, {"id": 368, "seek": 151734, "start": 1525.34, "end": 1528.24, "text": " in the backward pass is we need to sum all the gradients", "tokens": [50765, 294, 264, 23897, 1320, 307, 321, 643, 281, 2408, 439, 264, 2771, 2448, 50910], "temperature": 0.0, "avg_logprob": -0.09474751443573923, "compression_ratio": 1.8364312267657992, "no_speech_prob": 0.0004471240390557796}, {"id": 369, "seek": 151734, "start": 1528.34, "end": 1530.24, "text": " that arrive at any one node.", "tokens": [50915, 300, 8881, 412, 604, 472, 9984, 13, 51010], "temperature": 0.0, "avg_logprob": -0.09474751443573923, "compression_ratio": 1.8364312267657992, "no_speech_prob": 0.0004471240390557796}, {"id": 370, "seek": 151734, "start": 1530.34, "end": 1532.24, "text": " So across these different branches,", "tokens": [51015, 407, 2108, 613, 819, 14770, 11, 51110], "temperature": 0.0, "avg_logprob": -0.09474751443573923, "compression_ratio": 1.8364312267657992, "no_speech_prob": 0.0004471240390557796}, {"id": 371, "seek": 151734, "start": 1532.34, "end": 1534.24, "text": " the gradients would sum.", "tokens": [51115, 264, 2771, 2448, 576, 2408, 13, 51210], "temperature": 0.0, "avg_logprob": -0.09474751443573923, "compression_ratio": 1.8364312267657992, "no_speech_prob": 0.0004471240390557796}, {"id": 372, "seek": 151734, "start": 1534.34, "end": 1536.24, "text": " So if a node is used multiple times,", "tokens": [51215, 407, 498, 257, 9984, 307, 1143, 3866, 1413, 11, 51310], "temperature": 0.0, "avg_logprob": -0.09474751443573923, "compression_ratio": 1.8364312267657992, "no_speech_prob": 0.0004471240390557796}, {"id": 373, "seek": 151734, "start": 1536.34, "end": 1541.24, "text": " the gradients for all of its uses sum during backpropagation.", "tokens": [51315, 264, 2771, 2448, 337, 439, 295, 1080, 4960, 2408, 1830, 646, 79, 1513, 559, 399, 13, 51560], "temperature": 0.0, "avg_logprob": -0.09474751443573923, "compression_ratio": 1.8364312267657992, "no_speech_prob": 0.0004471240390557796}, {"id": 374, "seek": 151734, "start": 1541.34, "end": 1544.24, "text": " So here, B1 is used multiple times in all these columns,", "tokens": [51565, 407, 510, 11, 363, 16, 307, 1143, 3866, 1413, 294, 439, 613, 13766, 11, 51710], "temperature": 0.0, "avg_logprob": -0.09474751443573923, "compression_ratio": 1.8364312267657992, "no_speech_prob": 0.0004471240390557796}, {"id": 375, "seek": 151734, "start": 1544.34, "end": 1547.24, "text": " and therefore the right thing to do here is to sum", "tokens": [51715, 293, 4412, 264, 558, 551, 281, 360, 510, 307, 281, 2408, 51860], "temperature": 0.0, "avg_logprob": -0.09474751443573923, "compression_ratio": 1.8364312267657992, "no_speech_prob": 0.0004471240390557796}, {"id": 376, "seek": 154734, "start": 1547.34, "end": 1550.24, "text": " horizontally across all the rows.", "tokens": [50365, 33796, 2108, 439, 264, 13241, 13, 50510], "temperature": 0.0, "avg_logprob": -0.10022454222371756, "compression_ratio": 1.7872340425531914, "no_speech_prob": 0.0003109291719738394}, {"id": 377, "seek": 154734, "start": 1550.34, "end": 1554.24, "text": " So we want to sum in dimension 1,", "tokens": [50515, 407, 321, 528, 281, 2408, 294, 10139, 502, 11, 50710], "temperature": 0.0, "avg_logprob": -0.10022454222371756, "compression_ratio": 1.7872340425531914, "no_speech_prob": 0.0003109291719738394}, {"id": 378, "seek": 154734, "start": 1554.34, "end": 1556.24, "text": " but we want to retain this dimension", "tokens": [50715, 457, 321, 528, 281, 18340, 341, 10139, 50810], "temperature": 0.0, "avg_logprob": -0.10022454222371756, "compression_ratio": 1.7872340425531914, "no_speech_prob": 0.0003109291719738394}, {"id": 379, "seek": 154734, "start": 1556.34, "end": 1559.24, "text": " so that the countSumInv and its gradient", "tokens": [50815, 370, 300, 264, 1207, 50, 449, 4575, 85, 293, 1080, 16235, 50960], "temperature": 0.0, "avg_logprob": -0.10022454222371756, "compression_ratio": 1.7872340425531914, "no_speech_prob": 0.0003109291719738394}, {"id": 380, "seek": 154734, "start": 1559.34, "end": 1561.24, "text": " are going to be exactly the same shape.", "tokens": [50965, 366, 516, 281, 312, 2293, 264, 912, 3909, 13, 51060], "temperature": 0.0, "avg_logprob": -0.10022454222371756, "compression_ratio": 1.7872340425531914, "no_speech_prob": 0.0003109291719738394}, {"id": 381, "seek": 154734, "start": 1561.34, "end": 1564.24, "text": " So we want to make sure that we keep them as true", "tokens": [51065, 407, 321, 528, 281, 652, 988, 300, 321, 1066, 552, 382, 2074, 51210], "temperature": 0.0, "avg_logprob": -0.10022454222371756, "compression_ratio": 1.7872340425531914, "no_speech_prob": 0.0003109291719738394}, {"id": 382, "seek": 154734, "start": 1564.34, "end": 1566.24, "text": " so we don't lose this dimension.", "tokens": [51215, 370, 321, 500, 380, 3624, 341, 10139, 13, 51310], "temperature": 0.0, "avg_logprob": -0.10022454222371756, "compression_ratio": 1.7872340425531914, "no_speech_prob": 0.0003109291719738394}, {"id": 383, "seek": 154734, "start": 1566.34, "end": 1568.24, "text": " And this will make the countSumInv", "tokens": [51315, 400, 341, 486, 652, 264, 1207, 50, 449, 4575, 85, 51410], "temperature": 0.0, "avg_logprob": -0.10022454222371756, "compression_ratio": 1.7872340425531914, "no_speech_prob": 0.0003109291719738394}, {"id": 384, "seek": 154734, "start": 1568.34, "end": 1571.24, "text": " be exactly shaped 32 by 1.", "tokens": [51415, 312, 2293, 13475, 8858, 538, 502, 13, 51560], "temperature": 0.0, "avg_logprob": -0.10022454222371756, "compression_ratio": 1.7872340425531914, "no_speech_prob": 0.0003109291719738394}, {"id": 385, "seek": 154734, "start": 1571.34, "end": 1575.24, "text": " So revealing this comparison as well and running this,", "tokens": [51565, 407, 23983, 341, 9660, 382, 731, 293, 2614, 341, 11, 51760], "temperature": 0.0, "avg_logprob": -0.10022454222371756, "compression_ratio": 1.7872340425531914, "no_speech_prob": 0.0003109291719738394}, {"id": 386, "seek": 154734, "start": 1575.34, "end": 1577.24, "text": " we see that we get an exact match.", "tokens": [51765, 321, 536, 300, 321, 483, 364, 1900, 2995, 13, 51860], "temperature": 0.0, "avg_logprob": -0.10022454222371756, "compression_ratio": 1.7872340425531914, "no_speech_prob": 0.0003109291719738394}, {"id": 387, "seek": 157734, "start": 1577.34, "end": 1580.24, "text": " So this derivative is exactly correct.", "tokens": [50365, 407, 341, 13760, 307, 2293, 3006, 13, 50510], "temperature": 0.0, "avg_logprob": -0.1176825619618827, "compression_ratio": 1.577319587628866, "no_speech_prob": 0.0011269086971879005}, {"id": 388, "seek": 157734, "start": 1580.34, "end": 1584.24, "text": " And let me erase this.", "tokens": [50515, 400, 718, 385, 23525, 341, 13, 50710], "temperature": 0.0, "avg_logprob": -0.1176825619618827, "compression_ratio": 1.577319587628866, "no_speech_prob": 0.0011269086971879005}, {"id": 389, "seek": 157734, "start": 1584.34, "end": 1587.24, "text": " Now let's also backpropagate into counts,", "tokens": [50715, 823, 718, 311, 611, 646, 79, 1513, 559, 473, 666, 14893, 11, 50860], "temperature": 0.0, "avg_logprob": -0.1176825619618827, "compression_ratio": 1.577319587628866, "no_speech_prob": 0.0011269086971879005}, {"id": 390, "seek": 157734, "start": 1587.34, "end": 1590.24, "text": " which is the other variable here to create props.", "tokens": [50865, 597, 307, 264, 661, 7006, 510, 281, 1884, 26173, 13, 51010], "temperature": 0.0, "avg_logprob": -0.1176825619618827, "compression_ratio": 1.577319587628866, "no_speech_prob": 0.0011269086971879005}, {"id": 391, "seek": 157734, "start": 1590.34, "end": 1592.24, "text": " So from props to countSumInv,", "tokens": [51015, 407, 490, 26173, 281, 1207, 50, 449, 4575, 85, 11, 51110], "temperature": 0.0, "avg_logprob": -0.1176825619618827, "compression_ratio": 1.577319587628866, "no_speech_prob": 0.0011269086971879005}, {"id": 392, "seek": 157734, "start": 1592.34, "end": 1593.24, "text": " we just did that.", "tokens": [51115, 321, 445, 630, 300, 13, 51160], "temperature": 0.0, "avg_logprob": -0.1176825619618827, "compression_ratio": 1.577319587628866, "no_speech_prob": 0.0011269086971879005}, {"id": 393, "seek": 157734, "start": 1593.34, "end": 1595.24, "text": " Let's go into counts as well.", "tokens": [51165, 961, 311, 352, 666, 14893, 382, 731, 13, 51260], "temperature": 0.0, "avg_logprob": -0.1176825619618827, "compression_ratio": 1.577319587628866, "no_speech_prob": 0.0011269086971879005}, {"id": 394, "seek": 157734, "start": 1595.34, "end": 1600.24, "text": " So dCounts is our A.", "tokens": [51265, 407, 274, 34, 792, 82, 307, 527, 316, 13, 51510], "temperature": 0.0, "avg_logprob": -0.1176825619618827, "compression_ratio": 1.577319587628866, "no_speech_prob": 0.0011269086971879005}, {"id": 395, "seek": 157734, "start": 1600.34, "end": 1603.24, "text": " So dC by dA is just B.", "tokens": [51515, 407, 274, 34, 538, 274, 32, 307, 445, 363, 13, 51660], "temperature": 0.0, "avg_logprob": -0.1176825619618827, "compression_ratio": 1.577319587628866, "no_speech_prob": 0.0011269086971879005}, {"id": 396, "seek": 157734, "start": 1603.34, "end": 1606.24, "text": " So therefore it's countSumInv.", "tokens": [51665, 407, 4412, 309, 311, 1207, 50, 449, 4575, 85, 13, 51810], "temperature": 0.0, "avg_logprob": -0.1176825619618827, "compression_ratio": 1.577319587628866, "no_speech_prob": 0.0011269086971879005}, {"id": 397, "seek": 160624, "start": 1606.24, "end": 1610.14, "text": " And then times, chain rule, dProps.", "tokens": [50365, 400, 550, 1413, 11, 5021, 4978, 11, 274, 47, 49715, 13, 50560], "temperature": 0.0, "avg_logprob": -0.07600463015361897, "compression_ratio": 1.5545023696682465, "no_speech_prob": 0.00029766399529762566}, {"id": 398, "seek": 160624, "start": 1610.24, "end": 1613.14, "text": " Now countSumInv is 32 by 1.", "tokens": [50565, 823, 1207, 50, 449, 4575, 85, 307, 8858, 538, 502, 13, 50710], "temperature": 0.0, "avg_logprob": -0.07600463015361897, "compression_ratio": 1.5545023696682465, "no_speech_prob": 0.00029766399529762566}, {"id": 399, "seek": 160624, "start": 1613.24, "end": 1617.14, "text": " dProps is 32 by 27.", "tokens": [50715, 274, 47, 49715, 307, 8858, 538, 7634, 13, 50910], "temperature": 0.0, "avg_logprob": -0.07600463015361897, "compression_ratio": 1.5545023696682465, "no_speech_prob": 0.00029766399529762566}, {"id": 400, "seek": 160624, "start": 1617.24, "end": 1621.14, "text": " So those will broadcast fine", "tokens": [50915, 407, 729, 486, 9975, 2489, 51110], "temperature": 0.0, "avg_logprob": -0.07600463015361897, "compression_ratio": 1.5545023696682465, "no_speech_prob": 0.00029766399529762566}, {"id": 401, "seek": 160624, "start": 1621.24, "end": 1623.14, "text": " and will give us dCounts.", "tokens": [51115, 293, 486, 976, 505, 274, 34, 792, 82, 13, 51210], "temperature": 0.0, "avg_logprob": -0.07600463015361897, "compression_ratio": 1.5545023696682465, "no_speech_prob": 0.00029766399529762566}, {"id": 402, "seek": 160624, "start": 1623.24, "end": 1626.14, "text": " There's no additional summation required here.", "tokens": [51215, 821, 311, 572, 4497, 28811, 4739, 510, 13, 51360], "temperature": 0.0, "avg_logprob": -0.07600463015361897, "compression_ratio": 1.5545023696682465, "no_speech_prob": 0.00029766399529762566}, {"id": 403, "seek": 160624, "start": 1626.24, "end": 1628.14, "text": " There will be a broadcasting", "tokens": [51365, 821, 486, 312, 257, 30024, 51460], "temperature": 0.0, "avg_logprob": -0.07600463015361897, "compression_ratio": 1.5545023696682465, "no_speech_prob": 0.00029766399529762566}, {"id": 404, "seek": 160624, "start": 1628.24, "end": 1630.14, "text": " that happens in this multiply here", "tokens": [51465, 300, 2314, 294, 341, 12972, 510, 51560], "temperature": 0.0, "avg_logprob": -0.07600463015361897, "compression_ratio": 1.5545023696682465, "no_speech_prob": 0.00029766399529762566}, {"id": 405, "seek": 160624, "start": 1630.24, "end": 1633.14, "text": " because countSumInv needs to be replicated again", "tokens": [51565, 570, 1207, 50, 449, 4575, 85, 2203, 281, 312, 46365, 797, 51710], "temperature": 0.0, "avg_logprob": -0.07600463015361897, "compression_ratio": 1.5545023696682465, "no_speech_prob": 0.00029766399529762566}, {"id": 406, "seek": 160624, "start": 1633.24, "end": 1635.14, "text": " to correctly multiply dProps.", "tokens": [51715, 281, 8944, 12972, 274, 47, 49715, 13, 51810], "temperature": 0.0, "avg_logprob": -0.07600463015361897, "compression_ratio": 1.5545023696682465, "no_speech_prob": 0.00029766399529762566}, {"id": 407, "seek": 163514, "start": 1635.14, "end": 1638.0400000000002, "text": " But that's going to give the correct result.", "tokens": [50365, 583, 300, 311, 516, 281, 976, 264, 3006, 1874, 13, 50510], "temperature": 0.0, "avg_logprob": -0.06614823599119445, "compression_ratio": 1.8246268656716418, "no_speech_prob": 0.00029842459480278194}, {"id": 408, "seek": 163514, "start": 1638.14, "end": 1641.0400000000002, "text": " So as far as this single operation is concerned.", "tokens": [50515, 407, 382, 1400, 382, 341, 2167, 6916, 307, 5922, 13, 50660], "temperature": 0.0, "avg_logprob": -0.06614823599119445, "compression_ratio": 1.8246268656716418, "no_speech_prob": 0.00029842459480278194}, {"id": 409, "seek": 163514, "start": 1641.14, "end": 1644.0400000000002, "text": " So we've backpropagated from props to counts,", "tokens": [50665, 407, 321, 600, 646, 79, 1513, 559, 770, 490, 26173, 281, 14893, 11, 50810], "temperature": 0.0, "avg_logprob": -0.06614823599119445, "compression_ratio": 1.8246268656716418, "no_speech_prob": 0.00029842459480278194}, {"id": 410, "seek": 163514, "start": 1644.14, "end": 1648.0400000000002, "text": " but we can't actually check the derivative of counts.", "tokens": [50815, 457, 321, 393, 380, 767, 1520, 264, 13760, 295, 14893, 13, 51010], "temperature": 0.0, "avg_logprob": -0.06614823599119445, "compression_ratio": 1.8246268656716418, "no_speech_prob": 0.00029842459480278194}, {"id": 411, "seek": 163514, "start": 1648.14, "end": 1650.0400000000002, "text": " I have it much later on.", "tokens": [51015, 286, 362, 309, 709, 1780, 322, 13, 51110], "temperature": 0.0, "avg_logprob": -0.06614823599119445, "compression_ratio": 1.8246268656716418, "no_speech_prob": 0.00029842459480278194}, {"id": 412, "seek": 163514, "start": 1650.14, "end": 1652.0400000000002, "text": " And the reason for that is because", "tokens": [51115, 400, 264, 1778, 337, 300, 307, 570, 51210], "temperature": 0.0, "avg_logprob": -0.06614823599119445, "compression_ratio": 1.8246268656716418, "no_speech_prob": 0.00029842459480278194}, {"id": 413, "seek": 163514, "start": 1652.14, "end": 1654.0400000000002, "text": " countSumInv depends on counts.", "tokens": [51215, 1207, 50, 449, 4575, 85, 5946, 322, 14893, 13, 51310], "temperature": 0.0, "avg_logprob": -0.06614823599119445, "compression_ratio": 1.8246268656716418, "no_speech_prob": 0.00029842459480278194}, {"id": 414, "seek": 163514, "start": 1654.14, "end": 1656.0400000000002, "text": " And so there's a second branch here", "tokens": [51315, 400, 370, 456, 311, 257, 1150, 9819, 510, 51410], "temperature": 0.0, "avg_logprob": -0.06614823599119445, "compression_ratio": 1.8246268656716418, "no_speech_prob": 0.00029842459480278194}, {"id": 415, "seek": 163514, "start": 1656.14, "end": 1657.0400000000002, "text": " that we have to finish.", "tokens": [51415, 300, 321, 362, 281, 2413, 13, 51460], "temperature": 0.0, "avg_logprob": -0.06614823599119445, "compression_ratio": 1.8246268656716418, "no_speech_prob": 0.00029842459480278194}, {"id": 416, "seek": 163514, "start": 1657.14, "end": 1660.0400000000002, "text": " Because countSumInv backpropagates into countSum,", "tokens": [51465, 1436, 1207, 50, 449, 4575, 85, 646, 79, 1513, 559, 1024, 666, 1207, 50, 449, 11, 51610], "temperature": 0.0, "avg_logprob": -0.06614823599119445, "compression_ratio": 1.8246268656716418, "no_speech_prob": 0.00029842459480278194}, {"id": 417, "seek": 163514, "start": 1660.14, "end": 1662.0400000000002, "text": " and countSum will backpropagate into counts.", "tokens": [51615, 293, 1207, 50, 449, 486, 646, 79, 1513, 559, 473, 666, 14893, 13, 51710], "temperature": 0.0, "avg_logprob": -0.06614823599119445, "compression_ratio": 1.8246268656716418, "no_speech_prob": 0.00029842459480278194}, {"id": 418, "seek": 163514, "start": 1662.14, "end": 1665.0400000000002, "text": " And so counts is a node that is being used twice.", "tokens": [51715, 400, 370, 14893, 307, 257, 9984, 300, 307, 885, 1143, 6091, 13, 51860], "temperature": 0.0, "avg_logprob": -0.06614823599119445, "compression_ratio": 1.8246268656716418, "no_speech_prob": 0.00029842459480278194}, {"id": 419, "seek": 166504, "start": 1665.04, "end": 1666.94, "text": " It's used right here in two props,", "tokens": [50365, 467, 311, 1143, 558, 510, 294, 732, 26173, 11, 50460], "temperature": 0.0, "avg_logprob": -0.08924445849937081, "compression_ratio": 1.8453237410071943, "no_speech_prob": 0.00031144649256020784}, {"id": 420, "seek": 166504, "start": 1667.04, "end": 1668.94, "text": " and it goes through this other branch", "tokens": [50465, 293, 309, 1709, 807, 341, 661, 9819, 50560], "temperature": 0.0, "avg_logprob": -0.08924445849937081, "compression_ratio": 1.8453237410071943, "no_speech_prob": 0.00031144649256020784}, {"id": 421, "seek": 166504, "start": 1669.04, "end": 1670.94, "text": " through countSumInv.", "tokens": [50565, 807, 1207, 50, 449, 4575, 85, 13, 50660], "temperature": 0.0, "avg_logprob": -0.08924445849937081, "compression_ratio": 1.8453237410071943, "no_speech_prob": 0.00031144649256020784}, {"id": 422, "seek": 166504, "start": 1671.04, "end": 1672.94, "text": " So even though we've calculated", "tokens": [50665, 407, 754, 1673, 321, 600, 15598, 50760], "temperature": 0.0, "avg_logprob": -0.08924445849937081, "compression_ratio": 1.8453237410071943, "no_speech_prob": 0.00031144649256020784}, {"id": 423, "seek": 166504, "start": 1673.04, "end": 1674.94, "text": " the first contribution of it,", "tokens": [50765, 264, 700, 13150, 295, 309, 11, 50860], "temperature": 0.0, "avg_logprob": -0.08924445849937081, "compression_ratio": 1.8453237410071943, "no_speech_prob": 0.00031144649256020784}, {"id": 424, "seek": 166504, "start": 1675.04, "end": 1676.94, "text": " we still have to calculate the second contribution of it later.", "tokens": [50865, 321, 920, 362, 281, 8873, 264, 1150, 13150, 295, 309, 1780, 13, 50960], "temperature": 0.0, "avg_logprob": -0.08924445849937081, "compression_ratio": 1.8453237410071943, "no_speech_prob": 0.00031144649256020784}, {"id": 425, "seek": 166504, "start": 1677.04, "end": 1678.94, "text": " Okay, so we're continuing with this branch.", "tokens": [50965, 1033, 11, 370, 321, 434, 9289, 365, 341, 9819, 13, 51060], "temperature": 0.0, "avg_logprob": -0.08924445849937081, "compression_ratio": 1.8453237410071943, "no_speech_prob": 0.00031144649256020784}, {"id": 426, "seek": 166504, "start": 1679.04, "end": 1680.94, "text": " We have the derivative for countSumInv.", "tokens": [51065, 492, 362, 264, 13760, 337, 1207, 50, 449, 4575, 85, 13, 51160], "temperature": 0.0, "avg_logprob": -0.08924445849937081, "compression_ratio": 1.8453237410071943, "no_speech_prob": 0.00031144649256020784}, {"id": 427, "seek": 166504, "start": 1681.04, "end": 1682.94, "text": " Now we want the derivative for countSum.", "tokens": [51165, 823, 321, 528, 264, 13760, 337, 1207, 50, 449, 13, 51260], "temperature": 0.0, "avg_logprob": -0.08924445849937081, "compression_ratio": 1.8453237410071943, "no_speech_prob": 0.00031144649256020784}, {"id": 428, "seek": 166504, "start": 1683.04, "end": 1684.94, "text": " So dCountSum equals", "tokens": [51265, 407, 274, 34, 792, 50, 449, 6915, 51360], "temperature": 0.0, "avg_logprob": -0.08924445849937081, "compression_ratio": 1.8453237410071943, "no_speech_prob": 0.00031144649256020784}, {"id": 429, "seek": 166504, "start": 1685.04, "end": 1686.94, "text": " what is the local derivative of this operation?", "tokens": [51365, 437, 307, 264, 2654, 13760, 295, 341, 6916, 30, 51460], "temperature": 0.0, "avg_logprob": -0.08924445849937081, "compression_ratio": 1.8453237410071943, "no_speech_prob": 0.00031144649256020784}, {"id": 430, "seek": 166504, "start": 1687.04, "end": 1688.94, "text": " So this is basically an element-wise", "tokens": [51465, 407, 341, 307, 1936, 364, 4478, 12, 3711, 51560], "temperature": 0.0, "avg_logprob": -0.08924445849937081, "compression_ratio": 1.8453237410071943, "no_speech_prob": 0.00031144649256020784}, {"id": 431, "seek": 166504, "start": 1689.04, "end": 1690.94, "text": " 1 over countsSum.", "tokens": [51565, 502, 670, 14893, 50, 449, 13, 51660], "temperature": 0.0, "avg_logprob": -0.08924445849937081, "compression_ratio": 1.8453237410071943, "no_speech_prob": 0.00031144649256020784}, {"id": 432, "seek": 166504, "start": 1691.04, "end": 1693.94, "text": " So countSum raised to the power of negative 1", "tokens": [51665, 407, 1207, 50, 449, 6005, 281, 264, 1347, 295, 3671, 502, 51810], "temperature": 0.0, "avg_logprob": -0.08924445849937081, "compression_ratio": 1.8453237410071943, "no_speech_prob": 0.00031144649256020784}, {"id": 433, "seek": 169394, "start": 1693.94, "end": 1695.8400000000001, "text": " is the same as 1 over countsSum.", "tokens": [50365, 307, 264, 912, 382, 502, 670, 14893, 50, 449, 13, 50460], "temperature": 0.0, "avg_logprob": -0.12858757972717286, "compression_ratio": 1.893939393939394, "no_speech_prob": 0.0010418298188596964}, {"id": 434, "seek": 169394, "start": 1695.94, "end": 1697.8400000000001, "text": " If we go to wall from alpha,", "tokens": [50465, 759, 321, 352, 281, 2929, 490, 8961, 11, 50560], "temperature": 0.0, "avg_logprob": -0.12858757972717286, "compression_ratio": 1.893939393939394, "no_speech_prob": 0.0010418298188596964}, {"id": 435, "seek": 169394, "start": 1697.94, "end": 1699.8400000000001, "text": " we see that x to the negative 1,", "tokens": [50565, 321, 536, 300, 2031, 281, 264, 3671, 502, 11, 50660], "temperature": 0.0, "avg_logprob": -0.12858757972717286, "compression_ratio": 1.893939393939394, "no_speech_prob": 0.0010418298188596964}, {"id": 436, "seek": 169394, "start": 1699.94, "end": 1701.8400000000001, "text": " d by dx of it,", "tokens": [50665, 274, 538, 30017, 295, 309, 11, 50760], "temperature": 0.0, "avg_logprob": -0.12858757972717286, "compression_ratio": 1.893939393939394, "no_speech_prob": 0.0010418298188596964}, {"id": 437, "seek": 169394, "start": 1701.94, "end": 1703.8400000000001, "text": " is basically negative x to the negative 2.", "tokens": [50765, 307, 1936, 3671, 2031, 281, 264, 3671, 568, 13, 50860], "temperature": 0.0, "avg_logprob": -0.12858757972717286, "compression_ratio": 1.893939393939394, "no_speech_prob": 0.0010418298188596964}, {"id": 438, "seek": 169394, "start": 1703.94, "end": 1705.8400000000001, "text": " Negative 1 over s squared", "tokens": [50865, 43230, 502, 670, 262, 8889, 50960], "temperature": 0.0, "avg_logprob": -0.12858757972717286, "compression_ratio": 1.893939393939394, "no_speech_prob": 0.0010418298188596964}, {"id": 439, "seek": 169394, "start": 1705.94, "end": 1707.8400000000001, "text": " is the same as negative x to the negative 2.", "tokens": [50965, 307, 264, 912, 382, 3671, 2031, 281, 264, 3671, 568, 13, 51060], "temperature": 0.0, "avg_logprob": -0.12858757972717286, "compression_ratio": 1.893939393939394, "no_speech_prob": 0.0010418298188596964}, {"id": 440, "seek": 169394, "start": 1707.94, "end": 1711.8400000000001, "text": " So dCountSum here will be", "tokens": [51065, 407, 274, 34, 792, 50, 449, 510, 486, 312, 51260], "temperature": 0.0, "avg_logprob": -0.12858757972717286, "compression_ratio": 1.893939393939394, "no_speech_prob": 0.0010418298188596964}, {"id": 441, "seek": 169394, "start": 1711.94, "end": 1713.8400000000001, "text": " local derivative is going to be", "tokens": [51265, 2654, 13760, 307, 516, 281, 312, 51360], "temperature": 0.0, "avg_logprob": -0.12858757972717286, "compression_ratio": 1.893939393939394, "no_speech_prob": 0.0010418298188596964}, {"id": 442, "seek": 169394, "start": 1713.94, "end": 1718.8400000000001, "text": " negative countsSum to the negative 2,", "tokens": [51365, 3671, 14893, 50, 449, 281, 264, 3671, 568, 11, 51610], "temperature": 0.0, "avg_logprob": -0.12858757972717286, "compression_ratio": 1.893939393939394, "no_speech_prob": 0.0010418298188596964}, {"id": 443, "seek": 169394, "start": 1718.94, "end": 1720.8400000000001, "text": " that's the local derivative,", "tokens": [51615, 300, 311, 264, 2654, 13760, 11, 51710], "temperature": 0.0, "avg_logprob": -0.12858757972717286, "compression_ratio": 1.893939393939394, "no_speech_prob": 0.0010418298188596964}, {"id": 444, "seek": 169394, "start": 1720.94, "end": 1723.8400000000001, "text": " times chain rule, which is", "tokens": [51715, 1413, 5021, 4978, 11, 597, 307, 51860], "temperature": 0.0, "avg_logprob": -0.12858757972717286, "compression_ratio": 1.893939393939394, "no_speech_prob": 0.0010418298188596964}, {"id": 445, "seek": 172384, "start": 1723.84, "end": 1725.74, "text": " countSumInv.", "tokens": [50365, 1207, 50, 449, 4575, 85, 13, 50460], "temperature": 0.0, "avg_logprob": -0.1048560414995466, "compression_ratio": 1.5984555984555984, "no_speech_prob": 0.0006859863642603159}, {"id": 446, "seek": 172384, "start": 1725.84, "end": 1727.74, "text": " So that's dCountSum.", "tokens": [50465, 407, 300, 311, 274, 34, 792, 50, 449, 13, 50560], "temperature": 0.0, "avg_logprob": -0.1048560414995466, "compression_ratio": 1.5984555984555984, "no_speech_prob": 0.0006859863642603159}, {"id": 447, "seek": 172384, "start": 1727.84, "end": 1729.74, "text": " Let's uncomment this", "tokens": [50565, 961, 311, 8585, 518, 341, 50660], "temperature": 0.0, "avg_logprob": -0.1048560414995466, "compression_ratio": 1.5984555984555984, "no_speech_prob": 0.0006859863642603159}, {"id": 448, "seek": 172384, "start": 1729.84, "end": 1731.74, "text": " and check that I am correct.", "tokens": [50665, 293, 1520, 300, 286, 669, 3006, 13, 50760], "temperature": 0.0, "avg_logprob": -0.1048560414995466, "compression_ratio": 1.5984555984555984, "no_speech_prob": 0.0006859863642603159}, {"id": 449, "seek": 172384, "start": 1731.84, "end": 1733.74, "text": " Okay, so we have perfect equality.", "tokens": [50765, 1033, 11, 370, 321, 362, 2176, 14949, 13, 50860], "temperature": 0.0, "avg_logprob": -0.1048560414995466, "compression_ratio": 1.5984555984555984, "no_speech_prob": 0.0006859863642603159}, {"id": 450, "seek": 172384, "start": 1733.84, "end": 1737.74, "text": " And there's no sketchiness going on here", "tokens": [50865, 400, 456, 311, 572, 12325, 1324, 516, 322, 510, 51060], "temperature": 0.0, "avg_logprob": -0.1048560414995466, "compression_ratio": 1.5984555984555984, "no_speech_prob": 0.0006859863642603159}, {"id": 451, "seek": 172384, "start": 1737.84, "end": 1739.74, "text": " with any shapes because these are of the same shape.", "tokens": [51065, 365, 604, 10854, 570, 613, 366, 295, 264, 912, 3909, 13, 51160], "temperature": 0.0, "avg_logprob": -0.1048560414995466, "compression_ratio": 1.5984555984555984, "no_speech_prob": 0.0006859863642603159}, {"id": 452, "seek": 172384, "start": 1739.84, "end": 1741.74, "text": " Okay, next up we want to", "tokens": [51165, 1033, 11, 958, 493, 321, 528, 281, 51260], "temperature": 0.0, "avg_logprob": -0.1048560414995466, "compression_ratio": 1.5984555984555984, "no_speech_prob": 0.0006859863642603159}, {"id": 453, "seek": 172384, "start": 1741.84, "end": 1743.74, "text": " backpropagate through this line.", "tokens": [51265, 646, 79, 1513, 559, 473, 807, 341, 1622, 13, 51360], "temperature": 0.0, "avg_logprob": -0.1048560414995466, "compression_ratio": 1.5984555984555984, "no_speech_prob": 0.0006859863642603159}, {"id": 454, "seek": 172384, "start": 1743.84, "end": 1745.74, "text": " We have that countSum is counts.sum", "tokens": [51365, 492, 362, 300, 1207, 50, 449, 307, 14893, 13, 82, 449, 51460], "temperature": 0.0, "avg_logprob": -0.1048560414995466, "compression_ratio": 1.5984555984555984, "no_speech_prob": 0.0006859863642603159}, {"id": 455, "seek": 172384, "start": 1745.84, "end": 1747.74, "text": " along the rows.", "tokens": [51465, 2051, 264, 13241, 13, 51560], "temperature": 0.0, "avg_logprob": -0.1048560414995466, "compression_ratio": 1.5984555984555984, "no_speech_prob": 0.0006859863642603159}, {"id": 456, "seek": 172384, "start": 1747.84, "end": 1749.74, "text": " So I wrote out some help here.", "tokens": [51565, 407, 286, 4114, 484, 512, 854, 510, 13, 51660], "temperature": 0.0, "avg_logprob": -0.1048560414995466, "compression_ratio": 1.5984555984555984, "no_speech_prob": 0.0006859863642603159}, {"id": 457, "seek": 172384, "start": 1749.84, "end": 1751.74, "text": " We have to keep in mind that", "tokens": [51665, 492, 362, 281, 1066, 294, 1575, 300, 51760], "temperature": 0.0, "avg_logprob": -0.1048560414995466, "compression_ratio": 1.5984555984555984, "no_speech_prob": 0.0006859863642603159}, {"id": 458, "seek": 172384, "start": 1751.84, "end": 1753.74, "text": " counts, of course, is 32 by 27.", "tokens": [51765, 14893, 11, 295, 1164, 11, 307, 8858, 538, 7634, 13, 51860], "temperature": 0.0, "avg_logprob": -0.1048560414995466, "compression_ratio": 1.5984555984555984, "no_speech_prob": 0.0006859863642603159}, {"id": 459, "seek": 175384, "start": 1753.84, "end": 1755.74, "text": " And countsSum is 32 by 1.", "tokens": [50365, 400, 14893, 50, 449, 307, 8858, 538, 502, 13, 50460], "temperature": 0.0, "avg_logprob": -0.12193046347068175, "compression_ratio": 1.5961538461538463, "no_speech_prob": 0.0003771467600017786}, {"id": 460, "seek": 175384, "start": 1755.84, "end": 1757.74, "text": " So in this backpropagation,", "tokens": [50465, 407, 294, 341, 646, 79, 1513, 559, 399, 11, 50560], "temperature": 0.0, "avg_logprob": -0.12193046347068175, "compression_ratio": 1.5961538461538463, "no_speech_prob": 0.0003771467600017786}, {"id": 461, "seek": 175384, "start": 1757.84, "end": 1761.74, "text": " we need to take this column of derivatives", "tokens": [50565, 321, 643, 281, 747, 341, 7738, 295, 33733, 50760], "temperature": 0.0, "avg_logprob": -0.12193046347068175, "compression_ratio": 1.5961538461538463, "no_speech_prob": 0.0003771467600017786}, {"id": 462, "seek": 175384, "start": 1761.84, "end": 1764.74, "text": " and transform it into an array of derivatives,", "tokens": [50765, 293, 4088, 309, 666, 364, 10225, 295, 33733, 11, 50910], "temperature": 0.0, "avg_logprob": -0.12193046347068175, "compression_ratio": 1.5961538461538463, "no_speech_prob": 0.0003771467600017786}, {"id": 463, "seek": 175384, "start": 1764.84, "end": 1766.74, "text": " two-dimensional array.", "tokens": [50915, 732, 12, 18759, 10225, 13, 51010], "temperature": 0.0, "avg_logprob": -0.12193046347068175, "compression_ratio": 1.5961538461538463, "no_speech_prob": 0.0003771467600017786}, {"id": 464, "seek": 175384, "start": 1766.84, "end": 1768.74, "text": " So what is this operation doing?", "tokens": [51015, 407, 437, 307, 341, 6916, 884, 30, 51110], "temperature": 0.0, "avg_logprob": -0.12193046347068175, "compression_ratio": 1.5961538461538463, "no_speech_prob": 0.0003771467600017786}, {"id": 465, "seek": 175384, "start": 1768.84, "end": 1770.74, "text": " We're taking some kind of an input,", "tokens": [51115, 492, 434, 1940, 512, 733, 295, 364, 4846, 11, 51210], "temperature": 0.0, "avg_logprob": -0.12193046347068175, "compression_ratio": 1.5961538461538463, "no_speech_prob": 0.0003771467600017786}, {"id": 466, "seek": 175384, "start": 1770.84, "end": 1772.74, "text": " like, say, a 3x3 matrix A,", "tokens": [51215, 411, 11, 584, 11, 257, 805, 87, 18, 8141, 316, 11, 51310], "temperature": 0.0, "avg_logprob": -0.12193046347068175, "compression_ratio": 1.5961538461538463, "no_speech_prob": 0.0003771467600017786}, {"id": 467, "seek": 175384, "start": 1772.84, "end": 1774.74, "text": " and we are summing up the rows", "tokens": [51315, 293, 321, 366, 2408, 2810, 493, 264, 13241, 51410], "temperature": 0.0, "avg_logprob": -0.12193046347068175, "compression_ratio": 1.5961538461538463, "no_speech_prob": 0.0003771467600017786}, {"id": 468, "seek": 175384, "start": 1774.84, "end": 1776.74, "text": " into a column tensor B.", "tokens": [51415, 666, 257, 7738, 40863, 363, 13, 51510], "temperature": 0.0, "avg_logprob": -0.12193046347068175, "compression_ratio": 1.5961538461538463, "no_speech_prob": 0.0003771467600017786}, {"id": 469, "seek": 175384, "start": 1776.84, "end": 1779.74, "text": " B1, B2, B3, that is basically this.", "tokens": [51515, 363, 16, 11, 363, 17, 11, 363, 18, 11, 300, 307, 1936, 341, 13, 51660], "temperature": 0.0, "avg_logprob": -0.12193046347068175, "compression_ratio": 1.5961538461538463, "no_speech_prob": 0.0003771467600017786}, {"id": 470, "seek": 175384, "start": 1779.84, "end": 1781.74, "text": " So now we have the derivatives", "tokens": [51665, 407, 586, 321, 362, 264, 33733, 51760], "temperature": 0.0, "avg_logprob": -0.12193046347068175, "compression_ratio": 1.5961538461538463, "no_speech_prob": 0.0003771467600017786}, {"id": 471, "seek": 175384, "start": 1781.84, "end": 1783.74, "text": " of the loss with respect to B.", "tokens": [51765, 295, 264, 4470, 365, 3104, 281, 363, 13, 51860], "temperature": 0.0, "avg_logprob": -0.12193046347068175, "compression_ratio": 1.5961538461538463, "no_speech_prob": 0.0003771467600017786}, {"id": 472, "seek": 178374, "start": 1783.74, "end": 1785.64, "text": " And now we have the elements of B.", "tokens": [50365, 400, 586, 321, 362, 264, 4959, 295, 363, 13, 50460], "temperature": 0.0, "avg_logprob": -0.12155938655772108, "compression_ratio": 1.9121338912133892, "no_speech_prob": 0.0006679570651613176}, {"id": 473, "seek": 178374, "start": 1785.74, "end": 1787.64, "text": " And now we want the derivative of the loss", "tokens": [50465, 400, 586, 321, 528, 264, 13760, 295, 264, 4470, 50560], "temperature": 0.0, "avg_logprob": -0.12155938655772108, "compression_ratio": 1.9121338912133892, "no_speech_prob": 0.0006679570651613176}, {"id": 474, "seek": 178374, "start": 1787.74, "end": 1789.64, "text": " with respect to all these little a's.", "tokens": [50565, 365, 3104, 281, 439, 613, 707, 257, 311, 13, 50660], "temperature": 0.0, "avg_logprob": -0.12155938655772108, "compression_ratio": 1.9121338912133892, "no_speech_prob": 0.0006679570651613176}, {"id": 475, "seek": 178374, "start": 1789.74, "end": 1791.64, "text": " So how do the b's depend on the a's,", "tokens": [50665, 407, 577, 360, 264, 272, 311, 5672, 322, 264, 257, 311, 11, 50760], "temperature": 0.0, "avg_logprob": -0.12155938655772108, "compression_ratio": 1.9121338912133892, "no_speech_prob": 0.0006679570651613176}, {"id": 476, "seek": 178374, "start": 1791.74, "end": 1793.64, "text": " is basically what we're after.", "tokens": [50765, 307, 1936, 437, 321, 434, 934, 13, 50860], "temperature": 0.0, "avg_logprob": -0.12155938655772108, "compression_ratio": 1.9121338912133892, "no_speech_prob": 0.0006679570651613176}, {"id": 477, "seek": 178374, "start": 1793.74, "end": 1795.64, "text": " What is the local derivative of this operation?", "tokens": [50865, 708, 307, 264, 2654, 13760, 295, 341, 6916, 30, 50960], "temperature": 0.0, "avg_logprob": -0.12155938655772108, "compression_ratio": 1.9121338912133892, "no_speech_prob": 0.0006679570651613176}, {"id": 478, "seek": 178374, "start": 1795.74, "end": 1797.64, "text": " Well, we can see here that B1", "tokens": [50965, 1042, 11, 321, 393, 536, 510, 300, 363, 16, 51060], "temperature": 0.0, "avg_logprob": -0.12155938655772108, "compression_ratio": 1.9121338912133892, "no_speech_prob": 0.0006679570651613176}, {"id": 479, "seek": 178374, "start": 1797.74, "end": 1799.64, "text": " only depends on these elements here.", "tokens": [51065, 787, 5946, 322, 613, 4959, 510, 13, 51160], "temperature": 0.0, "avg_logprob": -0.12155938655772108, "compression_ratio": 1.9121338912133892, "no_speech_prob": 0.0006679570651613176}, {"id": 480, "seek": 178374, "start": 1799.74, "end": 1801.64, "text": " The derivative of B1", "tokens": [51165, 440, 13760, 295, 363, 16, 51260], "temperature": 0.0, "avg_logprob": -0.12155938655772108, "compression_ratio": 1.9121338912133892, "no_speech_prob": 0.0006679570651613176}, {"id": 481, "seek": 178374, "start": 1801.74, "end": 1803.64, "text": " with respect to all of these elements down here", "tokens": [51265, 365, 3104, 281, 439, 295, 613, 4959, 760, 510, 51360], "temperature": 0.0, "avg_logprob": -0.12155938655772108, "compression_ratio": 1.9121338912133892, "no_speech_prob": 0.0006679570651613176}, {"id": 482, "seek": 178374, "start": 1803.74, "end": 1805.64, "text": " is 0.", "tokens": [51365, 307, 1958, 13, 51460], "temperature": 0.0, "avg_logprob": -0.12155938655772108, "compression_ratio": 1.9121338912133892, "no_speech_prob": 0.0006679570651613176}, {"id": 483, "seek": 178374, "start": 1805.74, "end": 1807.64, "text": " But for these elements here,", "tokens": [51465, 583, 337, 613, 4959, 510, 11, 51560], "temperature": 0.0, "avg_logprob": -0.12155938655772108, "compression_ratio": 1.9121338912133892, "no_speech_prob": 0.0006679570651613176}, {"id": 484, "seek": 178374, "start": 1807.74, "end": 1809.64, "text": " like A11, A12, etc.,", "tokens": [51565, 411, 316, 5348, 11, 316, 4762, 11, 5183, 7933, 51660], "temperature": 0.0, "avg_logprob": -0.12155938655772108, "compression_ratio": 1.9121338912133892, "no_speech_prob": 0.0006679570651613176}, {"id": 485, "seek": 178374, "start": 1809.74, "end": 1811.64, "text": " the local derivative is 1, right?", "tokens": [51665, 264, 2654, 13760, 307, 502, 11, 558, 30, 51760], "temperature": 0.0, "avg_logprob": -0.12155938655772108, "compression_ratio": 1.9121338912133892, "no_speech_prob": 0.0006679570651613176}, {"id": 486, "seek": 181164, "start": 1811.64, "end": 1815.5400000000002, "text": " So dB1 by dA11, for example, is 1.", "tokens": [50365, 407, 274, 33, 16, 538, 274, 32, 5348, 11, 337, 1365, 11, 307, 502, 13, 50560], "temperature": 0.0, "avg_logprob": -0.1163332781453771, "compression_ratio": 2.0, "no_speech_prob": 0.00011503801215440035}, {"id": 487, "seek": 181164, "start": 1815.64, "end": 1817.5400000000002, "text": " So it's 1, 1, and 1.", "tokens": [50565, 407, 309, 311, 502, 11, 502, 11, 293, 502, 13, 50660], "temperature": 0.0, "avg_logprob": -0.1163332781453771, "compression_ratio": 2.0, "no_speech_prob": 0.00011503801215440035}, {"id": 488, "seek": 181164, "start": 1817.64, "end": 1819.5400000000002, "text": " So when we have the derivative of the loss", "tokens": [50665, 407, 562, 321, 362, 264, 13760, 295, 264, 4470, 50760], "temperature": 0.0, "avg_logprob": -0.1163332781453771, "compression_ratio": 2.0, "no_speech_prob": 0.00011503801215440035}, {"id": 489, "seek": 181164, "start": 1819.64, "end": 1821.5400000000002, "text": " with respect to B1,", "tokens": [50765, 365, 3104, 281, 363, 16, 11, 50860], "temperature": 0.0, "avg_logprob": -0.1163332781453771, "compression_ratio": 2.0, "no_speech_prob": 0.00011503801215440035}, {"id": 490, "seek": 181164, "start": 1821.64, "end": 1823.5400000000002, "text": " the local derivative of B1", "tokens": [50865, 264, 2654, 13760, 295, 363, 16, 50960], "temperature": 0.0, "avg_logprob": -0.1163332781453771, "compression_ratio": 2.0, "no_speech_prob": 0.00011503801215440035}, {"id": 491, "seek": 181164, "start": 1823.64, "end": 1825.5400000000002, "text": " with respect to these inputs is 0 here,", "tokens": [50965, 365, 3104, 281, 613, 15743, 307, 1958, 510, 11, 51060], "temperature": 0.0, "avg_logprob": -0.1163332781453771, "compression_ratio": 2.0, "no_speech_prob": 0.00011503801215440035}, {"id": 492, "seek": 181164, "start": 1825.64, "end": 1827.5400000000002, "text": " but it's 1 on these guys.", "tokens": [51065, 457, 309, 311, 502, 322, 613, 1074, 13, 51160], "temperature": 0.0, "avg_logprob": -0.1163332781453771, "compression_ratio": 2.0, "no_speech_prob": 0.00011503801215440035}, {"id": 493, "seek": 181164, "start": 1827.64, "end": 1829.5400000000002, "text": " So in the chain rule,", "tokens": [51165, 407, 294, 264, 5021, 4978, 11, 51260], "temperature": 0.0, "avg_logprob": -0.1163332781453771, "compression_ratio": 2.0, "no_speech_prob": 0.00011503801215440035}, {"id": 494, "seek": 181164, "start": 1829.64, "end": 1831.5400000000002, "text": " we have the local derivative", "tokens": [51265, 321, 362, 264, 2654, 13760, 51360], "temperature": 0.0, "avg_logprob": -0.1163332781453771, "compression_ratio": 2.0, "no_speech_prob": 0.00011503801215440035}, {"id": 495, "seek": 181164, "start": 1831.64, "end": 1834.5400000000002, "text": " times the derivative of B1.", "tokens": [51365, 1413, 264, 13760, 295, 363, 16, 13, 51510], "temperature": 0.0, "avg_logprob": -0.1163332781453771, "compression_ratio": 2.0, "no_speech_prob": 0.00011503801215440035}, {"id": 496, "seek": 181164, "start": 1834.64, "end": 1836.5400000000002, "text": " And so because the local derivative", "tokens": [51515, 400, 370, 570, 264, 2654, 13760, 51610], "temperature": 0.0, "avg_logprob": -0.1163332781453771, "compression_ratio": 2.0, "no_speech_prob": 0.00011503801215440035}, {"id": 497, "seek": 181164, "start": 1836.64, "end": 1838.5400000000002, "text": " is 1 on these three elements,", "tokens": [51615, 307, 502, 322, 613, 1045, 4959, 11, 51710], "temperature": 0.0, "avg_logprob": -0.1163332781453771, "compression_ratio": 2.0, "no_speech_prob": 0.00011503801215440035}, {"id": 498, "seek": 181164, "start": 1838.64, "end": 1840.5400000000002, "text": " the local derivative of multiplying", "tokens": [51715, 264, 2654, 13760, 295, 30955, 51810], "temperature": 0.0, "avg_logprob": -0.1163332781453771, "compression_ratio": 2.0, "no_speech_prob": 0.00011503801215440035}, {"id": 499, "seek": 184054, "start": 1840.54, "end": 1842.44, "text": " the derivative of B1", "tokens": [50365, 264, 13760, 295, 363, 16, 50460], "temperature": 0.0, "avg_logprob": -0.05726838467726067, "compression_ratio": 1.9422222222222223, "no_speech_prob": 0.00022972641454543918}, {"id": 500, "seek": 184054, "start": 1842.54, "end": 1844.44, "text": " will just be the derivative of B1.", "tokens": [50465, 486, 445, 312, 264, 13760, 295, 363, 16, 13, 50560], "temperature": 0.0, "avg_logprob": -0.05726838467726067, "compression_ratio": 1.9422222222222223, "no_speech_prob": 0.00022972641454543918}, {"id": 501, "seek": 184054, "start": 1844.54, "end": 1846.44, "text": " And so you can look at it as a router.", "tokens": [50565, 400, 370, 291, 393, 574, 412, 309, 382, 257, 22492, 13, 50660], "temperature": 0.0, "avg_logprob": -0.05726838467726067, "compression_ratio": 1.9422222222222223, "no_speech_prob": 0.00022972641454543918}, {"id": 502, "seek": 184054, "start": 1846.54, "end": 1848.44, "text": " Basically, an addition", "tokens": [50665, 8537, 11, 364, 4500, 50760], "temperature": 0.0, "avg_logprob": -0.05726838467726067, "compression_ratio": 1.9422222222222223, "no_speech_prob": 0.00022972641454543918}, {"id": 503, "seek": 184054, "start": 1848.54, "end": 1850.44, "text": " is a router of gradient.", "tokens": [50765, 307, 257, 22492, 295, 16235, 13, 50860], "temperature": 0.0, "avg_logprob": -0.05726838467726067, "compression_ratio": 1.9422222222222223, "no_speech_prob": 0.00022972641454543918}, {"id": 504, "seek": 184054, "start": 1850.54, "end": 1852.44, "text": " Whatever gradient comes from above,", "tokens": [50865, 8541, 16235, 1487, 490, 3673, 11, 50960], "temperature": 0.0, "avg_logprob": -0.05726838467726067, "compression_ratio": 1.9422222222222223, "no_speech_prob": 0.00022972641454543918}, {"id": 505, "seek": 184054, "start": 1852.54, "end": 1854.44, "text": " it just gets routed equally", "tokens": [50965, 309, 445, 2170, 4020, 292, 12309, 51060], "temperature": 0.0, "avg_logprob": -0.05726838467726067, "compression_ratio": 1.9422222222222223, "no_speech_prob": 0.00022972641454543918}, {"id": 506, "seek": 184054, "start": 1854.54, "end": 1856.44, "text": " to all the elements that participate", "tokens": [51065, 281, 439, 264, 4959, 300, 8197, 51160], "temperature": 0.0, "avg_logprob": -0.05726838467726067, "compression_ratio": 1.9422222222222223, "no_speech_prob": 0.00022972641454543918}, {"id": 507, "seek": 184054, "start": 1856.54, "end": 1858.44, "text": " in that addition.", "tokens": [51165, 294, 300, 4500, 13, 51260], "temperature": 0.0, "avg_logprob": -0.05726838467726067, "compression_ratio": 1.9422222222222223, "no_speech_prob": 0.00022972641454543918}, {"id": 508, "seek": 184054, "start": 1858.54, "end": 1860.44, "text": " So in this case, the derivative of B1", "tokens": [51265, 407, 294, 341, 1389, 11, 264, 13760, 295, 363, 16, 51360], "temperature": 0.0, "avg_logprob": -0.05726838467726067, "compression_ratio": 1.9422222222222223, "no_speech_prob": 0.00022972641454543918}, {"id": 509, "seek": 184054, "start": 1860.54, "end": 1862.44, "text": " will just flow equally to the derivative", "tokens": [51365, 486, 445, 3095, 12309, 281, 264, 13760, 51460], "temperature": 0.0, "avg_logprob": -0.05726838467726067, "compression_ratio": 1.9422222222222223, "no_speech_prob": 0.00022972641454543918}, {"id": 510, "seek": 184054, "start": 1862.54, "end": 1864.44, "text": " of A11, A12, and A13.", "tokens": [51465, 295, 316, 5348, 11, 316, 4762, 11, 293, 316, 7668, 13, 51560], "temperature": 0.0, "avg_logprob": -0.05726838467726067, "compression_ratio": 1.9422222222222223, "no_speech_prob": 0.00022972641454543918}, {"id": 511, "seek": 184054, "start": 1864.54, "end": 1866.44, "text": " So if we have a derivative", "tokens": [51565, 407, 498, 321, 362, 257, 13760, 51660], "temperature": 0.0, "avg_logprob": -0.05726838467726067, "compression_ratio": 1.9422222222222223, "no_speech_prob": 0.00022972641454543918}, {"id": 512, "seek": 184054, "start": 1866.54, "end": 1868.44, "text": " of all the elements of B", "tokens": [51665, 295, 439, 264, 4959, 295, 363, 51760], "temperature": 0.0, "avg_logprob": -0.05726838467726067, "compression_ratio": 1.9422222222222223, "no_speech_prob": 0.00022972641454543918}, {"id": 513, "seek": 184054, "start": 1868.54, "end": 1870.44, "text": " in this column tensor,", "tokens": [51765, 294, 341, 7738, 40863, 11, 51860], "temperature": 0.0, "avg_logprob": -0.05726838467726067, "compression_ratio": 1.9422222222222223, "no_speech_prob": 0.00022972641454543918}, {"id": 514, "seek": 187044, "start": 1870.44, "end": 1872.3400000000001, "text": " which we calculated just now,", "tokens": [50365, 597, 321, 15598, 445, 586, 11, 50460], "temperature": 0.0, "avg_logprob": -0.09572660322669599, "compression_ratio": 1.7672727272727273, "no_speech_prob": 0.0008267864468507469}, {"id": 515, "seek": 187044, "start": 1872.44, "end": 1874.3400000000001, "text": " we basically see that what that amounts to", "tokens": [50465, 321, 1936, 536, 300, 437, 300, 11663, 281, 50560], "temperature": 0.0, "avg_logprob": -0.09572660322669599, "compression_ratio": 1.7672727272727273, "no_speech_prob": 0.0008267864468507469}, {"id": 516, "seek": 187044, "start": 1874.44, "end": 1876.3400000000001, "text": " is all of these are now flowing", "tokens": [50565, 307, 439, 295, 613, 366, 586, 13974, 50660], "temperature": 0.0, "avg_logprob": -0.09572660322669599, "compression_ratio": 1.7672727272727273, "no_speech_prob": 0.0008267864468507469}, {"id": 517, "seek": 187044, "start": 1876.44, "end": 1878.3400000000001, "text": " to all these elements of A,", "tokens": [50665, 281, 439, 613, 4959, 295, 316, 11, 50760], "temperature": 0.0, "avg_logprob": -0.09572660322669599, "compression_ratio": 1.7672727272727273, "no_speech_prob": 0.0008267864468507469}, {"id": 518, "seek": 187044, "start": 1878.44, "end": 1880.3400000000001, "text": " and they're doing that horizontally.", "tokens": [50765, 293, 436, 434, 884, 300, 33796, 13, 50860], "temperature": 0.0, "avg_logprob": -0.09572660322669599, "compression_ratio": 1.7672727272727273, "no_speech_prob": 0.0008267864468507469}, {"id": 519, "seek": 187044, "start": 1880.44, "end": 1882.3400000000001, "text": " So basically what we want", "tokens": [50865, 407, 1936, 437, 321, 528, 50960], "temperature": 0.0, "avg_logprob": -0.09572660322669599, "compression_ratio": 1.7672727272727273, "no_speech_prob": 0.0008267864468507469}, {"id": 520, "seek": 187044, "start": 1882.44, "end": 1884.3400000000001, "text": " is we want to take the decount sum", "tokens": [50965, 307, 321, 528, 281, 747, 264, 979, 792, 2408, 51060], "temperature": 0.0, "avg_logprob": -0.09572660322669599, "compression_ratio": 1.7672727272727273, "no_speech_prob": 0.0008267864468507469}, {"id": 521, "seek": 187044, "start": 1884.44, "end": 1886.3400000000001, "text": " of size 32 by 1,", "tokens": [51065, 295, 2744, 8858, 538, 502, 11, 51160], "temperature": 0.0, "avg_logprob": -0.09572660322669599, "compression_ratio": 1.7672727272727273, "no_speech_prob": 0.0008267864468507469}, {"id": 522, "seek": 187044, "start": 1886.44, "end": 1888.3400000000001, "text": " and we just want to replicate it", "tokens": [51165, 293, 321, 445, 528, 281, 25356, 309, 51260], "temperature": 0.0, "avg_logprob": -0.09572660322669599, "compression_ratio": 1.7672727272727273, "no_speech_prob": 0.0008267864468507469}, {"id": 523, "seek": 187044, "start": 1888.44, "end": 1890.3400000000001, "text": " 27 times horizontally", "tokens": [51265, 7634, 1413, 33796, 51360], "temperature": 0.0, "avg_logprob": -0.09572660322669599, "compression_ratio": 1.7672727272727273, "no_speech_prob": 0.0008267864468507469}, {"id": 524, "seek": 187044, "start": 1890.44, "end": 1892.3400000000001, "text": " to create 32 by 27 array.", "tokens": [51365, 281, 1884, 8858, 538, 7634, 10225, 13, 51460], "temperature": 0.0, "avg_logprob": -0.09572660322669599, "compression_ratio": 1.7672727272727273, "no_speech_prob": 0.0008267864468507469}, {"id": 525, "seek": 187044, "start": 1892.44, "end": 1894.3400000000001, "text": " So there's many ways to implement this operation.", "tokens": [51465, 407, 456, 311, 867, 2098, 281, 4445, 341, 6916, 13, 51560], "temperature": 0.0, "avg_logprob": -0.09572660322669599, "compression_ratio": 1.7672727272727273, "no_speech_prob": 0.0008267864468507469}, {"id": 526, "seek": 187044, "start": 1894.44, "end": 1896.3400000000001, "text": " You could, of course, just replicate the tensor,", "tokens": [51565, 509, 727, 11, 295, 1164, 11, 445, 25356, 264, 40863, 11, 51660], "temperature": 0.0, "avg_logprob": -0.09572660322669599, "compression_ratio": 1.7672727272727273, "no_speech_prob": 0.0008267864468507469}, {"id": 527, "seek": 187044, "start": 1896.44, "end": 1898.3400000000001, "text": " but I think maybe one clean one", "tokens": [51665, 457, 286, 519, 1310, 472, 2541, 472, 51760], "temperature": 0.0, "avg_logprob": -0.09572660322669599, "compression_ratio": 1.7672727272727273, "no_speech_prob": 0.0008267864468507469}, {"id": 528, "seek": 187044, "start": 1898.44, "end": 1900.3400000000001, "text": " is that decounts is simply", "tokens": [51765, 307, 300, 979, 792, 82, 307, 2935, 51860], "temperature": 0.0, "avg_logprob": -0.09572660322669599, "compression_ratio": 1.7672727272727273, "no_speech_prob": 0.0008267864468507469}, {"id": 529, "seek": 190034, "start": 1900.34, "end": 1902.24, "text": " torch.once-like,", "tokens": [50365, 27822, 13, 26015, 12, 4092, 11, 50460], "temperature": 0.0, "avg_logprob": -0.12371620055167906, "compression_ratio": 1.6299212598425197, "no_speech_prob": 0.00040495701250620186}, {"id": 530, "seek": 190034, "start": 1902.34, "end": 1904.24, "text": " so just two-dimensional arrays", "tokens": [50465, 370, 445, 732, 12, 18759, 41011, 50560], "temperature": 0.0, "avg_logprob": -0.12371620055167906, "compression_ratio": 1.6299212598425197, "no_speech_prob": 0.00040495701250620186}, {"id": 531, "seek": 190034, "start": 1904.34, "end": 1906.24, "text": " of once in the shape of counts,", "tokens": [50565, 295, 1564, 294, 264, 3909, 295, 14893, 11, 50660], "temperature": 0.0, "avg_logprob": -0.12371620055167906, "compression_ratio": 1.6299212598425197, "no_speech_prob": 0.00040495701250620186}, {"id": 532, "seek": 190034, "start": 1906.34, "end": 1908.24, "text": " so 32 by 27,", "tokens": [50665, 370, 8858, 538, 7634, 11, 50760], "temperature": 0.0, "avg_logprob": -0.12371620055167906, "compression_ratio": 1.6299212598425197, "no_speech_prob": 0.00040495701250620186}, {"id": 533, "seek": 190034, "start": 1908.34, "end": 1910.24, "text": " times decounts sum.", "tokens": [50765, 1413, 979, 792, 82, 2408, 13, 50860], "temperature": 0.0, "avg_logprob": -0.12371620055167906, "compression_ratio": 1.6299212598425197, "no_speech_prob": 0.00040495701250620186}, {"id": 534, "seek": 190034, "start": 1910.34, "end": 1912.24, "text": " So this way we're letting", "tokens": [50865, 407, 341, 636, 321, 434, 8295, 50960], "temperature": 0.0, "avg_logprob": -0.12371620055167906, "compression_ratio": 1.6299212598425197, "no_speech_prob": 0.00040495701250620186}, {"id": 535, "seek": 190034, "start": 1912.34, "end": 1914.24, "text": " the broadcasting here", "tokens": [50965, 264, 30024, 510, 51060], "temperature": 0.0, "avg_logprob": -0.12371620055167906, "compression_ratio": 1.6299212598425197, "no_speech_prob": 0.00040495701250620186}, {"id": 536, "seek": 190034, "start": 1914.34, "end": 1916.24, "text": " basically implement the replication.", "tokens": [51065, 1936, 4445, 264, 39911, 13, 51160], "temperature": 0.0, "avg_logprob": -0.12371620055167906, "compression_ratio": 1.6299212598425197, "no_speech_prob": 0.00040495701250620186}, {"id": 537, "seek": 190034, "start": 1916.34, "end": 1918.24, "text": " You can look at it that way.", "tokens": [51165, 509, 393, 574, 412, 309, 300, 636, 13, 51260], "temperature": 0.0, "avg_logprob": -0.12371620055167906, "compression_ratio": 1.6299212598425197, "no_speech_prob": 0.00040495701250620186}, {"id": 538, "seek": 190034, "start": 1918.34, "end": 1920.24, "text": " But then we have to also be careful", "tokens": [51265, 583, 550, 321, 362, 281, 611, 312, 5026, 51360], "temperature": 0.0, "avg_logprob": -0.12371620055167906, "compression_ratio": 1.6299212598425197, "no_speech_prob": 0.00040495701250620186}, {"id": 539, "seek": 190034, "start": 1920.34, "end": 1922.24, "text": " because decounts", "tokens": [51365, 570, 979, 792, 82, 51460], "temperature": 0.0, "avg_logprob": -0.12371620055167906, "compression_ratio": 1.6299212598425197, "no_speech_prob": 0.00040495701250620186}, {"id": 540, "seek": 190034, "start": 1922.34, "end": 1924.24, "text": " was all already calculated.", "tokens": [51465, 390, 439, 1217, 15598, 13, 51560], "temperature": 0.0, "avg_logprob": -0.12371620055167906, "compression_ratio": 1.6299212598425197, "no_speech_prob": 0.00040495701250620186}, {"id": 541, "seek": 190034, "start": 1924.34, "end": 1926.24, "text": " We calculated earlier here,", "tokens": [51565, 492, 15598, 3071, 510, 11, 51660], "temperature": 0.0, "avg_logprob": -0.12371620055167906, "compression_ratio": 1.6299212598425197, "no_speech_prob": 0.00040495701250620186}, {"id": 542, "seek": 190034, "start": 1926.34, "end": 1928.24, "text": " and that was just the first branch,", "tokens": [51665, 293, 300, 390, 445, 264, 700, 9819, 11, 51760], "temperature": 0.0, "avg_logprob": -0.12371620055167906, "compression_ratio": 1.6299212598425197, "no_speech_prob": 0.00040495701250620186}, {"id": 543, "seek": 190034, "start": 1928.34, "end": 1930.24, "text": " and we're now finishing the second branch.", "tokens": [51765, 293, 321, 434, 586, 12693, 264, 1150, 9819, 13, 51860], "temperature": 0.0, "avg_logprob": -0.12371620055167906, "compression_ratio": 1.6299212598425197, "no_speech_prob": 0.00040495701250620186}, {"id": 544, "seek": 193024, "start": 1930.24, "end": 1932.14, "text": " So we need to make sure that these gradients add,", "tokens": [50365, 407, 321, 643, 281, 652, 988, 300, 613, 2771, 2448, 909, 11, 50460], "temperature": 0.0, "avg_logprob": -0.07754051344735281, "compression_ratio": 1.6278195488721805, "no_speech_prob": 0.0006079990998841822}, {"id": 545, "seek": 193024, "start": 1932.24, "end": 1934.14, "text": " so plus equals.", "tokens": [50465, 370, 1804, 6915, 13, 50560], "temperature": 0.0, "avg_logprob": -0.07754051344735281, "compression_ratio": 1.6278195488721805, "no_speech_prob": 0.0006079990998841822}, {"id": 546, "seek": 193024, "start": 1934.24, "end": 1936.14, "text": " And then here,", "tokens": [50565, 400, 550, 510, 11, 50660], "temperature": 0.0, "avg_logprob": -0.07754051344735281, "compression_ratio": 1.6278195488721805, "no_speech_prob": 0.0006079990998841822}, {"id": 547, "seek": 193024, "start": 1936.24, "end": 1938.14, "text": " let's comment out", "tokens": [50665, 718, 311, 2871, 484, 50760], "temperature": 0.0, "avg_logprob": -0.07754051344735281, "compression_ratio": 1.6278195488721805, "no_speech_prob": 0.0006079990998841822}, {"id": 548, "seek": 193024, "start": 1938.24, "end": 1940.14, "text": " the comparison,", "tokens": [50765, 264, 9660, 11, 50860], "temperature": 0.0, "avg_logprob": -0.07754051344735281, "compression_ratio": 1.6278195488721805, "no_speech_prob": 0.0006079990998841822}, {"id": 549, "seek": 193024, "start": 1940.24, "end": 1942.14, "text": " and let's make sure, crossing fingers,", "tokens": [50865, 293, 718, 311, 652, 988, 11, 14712, 7350, 11, 50960], "temperature": 0.0, "avg_logprob": -0.07754051344735281, "compression_ratio": 1.6278195488721805, "no_speech_prob": 0.0006079990998841822}, {"id": 550, "seek": 193024, "start": 1942.24, "end": 1944.14, "text": " that we have the correct result.", "tokens": [50965, 300, 321, 362, 264, 3006, 1874, 13, 51060], "temperature": 0.0, "avg_logprob": -0.07754051344735281, "compression_ratio": 1.6278195488721805, "no_speech_prob": 0.0006079990998841822}, {"id": 551, "seek": 193024, "start": 1944.24, "end": 1946.14, "text": " So PyTorch agrees with us", "tokens": [51065, 407, 9953, 51, 284, 339, 26383, 365, 505, 51160], "temperature": 0.0, "avg_logprob": -0.07754051344735281, "compression_ratio": 1.6278195488721805, "no_speech_prob": 0.0006079990998841822}, {"id": 552, "seek": 193024, "start": 1946.24, "end": 1948.14, "text": " on this gradient as well.", "tokens": [51165, 322, 341, 16235, 382, 731, 13, 51260], "temperature": 0.0, "avg_logprob": -0.07754051344735281, "compression_ratio": 1.6278195488721805, "no_speech_prob": 0.0006079990998841822}, {"id": 553, "seek": 193024, "start": 1948.24, "end": 1950.14, "text": " Okay, hopefully we're getting a hang of this now.", "tokens": [51265, 1033, 11, 4696, 321, 434, 1242, 257, 3967, 295, 341, 586, 13, 51360], "temperature": 0.0, "avg_logprob": -0.07754051344735281, "compression_ratio": 1.6278195488721805, "no_speech_prob": 0.0006079990998841822}, {"id": 554, "seek": 193024, "start": 1950.24, "end": 1952.14, "text": " Counts is an element-wise exp", "tokens": [51365, 5247, 82, 307, 364, 4478, 12, 3711, 1278, 51460], "temperature": 0.0, "avg_logprob": -0.07754051344735281, "compression_ratio": 1.6278195488721805, "no_speech_prob": 0.0006079990998841822}, {"id": 555, "seek": 193024, "start": 1952.24, "end": 1954.14, "text": " of norm logits.", "tokens": [51465, 295, 2026, 3565, 1208, 13, 51560], "temperature": 0.0, "avg_logprob": -0.07754051344735281, "compression_ratio": 1.6278195488721805, "no_speech_prob": 0.0006079990998841822}, {"id": 556, "seek": 193024, "start": 1954.24, "end": 1956.14, "text": " So now we want dNormLogits,", "tokens": [51565, 407, 586, 321, 528, 274, 45, 687, 43, 664, 1208, 11, 51660], "temperature": 0.0, "avg_logprob": -0.07754051344735281, "compression_ratio": 1.6278195488721805, "no_speech_prob": 0.0006079990998841822}, {"id": 557, "seek": 193024, "start": 1956.24, "end": 1958.14, "text": " and because it's an element-wise operation,", "tokens": [51665, 293, 570, 309, 311, 364, 4478, 12, 3711, 6916, 11, 51760], "temperature": 0.0, "avg_logprob": -0.07754051344735281, "compression_ratio": 1.6278195488721805, "no_speech_prob": 0.0006079990998841822}, {"id": 558, "seek": 193024, "start": 1958.24, "end": 1960.14, "text": " everything is very simple.", "tokens": [51765, 1203, 307, 588, 2199, 13, 51860], "temperature": 0.0, "avg_logprob": -0.07754051344735281, "compression_ratio": 1.6278195488721805, "no_speech_prob": 0.0006079990998841822}, {"id": 559, "seek": 196014, "start": 1960.14, "end": 1962.0400000000002, "text": " It's the local derivative of e to the x.", "tokens": [50365, 467, 311, 264, 2654, 13760, 295, 308, 281, 264, 2031, 13, 50460], "temperature": 0.0, "avg_logprob": -0.11971433143916092, "compression_ratio": 1.9104477611940298, "no_speech_prob": 0.0010081740329042077}, {"id": 560, "seek": 196014, "start": 1962.14, "end": 1964.0400000000002, "text": " It's famously just e to the x.", "tokens": [50465, 467, 311, 34360, 445, 308, 281, 264, 2031, 13, 50560], "temperature": 0.0, "avg_logprob": -0.11971433143916092, "compression_ratio": 1.9104477611940298, "no_speech_prob": 0.0010081740329042077}, {"id": 561, "seek": 196014, "start": 1964.14, "end": 1966.0400000000002, "text": " So this is the local derivative.", "tokens": [50565, 407, 341, 307, 264, 2654, 13760, 13, 50660], "temperature": 0.0, "avg_logprob": -0.11971433143916092, "compression_ratio": 1.9104477611940298, "no_speech_prob": 0.0010081740329042077}, {"id": 562, "seek": 196014, "start": 1968.14, "end": 1970.0400000000002, "text": " That is the local derivative.", "tokens": [50765, 663, 307, 264, 2654, 13760, 13, 50860], "temperature": 0.0, "avg_logprob": -0.11971433143916092, "compression_ratio": 1.9104477611940298, "no_speech_prob": 0.0010081740329042077}, {"id": 563, "seek": 196014, "start": 1970.14, "end": 1972.0400000000002, "text": " Now we already calculated it,", "tokens": [50865, 823, 321, 1217, 15598, 309, 11, 50960], "temperature": 0.0, "avg_logprob": -0.11971433143916092, "compression_ratio": 1.9104477611940298, "no_speech_prob": 0.0010081740329042077}, {"id": 564, "seek": 196014, "start": 1972.14, "end": 1974.0400000000002, "text": " and it's inside counts.", "tokens": [50965, 293, 309, 311, 1854, 14893, 13, 51060], "temperature": 0.0, "avg_logprob": -0.11971433143916092, "compression_ratio": 1.9104477611940298, "no_speech_prob": 0.0010081740329042077}, {"id": 565, "seek": 196014, "start": 1974.14, "end": 1976.0400000000002, "text": " So we may as well potentially just reuse counts.", "tokens": [51065, 407, 321, 815, 382, 731, 7263, 445, 26225, 14893, 13, 51160], "temperature": 0.0, "avg_logprob": -0.11971433143916092, "compression_ratio": 1.9104477611940298, "no_speech_prob": 0.0010081740329042077}, {"id": 566, "seek": 196014, "start": 1976.14, "end": 1978.0400000000002, "text": " That is the local derivative.", "tokens": [51165, 663, 307, 264, 2654, 13760, 13, 51260], "temperature": 0.0, "avg_logprob": -0.11971433143916092, "compression_ratio": 1.9104477611940298, "no_speech_prob": 0.0010081740329042077}, {"id": 567, "seek": 196014, "start": 1978.14, "end": 1980.0400000000002, "text": " Times dCounts.", "tokens": [51265, 11366, 274, 34, 792, 82, 13, 51360], "temperature": 0.0, "avg_logprob": -0.11971433143916092, "compression_ratio": 1.9104477611940298, "no_speech_prob": 0.0010081740329042077}, {"id": 568, "seek": 196014, "start": 1982.14, "end": 1984.0400000000002, "text": " Funny as that looks.", "tokens": [51465, 36484, 382, 300, 1542, 13, 51560], "temperature": 0.0, "avg_logprob": -0.11971433143916092, "compression_ratio": 1.9104477611940298, "no_speech_prob": 0.0010081740329042077}, {"id": 569, "seek": 196014, "start": 1984.14, "end": 1986.0400000000002, "text": " Counts times dCounts is dNormLogits.", "tokens": [51565, 5247, 82, 1413, 274, 34, 792, 82, 307, 274, 45, 687, 43, 664, 1208, 13, 51660], "temperature": 0.0, "avg_logprob": -0.11971433143916092, "compression_ratio": 1.9104477611940298, "no_speech_prob": 0.0010081740329042077}, {"id": 570, "seek": 196014, "start": 1986.14, "end": 1988.0400000000002, "text": " And now let's erase this,", "tokens": [51665, 400, 586, 718, 311, 23525, 341, 11, 51760], "temperature": 0.0, "avg_logprob": -0.11971433143916092, "compression_ratio": 1.9104477611940298, "no_speech_prob": 0.0010081740329042077}, {"id": 571, "seek": 196014, "start": 1988.14, "end": 1990.0400000000002, "text": " and let's verify,", "tokens": [51765, 293, 718, 311, 16888, 11, 51860], "temperature": 0.0, "avg_logprob": -0.11971433143916092, "compression_ratio": 1.9104477611940298, "no_speech_prob": 0.0010081740329042077}, {"id": 572, "seek": 199004, "start": 1990.04, "end": 1991.94, "text": " and let's go.", "tokens": [50365, 293, 718, 311, 352, 13, 50460], "temperature": 0.0, "avg_logprob": -0.1138478581776876, "compression_ratio": 1.746031746031746, "no_speech_prob": 0.0008522075368091464}, {"id": 573, "seek": 199004, "start": 1992.04, "end": 1993.94, "text": " So that's dNormLogits.", "tokens": [50465, 407, 300, 311, 274, 45, 687, 43, 664, 1208, 13, 50560], "temperature": 0.0, "avg_logprob": -0.1138478581776876, "compression_ratio": 1.746031746031746, "no_speech_prob": 0.0008522075368091464}, {"id": 574, "seek": 199004, "start": 1994.04, "end": 1995.94, "text": " Okay, so we are here", "tokens": [50565, 1033, 11, 370, 321, 366, 510, 50660], "temperature": 0.0, "avg_logprob": -0.1138478581776876, "compression_ratio": 1.746031746031746, "no_speech_prob": 0.0008522075368091464}, {"id": 575, "seek": 199004, "start": 1996.04, "end": 1997.94, "text": " on this line now, dNormLogits.", "tokens": [50665, 322, 341, 1622, 586, 11, 274, 45, 687, 43, 664, 1208, 13, 50760], "temperature": 0.0, "avg_logprob": -0.1138478581776876, "compression_ratio": 1.746031746031746, "no_speech_prob": 0.0008522075368091464}, {"id": 576, "seek": 199004, "start": 1998.04, "end": 1999.94, "text": " We have that,", "tokens": [50765, 492, 362, 300, 11, 50860], "temperature": 0.0, "avg_logprob": -0.1138478581776876, "compression_ratio": 1.746031746031746, "no_speech_prob": 0.0008522075368091464}, {"id": 577, "seek": 199004, "start": 2000.04, "end": 2001.94, "text": " and we're trying to calculate dLogits", "tokens": [50865, 293, 321, 434, 1382, 281, 8873, 274, 43, 664, 1208, 50960], "temperature": 0.0, "avg_logprob": -0.1138478581776876, "compression_ratio": 1.746031746031746, "no_speech_prob": 0.0008522075368091464}, {"id": 578, "seek": 199004, "start": 2002.04, "end": 2003.94, "text": " and dLogitMaxes.", "tokens": [50965, 293, 274, 43, 664, 270, 36025, 279, 13, 51060], "temperature": 0.0, "avg_logprob": -0.1138478581776876, "compression_ratio": 1.746031746031746, "no_speech_prob": 0.0008522075368091464}, {"id": 579, "seek": 199004, "start": 2004.04, "end": 2005.94, "text": " So back-propagating through this line.", "tokens": [51065, 407, 646, 12, 79, 1513, 559, 990, 807, 341, 1622, 13, 51160], "temperature": 0.0, "avg_logprob": -0.1138478581776876, "compression_ratio": 1.746031746031746, "no_speech_prob": 0.0008522075368091464}, {"id": 580, "seek": 199004, "start": 2006.04, "end": 2007.94, "text": " Now we have to be careful here,", "tokens": [51165, 823, 321, 362, 281, 312, 5026, 510, 11, 51260], "temperature": 0.0, "avg_logprob": -0.1138478581776876, "compression_ratio": 1.746031746031746, "no_speech_prob": 0.0008522075368091464}, {"id": 581, "seek": 199004, "start": 2008.04, "end": 2009.94, "text": " because the shapes, again, are not the same,", "tokens": [51265, 570, 264, 10854, 11, 797, 11, 366, 406, 264, 912, 11, 51360], "temperature": 0.0, "avg_logprob": -0.1138478581776876, "compression_ratio": 1.746031746031746, "no_speech_prob": 0.0008522075368091464}, {"id": 582, "seek": 199004, "start": 2010.04, "end": 2011.94, "text": " and so there's an implicit broadcasting happening here.", "tokens": [51365, 293, 370, 456, 311, 364, 26947, 30024, 2737, 510, 13, 51460], "temperature": 0.0, "avg_logprob": -0.1138478581776876, "compression_ratio": 1.746031746031746, "no_speech_prob": 0.0008522075368091464}, {"id": 583, "seek": 199004, "start": 2012.04, "end": 2013.94, "text": " So dNormLogits has the shape 32x27.", "tokens": [51465, 407, 274, 45, 687, 43, 664, 1208, 575, 264, 3909, 8858, 87, 10076, 13, 51560], "temperature": 0.0, "avg_logprob": -0.1138478581776876, "compression_ratio": 1.746031746031746, "no_speech_prob": 0.0008522075368091464}, {"id": 584, "seek": 199004, "start": 2014.04, "end": 2015.94, "text": " dLogits does as well,", "tokens": [51565, 274, 43, 664, 1208, 775, 382, 731, 11, 51660], "temperature": 0.0, "avg_logprob": -0.1138478581776876, "compression_ratio": 1.746031746031746, "no_speech_prob": 0.0008522075368091464}, {"id": 585, "seek": 199004, "start": 2016.04, "end": 2017.94, "text": " but dLogitMaxes is only 32x1.", "tokens": [51665, 457, 274, 43, 664, 270, 36025, 279, 307, 787, 8858, 87, 16, 13, 51760], "temperature": 0.0, "avg_logprob": -0.1138478581776876, "compression_ratio": 1.746031746031746, "no_speech_prob": 0.0008522075368091464}, {"id": 586, "seek": 199004, "start": 2018.04, "end": 2019.94, "text": " So there's a broadcast", "tokens": [51765, 407, 456, 311, 257, 9975, 51860], "temperature": 0.0, "avg_logprob": -0.1138478581776876, "compression_ratio": 1.746031746031746, "no_speech_prob": 0.0008522075368091464}, {"id": 587, "seek": 201994, "start": 2019.94, "end": 2021.8400000000001, "text": " here in the minus.", "tokens": [50365, 510, 294, 264, 3175, 13, 50460], "temperature": 0.0, "avg_logprob": -0.1276777854332557, "compression_ratio": 1.689795918367347, "no_speech_prob": 0.0004894372541457415}, {"id": 588, "seek": 201994, "start": 2021.94, "end": 2023.8400000000001, "text": " Now here I tried to", "tokens": [50465, 823, 510, 286, 3031, 281, 50560], "temperature": 0.0, "avg_logprob": -0.1276777854332557, "compression_ratio": 1.689795918367347, "no_speech_prob": 0.0004894372541457415}, {"id": 589, "seek": 201994, "start": 2023.94, "end": 2025.8400000000001, "text": " sort of write out a toy example again.", "tokens": [50565, 1333, 295, 2464, 484, 257, 12058, 1365, 797, 13, 50660], "temperature": 0.0, "avg_logprob": -0.1276777854332557, "compression_ratio": 1.689795918367347, "no_speech_prob": 0.0004894372541457415}, {"id": 590, "seek": 201994, "start": 2025.94, "end": 2027.8400000000001, "text": " We basically have that", "tokens": [50665, 492, 1936, 362, 300, 50760], "temperature": 0.0, "avg_logprob": -0.1276777854332557, "compression_ratio": 1.689795918367347, "no_speech_prob": 0.0004894372541457415}, {"id": 591, "seek": 201994, "start": 2027.94, "end": 2029.8400000000001, "text": " this is our c equals a minus b,", "tokens": [50765, 341, 307, 527, 269, 6915, 257, 3175, 272, 11, 50860], "temperature": 0.0, "avg_logprob": -0.1276777854332557, "compression_ratio": 1.689795918367347, "no_speech_prob": 0.0004894372541457415}, {"id": 592, "seek": 201994, "start": 2029.94, "end": 2031.8400000000001, "text": " and we see that because of the shape,", "tokens": [50865, 293, 321, 536, 300, 570, 295, 264, 3909, 11, 50960], "temperature": 0.0, "avg_logprob": -0.1276777854332557, "compression_ratio": 1.689795918367347, "no_speech_prob": 0.0004894372541457415}, {"id": 593, "seek": 201994, "start": 2031.94, "end": 2033.8400000000001, "text": " these are 3x3, but this one is just a column.", "tokens": [50965, 613, 366, 805, 87, 18, 11, 457, 341, 472, 307, 445, 257, 7738, 13, 51060], "temperature": 0.0, "avg_logprob": -0.1276777854332557, "compression_ratio": 1.689795918367347, "no_speech_prob": 0.0004894372541457415}, {"id": 594, "seek": 201994, "start": 2033.94, "end": 2035.8400000000001, "text": " And so for example,", "tokens": [51065, 400, 370, 337, 1365, 11, 51160], "temperature": 0.0, "avg_logprob": -0.1276777854332557, "compression_ratio": 1.689795918367347, "no_speech_prob": 0.0004894372541457415}, {"id": 595, "seek": 201994, "start": 2035.94, "end": 2037.8400000000001, "text": " every element of c,", "tokens": [51165, 633, 4478, 295, 269, 11, 51260], "temperature": 0.0, "avg_logprob": -0.1276777854332557, "compression_ratio": 1.689795918367347, "no_speech_prob": 0.0004894372541457415}, {"id": 596, "seek": 201994, "start": 2037.94, "end": 2039.8400000000001, "text": " we have to look at how it came to be.", "tokens": [51265, 321, 362, 281, 574, 412, 577, 309, 1361, 281, 312, 13, 51360], "temperature": 0.0, "avg_logprob": -0.1276777854332557, "compression_ratio": 1.689795918367347, "no_speech_prob": 0.0004894372541457415}, {"id": 597, "seek": 201994, "start": 2039.94, "end": 2041.8400000000001, "text": " And every element of c is just", "tokens": [51365, 400, 633, 4478, 295, 269, 307, 445, 51460], "temperature": 0.0, "avg_logprob": -0.1276777854332557, "compression_ratio": 1.689795918367347, "no_speech_prob": 0.0004894372541457415}, {"id": 598, "seek": 201994, "start": 2041.94, "end": 2043.8400000000001, "text": " the corresponding element of a minus", "tokens": [51465, 264, 11760, 4478, 295, 257, 3175, 51560], "temperature": 0.0, "avg_logprob": -0.1276777854332557, "compression_ratio": 1.689795918367347, "no_speech_prob": 0.0004894372541457415}, {"id": 599, "seek": 201994, "start": 2043.94, "end": 2045.8400000000001, "text": " basically that associated b.", "tokens": [51565, 1936, 300, 6615, 272, 13, 51660], "temperature": 0.0, "avg_logprob": -0.1276777854332557, "compression_ratio": 1.689795918367347, "no_speech_prob": 0.0004894372541457415}, {"id": 600, "seek": 201994, "start": 2047.94, "end": 2049.84, "text": " So it's very clear now", "tokens": [51765, 407, 309, 311, 588, 1850, 586, 51860], "temperature": 0.0, "avg_logprob": -0.1276777854332557, "compression_ratio": 1.689795918367347, "no_speech_prob": 0.0004894372541457415}, {"id": 601, "seek": 204984, "start": 2049.84, "end": 2051.7400000000002, "text": " that the derivatives of", "tokens": [50365, 300, 264, 33733, 295, 50460], "temperature": 0.0, "avg_logprob": -0.051539285977681475, "compression_ratio": 1.8904761904761904, "no_speech_prob": 0.0008087818277999759}, {"id": 602, "seek": 204984, "start": 2051.84, "end": 2053.7400000000002, "text": " every one of these c's with respect to their inputs", "tokens": [50465, 633, 472, 295, 613, 269, 311, 365, 3104, 281, 641, 15743, 50560], "temperature": 0.0, "avg_logprob": -0.051539285977681475, "compression_ratio": 1.8904761904761904, "no_speech_prob": 0.0008087818277999759}, {"id": 603, "seek": 204984, "start": 2053.84, "end": 2055.7400000000002, "text": " are 1", "tokens": [50565, 366, 502, 50660], "temperature": 0.0, "avg_logprob": -0.051539285977681475, "compression_ratio": 1.8904761904761904, "no_speech_prob": 0.0008087818277999759}, {"id": 604, "seek": 204984, "start": 2055.84, "end": 2057.7400000000002, "text": " for the corresponding a,", "tokens": [50665, 337, 264, 11760, 257, 11, 50760], "temperature": 0.0, "avg_logprob": -0.051539285977681475, "compression_ratio": 1.8904761904761904, "no_speech_prob": 0.0008087818277999759}, {"id": 605, "seek": 204984, "start": 2057.84, "end": 2059.7400000000002, "text": " and it's a negative 1", "tokens": [50765, 293, 309, 311, 257, 3671, 502, 50860], "temperature": 0.0, "avg_logprob": -0.051539285977681475, "compression_ratio": 1.8904761904761904, "no_speech_prob": 0.0008087818277999759}, {"id": 606, "seek": 204984, "start": 2059.84, "end": 2061.7400000000002, "text": " for the corresponding b.", "tokens": [50865, 337, 264, 11760, 272, 13, 50960], "temperature": 0.0, "avg_logprob": -0.051539285977681475, "compression_ratio": 1.8904761904761904, "no_speech_prob": 0.0008087818277999759}, {"id": 607, "seek": 204984, "start": 2061.84, "end": 2063.7400000000002, "text": " And so therefore,", "tokens": [50965, 400, 370, 4412, 11, 51060], "temperature": 0.0, "avg_logprob": -0.051539285977681475, "compression_ratio": 1.8904761904761904, "no_speech_prob": 0.0008087818277999759}, {"id": 608, "seek": 204984, "start": 2063.84, "end": 2065.7400000000002, "text": " the derivatives", "tokens": [51065, 264, 33733, 51160], "temperature": 0.0, "avg_logprob": -0.051539285977681475, "compression_ratio": 1.8904761904761904, "no_speech_prob": 0.0008087818277999759}, {"id": 609, "seek": 204984, "start": 2065.84, "end": 2067.7400000000002, "text": " on the c will flow", "tokens": [51165, 322, 264, 269, 486, 3095, 51260], "temperature": 0.0, "avg_logprob": -0.051539285977681475, "compression_ratio": 1.8904761904761904, "no_speech_prob": 0.0008087818277999759}, {"id": 610, "seek": 204984, "start": 2067.84, "end": 2069.7400000000002, "text": " equally to the corresponding a's,", "tokens": [51265, 12309, 281, 264, 11760, 257, 311, 11, 51360], "temperature": 0.0, "avg_logprob": -0.051539285977681475, "compression_ratio": 1.8904761904761904, "no_speech_prob": 0.0008087818277999759}, {"id": 611, "seek": 204984, "start": 2069.84, "end": 2071.7400000000002, "text": " and then also to the corresponding b's.", "tokens": [51365, 293, 550, 611, 281, 264, 11760, 272, 311, 13, 51460], "temperature": 0.0, "avg_logprob": -0.051539285977681475, "compression_ratio": 1.8904761904761904, "no_speech_prob": 0.0008087818277999759}, {"id": 612, "seek": 204984, "start": 2071.84, "end": 2073.7400000000002, "text": " But then in addition to that,", "tokens": [51465, 583, 550, 294, 4500, 281, 300, 11, 51560], "temperature": 0.0, "avg_logprob": -0.051539285977681475, "compression_ratio": 1.8904761904761904, "no_speech_prob": 0.0008087818277999759}, {"id": 613, "seek": 204984, "start": 2073.84, "end": 2075.7400000000002, "text": " the b's are broadcast,", "tokens": [51565, 264, 272, 311, 366, 9975, 11, 51660], "temperature": 0.0, "avg_logprob": -0.051539285977681475, "compression_ratio": 1.8904761904761904, "no_speech_prob": 0.0008087818277999759}, {"id": 614, "seek": 204984, "start": 2075.84, "end": 2077.7400000000002, "text": " so we'll have to do the additional sum", "tokens": [51665, 370, 321, 603, 362, 281, 360, 264, 4497, 2408, 51760], "temperature": 0.0, "avg_logprob": -0.051539285977681475, "compression_ratio": 1.8904761904761904, "no_speech_prob": 0.0008087818277999759}, {"id": 615, "seek": 204984, "start": 2077.84, "end": 2079.7400000000002, "text": " just like we did before.", "tokens": [51765, 445, 411, 321, 630, 949, 13, 51860], "temperature": 0.0, "avg_logprob": -0.051539285977681475, "compression_ratio": 1.8904761904761904, "no_speech_prob": 0.0008087818277999759}, {"id": 616, "seek": 207974, "start": 2079.74, "end": 2081.64, "text": " And of course, the derivatives for b's", "tokens": [50365, 400, 295, 1164, 11, 264, 33733, 337, 272, 311, 50460], "temperature": 0.0, "avg_logprob": -0.1289726110605093, "compression_ratio": 1.6517857142857142, "no_speech_prob": 0.0003923821495845914}, {"id": 617, "seek": 207974, "start": 2081.74, "end": 2083.64, "text": " will undergo a minus,", "tokens": [50465, 486, 26426, 257, 3175, 11, 50560], "temperature": 0.0, "avg_logprob": -0.1289726110605093, "compression_ratio": 1.6517857142857142, "no_speech_prob": 0.0003923821495845914}, {"id": 618, "seek": 207974, "start": 2083.74, "end": 2085.64, "text": " because the local derivative here", "tokens": [50565, 570, 264, 2654, 13760, 510, 50660], "temperature": 0.0, "avg_logprob": -0.1289726110605093, "compression_ratio": 1.6517857142857142, "no_speech_prob": 0.0003923821495845914}, {"id": 619, "seek": 207974, "start": 2085.74, "end": 2087.64, "text": " is negative 1.", "tokens": [50665, 307, 3671, 502, 13, 50760], "temperature": 0.0, "avg_logprob": -0.1289726110605093, "compression_ratio": 1.6517857142857142, "no_speech_prob": 0.0003923821495845914}, {"id": 620, "seek": 207974, "start": 2087.74, "end": 2089.64, "text": " So dc32 by db3 is negative 1.", "tokens": [50765, 407, 274, 66, 11440, 538, 274, 65, 18, 307, 3671, 502, 13, 50860], "temperature": 0.0, "avg_logprob": -0.1289726110605093, "compression_ratio": 1.6517857142857142, "no_speech_prob": 0.0003923821495845914}, {"id": 621, "seek": 207974, "start": 2089.74, "end": 2091.64, "text": " So let's just implement that.", "tokens": [50865, 407, 718, 311, 445, 4445, 300, 13, 50960], "temperature": 0.0, "avg_logprob": -0.1289726110605093, "compression_ratio": 1.6517857142857142, "no_speech_prob": 0.0003923821495845914}, {"id": 622, "seek": 207974, "start": 2091.74, "end": 2093.64, "text": " Basically, dlogits will be", "tokens": [50965, 8537, 11, 274, 4987, 1208, 486, 312, 51060], "temperature": 0.0, "avg_logprob": -0.1289726110605093, "compression_ratio": 1.6517857142857142, "no_speech_prob": 0.0003923821495845914}, {"id": 623, "seek": 207974, "start": 2093.74, "end": 2095.64, "text": " exactly copying", "tokens": [51065, 2293, 27976, 51160], "temperature": 0.0, "avg_logprob": -0.1289726110605093, "compression_ratio": 1.6517857142857142, "no_speech_prob": 0.0003923821495845914}, {"id": 624, "seek": 207974, "start": 2095.74, "end": 2097.64, "text": " the derivative on normlogits.", "tokens": [51165, 264, 13760, 322, 2026, 4987, 1208, 13, 51260], "temperature": 0.0, "avg_logprob": -0.1289726110605093, "compression_ratio": 1.6517857142857142, "no_speech_prob": 0.0003923821495845914}, {"id": 625, "seek": 207974, "start": 2097.74, "end": 2099.64, "text": " So", "tokens": [51265, 407, 51360], "temperature": 0.0, "avg_logprob": -0.1289726110605093, "compression_ratio": 1.6517857142857142, "no_speech_prob": 0.0003923821495845914}, {"id": 626, "seek": 207974, "start": 2099.74, "end": 2101.64, "text": " dlogits equals", "tokens": [51365, 274, 4987, 1208, 6915, 51460], "temperature": 0.0, "avg_logprob": -0.1289726110605093, "compression_ratio": 1.6517857142857142, "no_speech_prob": 0.0003923821495845914}, {"id": 627, "seek": 207974, "start": 2101.74, "end": 2103.64, "text": " dnormlogits, and I'll do a .clone", "tokens": [51465, 274, 13403, 4987, 1208, 11, 293, 286, 603, 360, 257, 2411, 3474, 546, 51560], "temperature": 0.0, "avg_logprob": -0.1289726110605093, "compression_ratio": 1.6517857142857142, "no_speech_prob": 0.0003923821495845914}, {"id": 628, "seek": 207974, "start": 2103.74, "end": 2105.64, "text": " for safety, so we're just making a copy.", "tokens": [51565, 337, 4514, 11, 370, 321, 434, 445, 1455, 257, 5055, 13, 51660], "temperature": 0.0, "avg_logprob": -0.1289726110605093, "compression_ratio": 1.6517857142857142, "no_speech_prob": 0.0003923821495845914}, {"id": 629, "seek": 207974, "start": 2105.74, "end": 2107.64, "text": " And then we have that", "tokens": [51665, 400, 550, 321, 362, 300, 51760], "temperature": 0.0, "avg_logprob": -0.1289726110605093, "compression_ratio": 1.6517857142857142, "no_speech_prob": 0.0003923821495845914}, {"id": 630, "seek": 207974, "start": 2107.74, "end": 2109.64, "text": " dlogitmaxis,", "tokens": [51765, 274, 4987, 270, 1696, 39637, 11, 51860], "temperature": 0.0, "avg_logprob": -0.1289726110605093, "compression_ratio": 1.6517857142857142, "no_speech_prob": 0.0003923821495845914}, {"id": 631, "seek": 210964, "start": 2109.64, "end": 2111.54, "text": " will be the negative", "tokens": [50365, 486, 312, 264, 3671, 50460], "temperature": 0.0, "avg_logprob": -0.10674502498419709, "compression_ratio": 1.8289473684210527, "no_speech_prob": 0.000346119690220803}, {"id": 632, "seek": 210964, "start": 2111.64, "end": 2113.54, "text": " of dnormlogits,", "tokens": [50465, 295, 274, 13403, 4987, 1208, 11, 50560], "temperature": 0.0, "avg_logprob": -0.10674502498419709, "compression_ratio": 1.8289473684210527, "no_speech_prob": 0.000346119690220803}, {"id": 633, "seek": 210964, "start": 2113.64, "end": 2115.54, "text": " because of the negative sign.", "tokens": [50565, 570, 295, 264, 3671, 1465, 13, 50660], "temperature": 0.0, "avg_logprob": -0.10674502498419709, "compression_ratio": 1.8289473684210527, "no_speech_prob": 0.000346119690220803}, {"id": 634, "seek": 210964, "start": 2115.64, "end": 2117.54, "text": " And then we have to be careful because", "tokens": [50665, 400, 550, 321, 362, 281, 312, 5026, 570, 50760], "temperature": 0.0, "avg_logprob": -0.10674502498419709, "compression_ratio": 1.8289473684210527, "no_speech_prob": 0.000346119690220803}, {"id": 635, "seek": 210964, "start": 2117.64, "end": 2119.54, "text": " dlogitmaxis is", "tokens": [50765, 274, 4987, 270, 1696, 39637, 307, 50860], "temperature": 0.0, "avg_logprob": -0.10674502498419709, "compression_ratio": 1.8289473684210527, "no_speech_prob": 0.000346119690220803}, {"id": 636, "seek": 210964, "start": 2119.64, "end": 2121.54, "text": " a column, and so", "tokens": [50865, 257, 7738, 11, 293, 370, 50960], "temperature": 0.0, "avg_logprob": -0.10674502498419709, "compression_ratio": 1.8289473684210527, "no_speech_prob": 0.000346119690220803}, {"id": 637, "seek": 210964, "start": 2121.64, "end": 2123.54, "text": " just like we saw before,", "tokens": [50965, 445, 411, 321, 1866, 949, 11, 51060], "temperature": 0.0, "avg_logprob": -0.10674502498419709, "compression_ratio": 1.8289473684210527, "no_speech_prob": 0.000346119690220803}, {"id": 638, "seek": 210964, "start": 2123.64, "end": 2125.54, "text": " because we keep replicating the same", "tokens": [51065, 570, 321, 1066, 3248, 30541, 264, 912, 51160], "temperature": 0.0, "avg_logprob": -0.10674502498419709, "compression_ratio": 1.8289473684210527, "no_speech_prob": 0.000346119690220803}, {"id": 639, "seek": 210964, "start": 2125.64, "end": 2127.54, "text": " elements across all the", "tokens": [51165, 4959, 2108, 439, 264, 51260], "temperature": 0.0, "avg_logprob": -0.10674502498419709, "compression_ratio": 1.8289473684210527, "no_speech_prob": 0.000346119690220803}, {"id": 640, "seek": 210964, "start": 2127.64, "end": 2129.54, "text": " columns, then in the", "tokens": [51265, 13766, 11, 550, 294, 264, 51360], "temperature": 0.0, "avg_logprob": -0.10674502498419709, "compression_ratio": 1.8289473684210527, "no_speech_prob": 0.000346119690220803}, {"id": 641, "seek": 210964, "start": 2129.64, "end": 2131.54, "text": " backward pass, because we keep reusing", "tokens": [51365, 23897, 1320, 11, 570, 321, 1066, 319, 7981, 51460], "temperature": 0.0, "avg_logprob": -0.10674502498419709, "compression_ratio": 1.8289473684210527, "no_speech_prob": 0.000346119690220803}, {"id": 642, "seek": 210964, "start": 2131.64, "end": 2133.54, "text": " this, these are all just like separate", "tokens": [51465, 341, 11, 613, 366, 439, 445, 411, 4994, 51560], "temperature": 0.0, "avg_logprob": -0.10674502498419709, "compression_ratio": 1.8289473684210527, "no_speech_prob": 0.000346119690220803}, {"id": 643, "seek": 210964, "start": 2133.64, "end": 2135.54, "text": " branches of use of that one variable.", "tokens": [51565, 14770, 295, 764, 295, 300, 472, 7006, 13, 51660], "temperature": 0.0, "avg_logprob": -0.10674502498419709, "compression_ratio": 1.8289473684210527, "no_speech_prob": 0.000346119690220803}, {"id": 644, "seek": 210964, "start": 2135.64, "end": 2137.54, "text": " And so therefore, we have to do a", "tokens": [51665, 400, 370, 4412, 11, 321, 362, 281, 360, 257, 51760], "temperature": 0.0, "avg_logprob": -0.10674502498419709, "compression_ratio": 1.8289473684210527, "no_speech_prob": 0.000346119690220803}, {"id": 645, "seek": 210964, "start": 2137.64, "end": 2139.54, "text": " sum along 1, we'd keep", "tokens": [51765, 2408, 2051, 502, 11, 321, 1116, 1066, 51860], "temperature": 0.0, "avg_logprob": -0.10674502498419709, "compression_ratio": 1.8289473684210527, "no_speech_prob": 0.000346119690220803}, {"id": 646, "seek": 213954, "start": 2139.54, "end": 2141.44, "text": " them equals true, so that we don't", "tokens": [50365, 552, 6915, 2074, 11, 370, 300, 321, 500, 380, 50460], "temperature": 0.0, "avg_logprob": -0.07661532748277021, "compression_ratio": 1.874074074074074, "no_speech_prob": 0.0006864761235192418}, {"id": 647, "seek": 213954, "start": 2141.54, "end": 2143.44, "text": " destroy this dimension.", "tokens": [50465, 5293, 341, 10139, 13, 50560], "temperature": 0.0, "avg_logprob": -0.07661532748277021, "compression_ratio": 1.874074074074074, "no_speech_prob": 0.0006864761235192418}, {"id": 648, "seek": 213954, "start": 2143.54, "end": 2145.44, "text": " And then dlogitmaxis will be the same", "tokens": [50565, 400, 550, 274, 4987, 270, 1696, 39637, 486, 312, 264, 912, 50660], "temperature": 0.0, "avg_logprob": -0.07661532748277021, "compression_ratio": 1.874074074074074, "no_speech_prob": 0.0006864761235192418}, {"id": 649, "seek": 213954, "start": 2145.54, "end": 2147.44, "text": " shape. Now we have to be careful because", "tokens": [50665, 3909, 13, 823, 321, 362, 281, 312, 5026, 570, 50760], "temperature": 0.0, "avg_logprob": -0.07661532748277021, "compression_ratio": 1.874074074074074, "no_speech_prob": 0.0006864761235192418}, {"id": 650, "seek": 213954, "start": 2147.54, "end": 2149.44, "text": " this dlogits is not the final dlogits,", "tokens": [50765, 341, 274, 4987, 1208, 307, 406, 264, 2572, 274, 4987, 1208, 11, 50860], "temperature": 0.0, "avg_logprob": -0.07661532748277021, "compression_ratio": 1.874074074074074, "no_speech_prob": 0.0006864761235192418}, {"id": 651, "seek": 213954, "start": 2149.54, "end": 2151.44, "text": " and that's because", "tokens": [50865, 293, 300, 311, 570, 50960], "temperature": 0.0, "avg_logprob": -0.07661532748277021, "compression_ratio": 1.874074074074074, "no_speech_prob": 0.0006864761235192418}, {"id": 652, "seek": 213954, "start": 2151.54, "end": 2153.44, "text": " not only do we get gradient signal", "tokens": [50965, 406, 787, 360, 321, 483, 16235, 6358, 51060], "temperature": 0.0, "avg_logprob": -0.07661532748277021, "compression_ratio": 1.874074074074074, "no_speech_prob": 0.0006864761235192418}, {"id": 653, "seek": 213954, "start": 2153.54, "end": 2155.44, "text": " into logits through here, but", "tokens": [51065, 666, 3565, 1208, 807, 510, 11, 457, 51160], "temperature": 0.0, "avg_logprob": -0.07661532748277021, "compression_ratio": 1.874074074074074, "no_speech_prob": 0.0006864761235192418}, {"id": 654, "seek": 213954, "start": 2155.54, "end": 2157.44, "text": " logitmaxis is a function of logits,", "tokens": [51165, 3565, 270, 1696, 39637, 307, 257, 2445, 295, 3565, 1208, 11, 51260], "temperature": 0.0, "avg_logprob": -0.07661532748277021, "compression_ratio": 1.874074074074074, "no_speech_prob": 0.0006864761235192418}, {"id": 655, "seek": 213954, "start": 2157.54, "end": 2159.44, "text": " and that's a second branch into", "tokens": [51265, 293, 300, 311, 257, 1150, 9819, 666, 51360], "temperature": 0.0, "avg_logprob": -0.07661532748277021, "compression_ratio": 1.874074074074074, "no_speech_prob": 0.0006864761235192418}, {"id": 656, "seek": 213954, "start": 2159.54, "end": 2161.44, "text": " logits. So this is not yet our final", "tokens": [51365, 3565, 1208, 13, 407, 341, 307, 406, 1939, 527, 2572, 51460], "temperature": 0.0, "avg_logprob": -0.07661532748277021, "compression_ratio": 1.874074074074074, "no_speech_prob": 0.0006864761235192418}, {"id": 657, "seek": 213954, "start": 2161.54, "end": 2163.44, "text": " derivative for logits, we will come back", "tokens": [51465, 13760, 337, 3565, 1208, 11, 321, 486, 808, 646, 51560], "temperature": 0.0, "avg_logprob": -0.07661532748277021, "compression_ratio": 1.874074074074074, "no_speech_prob": 0.0006864761235192418}, {"id": 658, "seek": 213954, "start": 2163.54, "end": 2165.44, "text": " later for the second branch.", "tokens": [51565, 1780, 337, 264, 1150, 9819, 13, 51660], "temperature": 0.0, "avg_logprob": -0.07661532748277021, "compression_ratio": 1.874074074074074, "no_speech_prob": 0.0006864761235192418}, {"id": 659, "seek": 213954, "start": 2165.54, "end": 2167.44, "text": " For now, dlogitmaxis is the final derivative,", "tokens": [51665, 1171, 586, 11, 274, 4987, 270, 1696, 39637, 307, 264, 2572, 13760, 11, 51760], "temperature": 0.0, "avg_logprob": -0.07661532748277021, "compression_ratio": 1.874074074074074, "no_speech_prob": 0.0006864761235192418}, {"id": 660, "seek": 213954, "start": 2167.54, "end": 2169.44, "text": " so let me uncomment this", "tokens": [51765, 370, 718, 385, 8585, 518, 341, 51860], "temperature": 0.0, "avg_logprob": -0.07661532748277021, "compression_ratio": 1.874074074074074, "no_speech_prob": 0.0006864761235192418}, {"id": 661, "seek": 216944, "start": 2169.44, "end": 2171.34, "text": " cmp here, and let's just run this.", "tokens": [50365, 269, 2455, 510, 11, 293, 718, 311, 445, 1190, 341, 13, 50460], "temperature": 0.0, "avg_logprob": -0.08371694023544723, "compression_ratio": 1.7054794520547945, "no_speech_prob": 0.0008369149290956557}, {"id": 662, "seek": 216944, "start": 2171.44, "end": 2173.34, "text": " And logitmaxis,", "tokens": [50465, 400, 3565, 270, 1696, 39637, 11, 50560], "temperature": 0.0, "avg_logprob": -0.08371694023544723, "compression_ratio": 1.7054794520547945, "no_speech_prob": 0.0008369149290956557}, {"id": 663, "seek": 216944, "start": 2173.44, "end": 2175.34, "text": " if pytorch, agrees", "tokens": [50565, 498, 25878, 284, 339, 11, 26383, 50660], "temperature": 0.0, "avg_logprob": -0.08371694023544723, "compression_ratio": 1.7054794520547945, "no_speech_prob": 0.0008369149290956557}, {"id": 664, "seek": 216944, "start": 2175.44, "end": 2177.34, "text": " with us. So that was the derivative", "tokens": [50665, 365, 505, 13, 407, 300, 390, 264, 13760, 50760], "temperature": 0.0, "avg_logprob": -0.08371694023544723, "compression_ratio": 1.7054794520547945, "no_speech_prob": 0.0008369149290956557}, {"id": 665, "seek": 216944, "start": 2177.44, "end": 2179.34, "text": " into, through", "tokens": [50765, 666, 11, 807, 50860], "temperature": 0.0, "avg_logprob": -0.08371694023544723, "compression_ratio": 1.7054794520547945, "no_speech_prob": 0.0008369149290956557}, {"id": 666, "seek": 216944, "start": 2179.44, "end": 2181.34, "text": " this line. Now before", "tokens": [50865, 341, 1622, 13, 823, 949, 50960], "temperature": 0.0, "avg_logprob": -0.08371694023544723, "compression_ratio": 1.7054794520547945, "no_speech_prob": 0.0008369149290956557}, {"id": 667, "seek": 216944, "start": 2181.44, "end": 2183.34, "text": " we move on, I want to pause here briefly,", "tokens": [50965, 321, 1286, 322, 11, 286, 528, 281, 10465, 510, 10515, 11, 51060], "temperature": 0.0, "avg_logprob": -0.08371694023544723, "compression_ratio": 1.7054794520547945, "no_speech_prob": 0.0008369149290956557}, {"id": 668, "seek": 216944, "start": 2183.44, "end": 2185.34, "text": " and I want to look at these logitmaxis, and", "tokens": [51065, 293, 286, 528, 281, 574, 412, 613, 3565, 270, 1696, 39637, 11, 293, 51160], "temperature": 0.0, "avg_logprob": -0.08371694023544723, "compression_ratio": 1.7054794520547945, "no_speech_prob": 0.0008369149290956557}, {"id": 669, "seek": 216944, "start": 2185.44, "end": 2187.34, "text": " especially their gradients. We've", "tokens": [51165, 2318, 641, 2771, 2448, 13, 492, 600, 51260], "temperature": 0.0, "avg_logprob": -0.08371694023544723, "compression_ratio": 1.7054794520547945, "no_speech_prob": 0.0008369149290956557}, {"id": 670, "seek": 216944, "start": 2187.44, "end": 2189.34, "text": " talked previously in the previous lecture", "tokens": [51265, 2825, 8046, 294, 264, 3894, 7991, 51360], "temperature": 0.0, "avg_logprob": -0.08371694023544723, "compression_ratio": 1.7054794520547945, "no_speech_prob": 0.0008369149290956557}, {"id": 671, "seek": 216944, "start": 2189.44, "end": 2191.34, "text": " that the only reason we're doing this is", "tokens": [51365, 300, 264, 787, 1778, 321, 434, 884, 341, 307, 51460], "temperature": 0.0, "avg_logprob": -0.08371694023544723, "compression_ratio": 1.7054794520547945, "no_speech_prob": 0.0008369149290956557}, {"id": 672, "seek": 216944, "start": 2191.44, "end": 2193.34, "text": " for the numerical stability of the softmax", "tokens": [51465, 337, 264, 29054, 11826, 295, 264, 2787, 41167, 51560], "temperature": 0.0, "avg_logprob": -0.08371694023544723, "compression_ratio": 1.7054794520547945, "no_speech_prob": 0.0008369149290956557}, {"id": 673, "seek": 216944, "start": 2193.44, "end": 2195.34, "text": " that we are implementing here.", "tokens": [51565, 300, 321, 366, 18114, 510, 13, 51660], "temperature": 0.0, "avg_logprob": -0.08371694023544723, "compression_ratio": 1.7054794520547945, "no_speech_prob": 0.0008369149290956557}, {"id": 674, "seek": 216944, "start": 2195.44, "end": 2197.34, "text": " And we talked about how if you take", "tokens": [51665, 400, 321, 2825, 466, 577, 498, 291, 747, 51760], "temperature": 0.0, "avg_logprob": -0.08371694023544723, "compression_ratio": 1.7054794520547945, "no_speech_prob": 0.0008369149290956557}, {"id": 675, "seek": 216944, "start": 2197.44, "end": 2199.34, "text": " these logits for any one of these examples,", "tokens": [51765, 613, 3565, 1208, 337, 604, 472, 295, 613, 5110, 11, 51860], "temperature": 0.0, "avg_logprob": -0.08371694023544723, "compression_ratio": 1.7054794520547945, "no_speech_prob": 0.0008369149290956557}, {"id": 676, "seek": 219944, "start": 2199.44, "end": 2201.34, "text": " so one row of this logits tensor,", "tokens": [50365, 370, 472, 5386, 295, 341, 3565, 1208, 40863, 11, 50460], "temperature": 0.0, "avg_logprob": -0.07040895734514509, "compression_ratio": 1.7049808429118773, "no_speech_prob": 0.00021489836217369884}, {"id": 677, "seek": 219944, "start": 2201.44, "end": 2203.34, "text": " if you add or subtract", "tokens": [50465, 498, 291, 909, 420, 16390, 50560], "temperature": 0.0, "avg_logprob": -0.07040895734514509, "compression_ratio": 1.7049808429118773, "no_speech_prob": 0.00021489836217369884}, {"id": 678, "seek": 219944, "start": 2203.44, "end": 2205.34, "text": " any value equally to all the elements,", "tokens": [50565, 604, 2158, 12309, 281, 439, 264, 4959, 11, 50660], "temperature": 0.0, "avg_logprob": -0.07040895734514509, "compression_ratio": 1.7049808429118773, "no_speech_prob": 0.00021489836217369884}, {"id": 679, "seek": 219944, "start": 2205.44, "end": 2207.34, "text": " then the value", "tokens": [50665, 550, 264, 2158, 50760], "temperature": 0.0, "avg_logprob": -0.07040895734514509, "compression_ratio": 1.7049808429118773, "no_speech_prob": 0.00021489836217369884}, {"id": 680, "seek": 219944, "start": 2207.44, "end": 2209.34, "text": " of the probes will be unchanged.", "tokens": [50765, 295, 264, 1239, 279, 486, 312, 44553, 13, 50860], "temperature": 0.0, "avg_logprob": -0.07040895734514509, "compression_ratio": 1.7049808429118773, "no_speech_prob": 0.00021489836217369884}, {"id": 681, "seek": 219944, "start": 2209.44, "end": 2211.34, "text": " You're not changing the softmax. The only thing", "tokens": [50865, 509, 434, 406, 4473, 264, 2787, 41167, 13, 440, 787, 551, 50960], "temperature": 0.0, "avg_logprob": -0.07040895734514509, "compression_ratio": 1.7049808429118773, "no_speech_prob": 0.00021489836217369884}, {"id": 682, "seek": 219944, "start": 2211.44, "end": 2213.34, "text": " that this is doing is it's making sure that", "tokens": [50965, 300, 341, 307, 884, 307, 309, 311, 1455, 988, 300, 51060], "temperature": 0.0, "avg_logprob": -0.07040895734514509, "compression_ratio": 1.7049808429118773, "no_speech_prob": 0.00021489836217369884}, {"id": 683, "seek": 219944, "start": 2213.44, "end": 2215.34, "text": " exp doesn't overflow. And the", "tokens": [51065, 1278, 1177, 380, 37772, 13, 400, 264, 51160], "temperature": 0.0, "avg_logprob": -0.07040895734514509, "compression_ratio": 1.7049808429118773, "no_speech_prob": 0.00021489836217369884}, {"id": 684, "seek": 219944, "start": 2215.44, "end": 2217.34, "text": " reason we're using a max is because then we", "tokens": [51165, 1778, 321, 434, 1228, 257, 11469, 307, 570, 550, 321, 51260], "temperature": 0.0, "avg_logprob": -0.07040895734514509, "compression_ratio": 1.7049808429118773, "no_speech_prob": 0.00021489836217369884}, {"id": 685, "seek": 219944, "start": 2217.44, "end": 2219.34, "text": " are guaranteed that each row of logits,", "tokens": [51265, 366, 18031, 300, 1184, 5386, 295, 3565, 1208, 11, 51360], "temperature": 0.0, "avg_logprob": -0.07040895734514509, "compression_ratio": 1.7049808429118773, "no_speech_prob": 0.00021489836217369884}, {"id": 686, "seek": 219944, "start": 2219.44, "end": 2221.34, "text": " the highest number, is zero.", "tokens": [51365, 264, 6343, 1230, 11, 307, 4018, 13, 51460], "temperature": 0.0, "avg_logprob": -0.07040895734514509, "compression_ratio": 1.7049808429118773, "no_speech_prob": 0.00021489836217369884}, {"id": 687, "seek": 219944, "start": 2221.44, "end": 2223.34, "text": " And so this will be safe.", "tokens": [51465, 400, 370, 341, 486, 312, 3273, 13, 51560], "temperature": 0.0, "avg_logprob": -0.07040895734514509, "compression_ratio": 1.7049808429118773, "no_speech_prob": 0.00021489836217369884}, {"id": 688, "seek": 219944, "start": 2223.44, "end": 2225.34, "text": " And so", "tokens": [51565, 400, 370, 51660], "temperature": 0.0, "avg_logprob": -0.07040895734514509, "compression_ratio": 1.7049808429118773, "no_speech_prob": 0.00021489836217369884}, {"id": 689, "seek": 219944, "start": 2225.44, "end": 2227.34, "text": " basically", "tokens": [51665, 1936, 51760], "temperature": 0.0, "avg_logprob": -0.07040895734514509, "compression_ratio": 1.7049808429118773, "no_speech_prob": 0.00021489836217369884}, {"id": 690, "seek": 219944, "start": 2227.44, "end": 2229.34, "text": " that has repercussions.", "tokens": [51765, 300, 575, 28946, 38899, 13, 51860], "temperature": 0.0, "avg_logprob": -0.07040895734514509, "compression_ratio": 1.7049808429118773, "no_speech_prob": 0.00021489836217369884}, {"id": 691, "seek": 222944, "start": 2229.44, "end": 2231.34, "text": " If it is the case that changing", "tokens": [50365, 759, 309, 307, 264, 1389, 300, 4473, 50460], "temperature": 0.0, "avg_logprob": -0.08910197960702997, "compression_ratio": 1.784452296819788, "no_speech_prob": 0.00048813445027917624}, {"id": 692, "seek": 222944, "start": 2231.44, "end": 2233.34, "text": " logitmaxis does not change the", "tokens": [50465, 3565, 270, 1696, 39637, 775, 406, 1319, 264, 50560], "temperature": 0.0, "avg_logprob": -0.08910197960702997, "compression_ratio": 1.784452296819788, "no_speech_prob": 0.00048813445027917624}, {"id": 693, "seek": 222944, "start": 2233.44, "end": 2235.34, "text": " probes, and therefore does not change the loss,", "tokens": [50565, 1239, 279, 11, 293, 4412, 775, 406, 1319, 264, 4470, 11, 50660], "temperature": 0.0, "avg_logprob": -0.08910197960702997, "compression_ratio": 1.784452296819788, "no_speech_prob": 0.00048813445027917624}, {"id": 694, "seek": 222944, "start": 2235.44, "end": 2237.34, "text": " then the gradient on logitmaxis", "tokens": [50665, 550, 264, 16235, 322, 3565, 270, 1696, 39637, 50760], "temperature": 0.0, "avg_logprob": -0.08910197960702997, "compression_ratio": 1.784452296819788, "no_speech_prob": 0.00048813445027917624}, {"id": 695, "seek": 222944, "start": 2237.44, "end": 2239.34, "text": " should be zero.", "tokens": [50765, 820, 312, 4018, 13, 50860], "temperature": 0.0, "avg_logprob": -0.08910197960702997, "compression_ratio": 1.784452296819788, "no_speech_prob": 0.00048813445027917624}, {"id": 696, "seek": 222944, "start": 2239.44, "end": 2241.34, "text": " Because saying those two things is the same.", "tokens": [50865, 1436, 1566, 729, 732, 721, 307, 264, 912, 13, 50960], "temperature": 0.0, "avg_logprob": -0.08910197960702997, "compression_ratio": 1.784452296819788, "no_speech_prob": 0.00048813445027917624}, {"id": 697, "seek": 222944, "start": 2241.44, "end": 2243.34, "text": " So indeed we hope that this is", "tokens": [50965, 407, 6451, 321, 1454, 300, 341, 307, 51060], "temperature": 0.0, "avg_logprob": -0.08910197960702997, "compression_ratio": 1.784452296819788, "no_speech_prob": 0.00048813445027917624}, {"id": 698, "seek": 222944, "start": 2243.44, "end": 2245.34, "text": " very, very small numbers. Indeed we hope this is zero.", "tokens": [51065, 588, 11, 588, 1359, 3547, 13, 15061, 321, 1454, 341, 307, 4018, 13, 51160], "temperature": 0.0, "avg_logprob": -0.08910197960702997, "compression_ratio": 1.784452296819788, "no_speech_prob": 0.00048813445027917624}, {"id": 699, "seek": 222944, "start": 2245.44, "end": 2247.34, "text": " Now because of floating", "tokens": [51165, 823, 570, 295, 12607, 51260], "temperature": 0.0, "avg_logprob": -0.08910197960702997, "compression_ratio": 1.784452296819788, "no_speech_prob": 0.00048813445027917624}, {"id": 700, "seek": 222944, "start": 2247.44, "end": 2249.34, "text": " point sort of wonkiness,", "tokens": [51265, 935, 1333, 295, 1582, 74, 1324, 11, 51360], "temperature": 0.0, "avg_logprob": -0.08910197960702997, "compression_ratio": 1.784452296819788, "no_speech_prob": 0.00048813445027917624}, {"id": 701, "seek": 222944, "start": 2249.44, "end": 2251.34, "text": " this doesn't come out exactly zero.", "tokens": [51365, 341, 1177, 380, 808, 484, 2293, 4018, 13, 51460], "temperature": 0.0, "avg_logprob": -0.08910197960702997, "compression_ratio": 1.784452296819788, "no_speech_prob": 0.00048813445027917624}, {"id": 702, "seek": 222944, "start": 2251.44, "end": 2253.34, "text": " Only in some of the rows it does.", "tokens": [51465, 5686, 294, 512, 295, 264, 13241, 309, 775, 13, 51560], "temperature": 0.0, "avg_logprob": -0.08910197960702997, "compression_ratio": 1.784452296819788, "no_speech_prob": 0.00048813445027917624}, {"id": 703, "seek": 222944, "start": 2253.44, "end": 2255.34, "text": " But we get extremely small values, like", "tokens": [51565, 583, 321, 483, 4664, 1359, 4190, 11, 411, 51660], "temperature": 0.0, "avg_logprob": -0.08910197960702997, "compression_ratio": 1.784452296819788, "no_speech_prob": 0.00048813445027917624}, {"id": 704, "seek": 222944, "start": 2255.44, "end": 2257.34, "text": " 1e-9 or 10.", "tokens": [51665, 502, 68, 12, 24, 420, 1266, 13, 51760], "temperature": 0.0, "avg_logprob": -0.08910197960702997, "compression_ratio": 1.784452296819788, "no_speech_prob": 0.00048813445027917624}, {"id": 705, "seek": 222944, "start": 2257.44, "end": 2259.34, "text": " And so this is telling us that the values of", "tokens": [51765, 400, 370, 341, 307, 3585, 505, 300, 264, 4190, 295, 51860], "temperature": 0.0, "avg_logprob": -0.08910197960702997, "compression_ratio": 1.784452296819788, "no_speech_prob": 0.00048813445027917624}, {"id": 706, "seek": 225934, "start": 2259.34, "end": 2261.2400000000002, "text": " logitmaxis are not impacting the loss", "tokens": [50365, 3565, 270, 1696, 39637, 366, 406, 29963, 264, 4470, 50460], "temperature": 0.0, "avg_logprob": -0.09343686280427156, "compression_ratio": 1.7003610108303249, "no_speech_prob": 0.00017832827870734036}, {"id": 707, "seek": 225934, "start": 2261.34, "end": 2263.2400000000002, "text": " as they shouldn't.", "tokens": [50465, 382, 436, 4659, 380, 13, 50560], "temperature": 0.0, "avg_logprob": -0.09343686280427156, "compression_ratio": 1.7003610108303249, "no_speech_prob": 0.00017832827870734036}, {"id": 708, "seek": 225934, "start": 2263.34, "end": 2265.2400000000002, "text": " It feels kind of weird to backpropagate through this branch,", "tokens": [50565, 467, 3417, 733, 295, 3657, 281, 646, 79, 1513, 559, 473, 807, 341, 9819, 11, 50660], "temperature": 0.0, "avg_logprob": -0.09343686280427156, "compression_ratio": 1.7003610108303249, "no_speech_prob": 0.00017832827870734036}, {"id": 709, "seek": 225934, "start": 2265.34, "end": 2267.2400000000002, "text": " honestly, because", "tokens": [50665, 6095, 11, 570, 50760], "temperature": 0.0, "avg_logprob": -0.09343686280427156, "compression_ratio": 1.7003610108303249, "no_speech_prob": 0.00017832827870734036}, {"id": 710, "seek": 225934, "start": 2267.34, "end": 2269.2400000000002, "text": " if you have any", "tokens": [50765, 498, 291, 362, 604, 50860], "temperature": 0.0, "avg_logprob": -0.09343686280427156, "compression_ratio": 1.7003610108303249, "no_speech_prob": 0.00017832827870734036}, {"id": 711, "seek": 225934, "start": 2269.34, "end": 2271.2400000000002, "text": " implementation of f.crossentropy and", "tokens": [50865, 11420, 295, 283, 13, 35418, 317, 27514, 293, 50960], "temperature": 0.0, "avg_logprob": -0.09343686280427156, "compression_ratio": 1.7003610108303249, "no_speech_prob": 0.00017832827870734036}, {"id": 712, "seek": 225934, "start": 2271.34, "end": 2273.2400000000002, "text": " pytorch, and you block together", "tokens": [50965, 25878, 284, 339, 11, 293, 291, 3461, 1214, 51060], "temperature": 0.0, "avg_logprob": -0.09343686280427156, "compression_ratio": 1.7003610108303249, "no_speech_prob": 0.00017832827870734036}, {"id": 713, "seek": 225934, "start": 2273.34, "end": 2275.2400000000002, "text": " all of these elements, and you're not doing backpropagation", "tokens": [51065, 439, 295, 613, 4959, 11, 293, 291, 434, 406, 884, 646, 79, 1513, 559, 399, 51160], "temperature": 0.0, "avg_logprob": -0.09343686280427156, "compression_ratio": 1.7003610108303249, "no_speech_prob": 0.00017832827870734036}, {"id": 714, "seek": 225934, "start": 2275.34, "end": 2277.2400000000002, "text": " piece by piece, then you would", "tokens": [51165, 2522, 538, 2522, 11, 550, 291, 576, 51260], "temperature": 0.0, "avg_logprob": -0.09343686280427156, "compression_ratio": 1.7003610108303249, "no_speech_prob": 0.00017832827870734036}, {"id": 715, "seek": 225934, "start": 2277.34, "end": 2279.2400000000002, "text": " probably assume that the derivative", "tokens": [51265, 1391, 6552, 300, 264, 13760, 51360], "temperature": 0.0, "avg_logprob": -0.09343686280427156, "compression_ratio": 1.7003610108303249, "no_speech_prob": 0.00017832827870734036}, {"id": 716, "seek": 225934, "start": 2279.34, "end": 2281.2400000000002, "text": " through here is exactly zero.", "tokens": [51365, 807, 510, 307, 2293, 4018, 13, 51460], "temperature": 0.0, "avg_logprob": -0.09343686280427156, "compression_ratio": 1.7003610108303249, "no_speech_prob": 0.00017832827870734036}, {"id": 717, "seek": 225934, "start": 2281.34, "end": 2283.2400000000002, "text": " So you would be sort of", "tokens": [51465, 407, 291, 576, 312, 1333, 295, 51560], "temperature": 0.0, "avg_logprob": -0.09343686280427156, "compression_ratio": 1.7003610108303249, "no_speech_prob": 0.00017832827870734036}, {"id": 718, "seek": 225934, "start": 2283.34, "end": 2285.2400000000002, "text": " skipping", "tokens": [51565, 31533, 51660], "temperature": 0.0, "avg_logprob": -0.09343686280427156, "compression_ratio": 1.7003610108303249, "no_speech_prob": 0.00017832827870734036}, {"id": 719, "seek": 225934, "start": 2285.34, "end": 2287.2400000000002, "text": " this branch. Because", "tokens": [51665, 341, 9819, 13, 1436, 51760], "temperature": 0.0, "avg_logprob": -0.09343686280427156, "compression_ratio": 1.7003610108303249, "no_speech_prob": 0.00017832827870734036}, {"id": 720, "seek": 225934, "start": 2287.34, "end": 2289.2400000000002, "text": " it's only done for numerical stability.", "tokens": [51765, 309, 311, 787, 1096, 337, 29054, 11826, 13, 51860], "temperature": 0.0, "avg_logprob": -0.09343686280427156, "compression_ratio": 1.7003610108303249, "no_speech_prob": 0.00017832827870734036}, {"id": 721, "seek": 228934, "start": 2289.34, "end": 2291.2400000000002, "text": " But it's interesting to see that even if you break up", "tokens": [50365, 583, 309, 311, 1880, 281, 536, 300, 754, 498, 291, 1821, 493, 50460], "temperature": 0.0, "avg_logprob": -0.06177686295419369, "compression_ratio": 1.8087774294670846, "no_speech_prob": 0.00045773424790240824}, {"id": 722, "seek": 228934, "start": 2291.34, "end": 2293.2400000000002, "text": " everything into the full atoms, and you", "tokens": [50465, 1203, 666, 264, 1577, 16871, 11, 293, 291, 50560], "temperature": 0.0, "avg_logprob": -0.06177686295419369, "compression_ratio": 1.8087774294670846, "no_speech_prob": 0.00045773424790240824}, {"id": 723, "seek": 228934, "start": 2293.34, "end": 2295.2400000000002, "text": " still do the computation as you'd like", "tokens": [50565, 920, 360, 264, 24903, 382, 291, 1116, 411, 50660], "temperature": 0.0, "avg_logprob": -0.06177686295419369, "compression_ratio": 1.8087774294670846, "no_speech_prob": 0.00045773424790240824}, {"id": 724, "seek": 228934, "start": 2295.34, "end": 2297.2400000000002, "text": " with respect to numerical stability, the correct thing", "tokens": [50665, 365, 3104, 281, 29054, 11826, 11, 264, 3006, 551, 50760], "temperature": 0.0, "avg_logprob": -0.06177686295419369, "compression_ratio": 1.8087774294670846, "no_speech_prob": 0.00045773424790240824}, {"id": 725, "seek": 228934, "start": 2297.34, "end": 2299.2400000000002, "text": " happens. And you still get", "tokens": [50765, 2314, 13, 400, 291, 920, 483, 50860], "temperature": 0.0, "avg_logprob": -0.06177686295419369, "compression_ratio": 1.8087774294670846, "no_speech_prob": 0.00045773424790240824}, {"id": 726, "seek": 228934, "start": 2299.34, "end": 2301.2400000000002, "text": " very, very small gradients here.", "tokens": [50865, 588, 11, 588, 1359, 2771, 2448, 510, 13, 50960], "temperature": 0.0, "avg_logprob": -0.06177686295419369, "compression_ratio": 1.8087774294670846, "no_speech_prob": 0.00045773424790240824}, {"id": 727, "seek": 228934, "start": 2301.34, "end": 2303.2400000000002, "text": " Basically reflecting the fact that", "tokens": [50965, 8537, 23543, 264, 1186, 300, 51060], "temperature": 0.0, "avg_logprob": -0.06177686295419369, "compression_ratio": 1.8087774294670846, "no_speech_prob": 0.00045773424790240824}, {"id": 728, "seek": 228934, "start": 2303.34, "end": 2305.2400000000002, "text": " the values of these do not matter", "tokens": [51065, 264, 4190, 295, 613, 360, 406, 1871, 51160], "temperature": 0.0, "avg_logprob": -0.06177686295419369, "compression_ratio": 1.8087774294670846, "no_speech_prob": 0.00045773424790240824}, {"id": 729, "seek": 228934, "start": 2305.34, "end": 2307.2400000000002, "text": " with respect to the final loss.", "tokens": [51165, 365, 3104, 281, 264, 2572, 4470, 13, 51260], "temperature": 0.0, "avg_logprob": -0.06177686295419369, "compression_ratio": 1.8087774294670846, "no_speech_prob": 0.00045773424790240824}, {"id": 730, "seek": 228934, "start": 2307.34, "end": 2309.2400000000002, "text": " Okay, so let's now continue backpropagation", "tokens": [51265, 1033, 11, 370, 718, 311, 586, 2354, 646, 79, 1513, 559, 399, 51360], "temperature": 0.0, "avg_logprob": -0.06177686295419369, "compression_ratio": 1.8087774294670846, "no_speech_prob": 0.00045773424790240824}, {"id": 731, "seek": 228934, "start": 2309.34, "end": 2311.2400000000002, "text": " through this line here.", "tokens": [51365, 807, 341, 1622, 510, 13, 51460], "temperature": 0.0, "avg_logprob": -0.06177686295419369, "compression_ratio": 1.8087774294670846, "no_speech_prob": 0.00045773424790240824}, {"id": 732, "seek": 228934, "start": 2311.34, "end": 2313.2400000000002, "text": " We've just calculated the logitmaxis, and now", "tokens": [51465, 492, 600, 445, 15598, 264, 3565, 270, 1696, 39637, 11, 293, 586, 51560], "temperature": 0.0, "avg_logprob": -0.06177686295419369, "compression_ratio": 1.8087774294670846, "no_speech_prob": 0.00045773424790240824}, {"id": 733, "seek": 228934, "start": 2313.34, "end": 2315.2400000000002, "text": " we want to backprop into logits through this", "tokens": [51565, 321, 528, 281, 646, 79, 1513, 666, 3565, 1208, 807, 341, 51660], "temperature": 0.0, "avg_logprob": -0.06177686295419369, "compression_ratio": 1.8087774294670846, "no_speech_prob": 0.00045773424790240824}, {"id": 734, "seek": 228934, "start": 2315.34, "end": 2317.2400000000002, "text": " second branch. Now here of course", "tokens": [51665, 1150, 9819, 13, 823, 510, 295, 1164, 51760], "temperature": 0.0, "avg_logprob": -0.06177686295419369, "compression_ratio": 1.8087774294670846, "no_speech_prob": 0.00045773424790240824}, {"id": 735, "seek": 228934, "start": 2317.34, "end": 2319.2400000000002, "text": " we took logits, and we took the max", "tokens": [51765, 321, 1890, 3565, 1208, 11, 293, 321, 1890, 264, 11469, 51860], "temperature": 0.0, "avg_logprob": -0.06177686295419369, "compression_ratio": 1.8087774294670846, "no_speech_prob": 0.00045773424790240824}, {"id": 736, "seek": 231924, "start": 2319.24, "end": 2321.14, "text": " along all the rows, and then", "tokens": [50365, 2051, 439, 264, 13241, 11, 293, 550, 50460], "temperature": 0.0, "avg_logprob": -0.10095200117896586, "compression_ratio": 1.8837209302325582, "no_speech_prob": 0.0004092954914085567}, {"id": 737, "seek": 231924, "start": 2321.24, "end": 2323.14, "text": " we looked at its values here.", "tokens": [50465, 321, 2956, 412, 1080, 4190, 510, 13, 50560], "temperature": 0.0, "avg_logprob": -0.10095200117896586, "compression_ratio": 1.8837209302325582, "no_speech_prob": 0.0004092954914085567}, {"id": 738, "seek": 231924, "start": 2323.24, "end": 2325.14, "text": " Now the way this works is that in pytorch,", "tokens": [50565, 823, 264, 636, 341, 1985, 307, 300, 294, 25878, 284, 339, 11, 50660], "temperature": 0.0, "avg_logprob": -0.10095200117896586, "compression_ratio": 1.8837209302325582, "no_speech_prob": 0.0004092954914085567}, {"id": 739, "seek": 231924, "start": 2327.24, "end": 2329.14, "text": " this thing here,", "tokens": [50765, 341, 551, 510, 11, 50860], "temperature": 0.0, "avg_logprob": -0.10095200117896586, "compression_ratio": 1.8837209302325582, "no_speech_prob": 0.0004092954914085567}, {"id": 740, "seek": 231924, "start": 2329.24, "end": 2331.14, "text": " the max returns both the values,", "tokens": [50865, 264, 11469, 11247, 1293, 264, 4190, 11, 50960], "temperature": 0.0, "avg_logprob": -0.10095200117896586, "compression_ratio": 1.8837209302325582, "no_speech_prob": 0.0004092954914085567}, {"id": 741, "seek": 231924, "start": 2331.24, "end": 2333.14, "text": " and it returns the indices at which those", "tokens": [50965, 293, 309, 11247, 264, 43840, 412, 597, 729, 51060], "temperature": 0.0, "avg_logprob": -0.10095200117896586, "compression_ratio": 1.8837209302325582, "no_speech_prob": 0.0004092954914085567}, {"id": 742, "seek": 231924, "start": 2333.24, "end": 2335.14, "text": " values to count the maximum value.", "tokens": [51065, 4190, 281, 1207, 264, 6674, 2158, 13, 51160], "temperature": 0.0, "avg_logprob": -0.10095200117896586, "compression_ratio": 1.8837209302325582, "no_speech_prob": 0.0004092954914085567}, {"id": 743, "seek": 231924, "start": 2335.24, "end": 2337.14, "text": " Now in the forward pass, we only", "tokens": [51165, 823, 294, 264, 2128, 1320, 11, 321, 787, 51260], "temperature": 0.0, "avg_logprob": -0.10095200117896586, "compression_ratio": 1.8837209302325582, "no_speech_prob": 0.0004092954914085567}, {"id": 744, "seek": 231924, "start": 2337.24, "end": 2339.14, "text": " used values, because that's all we needed.", "tokens": [51265, 1143, 4190, 11, 570, 300, 311, 439, 321, 2978, 13, 51360], "temperature": 0.0, "avg_logprob": -0.10095200117896586, "compression_ratio": 1.8837209302325582, "no_speech_prob": 0.0004092954914085567}, {"id": 745, "seek": 231924, "start": 2339.24, "end": 2341.14, "text": " But in the backward pass, it's extremely", "tokens": [51365, 583, 294, 264, 23897, 1320, 11, 309, 311, 4664, 51460], "temperature": 0.0, "avg_logprob": -0.10095200117896586, "compression_ratio": 1.8837209302325582, "no_speech_prob": 0.0004092954914085567}, {"id": 746, "seek": 231924, "start": 2341.24, "end": 2343.14, "text": " useful to know about where those", "tokens": [51465, 4420, 281, 458, 466, 689, 729, 51560], "temperature": 0.0, "avg_logprob": -0.10095200117896586, "compression_ratio": 1.8837209302325582, "no_speech_prob": 0.0004092954914085567}, {"id": 747, "seek": 231924, "start": 2343.24, "end": 2345.14, "text": " maximum values occurred.", "tokens": [51565, 6674, 4190, 11068, 13, 51660], "temperature": 0.0, "avg_logprob": -0.10095200117896586, "compression_ratio": 1.8837209302325582, "no_speech_prob": 0.0004092954914085567}, {"id": 748, "seek": 231924, "start": 2345.24, "end": 2347.14, "text": " And we have the indices at which they occurred.", "tokens": [51665, 400, 321, 362, 264, 43840, 412, 597, 436, 11068, 13, 51760], "temperature": 0.0, "avg_logprob": -0.10095200117896586, "compression_ratio": 1.8837209302325582, "no_speech_prob": 0.0004092954914085567}, {"id": 749, "seek": 231924, "start": 2347.24, "end": 2349.14, "text": " And this will of course help us do", "tokens": [51765, 400, 341, 486, 295, 1164, 854, 505, 360, 51860], "temperature": 0.0, "avg_logprob": -0.10095200117896586, "compression_ratio": 1.8837209302325582, "no_speech_prob": 0.0004092954914085567}, {"id": 750, "seek": 234924, "start": 2349.24, "end": 2351.14, "text": " the backpropagation.", "tokens": [50365, 264, 646, 79, 1513, 559, 399, 13, 50460], "temperature": 0.0, "avg_logprob": -0.10445846557617187, "compression_ratio": 1.7479338842975207, "no_speech_prob": 0.00017478242807555944}, {"id": 751, "seek": 234924, "start": 2351.24, "end": 2353.14, "text": " Because what should the backward pass be", "tokens": [50465, 1436, 437, 820, 264, 23897, 1320, 312, 50560], "temperature": 0.0, "avg_logprob": -0.10445846557617187, "compression_ratio": 1.7479338842975207, "no_speech_prob": 0.00017478242807555944}, {"id": 752, "seek": 234924, "start": 2353.24, "end": 2355.14, "text": " here in this case? We have the logit tensor,", "tokens": [50565, 510, 294, 341, 1389, 30, 492, 362, 264, 3565, 270, 40863, 11, 50660], "temperature": 0.0, "avg_logprob": -0.10445846557617187, "compression_ratio": 1.7479338842975207, "no_speech_prob": 0.00017478242807555944}, {"id": 753, "seek": 234924, "start": 2355.24, "end": 2357.14, "text": " which is 32 by 27, and", "tokens": [50665, 597, 307, 8858, 538, 7634, 11, 293, 50760], "temperature": 0.0, "avg_logprob": -0.10445846557617187, "compression_ratio": 1.7479338842975207, "no_speech_prob": 0.00017478242807555944}, {"id": 754, "seek": 234924, "start": 2357.24, "end": 2359.14, "text": " in each row we find the maximum value,", "tokens": [50765, 294, 1184, 5386, 321, 915, 264, 6674, 2158, 11, 50860], "temperature": 0.0, "avg_logprob": -0.10445846557617187, "compression_ratio": 1.7479338842975207, "no_speech_prob": 0.00017478242807555944}, {"id": 755, "seek": 234924, "start": 2359.24, "end": 2361.14, "text": " and then that value gets plucked out into", "tokens": [50865, 293, 550, 300, 2158, 2170, 41514, 292, 484, 666, 50960], "temperature": 0.0, "avg_logprob": -0.10445846557617187, "compression_ratio": 1.7479338842975207, "no_speech_prob": 0.00017478242807555944}, {"id": 756, "seek": 234924, "start": 2361.24, "end": 2363.14, "text": " logitmaxis. And so intuitively,", "tokens": [50965, 3565, 270, 1696, 39637, 13, 400, 370, 46506, 11, 51060], "temperature": 0.0, "avg_logprob": -0.10445846557617187, "compression_ratio": 1.7479338842975207, "no_speech_prob": 0.00017478242807555944}, {"id": 757, "seek": 234924, "start": 2363.24, "end": 2365.14, "text": " basically", "tokens": [51065, 1936, 51160], "temperature": 0.0, "avg_logprob": -0.10445846557617187, "compression_ratio": 1.7479338842975207, "no_speech_prob": 0.00017478242807555944}, {"id": 758, "seek": 234924, "start": 2365.24, "end": 2367.14, "text": " the derivative", "tokens": [51165, 264, 13760, 51260], "temperature": 0.0, "avg_logprob": -0.10445846557617187, "compression_ratio": 1.7479338842975207, "no_speech_prob": 0.00017478242807555944}, {"id": 759, "seek": 234924, "start": 2367.24, "end": 2369.14, "text": " flowing through here then", "tokens": [51265, 13974, 807, 510, 550, 51360], "temperature": 0.0, "avg_logprob": -0.10445846557617187, "compression_ratio": 1.7479338842975207, "no_speech_prob": 0.00017478242807555944}, {"id": 760, "seek": 234924, "start": 2369.24, "end": 2371.14, "text": " should be 1", "tokens": [51365, 820, 312, 502, 51460], "temperature": 0.0, "avg_logprob": -0.10445846557617187, "compression_ratio": 1.7479338842975207, "no_speech_prob": 0.00017478242807555944}, {"id": 761, "seek": 234924, "start": 2371.24, "end": 2373.14, "text": " times the local derivative", "tokens": [51465, 1413, 264, 2654, 13760, 51560], "temperature": 0.0, "avg_logprob": -0.10445846557617187, "compression_ratio": 1.7479338842975207, "no_speech_prob": 0.00017478242807555944}, {"id": 762, "seek": 234924, "start": 2373.24, "end": 2375.14, "text": " is 1 for the appropriate entry that was", "tokens": [51565, 307, 502, 337, 264, 6854, 8729, 300, 390, 51660], "temperature": 0.0, "avg_logprob": -0.10445846557617187, "compression_ratio": 1.7479338842975207, "no_speech_prob": 0.00017478242807555944}, {"id": 763, "seek": 234924, "start": 2375.24, "end": 2377.14, "text": " plucked out, and", "tokens": [51665, 41514, 292, 484, 11, 293, 51760], "temperature": 0.0, "avg_logprob": -0.10445846557617187, "compression_ratio": 1.7479338842975207, "no_speech_prob": 0.00017478242807555944}, {"id": 764, "seek": 234924, "start": 2377.24, "end": 2379.14, "text": " then times the global derivative,", "tokens": [51765, 550, 1413, 264, 4338, 13760, 11, 51860], "temperature": 0.0, "avg_logprob": -0.10445846557617187, "compression_ratio": 1.7479338842975207, "no_speech_prob": 0.00017478242807555944}, {"id": 765, "seek": 237914, "start": 2379.14, "end": 2381.04, "text": " of the logitmaxis.", "tokens": [50365, 295, 264, 3565, 270, 1696, 39637, 13, 50460], "temperature": 0.0, "avg_logprob": -0.10681532460747035, "compression_ratio": 1.6754716981132076, "no_speech_prob": 0.0013608551817014813}, {"id": 766, "seek": 237914, "start": 2381.14, "end": 2383.04, "text": " So really what we're doing here, if you think through it,", "tokens": [50465, 407, 534, 437, 321, 434, 884, 510, 11, 498, 291, 519, 807, 309, 11, 50560], "temperature": 0.0, "avg_logprob": -0.10681532460747035, "compression_ratio": 1.6754716981132076, "no_speech_prob": 0.0013608551817014813}, {"id": 767, "seek": 237914, "start": 2383.14, "end": 2385.04, "text": " is we need to take the delogitmaxis,", "tokens": [50565, 307, 321, 643, 281, 747, 264, 1103, 664, 270, 1696, 39637, 11, 50660], "temperature": 0.0, "avg_logprob": -0.10681532460747035, "compression_ratio": 1.6754716981132076, "no_speech_prob": 0.0013608551817014813}, {"id": 768, "seek": 237914, "start": 2385.14, "end": 2387.04, "text": " and we need to scatter it to", "tokens": [50665, 293, 321, 643, 281, 34951, 309, 281, 50760], "temperature": 0.0, "avg_logprob": -0.10681532460747035, "compression_ratio": 1.6754716981132076, "no_speech_prob": 0.0013608551817014813}, {"id": 769, "seek": 237914, "start": 2387.14, "end": 2389.04, "text": " the correct positions", "tokens": [50765, 264, 3006, 8432, 50860], "temperature": 0.0, "avg_logprob": -0.10681532460747035, "compression_ratio": 1.6754716981132076, "no_speech_prob": 0.0013608551817014813}, {"id": 770, "seek": 237914, "start": 2389.14, "end": 2391.04, "text": " in these logits,", "tokens": [50865, 294, 613, 3565, 1208, 11, 50960], "temperature": 0.0, "avg_logprob": -0.10681532460747035, "compression_ratio": 1.6754716981132076, "no_speech_prob": 0.0013608551817014813}, {"id": 771, "seek": 237914, "start": 2391.14, "end": 2393.04, "text": " from where the maximum values came.", "tokens": [50965, 490, 689, 264, 6674, 4190, 1361, 13, 51060], "temperature": 0.0, "avg_logprob": -0.10681532460747035, "compression_ratio": 1.6754716981132076, "no_speech_prob": 0.0013608551817014813}, {"id": 772, "seek": 237914, "start": 2393.14, "end": 2395.04, "text": " And so,", "tokens": [51065, 400, 370, 11, 51160], "temperature": 0.0, "avg_logprob": -0.10681532460747035, "compression_ratio": 1.6754716981132076, "no_speech_prob": 0.0013608551817014813}, {"id": 773, "seek": 237914, "start": 2395.14, "end": 2397.04, "text": " I came up with", "tokens": [51165, 286, 1361, 493, 365, 51260], "temperature": 0.0, "avg_logprob": -0.10681532460747035, "compression_ratio": 1.6754716981132076, "no_speech_prob": 0.0013608551817014813}, {"id": 774, "seek": 237914, "start": 2397.14, "end": 2399.04, "text": " one line of code that does that.", "tokens": [51265, 472, 1622, 295, 3089, 300, 775, 300, 13, 51360], "temperature": 0.0, "avg_logprob": -0.10681532460747035, "compression_ratio": 1.6754716981132076, "no_speech_prob": 0.0013608551817014813}, {"id": 775, "seek": 237914, "start": 2399.14, "end": 2401.04, "text": " Let me just erase a bunch of stuff here.", "tokens": [51365, 961, 385, 445, 23525, 257, 3840, 295, 1507, 510, 13, 51460], "temperature": 0.0, "avg_logprob": -0.10681532460747035, "compression_ratio": 1.6754716981132076, "no_speech_prob": 0.0013608551817014813}, {"id": 776, "seek": 237914, "start": 2401.14, "end": 2403.04, "text": " You could do it kind of very similar", "tokens": [51465, 509, 727, 360, 309, 733, 295, 588, 2531, 51560], "temperature": 0.0, "avg_logprob": -0.10681532460747035, "compression_ratio": 1.6754716981132076, "no_speech_prob": 0.0013608551817014813}, {"id": 777, "seek": 237914, "start": 2403.14, "end": 2405.04, "text": " to what we've done here, where we create", "tokens": [51565, 281, 437, 321, 600, 1096, 510, 11, 689, 321, 1884, 51660], "temperature": 0.0, "avg_logprob": -0.10681532460747035, "compression_ratio": 1.6754716981132076, "no_speech_prob": 0.0013608551817014813}, {"id": 778, "seek": 237914, "start": 2405.14, "end": 2407.04, "text": " a zeros, and then we populate", "tokens": [51665, 257, 35193, 11, 293, 550, 321, 1665, 5256, 51760], "temperature": 0.0, "avg_logprob": -0.10681532460747035, "compression_ratio": 1.6754716981132076, "no_speech_prob": 0.0013608551817014813}, {"id": 779, "seek": 237914, "start": 2407.14, "end": 2409.04, "text": " the correct elements.", "tokens": [51765, 264, 3006, 4959, 13, 51860], "temperature": 0.0, "avg_logprob": -0.10681532460747035, "compression_ratio": 1.6754716981132076, "no_speech_prob": 0.0013608551817014813}, {"id": 780, "seek": 240914, "start": 2409.14, "end": 2411.04, "text": " So we use the indices here, and we would", "tokens": [50365, 407, 321, 764, 264, 43840, 510, 11, 293, 321, 576, 50460], "temperature": 0.0, "avg_logprob": -0.14822857081890106, "compression_ratio": 1.5938864628820961, "no_speech_prob": 0.0002340041974093765}, {"id": 781, "seek": 240914, "start": 2411.14, "end": 2413.04, "text": " set them to be 1. But you can", "tokens": [50465, 992, 552, 281, 312, 502, 13, 583, 291, 393, 50560], "temperature": 0.0, "avg_logprob": -0.14822857081890106, "compression_ratio": 1.5938864628820961, "no_speech_prob": 0.0002340041974093765}, {"id": 782, "seek": 240914, "start": 2413.14, "end": 2415.04, "text": " also use one hot.", "tokens": [50565, 611, 764, 472, 2368, 13, 50660], "temperature": 0.0, "avg_logprob": -0.14822857081890106, "compression_ratio": 1.5938864628820961, "no_speech_prob": 0.0002340041974093765}, {"id": 783, "seek": 240914, "start": 2415.14, "end": 2417.04, "text": " So f dot one hot,", "tokens": [50665, 407, 283, 5893, 472, 2368, 11, 50760], "temperature": 0.0, "avg_logprob": -0.14822857081890106, "compression_ratio": 1.5938864628820961, "no_speech_prob": 0.0002340041974093765}, {"id": 784, "seek": 240914, "start": 2417.14, "end": 2419.04, "text": " and then I'm taking the logits of max", "tokens": [50765, 293, 550, 286, 478, 1940, 264, 3565, 1208, 295, 11469, 50860], "temperature": 0.0, "avg_logprob": -0.14822857081890106, "compression_ratio": 1.5938864628820961, "no_speech_prob": 0.0002340041974093765}, {"id": 785, "seek": 240914, "start": 2419.14, "end": 2421.04, "text": " over the first dimension", "tokens": [50865, 670, 264, 700, 10139, 50960], "temperature": 0.0, "avg_logprob": -0.14822857081890106, "compression_ratio": 1.5938864628820961, "no_speech_prob": 0.0002340041974093765}, {"id": 786, "seek": 240914, "start": 2421.14, "end": 2423.04, "text": " dot indices, and I'm telling", "tokens": [50965, 5893, 43840, 11, 293, 286, 478, 3585, 51060], "temperature": 0.0, "avg_logprob": -0.14822857081890106, "compression_ratio": 1.5938864628820961, "no_speech_prob": 0.0002340041974093765}, {"id": 787, "seek": 240914, "start": 2423.14, "end": 2425.04, "text": " PyTorch that", "tokens": [51065, 9953, 51, 284, 339, 300, 51160], "temperature": 0.0, "avg_logprob": -0.14822857081890106, "compression_ratio": 1.5938864628820961, "no_speech_prob": 0.0002340041974093765}, {"id": 788, "seek": 240914, "start": 2425.14, "end": 2427.04, "text": " the dimension of", "tokens": [51165, 264, 10139, 295, 51260], "temperature": 0.0, "avg_logprob": -0.14822857081890106, "compression_ratio": 1.5938864628820961, "no_speech_prob": 0.0002340041974093765}, {"id": 789, "seek": 240914, "start": 2427.14, "end": 2429.04, "text": " every one of these tensors should be", "tokens": [51265, 633, 472, 295, 613, 10688, 830, 820, 312, 51360], "temperature": 0.0, "avg_logprob": -0.14822857081890106, "compression_ratio": 1.5938864628820961, "no_speech_prob": 0.0002340041974093765}, {"id": 790, "seek": 240914, "start": 2429.14, "end": 2431.04, "text": " 27.", "tokens": [51365, 7634, 13, 51460], "temperature": 0.0, "avg_logprob": -0.14822857081890106, "compression_ratio": 1.5938864628820961, "no_speech_prob": 0.0002340041974093765}, {"id": 791, "seek": 240914, "start": 2431.14, "end": 2433.04, "text": " And so what this is going to do", "tokens": [51465, 400, 370, 437, 341, 307, 516, 281, 360, 51560], "temperature": 0.0, "avg_logprob": -0.14822857081890106, "compression_ratio": 1.5938864628820961, "no_speech_prob": 0.0002340041974093765}, {"id": 792, "seek": 240914, "start": 2433.14, "end": 2435.04, "text": " is...", "tokens": [51565, 307, 485, 51660], "temperature": 0.0, "avg_logprob": -0.14822857081890106, "compression_ratio": 1.5938864628820961, "no_speech_prob": 0.0002340041974093765}, {"id": 793, "seek": 240914, "start": 2435.14, "end": 2437.04, "text": " Okay, I apologize, this is crazy.", "tokens": [51665, 1033, 11, 286, 12328, 11, 341, 307, 3219, 13, 51760], "temperature": 0.0, "avg_logprob": -0.14822857081890106, "compression_ratio": 1.5938864628820961, "no_speech_prob": 0.0002340041974093765}, {"id": 794, "seek": 240914, "start": 2437.14, "end": 2439.04, "text": " PLT dot imchev of this.", "tokens": [51765, 6999, 51, 5893, 566, 1876, 85, 295, 341, 13, 51860], "temperature": 0.0, "avg_logprob": -0.14822857081890106, "compression_ratio": 1.5938864628820961, "no_speech_prob": 0.0002340041974093765}, {"id": 795, "seek": 243914, "start": 2439.14, "end": 2441.04, "text": " It's really just an array", "tokens": [50365, 467, 311, 534, 445, 364, 10225, 50460], "temperature": 0.0, "avg_logprob": -0.0653659333574011, "compression_ratio": 1.7727272727272727, "no_speech_prob": 0.00041338743176311255}, {"id": 796, "seek": 243914, "start": 2441.14, "end": 2443.04, "text": " of where the maxes came from", "tokens": [50465, 295, 689, 264, 11469, 279, 1361, 490, 50560], "temperature": 0.0, "avg_logprob": -0.0653659333574011, "compression_ratio": 1.7727272727272727, "no_speech_prob": 0.00041338743176311255}, {"id": 797, "seek": 243914, "start": 2443.14, "end": 2445.04, "text": " in each row, and that element is 1,", "tokens": [50565, 294, 1184, 5386, 11, 293, 300, 4478, 307, 502, 11, 50660], "temperature": 0.0, "avg_logprob": -0.0653659333574011, "compression_ratio": 1.7727272727272727, "no_speech_prob": 0.00041338743176311255}, {"id": 798, "seek": 243914, "start": 2445.14, "end": 2447.04, "text": " and all the other elements are 0.", "tokens": [50665, 293, 439, 264, 661, 4959, 366, 1958, 13, 50760], "temperature": 0.0, "avg_logprob": -0.0653659333574011, "compression_ratio": 1.7727272727272727, "no_speech_prob": 0.00041338743176311255}, {"id": 799, "seek": 243914, "start": 2447.14, "end": 2449.04, "text": " So it's one hot vector in each row,", "tokens": [50765, 407, 309, 311, 472, 2368, 8062, 294, 1184, 5386, 11, 50860], "temperature": 0.0, "avg_logprob": -0.0653659333574011, "compression_ratio": 1.7727272727272727, "no_speech_prob": 0.00041338743176311255}, {"id": 800, "seek": 243914, "start": 2449.14, "end": 2451.04, "text": " and these indices are now populating", "tokens": [50865, 293, 613, 43840, 366, 586, 1665, 12162, 50960], "temperature": 0.0, "avg_logprob": -0.0653659333574011, "compression_ratio": 1.7727272727272727, "no_speech_prob": 0.00041338743176311255}, {"id": 801, "seek": 243914, "start": 2451.14, "end": 2453.04, "text": " a single 1 in the proper", "tokens": [50965, 257, 2167, 502, 294, 264, 2296, 51060], "temperature": 0.0, "avg_logprob": -0.0653659333574011, "compression_ratio": 1.7727272727272727, "no_speech_prob": 0.00041338743176311255}, {"id": 802, "seek": 243914, "start": 2453.14, "end": 2455.04, "text": " place. And then what I'm doing", "tokens": [51065, 1081, 13, 400, 550, 437, 286, 478, 884, 51160], "temperature": 0.0, "avg_logprob": -0.0653659333574011, "compression_ratio": 1.7727272727272727, "no_speech_prob": 0.00041338743176311255}, {"id": 803, "seek": 243914, "start": 2455.14, "end": 2457.04, "text": " here is I'm multiplying by the logit", "tokens": [51165, 510, 307, 286, 478, 30955, 538, 264, 3565, 270, 51260], "temperature": 0.0, "avg_logprob": -0.0653659333574011, "compression_ratio": 1.7727272727272727, "no_speech_prob": 0.00041338743176311255}, {"id": 804, "seek": 243914, "start": 2457.14, "end": 2459.04, "text": " maxes. And keep in mind that", "tokens": [51265, 11469, 279, 13, 400, 1066, 294, 1575, 300, 51360], "temperature": 0.0, "avg_logprob": -0.0653659333574011, "compression_ratio": 1.7727272727272727, "no_speech_prob": 0.00041338743176311255}, {"id": 805, "seek": 243914, "start": 2459.14, "end": 2461.04, "text": " this is a column", "tokens": [51365, 341, 307, 257, 7738, 51460], "temperature": 0.0, "avg_logprob": -0.0653659333574011, "compression_ratio": 1.7727272727272727, "no_speech_prob": 0.00041338743176311255}, {"id": 806, "seek": 243914, "start": 2461.14, "end": 2463.04, "text": " of 32 by 1.", "tokens": [51465, 295, 8858, 538, 502, 13, 51560], "temperature": 0.0, "avg_logprob": -0.0653659333574011, "compression_ratio": 1.7727272727272727, "no_speech_prob": 0.00041338743176311255}, {"id": 807, "seek": 243914, "start": 2463.14, "end": 2465.04, "text": " And so when I'm doing this", "tokens": [51565, 400, 370, 562, 286, 478, 884, 341, 51660], "temperature": 0.0, "avg_logprob": -0.0653659333574011, "compression_ratio": 1.7727272727272727, "no_speech_prob": 0.00041338743176311255}, {"id": 808, "seek": 243914, "start": 2465.14, "end": 2467.04, "text": " times the logit maxes,", "tokens": [51665, 1413, 264, 3565, 270, 11469, 279, 11, 51760], "temperature": 0.0, "avg_logprob": -0.0653659333574011, "compression_ratio": 1.7727272727272727, "no_speech_prob": 0.00041338743176311255}, {"id": 809, "seek": 243914, "start": 2467.14, "end": 2469.04, "text": " the logit maxes will broadcast", "tokens": [51765, 264, 3565, 270, 11469, 279, 486, 9975, 51860], "temperature": 0.0, "avg_logprob": -0.0653659333574011, "compression_ratio": 1.7727272727272727, "no_speech_prob": 0.00041338743176311255}, {"id": 810, "seek": 246904, "start": 2469.04, "end": 2470.94, "text": " and that column will get replicated,", "tokens": [50365, 293, 300, 7738, 486, 483, 46365, 11, 50460], "temperature": 0.0, "avg_logprob": -0.07108131207917866, "compression_ratio": 1.7265917602996255, "no_speech_prob": 0.00019144538964610547}, {"id": 811, "seek": 246904, "start": 2471.04, "end": 2472.94, "text": " and then element-wise multiply", "tokens": [50465, 293, 550, 4478, 12, 3711, 12972, 50560], "temperature": 0.0, "avg_logprob": -0.07108131207917866, "compression_ratio": 1.7265917602996255, "no_speech_prob": 0.00019144538964610547}, {"id": 812, "seek": 246904, "start": 2473.04, "end": 2474.94, "text": " will ensure that each of these", "tokens": [50565, 486, 5586, 300, 1184, 295, 613, 50660], "temperature": 0.0, "avg_logprob": -0.07108131207917866, "compression_ratio": 1.7265917602996255, "no_speech_prob": 0.00019144538964610547}, {"id": 813, "seek": 246904, "start": 2475.04, "end": 2476.94, "text": " just gets routed to whichever", "tokens": [50665, 445, 2170, 4020, 292, 281, 24123, 50760], "temperature": 0.0, "avg_logprob": -0.07108131207917866, "compression_ratio": 1.7265917602996255, "no_speech_prob": 0.00019144538964610547}, {"id": 814, "seek": 246904, "start": 2477.04, "end": 2478.94, "text": " one of these bits is turned on.", "tokens": [50765, 472, 295, 613, 9239, 307, 3574, 322, 13, 50860], "temperature": 0.0, "avg_logprob": -0.07108131207917866, "compression_ratio": 1.7265917602996255, "no_speech_prob": 0.00019144538964610547}, {"id": 815, "seek": 246904, "start": 2479.04, "end": 2480.94, "text": " And so that's another way to implement", "tokens": [50865, 400, 370, 300, 311, 1071, 636, 281, 4445, 50960], "temperature": 0.0, "avg_logprob": -0.07108131207917866, "compression_ratio": 1.7265917602996255, "no_speech_prob": 0.00019144538964610547}, {"id": 816, "seek": 246904, "start": 2481.04, "end": 2482.94, "text": " this kind of", "tokens": [50965, 341, 733, 295, 51060], "temperature": 0.0, "avg_logprob": -0.07108131207917866, "compression_ratio": 1.7265917602996255, "no_speech_prob": 0.00019144538964610547}, {"id": 817, "seek": 246904, "start": 2483.04, "end": 2484.94, "text": " operation, and", "tokens": [51065, 6916, 11, 293, 51160], "temperature": 0.0, "avg_logprob": -0.07108131207917866, "compression_ratio": 1.7265917602996255, "no_speech_prob": 0.00019144538964610547}, {"id": 818, "seek": 246904, "start": 2485.04, "end": 2486.94, "text": " both of these can be used. I just", "tokens": [51165, 1293, 295, 613, 393, 312, 1143, 13, 286, 445, 51260], "temperature": 0.0, "avg_logprob": -0.07108131207917866, "compression_ratio": 1.7265917602996255, "no_speech_prob": 0.00019144538964610547}, {"id": 819, "seek": 246904, "start": 2487.04, "end": 2488.94, "text": " thought I would show an equivalent way to do it.", "tokens": [51265, 1194, 286, 576, 855, 364, 10344, 636, 281, 360, 309, 13, 51360], "temperature": 0.0, "avg_logprob": -0.07108131207917866, "compression_ratio": 1.7265917602996255, "no_speech_prob": 0.00019144538964610547}, {"id": 820, "seek": 246904, "start": 2489.04, "end": 2490.94, "text": " And I'm using plus equals because", "tokens": [51365, 400, 286, 478, 1228, 1804, 6915, 570, 51460], "temperature": 0.0, "avg_logprob": -0.07108131207917866, "compression_ratio": 1.7265917602996255, "no_speech_prob": 0.00019144538964610547}, {"id": 821, "seek": 246904, "start": 2491.04, "end": 2492.94, "text": " we already calculated the logits here,", "tokens": [51465, 321, 1217, 15598, 264, 3565, 1208, 510, 11, 51560], "temperature": 0.0, "avg_logprob": -0.07108131207917866, "compression_ratio": 1.7265917602996255, "no_speech_prob": 0.00019144538964610547}, {"id": 822, "seek": 246904, "start": 2493.04, "end": 2494.94, "text": " and this is now the second branch.", "tokens": [51565, 293, 341, 307, 586, 264, 1150, 9819, 13, 51660], "temperature": 0.0, "avg_logprob": -0.07108131207917866, "compression_ratio": 1.7265917602996255, "no_speech_prob": 0.00019144538964610547}, {"id": 823, "seek": 246904, "start": 2495.04, "end": 2496.94, "text": " So let's", "tokens": [51665, 407, 718, 311, 51760], "temperature": 0.0, "avg_logprob": -0.07108131207917866, "compression_ratio": 1.7265917602996255, "no_speech_prob": 0.00019144538964610547}, {"id": 824, "seek": 246904, "start": 2497.04, "end": 2498.94, "text": " look at logits and make sure that", "tokens": [51765, 574, 412, 3565, 1208, 293, 652, 988, 300, 51860], "temperature": 0.0, "avg_logprob": -0.07108131207917866, "compression_ratio": 1.7265917602996255, "no_speech_prob": 0.00019144538964610547}, {"id": 825, "seek": 249894, "start": 2498.94, "end": 2500.84, "text": " this is correct.", "tokens": [50365, 341, 307, 3006, 13, 50460], "temperature": 0.0, "avg_logprob": -0.07203583350548377, "compression_ratio": 1.5916030534351144, "no_speech_prob": 0.0001650814083404839}, {"id": 826, "seek": 249894, "start": 2500.94, "end": 2502.84, "text": " And we see that we have exactly the correct answer.", "tokens": [50465, 400, 321, 536, 300, 321, 362, 2293, 264, 3006, 1867, 13, 50560], "temperature": 0.0, "avg_logprob": -0.07203583350548377, "compression_ratio": 1.5916030534351144, "no_speech_prob": 0.0001650814083404839}, {"id": 827, "seek": 249894, "start": 2502.94, "end": 2504.84, "text": " Next up,", "tokens": [50565, 3087, 493, 11, 50660], "temperature": 0.0, "avg_logprob": -0.07203583350548377, "compression_ratio": 1.5916030534351144, "no_speech_prob": 0.0001650814083404839}, {"id": 828, "seek": 249894, "start": 2504.94, "end": 2506.84, "text": " we want to continue with logits here.", "tokens": [50665, 321, 528, 281, 2354, 365, 3565, 1208, 510, 13, 50760], "temperature": 0.0, "avg_logprob": -0.07203583350548377, "compression_ratio": 1.5916030534351144, "no_speech_prob": 0.0001650814083404839}, {"id": 829, "seek": 249894, "start": 2506.94, "end": 2508.84, "text": " That is an outcome of a matrix", "tokens": [50765, 663, 307, 364, 9700, 295, 257, 8141, 50860], "temperature": 0.0, "avg_logprob": -0.07203583350548377, "compression_ratio": 1.5916030534351144, "no_speech_prob": 0.0001650814083404839}, {"id": 830, "seek": 249894, "start": 2508.94, "end": 2510.84, "text": " multiplication and a bias offset", "tokens": [50865, 27290, 293, 257, 12577, 18687, 50960], "temperature": 0.0, "avg_logprob": -0.07203583350548377, "compression_ratio": 1.5916030534351144, "no_speech_prob": 0.0001650814083404839}, {"id": 831, "seek": 249894, "start": 2510.94, "end": 2512.84, "text": " in this linear layer.", "tokens": [50965, 294, 341, 8213, 4583, 13, 51060], "temperature": 0.0, "avg_logprob": -0.07203583350548377, "compression_ratio": 1.5916030534351144, "no_speech_prob": 0.0001650814083404839}, {"id": 832, "seek": 249894, "start": 2512.94, "end": 2514.84, "text": " So I've", "tokens": [51065, 407, 286, 600, 51160], "temperature": 0.0, "avg_logprob": -0.07203583350548377, "compression_ratio": 1.5916030534351144, "no_speech_prob": 0.0001650814083404839}, {"id": 833, "seek": 249894, "start": 2514.94, "end": 2516.84, "text": " printed out the shapes of all these intermediate", "tokens": [51165, 13567, 484, 264, 10854, 295, 439, 613, 19376, 51260], "temperature": 0.0, "avg_logprob": -0.07203583350548377, "compression_ratio": 1.5916030534351144, "no_speech_prob": 0.0001650814083404839}, {"id": 834, "seek": 249894, "start": 2516.94, "end": 2518.84, "text": " tensors. We see that logits", "tokens": [51265, 10688, 830, 13, 492, 536, 300, 3565, 1208, 51360], "temperature": 0.0, "avg_logprob": -0.07203583350548377, "compression_ratio": 1.5916030534351144, "no_speech_prob": 0.0001650814083404839}, {"id": 835, "seek": 249894, "start": 2518.94, "end": 2520.84, "text": " is of course 32 by 27, as we've just", "tokens": [51365, 307, 295, 1164, 8858, 538, 7634, 11, 382, 321, 600, 445, 51460], "temperature": 0.0, "avg_logprob": -0.07203583350548377, "compression_ratio": 1.5916030534351144, "no_speech_prob": 0.0001650814083404839}, {"id": 836, "seek": 249894, "start": 2520.94, "end": 2522.84, "text": " seen. Then the", "tokens": [51465, 1612, 13, 1396, 264, 51560], "temperature": 0.0, "avg_logprob": -0.07203583350548377, "compression_ratio": 1.5916030534351144, "no_speech_prob": 0.0001650814083404839}, {"id": 837, "seek": 249894, "start": 2522.94, "end": 2524.84, "text": " h here is 32 by 64.", "tokens": [51565, 276, 510, 307, 8858, 538, 12145, 13, 51660], "temperature": 0.0, "avg_logprob": -0.07203583350548377, "compression_ratio": 1.5916030534351144, "no_speech_prob": 0.0001650814083404839}, {"id": 838, "seek": 249894, "start": 2524.94, "end": 2526.84, "text": " So these are 64-dimensional hidden states.", "tokens": [51665, 407, 613, 366, 12145, 12, 18759, 7633, 4368, 13, 51760], "temperature": 0.0, "avg_logprob": -0.07203583350548377, "compression_ratio": 1.5916030534351144, "no_speech_prob": 0.0001650814083404839}, {"id": 839, "seek": 249894, "start": 2526.94, "end": 2528.84, "text": " And then this w", "tokens": [51765, 400, 550, 341, 261, 51860], "temperature": 0.0, "avg_logprob": -0.07203583350548377, "compression_ratio": 1.5916030534351144, "no_speech_prob": 0.0001650814083404839}, {"id": 840, "seek": 252884, "start": 2528.84, "end": 2530.7400000000002, "text": " matrix projects those 64-dimensional", "tokens": [50365, 8141, 4455, 729, 12145, 12, 18759, 50460], "temperature": 0.0, "avg_logprob": -0.07214680451613206, "compression_ratio": 1.8471074380165289, "no_speech_prob": 0.00016789280925877392}, {"id": 841, "seek": 252884, "start": 2530.84, "end": 2532.7400000000002, "text": " vectors into 27 dimensions.", "tokens": [50465, 18875, 666, 7634, 12819, 13, 50560], "temperature": 0.0, "avg_logprob": -0.07214680451613206, "compression_ratio": 1.8471074380165289, "no_speech_prob": 0.00016789280925877392}, {"id": 842, "seek": 252884, "start": 2532.84, "end": 2534.7400000000002, "text": " And then there's a 27-dimensional", "tokens": [50565, 400, 550, 456, 311, 257, 7634, 12, 18759, 50660], "temperature": 0.0, "avg_logprob": -0.07214680451613206, "compression_ratio": 1.8471074380165289, "no_speech_prob": 0.00016789280925877392}, {"id": 843, "seek": 252884, "start": 2534.84, "end": 2536.7400000000002, "text": " offset, which is a", "tokens": [50665, 18687, 11, 597, 307, 257, 50760], "temperature": 0.0, "avg_logprob": -0.07214680451613206, "compression_ratio": 1.8471074380165289, "no_speech_prob": 0.00016789280925877392}, {"id": 844, "seek": 252884, "start": 2536.84, "end": 2538.7400000000002, "text": " one-dimensional vector. Now we", "tokens": [50765, 472, 12, 18759, 8062, 13, 823, 321, 50860], "temperature": 0.0, "avg_logprob": -0.07214680451613206, "compression_ratio": 1.8471074380165289, "no_speech_prob": 0.00016789280925877392}, {"id": 845, "seek": 252884, "start": 2538.84, "end": 2540.7400000000002, "text": " should note that this plus here actually", "tokens": [50865, 820, 3637, 300, 341, 1804, 510, 767, 50960], "temperature": 0.0, "avg_logprob": -0.07214680451613206, "compression_ratio": 1.8471074380165289, "no_speech_prob": 0.00016789280925877392}, {"id": 846, "seek": 252884, "start": 2540.84, "end": 2542.7400000000002, "text": " broadcasts, because h multiplied", "tokens": [50965, 9975, 82, 11, 570, 276, 17207, 51060], "temperature": 0.0, "avg_logprob": -0.07214680451613206, "compression_ratio": 1.8471074380165289, "no_speech_prob": 0.00016789280925877392}, {"id": 847, "seek": 252884, "start": 2542.84, "end": 2544.7400000000002, "text": " by w2", "tokens": [51065, 538, 261, 17, 51160], "temperature": 0.0, "avg_logprob": -0.07214680451613206, "compression_ratio": 1.8471074380165289, "no_speech_prob": 0.00016789280925877392}, {"id": 848, "seek": 252884, "start": 2544.84, "end": 2546.7400000000002, "text": " will give us a 32 by 27.", "tokens": [51165, 486, 976, 505, 257, 8858, 538, 7634, 13, 51260], "temperature": 0.0, "avg_logprob": -0.07214680451613206, "compression_ratio": 1.8471074380165289, "no_speech_prob": 0.00016789280925877392}, {"id": 849, "seek": 252884, "start": 2546.84, "end": 2548.7400000000002, "text": " And so then this plus", "tokens": [51265, 400, 370, 550, 341, 1804, 51360], "temperature": 0.0, "avg_logprob": -0.07214680451613206, "compression_ratio": 1.8471074380165289, "no_speech_prob": 0.00016789280925877392}, {"id": 850, "seek": 252884, "start": 2548.84, "end": 2550.7400000000002, "text": " b2 is a 27-dimensional", "tokens": [51365, 272, 17, 307, 257, 7634, 12, 18759, 51460], "temperature": 0.0, "avg_logprob": -0.07214680451613206, "compression_ratio": 1.8471074380165289, "no_speech_prob": 0.00016789280925877392}, {"id": 851, "seek": 252884, "start": 2550.84, "end": 2552.7400000000002, "text": " vector here. Now in the", "tokens": [51465, 8062, 510, 13, 823, 294, 264, 51560], "temperature": 0.0, "avg_logprob": -0.07214680451613206, "compression_ratio": 1.8471074380165289, "no_speech_prob": 0.00016789280925877392}, {"id": 852, "seek": 252884, "start": 2552.84, "end": 2554.7400000000002, "text": " rules of broadcasting, what's going to happen with this bias", "tokens": [51565, 4474, 295, 30024, 11, 437, 311, 516, 281, 1051, 365, 341, 12577, 51660], "temperature": 0.0, "avg_logprob": -0.07214680451613206, "compression_ratio": 1.8471074380165289, "no_speech_prob": 0.00016789280925877392}, {"id": 853, "seek": 252884, "start": 2554.84, "end": 2556.7400000000002, "text": " vector is that this one-dimensional", "tokens": [51665, 8062, 307, 300, 341, 472, 12, 18759, 51760], "temperature": 0.0, "avg_logprob": -0.07214680451613206, "compression_ratio": 1.8471074380165289, "no_speech_prob": 0.00016789280925877392}, {"id": 854, "seek": 252884, "start": 2556.84, "end": 2558.7400000000002, "text": " vector of 27 will get a lot", "tokens": [51765, 8062, 295, 7634, 486, 483, 257, 688, 51860], "temperature": 0.0, "avg_logprob": -0.07214680451613206, "compression_ratio": 1.8471074380165289, "no_speech_prob": 0.00016789280925877392}, {"id": 855, "seek": 255884, "start": 2558.84, "end": 2560.7400000000002, "text": " aligned with an padded dimension", "tokens": [50365, 17962, 365, 364, 6887, 9207, 10139, 50460], "temperature": 0.0, "avg_logprob": -0.10638045489303465, "compression_ratio": 1.5725806451612903, "no_speech_prob": 0.0005002659745514393}, {"id": 856, "seek": 255884, "start": 2560.84, "end": 2562.7400000000002, "text": " of 1 on the left.", "tokens": [50465, 295, 502, 322, 264, 1411, 13, 50560], "temperature": 0.0, "avg_logprob": -0.10638045489303465, "compression_ratio": 1.5725806451612903, "no_speech_prob": 0.0005002659745514393}, {"id": 857, "seek": 255884, "start": 2562.84, "end": 2564.7400000000002, "text": " And it will basically become a row vector,", "tokens": [50565, 400, 309, 486, 1936, 1813, 257, 5386, 8062, 11, 50660], "temperature": 0.0, "avg_logprob": -0.10638045489303465, "compression_ratio": 1.5725806451612903, "no_speech_prob": 0.0005002659745514393}, {"id": 858, "seek": 255884, "start": 2564.84, "end": 2566.7400000000002, "text": " and then it will get replicated", "tokens": [50665, 293, 550, 309, 486, 483, 46365, 50760], "temperature": 0.0, "avg_logprob": -0.10638045489303465, "compression_ratio": 1.5725806451612903, "no_speech_prob": 0.0005002659745514393}, {"id": 859, "seek": 255884, "start": 2566.84, "end": 2568.7400000000002, "text": " vertically 32 times to make it", "tokens": [50765, 28450, 8858, 1413, 281, 652, 309, 50860], "temperature": 0.0, "avg_logprob": -0.10638045489303465, "compression_ratio": 1.5725806451612903, "no_speech_prob": 0.0005002659745514393}, {"id": 860, "seek": 255884, "start": 2568.84, "end": 2570.7400000000002, "text": " 32 by 27, and then there's an element-wise", "tokens": [50865, 8858, 538, 7634, 11, 293, 550, 456, 311, 364, 4478, 12, 3711, 50960], "temperature": 0.0, "avg_logprob": -0.10638045489303465, "compression_ratio": 1.5725806451612903, "no_speech_prob": 0.0005002659745514393}, {"id": 861, "seek": 255884, "start": 2570.84, "end": 2572.7400000000002, "text": " multiply.", "tokens": [50965, 12972, 13, 51060], "temperature": 0.0, "avg_logprob": -0.10638045489303465, "compression_ratio": 1.5725806451612903, "no_speech_prob": 0.0005002659745514393}, {"id": 862, "seek": 255884, "start": 2572.84, "end": 2574.7400000000002, "text": " Now the question", "tokens": [51065, 823, 264, 1168, 51160], "temperature": 0.0, "avg_logprob": -0.10638045489303465, "compression_ratio": 1.5725806451612903, "no_speech_prob": 0.0005002659745514393}, {"id": 863, "seek": 255884, "start": 2574.84, "end": 2576.7400000000002, "text": " is how do we backpropagate from", "tokens": [51165, 307, 577, 360, 321, 646, 79, 1513, 559, 473, 490, 51260], "temperature": 0.0, "avg_logprob": -0.10638045489303465, "compression_ratio": 1.5725806451612903, "no_speech_prob": 0.0005002659745514393}, {"id": 864, "seek": 255884, "start": 2576.84, "end": 2578.7400000000002, "text": " logits to the hidden states,", "tokens": [51265, 3565, 1208, 281, 264, 7633, 4368, 11, 51360], "temperature": 0.0, "avg_logprob": -0.10638045489303465, "compression_ratio": 1.5725806451612903, "no_speech_prob": 0.0005002659745514393}, {"id": 865, "seek": 255884, "start": 2578.84, "end": 2580.7400000000002, "text": " the weight matrix w2, and the bias", "tokens": [51365, 264, 3364, 8141, 261, 17, 11, 293, 264, 12577, 51460], "temperature": 0.0, "avg_logprob": -0.10638045489303465, "compression_ratio": 1.5725806451612903, "no_speech_prob": 0.0005002659745514393}, {"id": 866, "seek": 255884, "start": 2580.84, "end": 2582.7400000000002, "text": " b2? And you might", "tokens": [51465, 272, 17, 30, 400, 291, 1062, 51560], "temperature": 0.0, "avg_logprob": -0.10638045489303465, "compression_ratio": 1.5725806451612903, "no_speech_prob": 0.0005002659745514393}, {"id": 867, "seek": 255884, "start": 2582.84, "end": 2584.7400000000002, "text": " think that we need to go to some", "tokens": [51565, 519, 300, 321, 643, 281, 352, 281, 512, 51660], "temperature": 0.0, "avg_logprob": -0.10638045489303465, "compression_ratio": 1.5725806451612903, "no_speech_prob": 0.0005002659745514393}, {"id": 868, "seek": 255884, "start": 2584.84, "end": 2586.7400000000002, "text": " matrix calculus,", "tokens": [51665, 8141, 33400, 11, 51760], "temperature": 0.0, "avg_logprob": -0.10638045489303465, "compression_ratio": 1.5725806451612903, "no_speech_prob": 0.0005002659745514393}, {"id": 869, "seek": 258674, "start": 2586.74, "end": 2588.64, "text": " and then we have to look up the derivative", "tokens": [50365, 293, 550, 321, 362, 281, 574, 493, 264, 13760, 50460], "temperature": 0.0, "avg_logprob": -0.06244788609497936, "compression_ratio": 1.8482758620689654, "no_speech_prob": 0.0004199478426016867}, {"id": 870, "seek": 258674, "start": 2588.74, "end": 2590.64, "text": " for matrix multiplication,", "tokens": [50465, 337, 8141, 27290, 11, 50560], "temperature": 0.0, "avg_logprob": -0.06244788609497936, "compression_ratio": 1.8482758620689654, "no_speech_prob": 0.0004199478426016867}, {"id": 871, "seek": 258674, "start": 2590.74, "end": 2592.64, "text": " but actually you don't have to do any of that, and you can go", "tokens": [50565, 457, 767, 291, 500, 380, 362, 281, 360, 604, 295, 300, 11, 293, 291, 393, 352, 50660], "temperature": 0.0, "avg_logprob": -0.06244788609497936, "compression_ratio": 1.8482758620689654, "no_speech_prob": 0.0004199478426016867}, {"id": 872, "seek": 258674, "start": 2592.74, "end": 2594.64, "text": " back to first principles and derive this yourself", "tokens": [50665, 646, 281, 700, 9156, 293, 28446, 341, 1803, 50760], "temperature": 0.0, "avg_logprob": -0.06244788609497936, "compression_ratio": 1.8482758620689654, "no_speech_prob": 0.0004199478426016867}, {"id": 873, "seek": 258674, "start": 2594.74, "end": 2596.64, "text": " on a piece of paper.", "tokens": [50765, 322, 257, 2522, 295, 3035, 13, 50860], "temperature": 0.0, "avg_logprob": -0.06244788609497936, "compression_ratio": 1.8482758620689654, "no_speech_prob": 0.0004199478426016867}, {"id": 874, "seek": 258674, "start": 2596.74, "end": 2598.64, "text": " And specifically what I like to do, and what", "tokens": [50865, 400, 4682, 437, 286, 411, 281, 360, 11, 293, 437, 50960], "temperature": 0.0, "avg_logprob": -0.06244788609497936, "compression_ratio": 1.8482758620689654, "no_speech_prob": 0.0004199478426016867}, {"id": 875, "seek": 258674, "start": 2598.74, "end": 2600.64, "text": " I find works well for me, is you find", "tokens": [50965, 286, 915, 1985, 731, 337, 385, 11, 307, 291, 915, 51060], "temperature": 0.0, "avg_logprob": -0.06244788609497936, "compression_ratio": 1.8482758620689654, "no_speech_prob": 0.0004199478426016867}, {"id": 876, "seek": 258674, "start": 2600.74, "end": 2602.64, "text": " a specific small example", "tokens": [51065, 257, 2685, 1359, 1365, 51160], "temperature": 0.0, "avg_logprob": -0.06244788609497936, "compression_ratio": 1.8482758620689654, "no_speech_prob": 0.0004199478426016867}, {"id": 877, "seek": 258674, "start": 2602.74, "end": 2604.64, "text": " that you then fully write out, and then", "tokens": [51165, 300, 291, 550, 4498, 2464, 484, 11, 293, 550, 51260], "temperature": 0.0, "avg_logprob": -0.06244788609497936, "compression_ratio": 1.8482758620689654, "no_speech_prob": 0.0004199478426016867}, {"id": 878, "seek": 258674, "start": 2604.74, "end": 2606.64, "text": " in the process of analyzing how that individual", "tokens": [51265, 294, 264, 1399, 295, 23663, 577, 300, 2609, 51360], "temperature": 0.0, "avg_logprob": -0.06244788609497936, "compression_ratio": 1.8482758620689654, "no_speech_prob": 0.0004199478426016867}, {"id": 879, "seek": 258674, "start": 2606.74, "end": 2608.64, "text": " small example works, you will understand", "tokens": [51365, 1359, 1365, 1985, 11, 291, 486, 1223, 51460], "temperature": 0.0, "avg_logprob": -0.06244788609497936, "compression_ratio": 1.8482758620689654, "no_speech_prob": 0.0004199478426016867}, {"id": 880, "seek": 258674, "start": 2608.74, "end": 2610.64, "text": " the broader pattern, and you'll be able to generalize", "tokens": [51465, 264, 13227, 5102, 11, 293, 291, 603, 312, 1075, 281, 2674, 1125, 51560], "temperature": 0.0, "avg_logprob": -0.06244788609497936, "compression_ratio": 1.8482758620689654, "no_speech_prob": 0.0004199478426016867}, {"id": 881, "seek": 258674, "start": 2610.74, "end": 2612.64, "text": " and write out the full", "tokens": [51565, 293, 2464, 484, 264, 1577, 51660], "temperature": 0.0, "avg_logprob": -0.06244788609497936, "compression_ratio": 1.8482758620689654, "no_speech_prob": 0.0004199478426016867}, {"id": 882, "seek": 258674, "start": 2612.74, "end": 2614.64, "text": " general formula for", "tokens": [51665, 2674, 8513, 337, 51760], "temperature": 0.0, "avg_logprob": -0.06244788609497936, "compression_ratio": 1.8482758620689654, "no_speech_prob": 0.0004199478426016867}, {"id": 883, "seek": 261464, "start": 2614.64, "end": 2616.54, "text": " how these derivatives flow in an expression", "tokens": [50365, 577, 613, 33733, 3095, 294, 364, 6114, 50460], "temperature": 0.0, "avg_logprob": -0.07064182330400516, "compression_ratio": 1.7766323024054982, "no_speech_prob": 0.0006405398598872125}, {"id": 884, "seek": 261464, "start": 2616.64, "end": 2618.54, "text": " like this. So let's try that out.", "tokens": [50465, 411, 341, 13, 407, 718, 311, 853, 300, 484, 13, 50560], "temperature": 0.0, "avg_logprob": -0.07064182330400516, "compression_ratio": 1.7766323024054982, "no_speech_prob": 0.0006405398598872125}, {"id": 885, "seek": 261464, "start": 2618.64, "end": 2620.54, "text": " So pardon the low-budget production", "tokens": [50565, 407, 22440, 264, 2295, 12, 18281, 847, 4265, 50660], "temperature": 0.0, "avg_logprob": -0.07064182330400516, "compression_ratio": 1.7766323024054982, "no_speech_prob": 0.0006405398598872125}, {"id": 886, "seek": 261464, "start": 2620.64, "end": 2622.54, "text": " here, but what I've done here", "tokens": [50665, 510, 11, 457, 437, 286, 600, 1096, 510, 50760], "temperature": 0.0, "avg_logprob": -0.07064182330400516, "compression_ratio": 1.7766323024054982, "no_speech_prob": 0.0006405398598872125}, {"id": 887, "seek": 261464, "start": 2622.64, "end": 2624.54, "text": " is I'm writing it out on a piece of paper.", "tokens": [50765, 307, 286, 478, 3579, 309, 484, 322, 257, 2522, 295, 3035, 13, 50860], "temperature": 0.0, "avg_logprob": -0.07064182330400516, "compression_ratio": 1.7766323024054982, "no_speech_prob": 0.0006405398598872125}, {"id": 888, "seek": 261464, "start": 2624.64, "end": 2626.54, "text": " Really what we are interested in is we have", "tokens": [50865, 4083, 437, 321, 366, 3102, 294, 307, 321, 362, 50960], "temperature": 0.0, "avg_logprob": -0.07064182330400516, "compression_ratio": 1.7766323024054982, "no_speech_prob": 0.0006405398598872125}, {"id": 889, "seek": 261464, "start": 2626.64, "end": 2628.54, "text": " a multiply b plus c,", "tokens": [50965, 257, 12972, 272, 1804, 269, 11, 51060], "temperature": 0.0, "avg_logprob": -0.07064182330400516, "compression_ratio": 1.7766323024054982, "no_speech_prob": 0.0006405398598872125}, {"id": 890, "seek": 261464, "start": 2628.64, "end": 2630.54, "text": " and that creates a d.", "tokens": [51065, 293, 300, 7829, 257, 274, 13, 51160], "temperature": 0.0, "avg_logprob": -0.07064182330400516, "compression_ratio": 1.7766323024054982, "no_speech_prob": 0.0006405398598872125}, {"id": 891, "seek": 261464, "start": 2630.64, "end": 2632.54, "text": " And we have the derivative", "tokens": [51165, 400, 321, 362, 264, 13760, 51260], "temperature": 0.0, "avg_logprob": -0.07064182330400516, "compression_ratio": 1.7766323024054982, "no_speech_prob": 0.0006405398598872125}, {"id": 892, "seek": 261464, "start": 2632.64, "end": 2634.54, "text": " of the loss with respect to d, and we'd like to", "tokens": [51265, 295, 264, 4470, 365, 3104, 281, 274, 11, 293, 321, 1116, 411, 281, 51360], "temperature": 0.0, "avg_logprob": -0.07064182330400516, "compression_ratio": 1.7766323024054982, "no_speech_prob": 0.0006405398598872125}, {"id": 893, "seek": 261464, "start": 2634.64, "end": 2636.54, "text": " know what the derivative of the loss is with respect to a, b,", "tokens": [51365, 458, 437, 264, 13760, 295, 264, 4470, 307, 365, 3104, 281, 257, 11, 272, 11, 51460], "temperature": 0.0, "avg_logprob": -0.07064182330400516, "compression_ratio": 1.7766323024054982, "no_speech_prob": 0.0006405398598872125}, {"id": 894, "seek": 261464, "start": 2636.64, "end": 2638.54, "text": " and c. Now these", "tokens": [51465, 293, 269, 13, 823, 613, 51560], "temperature": 0.0, "avg_logprob": -0.07064182330400516, "compression_ratio": 1.7766323024054982, "no_speech_prob": 0.0006405398598872125}, {"id": 895, "seek": 261464, "start": 2638.64, "end": 2640.54, "text": " here are little two-dimensional examples", "tokens": [51565, 510, 366, 707, 732, 12, 18759, 5110, 51660], "temperature": 0.0, "avg_logprob": -0.07064182330400516, "compression_ratio": 1.7766323024054982, "no_speech_prob": 0.0006405398598872125}, {"id": 896, "seek": 261464, "start": 2640.64, "end": 2642.54, "text": " of matrix multiplication.", "tokens": [51665, 295, 8141, 27290, 13, 51760], "temperature": 0.0, "avg_logprob": -0.07064182330400516, "compression_ratio": 1.7766323024054982, "no_speech_prob": 0.0006405398598872125}, {"id": 897, "seek": 261464, "start": 2642.64, "end": 2644.54, "text": " 2 by 2 times a 2 by 2,", "tokens": [51765, 568, 538, 568, 1413, 257, 568, 538, 568, 11, 51860], "temperature": 0.0, "avg_logprob": -0.07064182330400516, "compression_ratio": 1.7766323024054982, "no_speech_prob": 0.0006405398598872125}, {"id": 898, "seek": 264454, "start": 2644.54, "end": 2646.44, "text": " plus a 2,", "tokens": [50365, 1804, 257, 568, 11, 50460], "temperature": 0.0, "avg_logprob": -0.05014582448357704, "compression_ratio": 1.784, "no_speech_prob": 0.0002897364611271769}, {"id": 899, "seek": 264454, "start": 2646.54, "end": 2648.44, "text": " a vector of just two elements, c1 and c2,", "tokens": [50465, 257, 8062, 295, 445, 732, 4959, 11, 269, 16, 293, 269, 17, 11, 50560], "temperature": 0.0, "avg_logprob": -0.05014582448357704, "compression_ratio": 1.784, "no_speech_prob": 0.0002897364611271769}, {"id": 900, "seek": 264454, "start": 2648.54, "end": 2650.44, "text": " gives me a 2 by 2.", "tokens": [50565, 2709, 385, 257, 568, 538, 568, 13, 50660], "temperature": 0.0, "avg_logprob": -0.05014582448357704, "compression_ratio": 1.784, "no_speech_prob": 0.0002897364611271769}, {"id": 901, "seek": 264454, "start": 2650.54, "end": 2652.44, "text": " Now notice here that", "tokens": [50665, 823, 3449, 510, 300, 50760], "temperature": 0.0, "avg_logprob": -0.05014582448357704, "compression_ratio": 1.784, "no_speech_prob": 0.0002897364611271769}, {"id": 902, "seek": 264454, "start": 2652.54, "end": 2654.44, "text": " I have a bias vector", "tokens": [50765, 286, 362, 257, 12577, 8062, 50860], "temperature": 0.0, "avg_logprob": -0.05014582448357704, "compression_ratio": 1.784, "no_speech_prob": 0.0002897364611271769}, {"id": 903, "seek": 264454, "start": 2654.54, "end": 2656.44, "text": " here called c, and the", "tokens": [50865, 510, 1219, 269, 11, 293, 264, 50960], "temperature": 0.0, "avg_logprob": -0.05014582448357704, "compression_ratio": 1.784, "no_speech_prob": 0.0002897364611271769}, {"id": 904, "seek": 264454, "start": 2656.54, "end": 2658.44, "text": " bias vector is c1 and c2, but", "tokens": [50965, 12577, 8062, 307, 269, 16, 293, 269, 17, 11, 457, 51060], "temperature": 0.0, "avg_logprob": -0.05014582448357704, "compression_ratio": 1.784, "no_speech_prob": 0.0002897364611271769}, {"id": 905, "seek": 264454, "start": 2658.54, "end": 2660.44, "text": " as I described over here, that bias", "tokens": [51065, 382, 286, 7619, 670, 510, 11, 300, 12577, 51160], "temperature": 0.0, "avg_logprob": -0.05014582448357704, "compression_ratio": 1.784, "no_speech_prob": 0.0002897364611271769}, {"id": 906, "seek": 264454, "start": 2660.54, "end": 2662.44, "text": " vector will become a row vector in the broadcasting,", "tokens": [51165, 8062, 486, 1813, 257, 5386, 8062, 294, 264, 30024, 11, 51260], "temperature": 0.0, "avg_logprob": -0.05014582448357704, "compression_ratio": 1.784, "no_speech_prob": 0.0002897364611271769}, {"id": 907, "seek": 264454, "start": 2662.54, "end": 2664.44, "text": " and will replicate vertically.", "tokens": [51265, 293, 486, 25356, 28450, 13, 51360], "temperature": 0.0, "avg_logprob": -0.05014582448357704, "compression_ratio": 1.784, "no_speech_prob": 0.0002897364611271769}, {"id": 908, "seek": 264454, "start": 2664.54, "end": 2666.44, "text": " So that's what's happening here as well. c1, c2", "tokens": [51365, 407, 300, 311, 437, 311, 2737, 510, 382, 731, 13, 269, 16, 11, 269, 17, 51460], "temperature": 0.0, "avg_logprob": -0.05014582448357704, "compression_ratio": 1.784, "no_speech_prob": 0.0002897364611271769}, {"id": 909, "seek": 264454, "start": 2666.54, "end": 2668.44, "text": " is replicated vertically,", "tokens": [51465, 307, 46365, 28450, 11, 51560], "temperature": 0.0, "avg_logprob": -0.05014582448357704, "compression_ratio": 1.784, "no_speech_prob": 0.0002897364611271769}, {"id": 910, "seek": 264454, "start": 2668.54, "end": 2670.44, "text": " and we see how we have two rows of c1,", "tokens": [51565, 293, 321, 536, 577, 321, 362, 732, 13241, 295, 269, 16, 11, 51660], "temperature": 0.0, "avg_logprob": -0.05014582448357704, "compression_ratio": 1.784, "no_speech_prob": 0.0002897364611271769}, {"id": 911, "seek": 264454, "start": 2670.54, "end": 2672.44, "text": " c2 as a result.", "tokens": [51665, 269, 17, 382, 257, 1874, 13, 51760], "temperature": 0.0, "avg_logprob": -0.05014582448357704, "compression_ratio": 1.784, "no_speech_prob": 0.0002897364611271769}, {"id": 912, "seek": 264454, "start": 2672.54, "end": 2674.44, "text": " So now when I say write it out,", "tokens": [51765, 407, 586, 562, 286, 584, 2464, 309, 484, 11, 51860], "temperature": 0.0, "avg_logprob": -0.05014582448357704, "compression_ratio": 1.784, "no_speech_prob": 0.0002897364611271769}, {"id": 913, "seek": 267454, "start": 2674.54, "end": 2676.44, "text": " I just mean like this.", "tokens": [50365, 286, 445, 914, 411, 341, 13, 50460], "temperature": 0.0, "avg_logprob": -0.07364910670689175, "compression_ratio": 1.7170542635658914, "no_speech_prob": 0.0013897784519940615}, {"id": 914, "seek": 267454, "start": 2676.54, "end": 2678.44, "text": " Basically break up this matrix multiplication", "tokens": [50465, 8537, 1821, 493, 341, 8141, 27290, 50560], "temperature": 0.0, "avg_logprob": -0.07364910670689175, "compression_ratio": 1.7170542635658914, "no_speech_prob": 0.0013897784519940615}, {"id": 915, "seek": 267454, "start": 2678.54, "end": 2680.44, "text": " into the actual thing that's", "tokens": [50565, 666, 264, 3539, 551, 300, 311, 50660], "temperature": 0.0, "avg_logprob": -0.07364910670689175, "compression_ratio": 1.7170542635658914, "no_speech_prob": 0.0013897784519940615}, {"id": 916, "seek": 267454, "start": 2680.54, "end": 2682.44, "text": " going on under the hood.", "tokens": [50665, 516, 322, 833, 264, 13376, 13, 50760], "temperature": 0.0, "avg_logprob": -0.07364910670689175, "compression_ratio": 1.7170542635658914, "no_speech_prob": 0.0013897784519940615}, {"id": 917, "seek": 267454, "start": 2682.54, "end": 2684.44, "text": " So as a result of matrix multiplication", "tokens": [50765, 407, 382, 257, 1874, 295, 8141, 27290, 50860], "temperature": 0.0, "avg_logprob": -0.07364910670689175, "compression_ratio": 1.7170542635658914, "no_speech_prob": 0.0013897784519940615}, {"id": 918, "seek": 267454, "start": 2684.54, "end": 2686.44, "text": " and how it works, d11", "tokens": [50865, 293, 577, 309, 1985, 11, 274, 5348, 50960], "temperature": 0.0, "avg_logprob": -0.07364910670689175, "compression_ratio": 1.7170542635658914, "no_speech_prob": 0.0013897784519940615}, {"id": 919, "seek": 267454, "start": 2686.54, "end": 2688.44, "text": " is the result of a dot product between the", "tokens": [50965, 307, 264, 1874, 295, 257, 5893, 1674, 1296, 264, 51060], "temperature": 0.0, "avg_logprob": -0.07364910670689175, "compression_ratio": 1.7170542635658914, "no_speech_prob": 0.0013897784519940615}, {"id": 920, "seek": 267454, "start": 2688.54, "end": 2690.44, "text": " first row of a and the first column", "tokens": [51065, 700, 5386, 295, 257, 293, 264, 700, 7738, 51160], "temperature": 0.0, "avg_logprob": -0.07364910670689175, "compression_ratio": 1.7170542635658914, "no_speech_prob": 0.0013897784519940615}, {"id": 921, "seek": 267454, "start": 2690.54, "end": 2692.44, "text": " of b. So a11, b11,", "tokens": [51165, 295, 272, 13, 407, 257, 5348, 11, 272, 5348, 11, 51260], "temperature": 0.0, "avg_logprob": -0.07364910670689175, "compression_ratio": 1.7170542635658914, "no_speech_prob": 0.0013897784519940615}, {"id": 922, "seek": 267454, "start": 2692.54, "end": 2694.44, "text": " plus a12, b21,", "tokens": [51265, 1804, 257, 4762, 11, 272, 4436, 11, 51360], "temperature": 0.0, "avg_logprob": -0.07364910670689175, "compression_ratio": 1.7170542635658914, "no_speech_prob": 0.0013897784519940615}, {"id": 923, "seek": 267454, "start": 2694.54, "end": 2696.44, "text": " plus c1.", "tokens": [51365, 1804, 269, 16, 13, 51460], "temperature": 0.0, "avg_logprob": -0.07364910670689175, "compression_ratio": 1.7170542635658914, "no_speech_prob": 0.0013897784519940615}, {"id": 924, "seek": 267454, "start": 2696.54, "end": 2698.44, "text": " And so on", "tokens": [51465, 400, 370, 322, 51560], "temperature": 0.0, "avg_logprob": -0.07364910670689175, "compression_ratio": 1.7170542635658914, "no_speech_prob": 0.0013897784519940615}, {"id": 925, "seek": 267454, "start": 2698.54, "end": 2700.44, "text": " and so forth for all the other elements of d.", "tokens": [51565, 293, 370, 5220, 337, 439, 264, 661, 4959, 295, 274, 13, 51660], "temperature": 0.0, "avg_logprob": -0.07364910670689175, "compression_ratio": 1.7170542635658914, "no_speech_prob": 0.0013897784519940615}, {"id": 926, "seek": 267454, "start": 2700.54, "end": 2702.44, "text": " And once you actually write", "tokens": [51665, 400, 1564, 291, 767, 2464, 51760], "temperature": 0.0, "avg_logprob": -0.07364910670689175, "compression_ratio": 1.7170542635658914, "no_speech_prob": 0.0013897784519940615}, {"id": 927, "seek": 267454, "start": 2702.54, "end": 2704.44, "text": " it out, it becomes obvious that it's just a bunch of", "tokens": [51765, 309, 484, 11, 309, 3643, 6322, 300, 309, 311, 445, 257, 3840, 295, 51860], "temperature": 0.0, "avg_logprob": -0.07364910670689175, "compression_ratio": 1.7170542635658914, "no_speech_prob": 0.0013897784519940615}, {"id": 928, "seek": 270444, "start": 2704.44, "end": 2706.34, "text": " multiplies and adds.", "tokens": [50365, 12788, 530, 293, 10860, 13, 50460], "temperature": 0.0, "avg_logprob": -0.07809753940530019, "compression_ratio": 1.7303754266211604, "no_speech_prob": 0.0008221903699450195}, {"id": 929, "seek": 270444, "start": 2706.44, "end": 2708.34, "text": " And we know from micrograd", "tokens": [50465, 400, 321, 458, 490, 4532, 7165, 50560], "temperature": 0.0, "avg_logprob": -0.07809753940530019, "compression_ratio": 1.7303754266211604, "no_speech_prob": 0.0008221903699450195}, {"id": 930, "seek": 270444, "start": 2708.44, "end": 2710.34, "text": " how to differentiate multiplies and adds.", "tokens": [50565, 577, 281, 23203, 12788, 530, 293, 10860, 13, 50660], "temperature": 0.0, "avg_logprob": -0.07809753940530019, "compression_ratio": 1.7303754266211604, "no_speech_prob": 0.0008221903699450195}, {"id": 931, "seek": 270444, "start": 2710.44, "end": 2712.34, "text": " And so this is not scary anymore.", "tokens": [50665, 400, 370, 341, 307, 406, 6958, 3602, 13, 50760], "temperature": 0.0, "avg_logprob": -0.07809753940530019, "compression_ratio": 1.7303754266211604, "no_speech_prob": 0.0008221903699450195}, {"id": 932, "seek": 270444, "start": 2712.44, "end": 2714.34, "text": " It's not just matrix multiplication.", "tokens": [50765, 467, 311, 406, 445, 8141, 27290, 13, 50860], "temperature": 0.0, "avg_logprob": -0.07809753940530019, "compression_ratio": 1.7303754266211604, "no_speech_prob": 0.0008221903699450195}, {"id": 933, "seek": 270444, "start": 2714.44, "end": 2716.34, "text": " It's just tedious, unfortunately.", "tokens": [50865, 467, 311, 445, 38284, 11, 7015, 13, 50960], "temperature": 0.0, "avg_logprob": -0.07809753940530019, "compression_ratio": 1.7303754266211604, "no_speech_prob": 0.0008221903699450195}, {"id": 934, "seek": 270444, "start": 2716.44, "end": 2718.34, "text": " But this is completely tractable.", "tokens": [50965, 583, 341, 307, 2584, 24207, 712, 13, 51060], "temperature": 0.0, "avg_logprob": -0.07809753940530019, "compression_ratio": 1.7303754266211604, "no_speech_prob": 0.0008221903699450195}, {"id": 935, "seek": 270444, "start": 2718.44, "end": 2720.34, "text": " We have dl by d for all of these,", "tokens": [51065, 492, 362, 37873, 538, 274, 337, 439, 295, 613, 11, 51160], "temperature": 0.0, "avg_logprob": -0.07809753940530019, "compression_ratio": 1.7303754266211604, "no_speech_prob": 0.0008221903699450195}, {"id": 936, "seek": 270444, "start": 2720.44, "end": 2722.34, "text": " and we want dl by", "tokens": [51165, 293, 321, 528, 37873, 538, 51260], "temperature": 0.0, "avg_logprob": -0.07809753940530019, "compression_ratio": 1.7303754266211604, "no_speech_prob": 0.0008221903699450195}, {"id": 937, "seek": 270444, "start": 2722.44, "end": 2724.34, "text": " all these little other variables.", "tokens": [51265, 439, 613, 707, 661, 9102, 13, 51360], "temperature": 0.0, "avg_logprob": -0.07809753940530019, "compression_ratio": 1.7303754266211604, "no_speech_prob": 0.0008221903699450195}, {"id": 938, "seek": 270444, "start": 2724.44, "end": 2726.34, "text": " So how do we achieve that, and how do we", "tokens": [51365, 407, 577, 360, 321, 4584, 300, 11, 293, 577, 360, 321, 51460], "temperature": 0.0, "avg_logprob": -0.07809753940530019, "compression_ratio": 1.7303754266211604, "no_speech_prob": 0.0008221903699450195}, {"id": 939, "seek": 270444, "start": 2726.44, "end": 2728.34, "text": " actually get the gradients?", "tokens": [51465, 767, 483, 264, 2771, 2448, 30, 51560], "temperature": 0.0, "avg_logprob": -0.07809753940530019, "compression_ratio": 1.7303754266211604, "no_speech_prob": 0.0008221903699450195}, {"id": 940, "seek": 270444, "start": 2728.44, "end": 2730.34, "text": " Okay, so the low-budget production continues here.", "tokens": [51565, 1033, 11, 370, 264, 2295, 12, 18281, 847, 4265, 6515, 510, 13, 51660], "temperature": 0.0, "avg_logprob": -0.07809753940530019, "compression_ratio": 1.7303754266211604, "no_speech_prob": 0.0008221903699450195}, {"id": 941, "seek": 270444, "start": 2730.44, "end": 2732.34, "text": " So let's, for example, derive", "tokens": [51665, 407, 718, 311, 11, 337, 1365, 11, 28446, 51760], "temperature": 0.0, "avg_logprob": -0.07809753940530019, "compression_ratio": 1.7303754266211604, "no_speech_prob": 0.0008221903699450195}, {"id": 942, "seek": 270444, "start": 2732.44, "end": 2734.34, "text": " the derivative of the loss with respect to", "tokens": [51765, 264, 13760, 295, 264, 4470, 365, 3104, 281, 51860], "temperature": 0.0, "avg_logprob": -0.07809753940530019, "compression_ratio": 1.7303754266211604, "no_speech_prob": 0.0008221903699450195}, {"id": 943, "seek": 273434, "start": 2734.34, "end": 2736.2400000000002, "text": " a11.", "tokens": [50365, 257, 5348, 13, 50460], "temperature": 0.0, "avg_logprob": -0.07021088966956505, "compression_ratio": 1.6621004566210045, "no_speech_prob": 0.0006931216339580715}, {"id": 944, "seek": 273434, "start": 2736.34, "end": 2738.2400000000002, "text": " We see here that a11 occurs twice", "tokens": [50465, 492, 536, 510, 300, 257, 5348, 11843, 6091, 50560], "temperature": 0.0, "avg_logprob": -0.07021088966956505, "compression_ratio": 1.6621004566210045, "no_speech_prob": 0.0006931216339580715}, {"id": 945, "seek": 273434, "start": 2738.34, "end": 2740.2400000000002, "text": " in our simple expression, right here, right here,", "tokens": [50565, 294, 527, 2199, 6114, 11, 558, 510, 11, 558, 510, 11, 50660], "temperature": 0.0, "avg_logprob": -0.07021088966956505, "compression_ratio": 1.6621004566210045, "no_speech_prob": 0.0006931216339580715}, {"id": 946, "seek": 273434, "start": 2740.34, "end": 2742.2400000000002, "text": " and influences d11 and d12.", "tokens": [50665, 293, 21222, 274, 5348, 293, 274, 4762, 13, 50760], "temperature": 0.0, "avg_logprob": -0.07021088966956505, "compression_ratio": 1.6621004566210045, "no_speech_prob": 0.0006931216339580715}, {"id": 947, "seek": 273434, "start": 2742.34, "end": 2744.2400000000002, "text": " So this is, so what", "tokens": [50765, 407, 341, 307, 11, 370, 437, 50860], "temperature": 0.0, "avg_logprob": -0.07021088966956505, "compression_ratio": 1.6621004566210045, "no_speech_prob": 0.0006931216339580715}, {"id": 948, "seek": 273434, "start": 2744.34, "end": 2746.2400000000002, "text": " is dl by d a11?", "tokens": [50865, 307, 37873, 538, 274, 257, 5348, 30, 50960], "temperature": 0.0, "avg_logprob": -0.07021088966956505, "compression_ratio": 1.6621004566210045, "no_speech_prob": 0.0006931216339580715}, {"id": 949, "seek": 273434, "start": 2746.34, "end": 2748.2400000000002, "text": " Well, it's dl by d11", "tokens": [50965, 1042, 11, 309, 311, 37873, 538, 274, 5348, 51060], "temperature": 0.0, "avg_logprob": -0.07021088966956505, "compression_ratio": 1.6621004566210045, "no_speech_prob": 0.0006931216339580715}, {"id": 950, "seek": 273434, "start": 2748.34, "end": 2750.2400000000002, "text": " times", "tokens": [51065, 1413, 51160], "temperature": 0.0, "avg_logprob": -0.07021088966956505, "compression_ratio": 1.6621004566210045, "no_speech_prob": 0.0006931216339580715}, {"id": 951, "seek": 273434, "start": 2750.34, "end": 2752.2400000000002, "text": " the local derivative of d11,", "tokens": [51165, 264, 2654, 13760, 295, 274, 5348, 11, 51260], "temperature": 0.0, "avg_logprob": -0.07021088966956505, "compression_ratio": 1.6621004566210045, "no_speech_prob": 0.0006931216339580715}, {"id": 952, "seek": 273434, "start": 2752.34, "end": 2754.2400000000002, "text": " which in this case is just b11,", "tokens": [51265, 597, 294, 341, 1389, 307, 445, 272, 5348, 11, 51360], "temperature": 0.0, "avg_logprob": -0.07021088966956505, "compression_ratio": 1.6621004566210045, "no_speech_prob": 0.0006931216339580715}, {"id": 953, "seek": 273434, "start": 2754.34, "end": 2756.2400000000002, "text": " because that's what's multiplying", "tokens": [51365, 570, 300, 311, 437, 311, 30955, 51460], "temperature": 0.0, "avg_logprob": -0.07021088966956505, "compression_ratio": 1.6621004566210045, "no_speech_prob": 0.0006931216339580715}, {"id": 954, "seek": 273434, "start": 2756.34, "end": 2758.2400000000002, "text": " a11 here.", "tokens": [51465, 257, 5348, 510, 13, 51560], "temperature": 0.0, "avg_logprob": -0.07021088966956505, "compression_ratio": 1.6621004566210045, "no_speech_prob": 0.0006931216339580715}, {"id": 955, "seek": 273434, "start": 2758.34, "end": 2760.2400000000002, "text": " And likewise here, the local", "tokens": [51565, 400, 32407, 510, 11, 264, 2654, 51660], "temperature": 0.0, "avg_logprob": -0.07021088966956505, "compression_ratio": 1.6621004566210045, "no_speech_prob": 0.0006931216339580715}, {"id": 956, "seek": 273434, "start": 2760.34, "end": 2762.2400000000002, "text": " derivative of d12 with respect to a11", "tokens": [51665, 13760, 295, 274, 4762, 365, 3104, 281, 257, 5348, 51760], "temperature": 0.0, "avg_logprob": -0.07021088966956505, "compression_ratio": 1.6621004566210045, "no_speech_prob": 0.0006931216339580715}, {"id": 957, "seek": 273434, "start": 2762.34, "end": 2764.2400000000002, "text": " is just b12.", "tokens": [51765, 307, 445, 272, 4762, 13, 51860], "temperature": 0.0, "avg_logprob": -0.07021088966956505, "compression_ratio": 1.6621004566210045, "no_speech_prob": 0.0006931216339580715}, {"id": 958, "seek": 276424, "start": 2764.24, "end": 2766.14, "text": " And so b12 will, in the chain rule, therefore,", "tokens": [50365, 400, 370, 272, 4762, 486, 11, 294, 264, 5021, 4978, 11, 4412, 11, 50460], "temperature": 0.0, "avg_logprob": -0.07327790941510881, "compression_ratio": 1.7222222222222223, "no_speech_prob": 0.0003057649591937661}, {"id": 959, "seek": 276424, "start": 2766.24, "end": 2768.14, "text": " multiply dl by d12.", "tokens": [50465, 12972, 37873, 538, 274, 4762, 13, 50560], "temperature": 0.0, "avg_logprob": -0.07327790941510881, "compression_ratio": 1.7222222222222223, "no_speech_prob": 0.0003057649591937661}, {"id": 960, "seek": 276424, "start": 2768.24, "end": 2770.14, "text": " And then, because a11", "tokens": [50565, 400, 550, 11, 570, 257, 5348, 50660], "temperature": 0.0, "avg_logprob": -0.07327790941510881, "compression_ratio": 1.7222222222222223, "no_speech_prob": 0.0003057649591937661}, {"id": 961, "seek": 276424, "start": 2770.24, "end": 2772.14, "text": " is used both to produce", "tokens": [50665, 307, 1143, 1293, 281, 5258, 50760], "temperature": 0.0, "avg_logprob": -0.07327790941510881, "compression_ratio": 1.7222222222222223, "no_speech_prob": 0.0003057649591937661}, {"id": 962, "seek": 276424, "start": 2772.24, "end": 2774.14, "text": " d11 and d12, we need", "tokens": [50765, 274, 5348, 293, 274, 4762, 11, 321, 643, 50860], "temperature": 0.0, "avg_logprob": -0.07327790941510881, "compression_ratio": 1.7222222222222223, "no_speech_prob": 0.0003057649591937661}, {"id": 963, "seek": 276424, "start": 2774.24, "end": 2776.14, "text": " to add up the contributions", "tokens": [50865, 281, 909, 493, 264, 15725, 50960], "temperature": 0.0, "avg_logprob": -0.07327790941510881, "compression_ratio": 1.7222222222222223, "no_speech_prob": 0.0003057649591937661}, {"id": 964, "seek": 276424, "start": 2776.24, "end": 2778.14, "text": " of both of those sort of", "tokens": [50965, 295, 1293, 295, 729, 1333, 295, 51060], "temperature": 0.0, "avg_logprob": -0.07327790941510881, "compression_ratio": 1.7222222222222223, "no_speech_prob": 0.0003057649591937661}, {"id": 965, "seek": 276424, "start": 2778.24, "end": 2780.14, "text": " chains that are running in parallel.", "tokens": [51065, 12626, 300, 366, 2614, 294, 8952, 13, 51160], "temperature": 0.0, "avg_logprob": -0.07327790941510881, "compression_ratio": 1.7222222222222223, "no_speech_prob": 0.0003057649591937661}, {"id": 966, "seek": 276424, "start": 2780.24, "end": 2782.14, "text": " And that's why we get a plus, just", "tokens": [51165, 400, 300, 311, 983, 321, 483, 257, 1804, 11, 445, 51260], "temperature": 0.0, "avg_logprob": -0.07327790941510881, "compression_ratio": 1.7222222222222223, "no_speech_prob": 0.0003057649591937661}, {"id": 967, "seek": 276424, "start": 2782.24, "end": 2784.14, "text": " adding up those two,", "tokens": [51265, 5127, 493, 729, 732, 11, 51360], "temperature": 0.0, "avg_logprob": -0.07327790941510881, "compression_ratio": 1.7222222222222223, "no_speech_prob": 0.0003057649591937661}, {"id": 968, "seek": 276424, "start": 2784.24, "end": 2786.14, "text": " those two contributions. And that gives", "tokens": [51365, 729, 732, 15725, 13, 400, 300, 2709, 51460], "temperature": 0.0, "avg_logprob": -0.07327790941510881, "compression_ratio": 1.7222222222222223, "no_speech_prob": 0.0003057649591937661}, {"id": 969, "seek": 276424, "start": 2786.24, "end": 2788.14, "text": " us dl by d a11.", "tokens": [51465, 505, 37873, 538, 274, 257, 5348, 13, 51560], "temperature": 0.0, "avg_logprob": -0.07327790941510881, "compression_ratio": 1.7222222222222223, "no_speech_prob": 0.0003057649591937661}, {"id": 970, "seek": 276424, "start": 2788.24, "end": 2790.14, "text": " We can do the exact same analysis for", "tokens": [51565, 492, 393, 360, 264, 1900, 912, 5215, 337, 51660], "temperature": 0.0, "avg_logprob": -0.07327790941510881, "compression_ratio": 1.7222222222222223, "no_speech_prob": 0.0003057649591937661}, {"id": 971, "seek": 276424, "start": 2790.24, "end": 2792.14, "text": " the other one, for all the other", "tokens": [51665, 264, 661, 472, 11, 337, 439, 264, 661, 51760], "temperature": 0.0, "avg_logprob": -0.07327790941510881, "compression_ratio": 1.7222222222222223, "no_speech_prob": 0.0003057649591937661}, {"id": 972, "seek": 276424, "start": 2792.24, "end": 2794.14, "text": " elements of a. And when you", "tokens": [51765, 4959, 295, 257, 13, 400, 562, 291, 51860], "temperature": 0.0, "avg_logprob": -0.07327790941510881, "compression_ratio": 1.7222222222222223, "no_speech_prob": 0.0003057649591937661}, {"id": 973, "seek": 279414, "start": 2794.14, "end": 2796.04, "text": " simply write it out, it's just super", "tokens": [50365, 2935, 2464, 309, 484, 11, 309, 311, 445, 1687, 50460], "temperature": 0.0, "avg_logprob": -0.10121696751292159, "compression_ratio": 1.6355140186915889, "no_speech_prob": 0.0009276805794797838}, {"id": 974, "seek": 279414, "start": 2796.14, "end": 2798.04, "text": " simple taking of", "tokens": [50465, 2199, 1940, 295, 50560], "temperature": 0.0, "avg_logprob": -0.10121696751292159, "compression_ratio": 1.6355140186915889, "no_speech_prob": 0.0009276805794797838}, {"id": 975, "seek": 279414, "start": 2798.14, "end": 2800.04, "text": " gradients on, you know,", "tokens": [50565, 2771, 2448, 322, 11, 291, 458, 11, 50660], "temperature": 0.0, "avg_logprob": -0.10121696751292159, "compression_ratio": 1.6355140186915889, "no_speech_prob": 0.0009276805794797838}, {"id": 976, "seek": 279414, "start": 2800.14, "end": 2802.04, "text": " expressions like this.", "tokens": [50665, 15277, 411, 341, 13, 50760], "temperature": 0.0, "avg_logprob": -0.10121696751292159, "compression_ratio": 1.6355140186915889, "no_speech_prob": 0.0009276805794797838}, {"id": 977, "seek": 279414, "start": 2802.14, "end": 2804.04, "text": " You find that", "tokens": [50765, 509, 915, 300, 50860], "temperature": 0.0, "avg_logprob": -0.10121696751292159, "compression_ratio": 1.6355140186915889, "no_speech_prob": 0.0009276805794797838}, {"id": 978, "seek": 279414, "start": 2804.14, "end": 2806.04, "text": " this matrix, dl by d a,", "tokens": [50865, 341, 8141, 11, 37873, 538, 274, 257, 11, 50960], "temperature": 0.0, "avg_logprob": -0.10121696751292159, "compression_ratio": 1.6355140186915889, "no_speech_prob": 0.0009276805794797838}, {"id": 979, "seek": 279414, "start": 2806.14, "end": 2808.04, "text": " that we're after, right, if we", "tokens": [50965, 300, 321, 434, 934, 11, 558, 11, 498, 321, 51060], "temperature": 0.0, "avg_logprob": -0.10121696751292159, "compression_ratio": 1.6355140186915889, "no_speech_prob": 0.0009276805794797838}, {"id": 980, "seek": 279414, "start": 2808.14, "end": 2810.04, "text": " just arrange all of them in the", "tokens": [51065, 445, 9424, 439, 295, 552, 294, 264, 51160], "temperature": 0.0, "avg_logprob": -0.10121696751292159, "compression_ratio": 1.6355140186915889, "no_speech_prob": 0.0009276805794797838}, {"id": 981, "seek": 279414, "start": 2810.14, "end": 2812.04, "text": " same shape as a takes, so", "tokens": [51165, 912, 3909, 382, 257, 2516, 11, 370, 51260], "temperature": 0.0, "avg_logprob": -0.10121696751292159, "compression_ratio": 1.6355140186915889, "no_speech_prob": 0.0009276805794797838}, {"id": 982, "seek": 279414, "start": 2812.14, "end": 2814.04, "text": " a is just a 2x2 matrix, so", "tokens": [51265, 257, 307, 445, 257, 568, 87, 17, 8141, 11, 370, 51360], "temperature": 0.0, "avg_logprob": -0.10121696751292159, "compression_ratio": 1.6355140186915889, "no_speech_prob": 0.0009276805794797838}, {"id": 983, "seek": 279414, "start": 2814.14, "end": 2816.04, "text": " dl by d a here will be", "tokens": [51365, 37873, 538, 274, 257, 510, 486, 312, 51460], "temperature": 0.0, "avg_logprob": -0.10121696751292159, "compression_ratio": 1.6355140186915889, "no_speech_prob": 0.0009276805794797838}, {"id": 984, "seek": 279414, "start": 2816.14, "end": 2818.04, "text": " also just the same", "tokens": [51465, 611, 445, 264, 912, 51560], "temperature": 0.0, "avg_logprob": -0.10121696751292159, "compression_ratio": 1.6355140186915889, "no_speech_prob": 0.0009276805794797838}, {"id": 985, "seek": 279414, "start": 2818.14, "end": 2820.04, "text": " shape", "tokens": [51565, 3909, 51660], "temperature": 0.0, "avg_logprob": -0.10121696751292159, "compression_ratio": 1.6355140186915889, "no_speech_prob": 0.0009276805794797838}, {"id": 986, "seek": 279414, "start": 2820.14, "end": 2822.04, "text": " tensor with the derivatives", "tokens": [51665, 40863, 365, 264, 33733, 51760], "temperature": 0.0, "avg_logprob": -0.10121696751292159, "compression_ratio": 1.6355140186915889, "no_speech_prob": 0.0009276805794797838}, {"id": 987, "seek": 279414, "start": 2822.14, "end": 2824.04, "text": " now. So dl by d a11", "tokens": [51765, 586, 13, 407, 37873, 538, 274, 257, 5348, 51860], "temperature": 0.0, "avg_logprob": -0.10121696751292159, "compression_ratio": 1.6355140186915889, "no_speech_prob": 0.0009276805794797838}, {"id": 988, "seek": 282404, "start": 2824.04, "end": 2825.94, "text": " etc. And we see that actually", "tokens": [50365, 5183, 13, 400, 321, 536, 300, 767, 50460], "temperature": 0.0, "avg_logprob": -0.08427295230683826, "compression_ratio": 1.9466666666666668, "no_speech_prob": 0.0006035161786712706}, {"id": 989, "seek": 282404, "start": 2826.04, "end": 2827.94, "text": " we can express what we've written out here", "tokens": [50465, 321, 393, 5109, 437, 321, 600, 3720, 484, 510, 50560], "temperature": 0.0, "avg_logprob": -0.08427295230683826, "compression_ratio": 1.9466666666666668, "no_speech_prob": 0.0006035161786712706}, {"id": 990, "seek": 282404, "start": 2828.04, "end": 2829.94, "text": " as a matrix multiply.", "tokens": [50565, 382, 257, 8141, 12972, 13, 50660], "temperature": 0.0, "avg_logprob": -0.08427295230683826, "compression_ratio": 1.9466666666666668, "no_speech_prob": 0.0006035161786712706}, {"id": 991, "seek": 282404, "start": 2830.04, "end": 2831.94, "text": " And so it just so", "tokens": [50665, 400, 370, 309, 445, 370, 50760], "temperature": 0.0, "avg_logprob": -0.08427295230683826, "compression_ratio": 1.9466666666666668, "no_speech_prob": 0.0006035161786712706}, {"id": 992, "seek": 282404, "start": 2832.04, "end": 2833.94, "text": " happens that dl by, that all", "tokens": [50765, 2314, 300, 37873, 538, 11, 300, 439, 50860], "temperature": 0.0, "avg_logprob": -0.08427295230683826, "compression_ratio": 1.9466666666666668, "no_speech_prob": 0.0006035161786712706}, {"id": 993, "seek": 282404, "start": 2834.04, "end": 2835.94, "text": " of these formulas that we've derived here", "tokens": [50865, 295, 613, 30546, 300, 321, 600, 18949, 510, 50960], "temperature": 0.0, "avg_logprob": -0.08427295230683826, "compression_ratio": 1.9466666666666668, "no_speech_prob": 0.0006035161786712706}, {"id": 994, "seek": 282404, "start": 2836.04, "end": 2837.94, "text": " by taking gradients can actually", "tokens": [50965, 538, 1940, 2771, 2448, 393, 767, 51060], "temperature": 0.0, "avg_logprob": -0.08427295230683826, "compression_ratio": 1.9466666666666668, "no_speech_prob": 0.0006035161786712706}, {"id": 995, "seek": 282404, "start": 2838.04, "end": 2839.94, "text": " be expressed as a matrix multiplication.", "tokens": [51065, 312, 12675, 382, 257, 8141, 27290, 13, 51160], "temperature": 0.0, "avg_logprob": -0.08427295230683826, "compression_ratio": 1.9466666666666668, "no_speech_prob": 0.0006035161786712706}, {"id": 996, "seek": 282404, "start": 2840.04, "end": 2841.94, "text": " And in particular, we see that it is the matrix", "tokens": [51165, 400, 294, 1729, 11, 321, 536, 300, 309, 307, 264, 8141, 51260], "temperature": 0.0, "avg_logprob": -0.08427295230683826, "compression_ratio": 1.9466666666666668, "no_speech_prob": 0.0006035161786712706}, {"id": 997, "seek": 282404, "start": 2842.04, "end": 2843.94, "text": " multiplication of these two", "tokens": [51265, 27290, 295, 613, 732, 51360], "temperature": 0.0, "avg_logprob": -0.08427295230683826, "compression_ratio": 1.9466666666666668, "no_speech_prob": 0.0006035161786712706}, {"id": 998, "seek": 282404, "start": 2844.04, "end": 2845.94, "text": " matrices. So it", "tokens": [51365, 32284, 13, 407, 309, 51460], "temperature": 0.0, "avg_logprob": -0.08427295230683826, "compression_ratio": 1.9466666666666668, "no_speech_prob": 0.0006035161786712706}, {"id": 999, "seek": 282404, "start": 2846.04, "end": 2847.94, "text": " is the dl", "tokens": [51465, 307, 264, 37873, 51560], "temperature": 0.0, "avg_logprob": -0.08427295230683826, "compression_ratio": 1.9466666666666668, "no_speech_prob": 0.0006035161786712706}, {"id": 1000, "seek": 282404, "start": 2848.04, "end": 2849.94, "text": " by d and then matrix", "tokens": [51565, 538, 274, 293, 550, 8141, 51660], "temperature": 0.0, "avg_logprob": -0.08427295230683826, "compression_ratio": 1.9466666666666668, "no_speech_prob": 0.0006035161786712706}, {"id": 1001, "seek": 282404, "start": 2850.04, "end": 2851.94, "text": " multiplying b, but b", "tokens": [51665, 30955, 272, 11, 457, 272, 51760], "temperature": 0.0, "avg_logprob": -0.08427295230683826, "compression_ratio": 1.9466666666666668, "no_speech_prob": 0.0006035161786712706}, {"id": 1002, "seek": 282404, "start": 2852.04, "end": 2853.84, "text": " transpose, actually. So you see that", "tokens": [51765, 25167, 11, 767, 13, 407, 291, 536, 300, 51855], "temperature": 0.0, "avg_logprob": -0.08427295230683826, "compression_ratio": 1.9466666666666668, "no_speech_prob": 0.0006035161786712706}, {"id": 1003, "seek": 285384, "start": 2853.84, "end": 2855.7400000000002, "text": " b21 and b12", "tokens": [50365, 272, 4436, 293, 272, 4762, 50460], "temperature": 0.0, "avg_logprob": -0.08637996875878537, "compression_ratio": 1.640495867768595, "no_speech_prob": 0.0005300578195601702}, {"id": 1004, "seek": 285384, "start": 2855.84, "end": 2857.7400000000002, "text": " have changed place,", "tokens": [50465, 362, 3105, 1081, 11, 50560], "temperature": 0.0, "avg_logprob": -0.08637996875878537, "compression_ratio": 1.640495867768595, "no_speech_prob": 0.0005300578195601702}, {"id": 1005, "seek": 285384, "start": 2857.84, "end": 2859.7400000000002, "text": " whereas before we had, of course,", "tokens": [50565, 9735, 949, 321, 632, 11, 295, 1164, 11, 50660], "temperature": 0.0, "avg_logprob": -0.08637996875878537, "compression_ratio": 1.640495867768595, "no_speech_prob": 0.0005300578195601702}, {"id": 1006, "seek": 285384, "start": 2859.84, "end": 2861.7400000000002, "text": " b11, b12, b21,", "tokens": [50665, 272, 5348, 11, 272, 4762, 11, 272, 4436, 11, 50760], "temperature": 0.0, "avg_logprob": -0.08637996875878537, "compression_ratio": 1.640495867768595, "no_speech_prob": 0.0005300578195601702}, {"id": 1007, "seek": 285384, "start": 2861.84, "end": 2863.7400000000002, "text": " b22. So you see that", "tokens": [50765, 272, 7490, 13, 407, 291, 536, 300, 50860], "temperature": 0.0, "avg_logprob": -0.08637996875878537, "compression_ratio": 1.640495867768595, "no_speech_prob": 0.0005300578195601702}, {"id": 1008, "seek": 285384, "start": 2863.84, "end": 2865.7400000000002, "text": " this other matrix, b,", "tokens": [50865, 341, 661, 8141, 11, 272, 11, 50960], "temperature": 0.0, "avg_logprob": -0.08637996875878537, "compression_ratio": 1.640495867768595, "no_speech_prob": 0.0005300578195601702}, {"id": 1009, "seek": 285384, "start": 2865.84, "end": 2867.7400000000002, "text": " is transposed. And so", "tokens": [50965, 307, 7132, 1744, 13, 400, 370, 51060], "temperature": 0.0, "avg_logprob": -0.08637996875878537, "compression_ratio": 1.640495867768595, "no_speech_prob": 0.0005300578195601702}, {"id": 1010, "seek": 285384, "start": 2867.84, "end": 2869.7400000000002, "text": " basically what we have, long story short, just by", "tokens": [51065, 1936, 437, 321, 362, 11, 938, 1657, 2099, 11, 445, 538, 51160], "temperature": 0.0, "avg_logprob": -0.08637996875878537, "compression_ratio": 1.640495867768595, "no_speech_prob": 0.0005300578195601702}, {"id": 1011, "seek": 285384, "start": 2869.84, "end": 2871.7400000000002, "text": " doing very simple reasoning here,", "tokens": [51165, 884, 588, 2199, 21577, 510, 11, 51260], "temperature": 0.0, "avg_logprob": -0.08637996875878537, "compression_ratio": 1.640495867768595, "no_speech_prob": 0.0005300578195601702}, {"id": 1012, "seek": 285384, "start": 2871.84, "end": 2873.7400000000002, "text": " by breaking up the expression in the case of", "tokens": [51265, 538, 7697, 493, 264, 6114, 294, 264, 1389, 295, 51360], "temperature": 0.0, "avg_logprob": -0.08637996875878537, "compression_ratio": 1.640495867768595, "no_speech_prob": 0.0005300578195601702}, {"id": 1013, "seek": 285384, "start": 2873.84, "end": 2875.7400000000002, "text": " a very simple example, is that", "tokens": [51365, 257, 588, 2199, 1365, 11, 307, 300, 51460], "temperature": 0.0, "avg_logprob": -0.08637996875878537, "compression_ratio": 1.640495867768595, "no_speech_prob": 0.0005300578195601702}, {"id": 1014, "seek": 285384, "start": 2875.84, "end": 2877.7400000000002, "text": " dl by d a is", "tokens": [51465, 37873, 538, 274, 257, 307, 51560], "temperature": 0.0, "avg_logprob": -0.08637996875878537, "compression_ratio": 1.640495867768595, "no_speech_prob": 0.0005300578195601702}, {"id": 1015, "seek": 285384, "start": 2877.84, "end": 2879.7400000000002, "text": " which is this, is simply", "tokens": [51565, 597, 307, 341, 11, 307, 2935, 51660], "temperature": 0.0, "avg_logprob": -0.08637996875878537, "compression_ratio": 1.640495867768595, "no_speech_prob": 0.0005300578195601702}, {"id": 1016, "seek": 285384, "start": 2879.84, "end": 2881.7400000000002, "text": " equal to dl by dd matrix", "tokens": [51665, 2681, 281, 37873, 538, 274, 67, 8141, 51760], "temperature": 0.0, "avg_logprob": -0.08637996875878537, "compression_ratio": 1.640495867768595, "no_speech_prob": 0.0005300578195601702}, {"id": 1017, "seek": 285384, "start": 2881.84, "end": 2883.7400000000002, "text": " multiplied with b transpose.", "tokens": [51765, 17207, 365, 272, 25167, 13, 51860], "temperature": 0.0, "avg_logprob": -0.08637996875878537, "compression_ratio": 1.640495867768595, "no_speech_prob": 0.0005300578195601702}, {"id": 1018, "seek": 288384, "start": 2883.84, "end": 2885.7400000000002, "text": " So that", "tokens": [50365, 407, 300, 50460], "temperature": 0.0, "avg_logprob": -0.07458322635595349, "compression_ratio": 1.731060606060606, "no_speech_prob": 0.0007673675427213311}, {"id": 1019, "seek": 288384, "start": 2885.84, "end": 2887.7400000000002, "text": " is what we have so far.", "tokens": [50465, 307, 437, 321, 362, 370, 1400, 13, 50560], "temperature": 0.0, "avg_logprob": -0.07458322635595349, "compression_ratio": 1.731060606060606, "no_speech_prob": 0.0007673675427213311}, {"id": 1020, "seek": 288384, "start": 2887.84, "end": 2889.7400000000002, "text": " Now, we also want the derivative with respect to", "tokens": [50565, 823, 11, 321, 611, 528, 264, 13760, 365, 3104, 281, 50660], "temperature": 0.0, "avg_logprob": -0.07458322635595349, "compression_ratio": 1.731060606060606, "no_speech_prob": 0.0007673675427213311}, {"id": 1021, "seek": 288384, "start": 2889.84, "end": 2891.7400000000002, "text": " b and c.", "tokens": [50665, 272, 293, 269, 13, 50760], "temperature": 0.0, "avg_logprob": -0.07458322635595349, "compression_ratio": 1.731060606060606, "no_speech_prob": 0.0007673675427213311}, {"id": 1022, "seek": 288384, "start": 2891.84, "end": 2893.7400000000002, "text": " Now, for", "tokens": [50765, 823, 11, 337, 50860], "temperature": 0.0, "avg_logprob": -0.07458322635595349, "compression_ratio": 1.731060606060606, "no_speech_prob": 0.0007673675427213311}, {"id": 1023, "seek": 288384, "start": 2893.84, "end": 2895.7400000000002, "text": " b, I'm not actually doing the full derivation", "tokens": [50865, 272, 11, 286, 478, 406, 767, 884, 264, 1577, 10151, 399, 50960], "temperature": 0.0, "avg_logprob": -0.07458322635595349, "compression_ratio": 1.731060606060606, "no_speech_prob": 0.0007673675427213311}, {"id": 1024, "seek": 288384, "start": 2895.84, "end": 2897.7400000000002, "text": " because, honestly, it's", "tokens": [50965, 570, 11, 6095, 11, 309, 311, 51060], "temperature": 0.0, "avg_logprob": -0.07458322635595349, "compression_ratio": 1.731060606060606, "no_speech_prob": 0.0007673675427213311}, {"id": 1025, "seek": 288384, "start": 2897.84, "end": 2899.7400000000002, "text": " not deep. It's just", "tokens": [51065, 406, 2452, 13, 467, 311, 445, 51160], "temperature": 0.0, "avg_logprob": -0.07458322635595349, "compression_ratio": 1.731060606060606, "no_speech_prob": 0.0007673675427213311}, {"id": 1026, "seek": 288384, "start": 2899.84, "end": 2901.7400000000002, "text": " annoying. It's exhausting. You can", "tokens": [51165, 11304, 13, 467, 311, 34076, 13, 509, 393, 51260], "temperature": 0.0, "avg_logprob": -0.07458322635595349, "compression_ratio": 1.731060606060606, "no_speech_prob": 0.0007673675427213311}, {"id": 1027, "seek": 288384, "start": 2901.84, "end": 2903.7400000000002, "text": " actually do this analysis yourself. You'll", "tokens": [51265, 767, 360, 341, 5215, 1803, 13, 509, 603, 51360], "temperature": 0.0, "avg_logprob": -0.07458322635595349, "compression_ratio": 1.731060606060606, "no_speech_prob": 0.0007673675427213311}, {"id": 1028, "seek": 288384, "start": 2903.84, "end": 2905.7400000000002, "text": " also find that if you take these expressions", "tokens": [51365, 611, 915, 300, 498, 291, 747, 613, 15277, 51460], "temperature": 0.0, "avg_logprob": -0.07458322635595349, "compression_ratio": 1.731060606060606, "no_speech_prob": 0.0007673675427213311}, {"id": 1029, "seek": 288384, "start": 2905.84, "end": 2907.7400000000002, "text": " and you differentiate with respect to b", "tokens": [51465, 293, 291, 23203, 365, 3104, 281, 272, 51560], "temperature": 0.0, "avg_logprob": -0.07458322635595349, "compression_ratio": 1.731060606060606, "no_speech_prob": 0.0007673675427213311}, {"id": 1030, "seek": 288384, "start": 2907.84, "end": 2909.7400000000002, "text": " instead of a, you will find that", "tokens": [51565, 2602, 295, 257, 11, 291, 486, 915, 300, 51660], "temperature": 0.0, "avg_logprob": -0.07458322635595349, "compression_ratio": 1.731060606060606, "no_speech_prob": 0.0007673675427213311}, {"id": 1031, "seek": 288384, "start": 2909.84, "end": 2911.7400000000002, "text": " dl by db is also a matrix", "tokens": [51665, 37873, 538, 274, 65, 307, 611, 257, 8141, 51760], "temperature": 0.0, "avg_logprob": -0.07458322635595349, "compression_ratio": 1.731060606060606, "no_speech_prob": 0.0007673675427213311}, {"id": 1032, "seek": 288384, "start": 2911.84, "end": 2913.7400000000002, "text": " multiplication. In this case, you have to take", "tokens": [51765, 27290, 13, 682, 341, 1389, 11, 291, 362, 281, 747, 51860], "temperature": 0.0, "avg_logprob": -0.07458322635595349, "compression_ratio": 1.731060606060606, "no_speech_prob": 0.0007673675427213311}, {"id": 1033, "seek": 291374, "start": 2913.74, "end": 2915.64, "text": " the matrix a and transpose", "tokens": [50365, 264, 8141, 257, 293, 25167, 50460], "temperature": 0.0, "avg_logprob": -0.0583855180598017, "compression_ratio": 1.8609865470852018, "no_speech_prob": 0.0005717103485949337}, {"id": 1034, "seek": 291374, "start": 2915.74, "end": 2917.64, "text": " it and matrix multiply that with", "tokens": [50465, 309, 293, 8141, 12972, 300, 365, 50560], "temperature": 0.0, "avg_logprob": -0.0583855180598017, "compression_ratio": 1.8609865470852018, "no_speech_prob": 0.0005717103485949337}, {"id": 1035, "seek": 291374, "start": 2917.74, "end": 2919.64, "text": " dl by dd.", "tokens": [50565, 37873, 538, 274, 67, 13, 50660], "temperature": 0.0, "avg_logprob": -0.0583855180598017, "compression_ratio": 1.8609865470852018, "no_speech_prob": 0.0005717103485949337}, {"id": 1036, "seek": 291374, "start": 2919.74, "end": 2921.64, "text": " And that's what gives you dl by db.", "tokens": [50665, 400, 300, 311, 437, 2709, 291, 37873, 538, 274, 65, 13, 50760], "temperature": 0.0, "avg_logprob": -0.0583855180598017, "compression_ratio": 1.8609865470852018, "no_speech_prob": 0.0005717103485949337}, {"id": 1037, "seek": 291374, "start": 2921.74, "end": 2923.64, "text": " And then here for", "tokens": [50765, 400, 550, 510, 337, 50860], "temperature": 0.0, "avg_logprob": -0.0583855180598017, "compression_ratio": 1.8609865470852018, "no_speech_prob": 0.0005717103485949337}, {"id": 1038, "seek": 291374, "start": 2923.74, "end": 2925.64, "text": " the offsets, c1 and", "tokens": [50865, 264, 39457, 1385, 11, 269, 16, 293, 50960], "temperature": 0.0, "avg_logprob": -0.0583855180598017, "compression_ratio": 1.8609865470852018, "no_speech_prob": 0.0005717103485949337}, {"id": 1039, "seek": 291374, "start": 2925.74, "end": 2927.64, "text": " c2, if you again just differentiate", "tokens": [50965, 269, 17, 11, 498, 291, 797, 445, 23203, 51060], "temperature": 0.0, "avg_logprob": -0.0583855180598017, "compression_ratio": 1.8609865470852018, "no_speech_prob": 0.0005717103485949337}, {"id": 1040, "seek": 291374, "start": 2927.74, "end": 2929.64, "text": " with respect to c1, you will find", "tokens": [51065, 365, 3104, 281, 269, 16, 11, 291, 486, 915, 51160], "temperature": 0.0, "avg_logprob": -0.0583855180598017, "compression_ratio": 1.8609865470852018, "no_speech_prob": 0.0005717103485949337}, {"id": 1041, "seek": 291374, "start": 2929.74, "end": 2931.64, "text": " an expression like this.", "tokens": [51165, 364, 6114, 411, 341, 13, 51260], "temperature": 0.0, "avg_logprob": -0.0583855180598017, "compression_ratio": 1.8609865470852018, "no_speech_prob": 0.0005717103485949337}, {"id": 1042, "seek": 291374, "start": 2931.74, "end": 2933.64, "text": " And c2, an expression", "tokens": [51265, 400, 269, 17, 11, 364, 6114, 51360], "temperature": 0.0, "avg_logprob": -0.0583855180598017, "compression_ratio": 1.8609865470852018, "no_speech_prob": 0.0005717103485949337}, {"id": 1043, "seek": 291374, "start": 2933.74, "end": 2935.64, "text": " like this. And basically", "tokens": [51365, 411, 341, 13, 400, 1936, 51460], "temperature": 0.0, "avg_logprob": -0.0583855180598017, "compression_ratio": 1.8609865470852018, "no_speech_prob": 0.0005717103485949337}, {"id": 1044, "seek": 291374, "start": 2935.74, "end": 2937.64, "text": " you'll find that dl by dc is", "tokens": [51465, 291, 603, 915, 300, 37873, 538, 274, 66, 307, 51560], "temperature": 0.0, "avg_logprob": -0.0583855180598017, "compression_ratio": 1.8609865470852018, "no_speech_prob": 0.0005717103485949337}, {"id": 1045, "seek": 291374, "start": 2937.74, "end": 2939.64, "text": " simply, because they're just offsetting", "tokens": [51565, 2935, 11, 570, 436, 434, 445, 18687, 783, 51660], "temperature": 0.0, "avg_logprob": -0.0583855180598017, "compression_ratio": 1.8609865470852018, "no_speech_prob": 0.0005717103485949337}, {"id": 1046, "seek": 291374, "start": 2939.74, "end": 2941.64, "text": " these expressions, you just have to take", "tokens": [51665, 613, 15277, 11, 291, 445, 362, 281, 747, 51760], "temperature": 0.0, "avg_logprob": -0.0583855180598017, "compression_ratio": 1.8609865470852018, "no_speech_prob": 0.0005717103485949337}, {"id": 1047, "seek": 291374, "start": 2941.74, "end": 2943.64, "text": " the dl by dd matrix", "tokens": [51765, 264, 37873, 538, 274, 67, 8141, 51860], "temperature": 0.0, "avg_logprob": -0.0583855180598017, "compression_ratio": 1.8609865470852018, "no_speech_prob": 0.0005717103485949337}, {"id": 1048, "seek": 294364, "start": 2943.64, "end": 2945.54, "text": " of the derivatives of d", "tokens": [50365, 295, 264, 33733, 295, 274, 50460], "temperature": 0.0, "avg_logprob": -0.08935685591264204, "compression_ratio": 1.852017937219731, "no_speech_prob": 0.00017973085050471127}, {"id": 1049, "seek": 294364, "start": 2945.64, "end": 2947.54, "text": " and you just have to", "tokens": [50465, 293, 291, 445, 362, 281, 50560], "temperature": 0.0, "avg_logprob": -0.08935685591264204, "compression_ratio": 1.852017937219731, "no_speech_prob": 0.00017973085050471127}, {"id": 1050, "seek": 294364, "start": 2947.64, "end": 2949.54, "text": " sum across the columns.", "tokens": [50565, 2408, 2108, 264, 13766, 13, 50660], "temperature": 0.0, "avg_logprob": -0.08935685591264204, "compression_ratio": 1.852017937219731, "no_speech_prob": 0.00017973085050471127}, {"id": 1051, "seek": 294364, "start": 2949.64, "end": 2951.54, "text": " And that gives you the derivatives", "tokens": [50665, 400, 300, 2709, 291, 264, 33733, 50760], "temperature": 0.0, "avg_logprob": -0.08935685591264204, "compression_ratio": 1.852017937219731, "no_speech_prob": 0.00017973085050471127}, {"id": 1052, "seek": 294364, "start": 2951.64, "end": 2953.54, "text": " for c.", "tokens": [50765, 337, 269, 13, 50860], "temperature": 0.0, "avg_logprob": -0.08935685591264204, "compression_ratio": 1.852017937219731, "no_speech_prob": 0.00017973085050471127}, {"id": 1053, "seek": 294364, "start": 2953.64, "end": 2955.54, "text": " So, long story short,", "tokens": [50865, 407, 11, 938, 1657, 2099, 11, 50960], "temperature": 0.0, "avg_logprob": -0.08935685591264204, "compression_ratio": 1.852017937219731, "no_speech_prob": 0.00017973085050471127}, {"id": 1054, "seek": 294364, "start": 2955.64, "end": 2957.54, "text": " the backward pass of a matrix", "tokens": [50965, 264, 23897, 1320, 295, 257, 8141, 51060], "temperature": 0.0, "avg_logprob": -0.08935685591264204, "compression_ratio": 1.852017937219731, "no_speech_prob": 0.00017973085050471127}, {"id": 1055, "seek": 294364, "start": 2957.64, "end": 2959.54, "text": " multiply is a matrix multiply.", "tokens": [51065, 12972, 307, 257, 8141, 12972, 13, 51160], "temperature": 0.0, "avg_logprob": -0.08935685591264204, "compression_ratio": 1.852017937219731, "no_speech_prob": 0.00017973085050471127}, {"id": 1056, "seek": 294364, "start": 2959.64, "end": 2961.54, "text": " And instead of, just like we had", "tokens": [51165, 400, 2602, 295, 11, 445, 411, 321, 632, 51260], "temperature": 0.0, "avg_logprob": -0.08935685591264204, "compression_ratio": 1.852017937219731, "no_speech_prob": 0.00017973085050471127}, {"id": 1057, "seek": 294364, "start": 2961.64, "end": 2963.54, "text": " d equals a times b plus c,", "tokens": [51265, 274, 6915, 257, 1413, 272, 1804, 269, 11, 51360], "temperature": 0.0, "avg_logprob": -0.08935685591264204, "compression_ratio": 1.852017937219731, "no_speech_prob": 0.00017973085050471127}, {"id": 1058, "seek": 294364, "start": 2963.64, "end": 2965.54, "text": " in a scalar case,", "tokens": [51365, 294, 257, 39684, 1389, 11, 51460], "temperature": 0.0, "avg_logprob": -0.08935685591264204, "compression_ratio": 1.852017937219731, "no_speech_prob": 0.00017973085050471127}, {"id": 1059, "seek": 294364, "start": 2965.64, "end": 2967.54, "text": " we sort of arrive at something very, very similar", "tokens": [51465, 321, 1333, 295, 8881, 412, 746, 588, 11, 588, 2531, 51560], "temperature": 0.0, "avg_logprob": -0.08935685591264204, "compression_ratio": 1.852017937219731, "no_speech_prob": 0.00017973085050471127}, {"id": 1060, "seek": 294364, "start": 2967.64, "end": 2969.54, "text": " but now with a matrix multiplication", "tokens": [51565, 457, 586, 365, 257, 8141, 27290, 51660], "temperature": 0.0, "avg_logprob": -0.08935685591264204, "compression_ratio": 1.852017937219731, "no_speech_prob": 0.00017973085050471127}, {"id": 1061, "seek": 294364, "start": 2969.64, "end": 2971.54, "text": " instead of a scalar multiplication.", "tokens": [51665, 2602, 295, 257, 39684, 27290, 13, 51760], "temperature": 0.0, "avg_logprob": -0.08935685591264204, "compression_ratio": 1.852017937219731, "no_speech_prob": 0.00017973085050471127}, {"id": 1062, "seek": 294364, "start": 2971.64, "end": 2973.54, "text": " So, the derivative", "tokens": [51765, 407, 11, 264, 13760, 51860], "temperature": 0.0, "avg_logprob": -0.08935685591264204, "compression_ratio": 1.852017937219731, "no_speech_prob": 0.00017973085050471127}, {"id": 1063, "seek": 297354, "start": 2973.54, "end": 2975.44, "text": " of d with respect", "tokens": [50365, 295, 274, 365, 3104, 50460], "temperature": 0.0, "avg_logprob": -0.08432842601429333, "compression_ratio": 1.673170731707317, "no_speech_prob": 0.0003338630485814065}, {"id": 1064, "seek": 297354, "start": 2975.54, "end": 2977.44, "text": " to a is", "tokens": [50465, 281, 257, 307, 50560], "temperature": 0.0, "avg_logprob": -0.08432842601429333, "compression_ratio": 1.673170731707317, "no_speech_prob": 0.0003338630485814065}, {"id": 1065, "seek": 297354, "start": 2977.54, "end": 2979.44, "text": " dl by dd", "tokens": [50565, 37873, 538, 274, 67, 50660], "temperature": 0.0, "avg_logprob": -0.08432842601429333, "compression_ratio": 1.673170731707317, "no_speech_prob": 0.0003338630485814065}, {"id": 1066, "seek": 297354, "start": 2979.54, "end": 2981.44, "text": " matrix multiply b transpose", "tokens": [50665, 8141, 12972, 272, 25167, 50760], "temperature": 0.0, "avg_logprob": -0.08432842601429333, "compression_ratio": 1.673170731707317, "no_speech_prob": 0.0003338630485814065}, {"id": 1067, "seek": 297354, "start": 2981.54, "end": 2983.44, "text": " and here it's a transpose", "tokens": [50765, 293, 510, 309, 311, 257, 25167, 50860], "temperature": 0.0, "avg_logprob": -0.08432842601429333, "compression_ratio": 1.673170731707317, "no_speech_prob": 0.0003338630485814065}, {"id": 1068, "seek": 297354, "start": 2983.54, "end": 2985.44, "text": " multiply dl by dd. But in both", "tokens": [50865, 12972, 37873, 538, 274, 67, 13, 583, 294, 1293, 50960], "temperature": 0.0, "avg_logprob": -0.08432842601429333, "compression_ratio": 1.673170731707317, "no_speech_prob": 0.0003338630485814065}, {"id": 1069, "seek": 297354, "start": 2985.54, "end": 2987.44, "text": " cases it's a matrix multiplication with", "tokens": [50965, 3331, 309, 311, 257, 8141, 27290, 365, 51060], "temperature": 0.0, "avg_logprob": -0.08432842601429333, "compression_ratio": 1.673170731707317, "no_speech_prob": 0.0003338630485814065}, {"id": 1070, "seek": 297354, "start": 2987.54, "end": 2989.44, "text": " the derivative and", "tokens": [51065, 264, 13760, 293, 51160], "temperature": 0.0, "avg_logprob": -0.08432842601429333, "compression_ratio": 1.673170731707317, "no_speech_prob": 0.0003338630485814065}, {"id": 1071, "seek": 297354, "start": 2989.54, "end": 2991.44, "text": " the other term in the", "tokens": [51165, 264, 661, 1433, 294, 264, 51260], "temperature": 0.0, "avg_logprob": -0.08432842601429333, "compression_ratio": 1.673170731707317, "no_speech_prob": 0.0003338630485814065}, {"id": 1072, "seek": 297354, "start": 2991.54, "end": 2993.44, "text": " multiplication. And", "tokens": [51265, 27290, 13, 400, 51360], "temperature": 0.0, "avg_logprob": -0.08432842601429333, "compression_ratio": 1.673170731707317, "no_speech_prob": 0.0003338630485814065}, {"id": 1073, "seek": 297354, "start": 2993.54, "end": 2995.44, "text": " for c it is a sum.", "tokens": [51365, 337, 269, 309, 307, 257, 2408, 13, 51460], "temperature": 0.0, "avg_logprob": -0.08432842601429333, "compression_ratio": 1.673170731707317, "no_speech_prob": 0.0003338630485814065}, {"id": 1074, "seek": 297354, "start": 2995.54, "end": 2997.44, "text": " Now, I'll tell you a secret.", "tokens": [51465, 823, 11, 286, 603, 980, 291, 257, 4054, 13, 51560], "temperature": 0.0, "avg_logprob": -0.08432842601429333, "compression_ratio": 1.673170731707317, "no_speech_prob": 0.0003338630485814065}, {"id": 1075, "seek": 297354, "start": 2997.54, "end": 2999.44, "text": " I can never remember the formulas", "tokens": [51565, 286, 393, 1128, 1604, 264, 30546, 51660], "temperature": 0.0, "avg_logprob": -0.08432842601429333, "compression_ratio": 1.673170731707317, "no_speech_prob": 0.0003338630485814065}, {"id": 1076, "seek": 297354, "start": 2999.54, "end": 3001.44, "text": " that we just derived for backpropagating", "tokens": [51665, 300, 321, 445, 18949, 337, 646, 79, 1513, 559, 990, 51760], "temperature": 0.0, "avg_logprob": -0.08432842601429333, "compression_ratio": 1.673170731707317, "no_speech_prob": 0.0003338630485814065}, {"id": 1077, "seek": 300144, "start": 3001.44, "end": 3003.34, "text": " a matrix multiplication and I can backpropagate", "tokens": [50365, 257, 8141, 27290, 293, 286, 393, 646, 79, 1513, 559, 473, 50460], "temperature": 0.0, "avg_logprob": -0.06183817075646442, "compression_ratio": 1.7186311787072244, "no_speech_prob": 0.000946650980040431}, {"id": 1078, "seek": 300144, "start": 3003.44, "end": 3005.34, "text": " through these expressions just fine.", "tokens": [50465, 807, 613, 15277, 445, 2489, 13, 50560], "temperature": 0.0, "avg_logprob": -0.06183817075646442, "compression_ratio": 1.7186311787072244, "no_speech_prob": 0.000946650980040431}, {"id": 1079, "seek": 300144, "start": 3005.44, "end": 3007.34, "text": " And the reason this works is because", "tokens": [50565, 400, 264, 1778, 341, 1985, 307, 570, 50660], "temperature": 0.0, "avg_logprob": -0.06183817075646442, "compression_ratio": 1.7186311787072244, "no_speech_prob": 0.000946650980040431}, {"id": 1080, "seek": 300144, "start": 3007.44, "end": 3009.34, "text": " the dimensions have to work out.", "tokens": [50665, 264, 12819, 362, 281, 589, 484, 13, 50760], "temperature": 0.0, "avg_logprob": -0.06183817075646442, "compression_ratio": 1.7186311787072244, "no_speech_prob": 0.000946650980040431}, {"id": 1081, "seek": 300144, "start": 3009.44, "end": 3011.34, "text": " So, let me give you an example.", "tokens": [50765, 407, 11, 718, 385, 976, 291, 364, 1365, 13, 50860], "temperature": 0.0, "avg_logprob": -0.06183817075646442, "compression_ratio": 1.7186311787072244, "no_speech_prob": 0.000946650980040431}, {"id": 1082, "seek": 300144, "start": 3011.44, "end": 3013.34, "text": " Say I want to create dh.", "tokens": [50865, 6463, 286, 528, 281, 1884, 274, 71, 13, 50960], "temperature": 0.0, "avg_logprob": -0.06183817075646442, "compression_ratio": 1.7186311787072244, "no_speech_prob": 0.000946650980040431}, {"id": 1083, "seek": 300144, "start": 3013.44, "end": 3015.34, "text": " Then what should dh be?", "tokens": [50965, 1396, 437, 820, 274, 71, 312, 30, 51060], "temperature": 0.0, "avg_logprob": -0.06183817075646442, "compression_ratio": 1.7186311787072244, "no_speech_prob": 0.000946650980040431}, {"id": 1084, "seek": 300144, "start": 3015.44, "end": 3017.34, "text": " Number one, I have to know that", "tokens": [51065, 5118, 472, 11, 286, 362, 281, 458, 300, 51160], "temperature": 0.0, "avg_logprob": -0.06183817075646442, "compression_ratio": 1.7186311787072244, "no_speech_prob": 0.000946650980040431}, {"id": 1085, "seek": 300144, "start": 3017.44, "end": 3019.34, "text": " the shape of dh must be the same", "tokens": [51165, 264, 3909, 295, 274, 71, 1633, 312, 264, 912, 51260], "temperature": 0.0, "avg_logprob": -0.06183817075646442, "compression_ratio": 1.7186311787072244, "no_speech_prob": 0.000946650980040431}, {"id": 1086, "seek": 300144, "start": 3019.44, "end": 3021.34, "text": " as the shape of h.", "tokens": [51265, 382, 264, 3909, 295, 276, 13, 51360], "temperature": 0.0, "avg_logprob": -0.06183817075646442, "compression_ratio": 1.7186311787072244, "no_speech_prob": 0.000946650980040431}, {"id": 1087, "seek": 300144, "start": 3021.44, "end": 3023.34, "text": " And the shape of h is 32 by 64.", "tokens": [51365, 400, 264, 3909, 295, 276, 307, 8858, 538, 12145, 13, 51460], "temperature": 0.0, "avg_logprob": -0.06183817075646442, "compression_ratio": 1.7186311787072244, "no_speech_prob": 0.000946650980040431}, {"id": 1088, "seek": 300144, "start": 3023.44, "end": 3025.34, "text": " And then the other piece of information I know", "tokens": [51465, 400, 550, 264, 661, 2522, 295, 1589, 286, 458, 51560], "temperature": 0.0, "avg_logprob": -0.06183817075646442, "compression_ratio": 1.7186311787072244, "no_speech_prob": 0.000946650980040431}, {"id": 1089, "seek": 300144, "start": 3025.44, "end": 3027.34, "text": " is that dh", "tokens": [51565, 307, 300, 274, 71, 51660], "temperature": 0.0, "avg_logprob": -0.06183817075646442, "compression_ratio": 1.7186311787072244, "no_speech_prob": 0.000946650980040431}, {"id": 1090, "seek": 300144, "start": 3027.44, "end": 3029.34, "text": " must be some kind of matrix multiplication", "tokens": [51665, 1633, 312, 512, 733, 295, 8141, 27290, 51760], "temperature": 0.0, "avg_logprob": -0.06183817075646442, "compression_ratio": 1.7186311787072244, "no_speech_prob": 0.000946650980040431}, {"id": 1091, "seek": 302934, "start": 3029.34, "end": 3031.2400000000002, "text": " of dlogits with w2.", "tokens": [50365, 295, 274, 4987, 1208, 365, 261, 17, 13, 50460], "temperature": 0.0, "avg_logprob": -0.06931232481963875, "compression_ratio": 1.6830357142857142, "no_speech_prob": 0.0003950179379899055}, {"id": 1092, "seek": 302934, "start": 3031.34, "end": 3033.2400000000002, "text": " And dlogits", "tokens": [50465, 400, 274, 4987, 1208, 50560], "temperature": 0.0, "avg_logprob": -0.06931232481963875, "compression_ratio": 1.6830357142857142, "no_speech_prob": 0.0003950179379899055}, {"id": 1093, "seek": 302934, "start": 3033.34, "end": 3035.2400000000002, "text": " is 32 by 27", "tokens": [50565, 307, 8858, 538, 7634, 50660], "temperature": 0.0, "avg_logprob": -0.06931232481963875, "compression_ratio": 1.6830357142857142, "no_speech_prob": 0.0003950179379899055}, {"id": 1094, "seek": 302934, "start": 3035.34, "end": 3037.2400000000002, "text": " and w2 is", "tokens": [50665, 293, 261, 17, 307, 50760], "temperature": 0.0, "avg_logprob": -0.06931232481963875, "compression_ratio": 1.6830357142857142, "no_speech_prob": 0.0003950179379899055}, {"id": 1095, "seek": 302934, "start": 3037.34, "end": 3039.2400000000002, "text": " 64 by 27. There is only", "tokens": [50765, 12145, 538, 7634, 13, 821, 307, 787, 50860], "temperature": 0.0, "avg_logprob": -0.06931232481963875, "compression_ratio": 1.6830357142857142, "no_speech_prob": 0.0003950179379899055}, {"id": 1096, "seek": 302934, "start": 3039.34, "end": 3041.2400000000002, "text": " a single way to make the shape work out", "tokens": [50865, 257, 2167, 636, 281, 652, 264, 3909, 589, 484, 50960], "temperature": 0.0, "avg_logprob": -0.06931232481963875, "compression_ratio": 1.6830357142857142, "no_speech_prob": 0.0003950179379899055}, {"id": 1097, "seek": 302934, "start": 3041.34, "end": 3043.2400000000002, "text": " in this case", "tokens": [50965, 294, 341, 1389, 51060], "temperature": 0.0, "avg_logprob": -0.06931232481963875, "compression_ratio": 1.6830357142857142, "no_speech_prob": 0.0003950179379899055}, {"id": 1098, "seek": 302934, "start": 3043.34, "end": 3045.2400000000002, "text": " and it is indeed the correct result.", "tokens": [51065, 293, 309, 307, 6451, 264, 3006, 1874, 13, 51160], "temperature": 0.0, "avg_logprob": -0.06931232481963875, "compression_ratio": 1.6830357142857142, "no_speech_prob": 0.0003950179379899055}, {"id": 1099, "seek": 302934, "start": 3045.34, "end": 3047.2400000000002, "text": " In particular here, h", "tokens": [51165, 682, 1729, 510, 11, 276, 51260], "temperature": 0.0, "avg_logprob": -0.06931232481963875, "compression_ratio": 1.6830357142857142, "no_speech_prob": 0.0003950179379899055}, {"id": 1100, "seek": 302934, "start": 3047.34, "end": 3049.2400000000002, "text": " needs to be 32 by 64. The only", "tokens": [51265, 2203, 281, 312, 8858, 538, 12145, 13, 440, 787, 51360], "temperature": 0.0, "avg_logprob": -0.06931232481963875, "compression_ratio": 1.6830357142857142, "no_speech_prob": 0.0003950179379899055}, {"id": 1101, "seek": 302934, "start": 3049.34, "end": 3051.2400000000002, "text": " way to achieve that is to take dlogits", "tokens": [51365, 636, 281, 4584, 300, 307, 281, 747, 274, 4987, 1208, 51460], "temperature": 0.0, "avg_logprob": -0.06931232481963875, "compression_ratio": 1.6830357142857142, "no_speech_prob": 0.0003950179379899055}, {"id": 1102, "seek": 302934, "start": 3051.34, "end": 3053.2400000000002, "text": " and matrix", "tokens": [51465, 293, 8141, 51560], "temperature": 0.0, "avg_logprob": -0.06931232481963875, "compression_ratio": 1.6830357142857142, "no_speech_prob": 0.0003950179379899055}, {"id": 1103, "seek": 302934, "start": 3053.34, "end": 3055.2400000000002, "text": " multiply it with", "tokens": [51565, 12972, 309, 365, 51660], "temperature": 0.0, "avg_logprob": -0.06931232481963875, "compression_ratio": 1.6830357142857142, "no_speech_prob": 0.0003950179379899055}, {"id": 1104, "seek": 302934, "start": 3055.34, "end": 3057.2400000000002, "text": " you see how I have to take w2 but I have to", "tokens": [51665, 291, 536, 577, 286, 362, 281, 747, 261, 17, 457, 286, 362, 281, 51760], "temperature": 0.0, "avg_logprob": -0.06931232481963875, "compression_ratio": 1.6830357142857142, "no_speech_prob": 0.0003950179379899055}, {"id": 1105, "seek": 302934, "start": 3057.34, "end": 3059.2400000000002, "text": " transpose it to make the dimensions work out.", "tokens": [51765, 25167, 309, 281, 652, 264, 12819, 589, 484, 13, 51860], "temperature": 0.0, "avg_logprob": -0.06931232481963875, "compression_ratio": 1.6830357142857142, "no_speech_prob": 0.0003950179379899055}, {"id": 1106, "seek": 305934, "start": 3059.34, "end": 3061.2400000000002, "text": " So w2 transpose.", "tokens": [50365, 407, 261, 17, 25167, 13, 50460], "temperature": 0.0, "avg_logprob": -0.09187921120302521, "compression_ratio": 1.8448275862068966, "no_speech_prob": 0.0018182062776759267}, {"id": 1107, "seek": 305934, "start": 3061.34, "end": 3063.2400000000002, "text": " And it is the only way to make these", "tokens": [50465, 400, 309, 307, 264, 787, 636, 281, 652, 613, 50560], "temperature": 0.0, "avg_logprob": -0.09187921120302521, "compression_ratio": 1.8448275862068966, "no_speech_prob": 0.0018182062776759267}, {"id": 1108, "seek": 305934, "start": 3063.34, "end": 3065.2400000000002, "text": " to matrix multiply those two pieces", "tokens": [50565, 281, 8141, 12972, 729, 732, 3755, 50660], "temperature": 0.0, "avg_logprob": -0.09187921120302521, "compression_ratio": 1.8448275862068966, "no_speech_prob": 0.0018182062776759267}, {"id": 1109, "seek": 305934, "start": 3065.34, "end": 3067.2400000000002, "text": " to make the shapes work out. And that turns out", "tokens": [50665, 281, 652, 264, 10854, 589, 484, 13, 400, 300, 4523, 484, 50760], "temperature": 0.0, "avg_logprob": -0.09187921120302521, "compression_ratio": 1.8448275862068966, "no_speech_prob": 0.0018182062776759267}, {"id": 1110, "seek": 305934, "start": 3067.34, "end": 3069.2400000000002, "text": " to be the correct formula. So if we come", "tokens": [50765, 281, 312, 264, 3006, 8513, 13, 407, 498, 321, 808, 50860], "temperature": 0.0, "avg_logprob": -0.09187921120302521, "compression_ratio": 1.8448275862068966, "no_speech_prob": 0.0018182062776759267}, {"id": 1111, "seek": 305934, "start": 3069.34, "end": 3071.2400000000002, "text": " here, we want", "tokens": [50865, 510, 11, 321, 528, 50960], "temperature": 0.0, "avg_logprob": -0.09187921120302521, "compression_ratio": 1.8448275862068966, "no_speech_prob": 0.0018182062776759267}, {"id": 1112, "seek": 305934, "start": 3071.34, "end": 3073.2400000000002, "text": " dh which is da and we see", "tokens": [50965, 274, 71, 597, 307, 1120, 293, 321, 536, 51060], "temperature": 0.0, "avg_logprob": -0.09187921120302521, "compression_ratio": 1.8448275862068966, "no_speech_prob": 0.0018182062776759267}, {"id": 1113, "seek": 305934, "start": 3073.34, "end": 3075.2400000000002, "text": " that da is dl by", "tokens": [51065, 300, 1120, 307, 37873, 538, 51160], "temperature": 0.0, "avg_logprob": -0.09187921120302521, "compression_ratio": 1.8448275862068966, "no_speech_prob": 0.0018182062776759267}, {"id": 1114, "seek": 305934, "start": 3075.34, "end": 3077.2400000000002, "text": " dd matrix multiply b transpose.", "tokens": [51165, 274, 67, 8141, 12972, 272, 25167, 13, 51260], "temperature": 0.0, "avg_logprob": -0.09187921120302521, "compression_ratio": 1.8448275862068966, "no_speech_prob": 0.0018182062776759267}, {"id": 1115, "seek": 305934, "start": 3077.34, "end": 3079.2400000000002, "text": " So that is dlogits multiply", "tokens": [51265, 407, 300, 307, 274, 4987, 1208, 12972, 51360], "temperature": 0.0, "avg_logprob": -0.09187921120302521, "compression_ratio": 1.8448275862068966, "no_speech_prob": 0.0018182062776759267}, {"id": 1116, "seek": 305934, "start": 3079.34, "end": 3081.2400000000002, "text": " and b is w2.", "tokens": [51365, 293, 272, 307, 261, 17, 13, 51460], "temperature": 0.0, "avg_logprob": -0.09187921120302521, "compression_ratio": 1.8448275862068966, "no_speech_prob": 0.0018182062776759267}, {"id": 1117, "seek": 305934, "start": 3081.34, "end": 3083.2400000000002, "text": " So w2 transpose which is exactly", "tokens": [51465, 407, 261, 17, 25167, 597, 307, 2293, 51560], "temperature": 0.0, "avg_logprob": -0.09187921120302521, "compression_ratio": 1.8448275862068966, "no_speech_prob": 0.0018182062776759267}, {"id": 1118, "seek": 305934, "start": 3083.34, "end": 3085.2400000000002, "text": " what we have here. So", "tokens": [51565, 437, 321, 362, 510, 13, 407, 51660], "temperature": 0.0, "avg_logprob": -0.09187921120302521, "compression_ratio": 1.8448275862068966, "no_speech_prob": 0.0018182062776759267}, {"id": 1119, "seek": 305934, "start": 3085.34, "end": 3087.2400000000002, "text": " there is no need to remember these formulas.", "tokens": [51665, 456, 307, 572, 643, 281, 1604, 613, 30546, 13, 51760], "temperature": 0.0, "avg_logprob": -0.09187921120302521, "compression_ratio": 1.8448275862068966, "no_speech_prob": 0.0018182062776759267}, {"id": 1120, "seek": 305934, "start": 3087.34, "end": 3089.2400000000002, "text": " Similarly, now if I", "tokens": [51765, 13157, 11, 586, 498, 286, 51860], "temperature": 0.0, "avg_logprob": -0.09187921120302521, "compression_ratio": 1.8448275862068966, "no_speech_prob": 0.0018182062776759267}, {"id": 1121, "seek": 308924, "start": 3089.24, "end": 3091.14, "text": " want dw2", "tokens": [50365, 528, 27379, 17, 50460], "temperature": 0.0, "avg_logprob": -0.07275319671630859, "compression_ratio": 1.819905213270142, "no_speech_prob": 0.00035956056672148407}, {"id": 1122, "seek": 308924, "start": 3091.24, "end": 3093.14, "text": " well I know that it must be a matrix", "tokens": [50465, 731, 286, 458, 300, 309, 1633, 312, 257, 8141, 50560], "temperature": 0.0, "avg_logprob": -0.07275319671630859, "compression_ratio": 1.819905213270142, "no_speech_prob": 0.00035956056672148407}, {"id": 1123, "seek": 308924, "start": 3093.24, "end": 3095.14, "text": " multiplication of", "tokens": [50565, 27290, 295, 50660], "temperature": 0.0, "avg_logprob": -0.07275319671630859, "compression_ratio": 1.819905213270142, "no_speech_prob": 0.00035956056672148407}, {"id": 1124, "seek": 308924, "start": 3095.24, "end": 3097.14, "text": " dlogits and h", "tokens": [50665, 274, 4987, 1208, 293, 276, 50760], "temperature": 0.0, "avg_logprob": -0.07275319671630859, "compression_ratio": 1.819905213270142, "no_speech_prob": 0.00035956056672148407}, {"id": 1125, "seek": 308924, "start": 3097.24, "end": 3099.14, "text": " and maybe there is a few transpose", "tokens": [50765, 293, 1310, 456, 307, 257, 1326, 25167, 50860], "temperature": 0.0, "avg_logprob": -0.07275319671630859, "compression_ratio": 1.819905213270142, "no_speech_prob": 0.00035956056672148407}, {"id": 1126, "seek": 308924, "start": 3099.24, "end": 3101.14, "text": " like there is one transpose in there as well.", "tokens": [50865, 411, 456, 307, 472, 25167, 294, 456, 382, 731, 13, 50960], "temperature": 0.0, "avg_logprob": -0.07275319671630859, "compression_ratio": 1.819905213270142, "no_speech_prob": 0.00035956056672148407}, {"id": 1127, "seek": 308924, "start": 3101.24, "end": 3103.14, "text": " And I don't know which way it is so I have to come to w2", "tokens": [50965, 400, 286, 500, 380, 458, 597, 636, 309, 307, 370, 286, 362, 281, 808, 281, 261, 17, 51060], "temperature": 0.0, "avg_logprob": -0.07275319671630859, "compression_ratio": 1.819905213270142, "no_speech_prob": 0.00035956056672148407}, {"id": 1128, "seek": 308924, "start": 3103.24, "end": 3105.14, "text": " and I see that its shape is", "tokens": [51065, 293, 286, 536, 300, 1080, 3909, 307, 51160], "temperature": 0.0, "avg_logprob": -0.07275319671630859, "compression_ratio": 1.819905213270142, "no_speech_prob": 0.00035956056672148407}, {"id": 1129, "seek": 308924, "start": 3105.24, "end": 3107.14, "text": " 64 by 27", "tokens": [51165, 12145, 538, 7634, 51260], "temperature": 0.0, "avg_logprob": -0.07275319671630859, "compression_ratio": 1.819905213270142, "no_speech_prob": 0.00035956056672148407}, {"id": 1130, "seek": 308924, "start": 3107.24, "end": 3109.14, "text": " and that has to come from some matrix", "tokens": [51265, 293, 300, 575, 281, 808, 490, 512, 8141, 51360], "temperature": 0.0, "avg_logprob": -0.07275319671630859, "compression_ratio": 1.819905213270142, "no_speech_prob": 0.00035956056672148407}, {"id": 1131, "seek": 308924, "start": 3109.24, "end": 3111.14, "text": " multiplication of these two.", "tokens": [51365, 27290, 295, 613, 732, 13, 51460], "temperature": 0.0, "avg_logprob": -0.07275319671630859, "compression_ratio": 1.819905213270142, "no_speech_prob": 0.00035956056672148407}, {"id": 1132, "seek": 308924, "start": 3111.24, "end": 3113.14, "text": " And so to get a 64 by 27", "tokens": [51465, 400, 370, 281, 483, 257, 12145, 538, 7634, 51560], "temperature": 0.0, "avg_logprob": -0.07275319671630859, "compression_ratio": 1.819905213270142, "no_speech_prob": 0.00035956056672148407}, {"id": 1133, "seek": 308924, "start": 3113.24, "end": 3115.14, "text": " I need to take", "tokens": [51565, 286, 643, 281, 747, 51660], "temperature": 0.0, "avg_logprob": -0.07275319671630859, "compression_ratio": 1.819905213270142, "no_speech_prob": 0.00035956056672148407}, {"id": 1134, "seek": 308924, "start": 3115.24, "end": 3117.14, "text": " h", "tokens": [51665, 276, 51760], "temperature": 0.0, "avg_logprob": -0.07275319671630859, "compression_ratio": 1.819905213270142, "no_speech_prob": 0.00035956056672148407}, {"id": 1135, "seek": 308924, "start": 3117.24, "end": 3119.14, "text": " I need to transpose it", "tokens": [51765, 286, 643, 281, 25167, 309, 51860], "temperature": 0.0, "avg_logprob": -0.07275319671630859, "compression_ratio": 1.819905213270142, "no_speech_prob": 0.00035956056672148407}, {"id": 1136, "seek": 311924, "start": 3119.24, "end": 3121.14, "text": " and then I need to matrix multiply it", "tokens": [50365, 293, 550, 286, 643, 281, 8141, 12972, 309, 50460], "temperature": 0.0, "avg_logprob": -0.08232022638190283, "compression_ratio": 1.9487179487179487, "no_speech_prob": 0.00017550555639900267}, {"id": 1137, "seek": 311924, "start": 3121.24, "end": 3123.14, "text": " so that will become 64 by 32", "tokens": [50465, 370, 300, 486, 1813, 12145, 538, 8858, 50560], "temperature": 0.0, "avg_logprob": -0.08232022638190283, "compression_ratio": 1.9487179487179487, "no_speech_prob": 0.00017550555639900267}, {"id": 1138, "seek": 311924, "start": 3123.24, "end": 3125.14, "text": " and then I need to matrix multiply it with", "tokens": [50565, 293, 550, 286, 643, 281, 8141, 12972, 309, 365, 50660], "temperature": 0.0, "avg_logprob": -0.08232022638190283, "compression_ratio": 1.9487179487179487, "no_speech_prob": 0.00017550555639900267}, {"id": 1139, "seek": 311924, "start": 3125.24, "end": 3127.14, "text": " 32 by 27 and that's going to give me", "tokens": [50665, 8858, 538, 7634, 293, 300, 311, 516, 281, 976, 385, 50760], "temperature": 0.0, "avg_logprob": -0.08232022638190283, "compression_ratio": 1.9487179487179487, "no_speech_prob": 0.00017550555639900267}, {"id": 1140, "seek": 311924, "start": 3127.24, "end": 3129.14, "text": " a 64 by 27. So I need", "tokens": [50765, 257, 12145, 538, 7634, 13, 407, 286, 643, 50860], "temperature": 0.0, "avg_logprob": -0.08232022638190283, "compression_ratio": 1.9487179487179487, "no_speech_prob": 0.00017550555639900267}, {"id": 1141, "seek": 311924, "start": 3129.24, "end": 3131.14, "text": " to matrix multiply this with dlogits.shape", "tokens": [50865, 281, 8141, 12972, 341, 365, 274, 4987, 1208, 13, 82, 42406, 50960], "temperature": 0.0, "avg_logprob": -0.08232022638190283, "compression_ratio": 1.9487179487179487, "no_speech_prob": 0.00017550555639900267}, {"id": 1142, "seek": 311924, "start": 3131.24, "end": 3133.14, "text": " just like that. That's the only way", "tokens": [50965, 445, 411, 300, 13, 663, 311, 264, 787, 636, 51060], "temperature": 0.0, "avg_logprob": -0.08232022638190283, "compression_ratio": 1.9487179487179487, "no_speech_prob": 0.00017550555639900267}, {"id": 1143, "seek": 311924, "start": 3133.24, "end": 3135.14, "text": " to make the dimensions work out and", "tokens": [51065, 281, 652, 264, 12819, 589, 484, 293, 51160], "temperature": 0.0, "avg_logprob": -0.08232022638190283, "compression_ratio": 1.9487179487179487, "no_speech_prob": 0.00017550555639900267}, {"id": 1144, "seek": 311924, "start": 3135.24, "end": 3137.14, "text": " just use matrix multiplication.", "tokens": [51165, 445, 764, 8141, 27290, 13, 51260], "temperature": 0.0, "avg_logprob": -0.08232022638190283, "compression_ratio": 1.9487179487179487, "no_speech_prob": 0.00017550555639900267}, {"id": 1145, "seek": 311924, "start": 3137.24, "end": 3139.14, "text": " And if we come here, we see that", "tokens": [51265, 400, 498, 321, 808, 510, 11, 321, 536, 300, 51360], "temperature": 0.0, "avg_logprob": -0.08232022638190283, "compression_ratio": 1.9487179487179487, "no_speech_prob": 0.00017550555639900267}, {"id": 1146, "seek": 311924, "start": 3139.24, "end": 3141.14, "text": " that's exactly what's here. So a transpose", "tokens": [51365, 300, 311, 2293, 437, 311, 510, 13, 407, 257, 25167, 51460], "temperature": 0.0, "avg_logprob": -0.08232022638190283, "compression_ratio": 1.9487179487179487, "no_speech_prob": 0.00017550555639900267}, {"id": 1147, "seek": 311924, "start": 3141.24, "end": 3143.14, "text": " a for us is h", "tokens": [51465, 257, 337, 505, 307, 276, 51560], "temperature": 0.0, "avg_logprob": -0.08232022638190283, "compression_ratio": 1.9487179487179487, "no_speech_prob": 0.00017550555639900267}, {"id": 1148, "seek": 311924, "start": 3143.24, "end": 3145.14, "text": " multiplied with dlogits.", "tokens": [51565, 17207, 365, 274, 4987, 1208, 13, 51660], "temperature": 0.0, "avg_logprob": -0.08232022638190283, "compression_ratio": 1.9487179487179487, "no_speech_prob": 0.00017550555639900267}, {"id": 1149, "seek": 311924, "start": 3145.24, "end": 3147.14, "text": " So that's w2", "tokens": [51665, 407, 300, 311, 261, 17, 51760], "temperature": 0.0, "avg_logprob": -0.08232022638190283, "compression_ratio": 1.9487179487179487, "no_speech_prob": 0.00017550555639900267}, {"id": 1150, "seek": 311924, "start": 3147.24, "end": 3149.14, "text": " and then db2", "tokens": [51765, 293, 550, 274, 65, 17, 51860], "temperature": 0.0, "avg_logprob": -0.08232022638190283, "compression_ratio": 1.9487179487179487, "no_speech_prob": 0.00017550555639900267}, {"id": 1151, "seek": 314914, "start": 3149.14, "end": 3151.04, "text": " is just", "tokens": [50365, 307, 445, 50460], "temperature": 0.0, "avg_logprob": -0.09660089622109623, "compression_ratio": 1.6651162790697673, "no_speech_prob": 0.0007162443362176418}, {"id": 1152, "seek": 314914, "start": 3151.14, "end": 3153.04, "text": " the", "tokens": [50465, 264, 50560], "temperature": 0.0, "avg_logprob": -0.09660089622109623, "compression_ratio": 1.6651162790697673, "no_speech_prob": 0.0007162443362176418}, {"id": 1153, "seek": 314914, "start": 3153.14, "end": 3155.04, "text": " vertical sum and actually", "tokens": [50565, 9429, 2408, 293, 767, 50660], "temperature": 0.0, "avg_logprob": -0.09660089622109623, "compression_ratio": 1.6651162790697673, "no_speech_prob": 0.0007162443362176418}, {"id": 1154, "seek": 314914, "start": 3155.14, "end": 3157.04, "text": " in the same way, there's only one way to make", "tokens": [50665, 294, 264, 912, 636, 11, 456, 311, 787, 472, 636, 281, 652, 50760], "temperature": 0.0, "avg_logprob": -0.09660089622109623, "compression_ratio": 1.6651162790697673, "no_speech_prob": 0.0007162443362176418}, {"id": 1155, "seek": 314914, "start": 3157.14, "end": 3159.04, "text": " the shapes work out. I don't have to remember that", "tokens": [50765, 264, 10854, 589, 484, 13, 286, 500, 380, 362, 281, 1604, 300, 50860], "temperature": 0.0, "avg_logprob": -0.09660089622109623, "compression_ratio": 1.6651162790697673, "no_speech_prob": 0.0007162443362176418}, {"id": 1156, "seek": 314914, "start": 3159.14, "end": 3161.04, "text": " it's a vertical sum along the 0th axis", "tokens": [50865, 309, 311, 257, 9429, 2408, 2051, 264, 1958, 392, 10298, 50960], "temperature": 0.0, "avg_logprob": -0.09660089622109623, "compression_ratio": 1.6651162790697673, "no_speech_prob": 0.0007162443362176418}, {"id": 1157, "seek": 314914, "start": 3161.14, "end": 3163.04, "text": " because that's the only way that this makes sense.", "tokens": [50965, 570, 300, 311, 264, 787, 636, 300, 341, 1669, 2020, 13, 51060], "temperature": 0.0, "avg_logprob": -0.09660089622109623, "compression_ratio": 1.6651162790697673, "no_speech_prob": 0.0007162443362176418}, {"id": 1158, "seek": 314914, "start": 3163.14, "end": 3165.04, "text": " Because b2's shape is 27", "tokens": [51065, 1436, 272, 17, 311, 3909, 307, 7634, 51160], "temperature": 0.0, "avg_logprob": -0.09660089622109623, "compression_ratio": 1.6651162790697673, "no_speech_prob": 0.0007162443362176418}, {"id": 1159, "seek": 314914, "start": 3165.14, "end": 3167.04, "text": " so in order to get", "tokens": [51165, 370, 294, 1668, 281, 483, 51260], "temperature": 0.0, "avg_logprob": -0.09660089622109623, "compression_ratio": 1.6651162790697673, "no_speech_prob": 0.0007162443362176418}, {"id": 1160, "seek": 314914, "start": 3167.14, "end": 3169.04, "text": " a dlogits", "tokens": [51265, 257, 274, 4987, 1208, 51360], "temperature": 0.0, "avg_logprob": -0.09660089622109623, "compression_ratio": 1.6651162790697673, "no_speech_prob": 0.0007162443362176418}, {"id": 1161, "seek": 314914, "start": 3169.14, "end": 3171.04, "text": " here", "tokens": [51365, 510, 51460], "temperature": 0.0, "avg_logprob": -0.09660089622109623, "compression_ratio": 1.6651162790697673, "no_speech_prob": 0.0007162443362176418}, {"id": 1162, "seek": 314914, "start": 3171.14, "end": 3173.04, "text": " it's 32 by 27 so", "tokens": [51465, 309, 311, 8858, 538, 7634, 370, 51560], "temperature": 0.0, "avg_logprob": -0.09660089622109623, "compression_ratio": 1.6651162790697673, "no_speech_prob": 0.0007162443362176418}, {"id": 1163, "seek": 314914, "start": 3173.14, "end": 3175.04, "text": " knowing that it's just sum over dlogits", "tokens": [51565, 5276, 300, 309, 311, 445, 2408, 670, 274, 4987, 1208, 51660], "temperature": 0.0, "avg_logprob": -0.09660089622109623, "compression_ratio": 1.6651162790697673, "no_speech_prob": 0.0007162443362176418}, {"id": 1164, "seek": 314914, "start": 3175.14, "end": 3177.04, "text": " in some direction", "tokens": [51665, 294, 512, 3513, 51760], "temperature": 0.0, "avg_logprob": -0.09660089622109623, "compression_ratio": 1.6651162790697673, "no_speech_prob": 0.0007162443362176418}, {"id": 1165, "seek": 317704, "start": 3177.04, "end": 3180.94, "text": " that direction must be 0", "tokens": [50365, 300, 3513, 1633, 312, 1958, 50560], "temperature": 0.0, "avg_logprob": -0.08813016659745546, "compression_ratio": 1.5972850678733033, "no_speech_prob": 0.00036479945993050933}, {"id": 1166, "seek": 317704, "start": 3181.04, "end": 3182.94, "text": " because I need to eliminate this dimension.", "tokens": [50565, 570, 286, 643, 281, 13819, 341, 10139, 13, 50660], "temperature": 0.0, "avg_logprob": -0.08813016659745546, "compression_ratio": 1.5972850678733033, "no_speech_prob": 0.00036479945993050933}, {"id": 1167, "seek": 317704, "start": 3183.04, "end": 3184.94, "text": " So it's this.", "tokens": [50665, 407, 309, 311, 341, 13, 50760], "temperature": 0.0, "avg_logprob": -0.08813016659745546, "compression_ratio": 1.5972850678733033, "no_speech_prob": 0.00036479945993050933}, {"id": 1168, "seek": 317704, "start": 3185.04, "end": 3186.94, "text": " So this is", "tokens": [50765, 407, 341, 307, 50860], "temperature": 0.0, "avg_logprob": -0.08813016659745546, "compression_ratio": 1.5972850678733033, "no_speech_prob": 0.00036479945993050933}, {"id": 1169, "seek": 317704, "start": 3187.04, "end": 3188.94, "text": " kind of like the hacky way.", "tokens": [50865, 733, 295, 411, 264, 10339, 88, 636, 13, 50960], "temperature": 0.0, "avg_logprob": -0.08813016659745546, "compression_ratio": 1.5972850678733033, "no_speech_prob": 0.00036479945993050933}, {"id": 1170, "seek": 317704, "start": 3189.04, "end": 3190.94, "text": " Let me copy paste and delete that", "tokens": [50965, 961, 385, 5055, 9163, 293, 12097, 300, 51060], "temperature": 0.0, "avg_logprob": -0.08813016659745546, "compression_ratio": 1.5972850678733033, "no_speech_prob": 0.00036479945993050933}, {"id": 1171, "seek": 317704, "start": 3191.04, "end": 3192.94, "text": " and let me swing over here", "tokens": [51065, 293, 718, 385, 11173, 670, 510, 51160], "temperature": 0.0, "avg_logprob": -0.08813016659745546, "compression_ratio": 1.5972850678733033, "no_speech_prob": 0.00036479945993050933}, {"id": 1172, "seek": 317704, "start": 3193.04, "end": 3194.94, "text": " and this is our backward pass for the linear layer.", "tokens": [51165, 293, 341, 307, 527, 23897, 1320, 337, 264, 8213, 4583, 13, 51260], "temperature": 0.0, "avg_logprob": -0.08813016659745546, "compression_ratio": 1.5972850678733033, "no_speech_prob": 0.00036479945993050933}, {"id": 1173, "seek": 317704, "start": 3195.04, "end": 3196.94, "text": " Hopefully.", "tokens": [51265, 10429, 13, 51360], "temperature": 0.0, "avg_logprob": -0.08813016659745546, "compression_ratio": 1.5972850678733033, "no_speech_prob": 0.00036479945993050933}, {"id": 1174, "seek": 317704, "start": 3197.04, "end": 3198.94, "text": " So now let's uncomment", "tokens": [51365, 407, 586, 718, 311, 8585, 518, 51460], "temperature": 0.0, "avg_logprob": -0.08813016659745546, "compression_ratio": 1.5972850678733033, "no_speech_prob": 0.00036479945993050933}, {"id": 1175, "seek": 317704, "start": 3199.04, "end": 3200.94, "text": " these three and we're checking that", "tokens": [51465, 613, 1045, 293, 321, 434, 8568, 300, 51560], "temperature": 0.0, "avg_logprob": -0.08813016659745546, "compression_ratio": 1.5972850678733033, "no_speech_prob": 0.00036479945993050933}, {"id": 1176, "seek": 317704, "start": 3201.04, "end": 3202.94, "text": " we got all the three", "tokens": [51565, 321, 658, 439, 264, 1045, 51660], "temperature": 0.0, "avg_logprob": -0.08813016659745546, "compression_ratio": 1.5972850678733033, "no_speech_prob": 0.00036479945993050933}, {"id": 1177, "seek": 317704, "start": 3203.04, "end": 3204.94, "text": " derivatives correct and", "tokens": [51665, 33733, 3006, 293, 51760], "temperature": 0.0, "avg_logprob": -0.08813016659745546, "compression_ratio": 1.5972850678733033, "no_speech_prob": 0.00036479945993050933}, {"id": 1178, "seek": 317704, "start": 3205.04, "end": 3206.94, "text": " run", "tokens": [51765, 1190, 51860], "temperature": 0.0, "avg_logprob": -0.08813016659745546, "compression_ratio": 1.5972850678733033, "no_speech_prob": 0.00036479945993050933}, {"id": 1179, "seek": 320694, "start": 3206.94, "end": 3208.84, "text": " and we see that h,", "tokens": [50365, 293, 321, 536, 300, 276, 11, 50460], "temperature": 0.0, "avg_logprob": -0.13375156455569798, "compression_ratio": 1.779591836734694, "no_speech_prob": 0.00025052455021068454}, {"id": 1180, "seek": 320694, "start": 3208.94, "end": 3210.84, "text": " w2 and b2 are all exactly correct.", "tokens": [50465, 261, 17, 293, 272, 17, 366, 439, 2293, 3006, 13, 50560], "temperature": 0.0, "avg_logprob": -0.13375156455569798, "compression_ratio": 1.779591836734694, "no_speech_prob": 0.00025052455021068454}, {"id": 1181, "seek": 320694, "start": 3210.94, "end": 3212.84, "text": " So we backpropagate it through", "tokens": [50565, 407, 321, 646, 79, 1513, 559, 473, 309, 807, 50660], "temperature": 0.0, "avg_logprob": -0.13375156455569798, "compression_ratio": 1.779591836734694, "no_speech_prob": 0.00025052455021068454}, {"id": 1182, "seek": 320694, "start": 3212.94, "end": 3214.84, "text": " a linear layer.", "tokens": [50665, 257, 8213, 4583, 13, 50760], "temperature": 0.0, "avg_logprob": -0.13375156455569798, "compression_ratio": 1.779591836734694, "no_speech_prob": 0.00025052455021068454}, {"id": 1183, "seek": 320694, "start": 3214.94, "end": 3216.84, "text": " Now next up", "tokens": [50765, 823, 958, 493, 50860], "temperature": 0.0, "avg_logprob": -0.13375156455569798, "compression_ratio": 1.779591836734694, "no_speech_prob": 0.00025052455021068454}, {"id": 1184, "seek": 320694, "start": 3216.94, "end": 3218.84, "text": " we have derivative for the h", "tokens": [50865, 321, 362, 13760, 337, 264, 276, 50960], "temperature": 0.0, "avg_logprob": -0.13375156455569798, "compression_ratio": 1.779591836734694, "no_speech_prob": 0.00025052455021068454}, {"id": 1185, "seek": 320694, "start": 3218.94, "end": 3220.84, "text": " already and we need to backpropagate", "tokens": [50965, 1217, 293, 321, 643, 281, 646, 79, 1513, 559, 473, 51060], "temperature": 0.0, "avg_logprob": -0.13375156455569798, "compression_ratio": 1.779591836734694, "no_speech_prob": 0.00025052455021068454}, {"id": 1186, "seek": 320694, "start": 3220.94, "end": 3222.84, "text": " through tanh into h preact.", "tokens": [51065, 807, 7603, 71, 666, 276, 659, 578, 13, 51160], "temperature": 0.0, "avg_logprob": -0.13375156455569798, "compression_ratio": 1.779591836734694, "no_speech_prob": 0.00025052455021068454}, {"id": 1187, "seek": 320694, "start": 3222.94, "end": 3224.84, "text": " So we want to derive", "tokens": [51165, 407, 321, 528, 281, 28446, 51260], "temperature": 0.0, "avg_logprob": -0.13375156455569798, "compression_ratio": 1.779591836734694, "no_speech_prob": 0.00025052455021068454}, {"id": 1188, "seek": 320694, "start": 3224.94, "end": 3226.84, "text": " dh preact", "tokens": [51265, 274, 71, 659, 578, 51360], "temperature": 0.0, "avg_logprob": -0.13375156455569798, "compression_ratio": 1.779591836734694, "no_speech_prob": 0.00025052455021068454}, {"id": 1189, "seek": 320694, "start": 3226.94, "end": 3228.84, "text": " and here we have to backpropagate through a tanh", "tokens": [51365, 293, 510, 321, 362, 281, 646, 79, 1513, 559, 473, 807, 257, 7603, 71, 51460], "temperature": 0.0, "avg_logprob": -0.13375156455569798, "compression_ratio": 1.779591836734694, "no_speech_prob": 0.00025052455021068454}, {"id": 1190, "seek": 320694, "start": 3228.94, "end": 3230.84, "text": " and we've already done this in micrograd", "tokens": [51465, 293, 321, 600, 1217, 1096, 341, 294, 4532, 7165, 51560], "temperature": 0.0, "avg_logprob": -0.13375156455569798, "compression_ratio": 1.779591836734694, "no_speech_prob": 0.00025052455021068454}, {"id": 1191, "seek": 320694, "start": 3230.94, "end": 3232.84, "text": " and we remember that tanh is a very simple", "tokens": [51565, 293, 321, 1604, 300, 7603, 71, 307, 257, 588, 2199, 51660], "temperature": 0.0, "avg_logprob": -0.13375156455569798, "compression_ratio": 1.779591836734694, "no_speech_prob": 0.00025052455021068454}, {"id": 1192, "seek": 320694, "start": 3232.94, "end": 3234.84, "text": " backward formula. Now unfortunately", "tokens": [51665, 23897, 8513, 13, 823, 7015, 51760], "temperature": 0.0, "avg_logprob": -0.13375156455569798, "compression_ratio": 1.779591836734694, "no_speech_prob": 0.00025052455021068454}, {"id": 1193, "seek": 320694, "start": 3234.94, "end": 3236.84, "text": " if I just put in d by dx of f", "tokens": [51765, 498, 286, 445, 829, 294, 274, 538, 30017, 295, 283, 51860], "temperature": 0.0, "avg_logprob": -0.13375156455569798, "compression_ratio": 1.779591836734694, "no_speech_prob": 0.00025052455021068454}, {"id": 1194, "seek": 323694, "start": 3236.94, "end": 3238.84, "text": " tanh of x into volt from alpha", "tokens": [50365, 7603, 71, 295, 2031, 666, 5962, 490, 8961, 50460], "temperature": 0.0, "avg_logprob": -0.12093232737647162, "compression_ratio": 1.7083333333333333, "no_speech_prob": 0.0006599322659894824}, {"id": 1195, "seek": 323694, "start": 3238.94, "end": 3240.84, "text": " it lets us down. It tells us that it's a", "tokens": [50465, 309, 6653, 505, 760, 13, 467, 5112, 505, 300, 309, 311, 257, 50560], "temperature": 0.0, "avg_logprob": -0.12093232737647162, "compression_ratio": 1.7083333333333333, "no_speech_prob": 0.0006599322659894824}, {"id": 1196, "seek": 323694, "start": 3240.94, "end": 3242.84, "text": " hyperbolic secant function squared", "tokens": [50565, 9848, 65, 7940, 907, 394, 2445, 8889, 50660], "temperature": 0.0, "avg_logprob": -0.12093232737647162, "compression_ratio": 1.7083333333333333, "no_speech_prob": 0.0006599322659894824}, {"id": 1197, "seek": 323694, "start": 3242.94, "end": 3244.84, "text": " of x. It's not exactly helpful", "tokens": [50665, 295, 2031, 13, 467, 311, 406, 2293, 4961, 50760], "temperature": 0.0, "avg_logprob": -0.12093232737647162, "compression_ratio": 1.7083333333333333, "no_speech_prob": 0.0006599322659894824}, {"id": 1198, "seek": 323694, "start": 3244.94, "end": 3246.84, "text": " but luckily google image", "tokens": [50765, 457, 22880, 20742, 3256, 50860], "temperature": 0.0, "avg_logprob": -0.12093232737647162, "compression_ratio": 1.7083333333333333, "no_speech_prob": 0.0006599322659894824}, {"id": 1199, "seek": 323694, "start": 3246.94, "end": 3248.84, "text": " search does not let us down and it gives", "tokens": [50865, 3164, 775, 406, 718, 505, 760, 293, 309, 2709, 50960], "temperature": 0.0, "avg_logprob": -0.12093232737647162, "compression_ratio": 1.7083333333333333, "no_speech_prob": 0.0006599322659894824}, {"id": 1200, "seek": 323694, "start": 3248.94, "end": 3250.84, "text": " us the simpler formula. In particular", "tokens": [50965, 505, 264, 18587, 8513, 13, 682, 1729, 51060], "temperature": 0.0, "avg_logprob": -0.12093232737647162, "compression_ratio": 1.7083333333333333, "no_speech_prob": 0.0006599322659894824}, {"id": 1201, "seek": 323694, "start": 3250.94, "end": 3252.84, "text": " if you have that a is equal to tanh", "tokens": [51065, 498, 291, 362, 300, 257, 307, 2681, 281, 7603, 71, 51160], "temperature": 0.0, "avg_logprob": -0.12093232737647162, "compression_ratio": 1.7083333333333333, "no_speech_prob": 0.0006599322659894824}, {"id": 1202, "seek": 323694, "start": 3252.94, "end": 3254.84, "text": " of z then da by", "tokens": [51165, 295, 710, 550, 1120, 538, 51260], "temperature": 0.0, "avg_logprob": -0.12093232737647162, "compression_ratio": 1.7083333333333333, "no_speech_prob": 0.0006599322659894824}, {"id": 1203, "seek": 323694, "start": 3254.94, "end": 3256.84, "text": " dz backpropagating through tanh", "tokens": [51265, 9758, 646, 79, 1513, 559, 990, 807, 7603, 71, 51360], "temperature": 0.0, "avg_logprob": -0.12093232737647162, "compression_ratio": 1.7083333333333333, "no_speech_prob": 0.0006599322659894824}, {"id": 1204, "seek": 323694, "start": 3256.94, "end": 3258.84, "text": " is just 1 minus a square", "tokens": [51365, 307, 445, 502, 3175, 257, 3732, 51460], "temperature": 0.0, "avg_logprob": -0.12093232737647162, "compression_ratio": 1.7083333333333333, "no_speech_prob": 0.0006599322659894824}, {"id": 1205, "seek": 323694, "start": 3258.94, "end": 3260.84, "text": " and take note that 1", "tokens": [51465, 293, 747, 3637, 300, 502, 51560], "temperature": 0.0, "avg_logprob": -0.12093232737647162, "compression_ratio": 1.7083333333333333, "no_speech_prob": 0.0006599322659894824}, {"id": 1206, "seek": 323694, "start": 3260.94, "end": 3262.84, "text": " minus a square a here is the", "tokens": [51565, 3175, 257, 3732, 257, 510, 307, 264, 51660], "temperature": 0.0, "avg_logprob": -0.12093232737647162, "compression_ratio": 1.7083333333333333, "no_speech_prob": 0.0006599322659894824}, {"id": 1207, "seek": 323694, "start": 3262.94, "end": 3264.84, "text": " output of the tanh not the input to", "tokens": [51665, 5598, 295, 264, 7603, 71, 406, 264, 4846, 281, 51760], "temperature": 0.0, "avg_logprob": -0.12093232737647162, "compression_ratio": 1.7083333333333333, "no_speech_prob": 0.0006599322659894824}, {"id": 1208, "seek": 323694, "start": 3264.94, "end": 3266.84, "text": " the tanh z. So", "tokens": [51765, 264, 7603, 71, 710, 13, 407, 51860], "temperature": 0.0, "avg_logprob": -0.12093232737647162, "compression_ratio": 1.7083333333333333, "no_speech_prob": 0.0006599322659894824}, {"id": 1209, "seek": 326684, "start": 3266.84, "end": 3268.7400000000002, "text": " the da by dz is here", "tokens": [50365, 264, 1120, 538, 9758, 307, 510, 50460], "temperature": 0.0, "avg_logprob": -0.05285712650844029, "compression_ratio": 1.8672566371681416, "no_speech_prob": 0.00015957790310494602}, {"id": 1210, "seek": 326684, "start": 3268.84, "end": 3270.7400000000002, "text": " formulated in terms of the output of that tanh", "tokens": [50465, 48936, 294, 2115, 295, 264, 5598, 295, 300, 7603, 71, 50560], "temperature": 0.0, "avg_logprob": -0.05285712650844029, "compression_ratio": 1.8672566371681416, "no_speech_prob": 0.00015957790310494602}, {"id": 1211, "seek": 326684, "start": 3270.84, "end": 3272.7400000000002, "text": " and here also", "tokens": [50565, 293, 510, 611, 50660], "temperature": 0.0, "avg_logprob": -0.05285712650844029, "compression_ratio": 1.8672566371681416, "no_speech_prob": 0.00015957790310494602}, {"id": 1212, "seek": 326684, "start": 3272.84, "end": 3274.7400000000002, "text": " in google image search we have the full derivation", "tokens": [50665, 294, 20742, 3256, 3164, 321, 362, 264, 1577, 10151, 399, 50760], "temperature": 0.0, "avg_logprob": -0.05285712650844029, "compression_ratio": 1.8672566371681416, "no_speech_prob": 0.00015957790310494602}, {"id": 1213, "seek": 326684, "start": 3274.84, "end": 3276.7400000000002, "text": " if you want to actually take the", "tokens": [50765, 498, 291, 528, 281, 767, 747, 264, 50860], "temperature": 0.0, "avg_logprob": -0.05285712650844029, "compression_ratio": 1.8672566371681416, "no_speech_prob": 0.00015957790310494602}, {"id": 1214, "seek": 326684, "start": 3276.84, "end": 3278.7400000000002, "text": " actual definition of tanh and work", "tokens": [50865, 3539, 7123, 295, 7603, 71, 293, 589, 50960], "temperature": 0.0, "avg_logprob": -0.05285712650844029, "compression_ratio": 1.8672566371681416, "no_speech_prob": 0.00015957790310494602}, {"id": 1215, "seek": 326684, "start": 3278.84, "end": 3280.7400000000002, "text": " through the math to figure out 1 minus", "tokens": [50965, 807, 264, 5221, 281, 2573, 484, 502, 3175, 51060], "temperature": 0.0, "avg_logprob": -0.05285712650844029, "compression_ratio": 1.8672566371681416, "no_speech_prob": 0.00015957790310494602}, {"id": 1216, "seek": 326684, "start": 3280.84, "end": 3282.7400000000002, "text": " tanh square of z. So", "tokens": [51065, 7603, 71, 3732, 295, 710, 13, 407, 51160], "temperature": 0.0, "avg_logprob": -0.05285712650844029, "compression_ratio": 1.8672566371681416, "no_speech_prob": 0.00015957790310494602}, {"id": 1217, "seek": 326684, "start": 3282.84, "end": 3284.7400000000002, "text": " 1 minus a square is", "tokens": [51165, 502, 3175, 257, 3732, 307, 51260], "temperature": 0.0, "avg_logprob": -0.05285712650844029, "compression_ratio": 1.8672566371681416, "no_speech_prob": 0.00015957790310494602}, {"id": 1218, "seek": 326684, "start": 3284.84, "end": 3286.7400000000002, "text": " the local derivative. In our case", "tokens": [51265, 264, 2654, 13760, 13, 682, 527, 1389, 51360], "temperature": 0.0, "avg_logprob": -0.05285712650844029, "compression_ratio": 1.8672566371681416, "no_speech_prob": 0.00015957790310494602}, {"id": 1219, "seek": 326684, "start": 3286.84, "end": 3288.7400000000002, "text": " that is 1 minus", "tokens": [51365, 300, 307, 502, 3175, 51460], "temperature": 0.0, "avg_logprob": -0.05285712650844029, "compression_ratio": 1.8672566371681416, "no_speech_prob": 0.00015957790310494602}, {"id": 1220, "seek": 326684, "start": 3288.84, "end": 3290.7400000000002, "text": " the output of tanh", "tokens": [51465, 264, 5598, 295, 7603, 71, 51560], "temperature": 0.0, "avg_logprob": -0.05285712650844029, "compression_ratio": 1.8672566371681416, "no_speech_prob": 0.00015957790310494602}, {"id": 1221, "seek": 326684, "start": 3290.84, "end": 3292.7400000000002, "text": " square which here is h", "tokens": [51565, 3732, 597, 510, 307, 276, 51660], "temperature": 0.0, "avg_logprob": -0.05285712650844029, "compression_ratio": 1.8672566371681416, "no_speech_prob": 0.00015957790310494602}, {"id": 1222, "seek": 326684, "start": 3292.84, "end": 3294.7400000000002, "text": " so it's h square", "tokens": [51665, 370, 309, 311, 276, 3732, 51760], "temperature": 0.0, "avg_logprob": -0.05285712650844029, "compression_ratio": 1.8672566371681416, "no_speech_prob": 0.00015957790310494602}, {"id": 1223, "seek": 326684, "start": 3294.84, "end": 3296.7400000000002, "text": " and that is the local derivative", "tokens": [51765, 293, 300, 307, 264, 2654, 13760, 51860], "temperature": 0.0, "avg_logprob": -0.05285712650844029, "compression_ratio": 1.8672566371681416, "no_speech_prob": 0.00015957790310494602}, {"id": 1224, "seek": 329684, "start": 3296.84, "end": 3298.7400000000002, "text": " and then times the chain rule", "tokens": [50365, 293, 550, 1413, 264, 5021, 4978, 50460], "temperature": 0.0, "avg_logprob": -0.0976264187783906, "compression_ratio": 1.8181818181818181, "no_speech_prob": 0.0006292664329521358}, {"id": 1225, "seek": 329684, "start": 3298.84, "end": 3300.7400000000002, "text": " dh. So", "tokens": [50465, 274, 71, 13, 407, 50560], "temperature": 0.0, "avg_logprob": -0.0976264187783906, "compression_ratio": 1.8181818181818181, "no_speech_prob": 0.0006292664329521358}, {"id": 1226, "seek": 329684, "start": 3300.84, "end": 3302.7400000000002, "text": " that is going to be our candidate implementation", "tokens": [50565, 300, 307, 516, 281, 312, 527, 11532, 11420, 50660], "temperature": 0.0, "avg_logprob": -0.0976264187783906, "compression_ratio": 1.8181818181818181, "no_speech_prob": 0.0006292664329521358}, {"id": 1227, "seek": 329684, "start": 3302.84, "end": 3304.7400000000002, "text": " so if we come here", "tokens": [50665, 370, 498, 321, 808, 510, 50760], "temperature": 0.0, "avg_logprob": -0.0976264187783906, "compression_ratio": 1.8181818181818181, "no_speech_prob": 0.0006292664329521358}, {"id": 1228, "seek": 329684, "start": 3304.84, "end": 3306.7400000000002, "text": " and then uncomment", "tokens": [50765, 293, 550, 8585, 518, 50860], "temperature": 0.0, "avg_logprob": -0.0976264187783906, "compression_ratio": 1.8181818181818181, "no_speech_prob": 0.0006292664329521358}, {"id": 1229, "seek": 329684, "start": 3306.84, "end": 3308.7400000000002, "text": " this let's hope for the best", "tokens": [50865, 341, 718, 311, 1454, 337, 264, 1151, 50960], "temperature": 0.0, "avg_logprob": -0.0976264187783906, "compression_ratio": 1.8181818181818181, "no_speech_prob": 0.0006292664329521358}, {"id": 1230, "seek": 329684, "start": 3308.84, "end": 3310.7400000000002, "text": " and we have", "tokens": [50965, 293, 321, 362, 51060], "temperature": 0.0, "avg_logprob": -0.0976264187783906, "compression_ratio": 1.8181818181818181, "no_speech_prob": 0.0006292664329521358}, {"id": 1231, "seek": 329684, "start": 3310.84, "end": 3312.7400000000002, "text": " the right answer. Okay next", "tokens": [51065, 264, 558, 1867, 13, 1033, 958, 51160], "temperature": 0.0, "avg_logprob": -0.0976264187783906, "compression_ratio": 1.8181818181818181, "no_speech_prob": 0.0006292664329521358}, {"id": 1232, "seek": 329684, "start": 3312.84, "end": 3314.7400000000002, "text": " up we have dh preact and", "tokens": [51165, 493, 321, 362, 274, 71, 659, 578, 293, 51260], "temperature": 0.0, "avg_logprob": -0.0976264187783906, "compression_ratio": 1.8181818181818181, "no_speech_prob": 0.0006292664329521358}, {"id": 1233, "seek": 329684, "start": 3314.84, "end": 3316.7400000000002, "text": " we want to backpropagate into the gain", "tokens": [51265, 321, 528, 281, 646, 79, 1513, 559, 473, 666, 264, 6052, 51360], "temperature": 0.0, "avg_logprob": -0.0976264187783906, "compression_ratio": 1.8181818181818181, "no_speech_prob": 0.0006292664329521358}, {"id": 1234, "seek": 329684, "start": 3316.84, "end": 3318.7400000000002, "text": " the b in raw and the b in bias.", "tokens": [51365, 264, 272, 294, 8936, 293, 264, 272, 294, 12577, 13, 51460], "temperature": 0.0, "avg_logprob": -0.0976264187783906, "compression_ratio": 1.8181818181818181, "no_speech_prob": 0.0006292664329521358}, {"id": 1235, "seek": 329684, "start": 3318.84, "end": 3320.7400000000002, "text": " So here this is the bash norm", "tokens": [51465, 407, 510, 341, 307, 264, 46183, 2026, 51560], "temperature": 0.0, "avg_logprob": -0.0976264187783906, "compression_ratio": 1.8181818181818181, "no_speech_prob": 0.0006292664329521358}, {"id": 1236, "seek": 329684, "start": 3320.84, "end": 3322.7400000000002, "text": " parameters b in gain and bias inside", "tokens": [51565, 9834, 272, 294, 6052, 293, 12577, 1854, 51660], "temperature": 0.0, "avg_logprob": -0.0976264187783906, "compression_ratio": 1.8181818181818181, "no_speech_prob": 0.0006292664329521358}, {"id": 1237, "seek": 329684, "start": 3322.84, "end": 3324.7400000000002, "text": " the bash norm that take the b in raw", "tokens": [51665, 264, 46183, 2026, 300, 747, 264, 272, 294, 8936, 51760], "temperature": 0.0, "avg_logprob": -0.0976264187783906, "compression_ratio": 1.8181818181818181, "no_speech_prob": 0.0006292664329521358}, {"id": 1238, "seek": 329684, "start": 3324.84, "end": 3326.7400000000002, "text": " that is exact unit Gaussian", "tokens": [51765, 300, 307, 1900, 4985, 39148, 51860], "temperature": 0.0, "avg_logprob": -0.0976264187783906, "compression_ratio": 1.8181818181818181, "no_speech_prob": 0.0006292664329521358}, {"id": 1239, "seek": 332674, "start": 3326.74, "end": 3328.64, "text": " and they scale it and shift it", "tokens": [50365, 293, 436, 4373, 309, 293, 5513, 309, 50460], "temperature": 0.0, "avg_logprob": -0.07111853272167605, "compression_ratio": 1.7797202797202798, "no_speech_prob": 0.0005368402926251292}, {"id": 1240, "seek": 332674, "start": 3328.74, "end": 3330.64, "text": " and these are the parameters of the", "tokens": [50465, 293, 613, 366, 264, 9834, 295, 264, 50560], "temperature": 0.0, "avg_logprob": -0.07111853272167605, "compression_ratio": 1.7797202797202798, "no_speech_prob": 0.0005368402926251292}, {"id": 1241, "seek": 332674, "start": 3330.74, "end": 3332.64, "text": " bash norm. Now here", "tokens": [50565, 46183, 2026, 13, 823, 510, 50660], "temperature": 0.0, "avg_logprob": -0.07111853272167605, "compression_ratio": 1.7797202797202798, "no_speech_prob": 0.0005368402926251292}, {"id": 1242, "seek": 332674, "start": 3332.74, "end": 3334.64, "text": " we have a multiplication but", "tokens": [50665, 321, 362, 257, 27290, 457, 50760], "temperature": 0.0, "avg_logprob": -0.07111853272167605, "compression_ratio": 1.7797202797202798, "no_speech_prob": 0.0005368402926251292}, {"id": 1243, "seek": 332674, "start": 3334.74, "end": 3336.64, "text": " it's worth noting that this multiply is very very", "tokens": [50765, 309, 311, 3163, 26801, 300, 341, 12972, 307, 588, 588, 50860], "temperature": 0.0, "avg_logprob": -0.07111853272167605, "compression_ratio": 1.7797202797202798, "no_speech_prob": 0.0005368402926251292}, {"id": 1244, "seek": 332674, "start": 3336.74, "end": 3338.64, "text": " different from this matrix multiply here", "tokens": [50865, 819, 490, 341, 8141, 12972, 510, 50960], "temperature": 0.0, "avg_logprob": -0.07111853272167605, "compression_ratio": 1.7797202797202798, "no_speech_prob": 0.0005368402926251292}, {"id": 1245, "seek": 332674, "start": 3338.74, "end": 3340.64, "text": " matrix multiply are dot products", "tokens": [50965, 8141, 12972, 366, 5893, 3383, 51060], "temperature": 0.0, "avg_logprob": -0.07111853272167605, "compression_ratio": 1.7797202797202798, "no_speech_prob": 0.0005368402926251292}, {"id": 1246, "seek": 332674, "start": 3340.74, "end": 3342.64, "text": " between rows and columns of these", "tokens": [51065, 1296, 13241, 293, 13766, 295, 613, 51160], "temperature": 0.0, "avg_logprob": -0.07111853272167605, "compression_ratio": 1.7797202797202798, "no_speech_prob": 0.0005368402926251292}, {"id": 1247, "seek": 332674, "start": 3342.74, "end": 3344.64, "text": " matrices involved. This is an", "tokens": [51165, 32284, 3288, 13, 639, 307, 364, 51260], "temperature": 0.0, "avg_logprob": -0.07111853272167605, "compression_ratio": 1.7797202797202798, "no_speech_prob": 0.0005368402926251292}, {"id": 1248, "seek": 332674, "start": 3344.74, "end": 3346.64, "text": " element wise multiply so things are quite a bit", "tokens": [51265, 4478, 10829, 12972, 370, 721, 366, 1596, 257, 857, 51360], "temperature": 0.0, "avg_logprob": -0.07111853272167605, "compression_ratio": 1.7797202797202798, "no_speech_prob": 0.0005368402926251292}, {"id": 1249, "seek": 332674, "start": 3346.74, "end": 3348.64, "text": " simpler. Now we do have to", "tokens": [51365, 18587, 13, 823, 321, 360, 362, 281, 51460], "temperature": 0.0, "avg_logprob": -0.07111853272167605, "compression_ratio": 1.7797202797202798, "no_speech_prob": 0.0005368402926251292}, {"id": 1250, "seek": 332674, "start": 3348.74, "end": 3350.64, "text": " be careful with some of the broadcasting happening", "tokens": [51465, 312, 5026, 365, 512, 295, 264, 30024, 2737, 51560], "temperature": 0.0, "avg_logprob": -0.07111853272167605, "compression_ratio": 1.7797202797202798, "no_speech_prob": 0.0005368402926251292}, {"id": 1251, "seek": 332674, "start": 3350.74, "end": 3352.64, "text": " in this line of code though. So you", "tokens": [51565, 294, 341, 1622, 295, 3089, 1673, 13, 407, 291, 51660], "temperature": 0.0, "avg_logprob": -0.07111853272167605, "compression_ratio": 1.7797202797202798, "no_speech_prob": 0.0005368402926251292}, {"id": 1252, "seek": 332674, "start": 3352.74, "end": 3354.64, "text": " see how b in gain and b in bias", "tokens": [51665, 536, 577, 272, 294, 6052, 293, 272, 294, 12577, 51760], "temperature": 0.0, "avg_logprob": -0.07111853272167605, "compression_ratio": 1.7797202797202798, "no_speech_prob": 0.0005368402926251292}, {"id": 1253, "seek": 332674, "start": 3354.74, "end": 3356.64, "text": " are 1 by 64", "tokens": [51765, 366, 502, 538, 12145, 51860], "temperature": 0.0, "avg_logprob": -0.07111853272167605, "compression_ratio": 1.7797202797202798, "no_speech_prob": 0.0005368402926251292}, {"id": 1254, "seek": 335674, "start": 3356.74, "end": 3358.64, "text": " but h preact and", "tokens": [50365, 457, 276, 659, 578, 293, 50460], "temperature": 0.0, "avg_logprob": -0.08398477059823495, "compression_ratio": 1.792, "no_speech_prob": 0.0002061880222754553}, {"id": 1255, "seek": 335674, "start": 3358.74, "end": 3360.64, "text": " b in raw are 32 by 64.", "tokens": [50465, 272, 294, 8936, 366, 8858, 538, 12145, 13, 50560], "temperature": 0.0, "avg_logprob": -0.08398477059823495, "compression_ratio": 1.792, "no_speech_prob": 0.0002061880222754553}, {"id": 1256, "seek": 335674, "start": 3360.74, "end": 3362.64, "text": " So", "tokens": [50565, 407, 50660], "temperature": 0.0, "avg_logprob": -0.08398477059823495, "compression_ratio": 1.792, "no_speech_prob": 0.0002061880222754553}, {"id": 1257, "seek": 335674, "start": 3362.74, "end": 3364.64, "text": " we have to be careful with that and make sure that all the shapes", "tokens": [50665, 321, 362, 281, 312, 5026, 365, 300, 293, 652, 988, 300, 439, 264, 10854, 50760], "temperature": 0.0, "avg_logprob": -0.08398477059823495, "compression_ratio": 1.792, "no_speech_prob": 0.0002061880222754553}, {"id": 1258, "seek": 335674, "start": 3364.74, "end": 3366.64, "text": " work out fine and that the broadcasting is", "tokens": [50765, 589, 484, 2489, 293, 300, 264, 30024, 307, 50860], "temperature": 0.0, "avg_logprob": -0.08398477059823495, "compression_ratio": 1.792, "no_speech_prob": 0.0002061880222754553}, {"id": 1259, "seek": 335674, "start": 3366.74, "end": 3368.64, "text": " correctly backpropagated. So", "tokens": [50865, 8944, 646, 79, 1513, 559, 770, 13, 407, 50960], "temperature": 0.0, "avg_logprob": -0.08398477059823495, "compression_ratio": 1.792, "no_speech_prob": 0.0002061880222754553}, {"id": 1260, "seek": 335674, "start": 3368.74, "end": 3370.64, "text": " in particular let's start with db in gain", "tokens": [50965, 294, 1729, 718, 311, 722, 365, 274, 65, 294, 6052, 51060], "temperature": 0.0, "avg_logprob": -0.08398477059823495, "compression_ratio": 1.792, "no_speech_prob": 0.0002061880222754553}, {"id": 1261, "seek": 335674, "start": 3370.74, "end": 3372.64, "text": " so db in gain", "tokens": [51065, 370, 274, 65, 294, 6052, 51160], "temperature": 0.0, "avg_logprob": -0.08398477059823495, "compression_ratio": 1.792, "no_speech_prob": 0.0002061880222754553}, {"id": 1262, "seek": 335674, "start": 3372.74, "end": 3374.64, "text": " should be", "tokens": [51165, 820, 312, 51260], "temperature": 0.0, "avg_logprob": -0.08398477059823495, "compression_ratio": 1.792, "no_speech_prob": 0.0002061880222754553}, {"id": 1263, "seek": 335674, "start": 3374.74, "end": 3376.64, "text": " and here this is again element wise", "tokens": [51265, 293, 510, 341, 307, 797, 4478, 10829, 51360], "temperature": 0.0, "avg_logprob": -0.08398477059823495, "compression_ratio": 1.792, "no_speech_prob": 0.0002061880222754553}, {"id": 1264, "seek": 335674, "start": 3376.74, "end": 3378.64, "text": " multiply and whenever we have a times", "tokens": [51365, 12972, 293, 5699, 321, 362, 257, 1413, 51460], "temperature": 0.0, "avg_logprob": -0.08398477059823495, "compression_ratio": 1.792, "no_speech_prob": 0.0002061880222754553}, {"id": 1265, "seek": 335674, "start": 3378.74, "end": 3380.64, "text": " b equals c we saw that", "tokens": [51465, 272, 6915, 269, 321, 1866, 300, 51560], "temperature": 0.0, "avg_logprob": -0.08398477059823495, "compression_ratio": 1.792, "no_speech_prob": 0.0002061880222754553}, {"id": 1266, "seek": 335674, "start": 3380.74, "end": 3382.64, "text": " the local derivative here is just if this", "tokens": [51565, 264, 2654, 13760, 510, 307, 445, 498, 341, 51660], "temperature": 0.0, "avg_logprob": -0.08398477059823495, "compression_ratio": 1.792, "no_speech_prob": 0.0002061880222754553}, {"id": 1267, "seek": 335674, "start": 3382.74, "end": 3384.64, "text": " is a the local derivative is just the", "tokens": [51665, 307, 257, 264, 2654, 13760, 307, 445, 264, 51760], "temperature": 0.0, "avg_logprob": -0.08398477059823495, "compression_ratio": 1.792, "no_speech_prob": 0.0002061880222754553}, {"id": 1268, "seek": 335674, "start": 3384.74, "end": 3386.64, "text": " b the other one. So this", "tokens": [51765, 272, 264, 661, 472, 13, 407, 341, 51860], "temperature": 0.0, "avg_logprob": -0.08398477059823495, "compression_ratio": 1.792, "no_speech_prob": 0.0002061880222754553}, {"id": 1269, "seek": 338674, "start": 3386.74, "end": 3388.64, "text": " local derivative is just b in raw", "tokens": [50365, 2654, 13760, 307, 445, 272, 294, 8936, 50460], "temperature": 0.0, "avg_logprob": -0.07336045170689488, "compression_ratio": 1.6064814814814814, "no_speech_prob": 0.00028479666798375547}, {"id": 1270, "seek": 338674, "start": 3388.74, "end": 3390.64, "text": " and then times chain rule", "tokens": [50465, 293, 550, 1413, 5021, 4978, 50560], "temperature": 0.0, "avg_logprob": -0.07336045170689488, "compression_ratio": 1.6064814814814814, "no_speech_prob": 0.00028479666798375547}, {"id": 1271, "seek": 338674, "start": 3390.74, "end": 3392.64, "text": " so dh preact.", "tokens": [50565, 370, 274, 71, 659, 578, 13, 50660], "temperature": 0.0, "avg_logprob": -0.07336045170689488, "compression_ratio": 1.6064814814814814, "no_speech_prob": 0.00028479666798375547}, {"id": 1272, "seek": 338674, "start": 3392.74, "end": 3394.64, "text": " So", "tokens": [50665, 407, 50760], "temperature": 0.0, "avg_logprob": -0.07336045170689488, "compression_ratio": 1.6064814814814814, "no_speech_prob": 0.00028479666798375547}, {"id": 1273, "seek": 338674, "start": 3394.74, "end": 3396.64, "text": " this is the candidate", "tokens": [50765, 341, 307, 264, 11532, 50860], "temperature": 0.0, "avg_logprob": -0.07336045170689488, "compression_ratio": 1.6064814814814814, "no_speech_prob": 0.00028479666798375547}, {"id": 1274, "seek": 338674, "start": 3396.74, "end": 3398.64, "text": " gradient. Now again", "tokens": [50865, 16235, 13, 823, 797, 50960], "temperature": 0.0, "avg_logprob": -0.07336045170689488, "compression_ratio": 1.6064814814814814, "no_speech_prob": 0.00028479666798375547}, {"id": 1275, "seek": 338674, "start": 3398.74, "end": 3400.64, "text": " we have to be careful because b in gain", "tokens": [50965, 321, 362, 281, 312, 5026, 570, 272, 294, 6052, 51060], "temperature": 0.0, "avg_logprob": -0.07336045170689488, "compression_ratio": 1.6064814814814814, "no_speech_prob": 0.00028479666798375547}, {"id": 1276, "seek": 338674, "start": 3400.74, "end": 3402.64, "text": " is of size 1 by 64", "tokens": [51065, 307, 295, 2744, 502, 538, 12145, 51160], "temperature": 0.0, "avg_logprob": -0.07336045170689488, "compression_ratio": 1.6064814814814814, "no_speech_prob": 0.00028479666798375547}, {"id": 1277, "seek": 338674, "start": 3402.74, "end": 3404.64, "text": " but this here", "tokens": [51165, 457, 341, 510, 51260], "temperature": 0.0, "avg_logprob": -0.07336045170689488, "compression_ratio": 1.6064814814814814, "no_speech_prob": 0.00028479666798375547}, {"id": 1278, "seek": 338674, "start": 3404.74, "end": 3406.64, "text": " would be 32 by 64", "tokens": [51265, 576, 312, 8858, 538, 12145, 51360], "temperature": 0.0, "avg_logprob": -0.07336045170689488, "compression_ratio": 1.6064814814814814, "no_speech_prob": 0.00028479666798375547}, {"id": 1279, "seek": 338674, "start": 3406.74, "end": 3408.64, "text": " and so", "tokens": [51365, 293, 370, 51460], "temperature": 0.0, "avg_logprob": -0.07336045170689488, "compression_ratio": 1.6064814814814814, "no_speech_prob": 0.00028479666798375547}, {"id": 1280, "seek": 338674, "start": 3408.74, "end": 3410.64, "text": " the correct thing to do", "tokens": [51465, 264, 3006, 551, 281, 360, 51560], "temperature": 0.0, "avg_logprob": -0.07336045170689488, "compression_ratio": 1.6064814814814814, "no_speech_prob": 0.00028479666798375547}, {"id": 1281, "seek": 338674, "start": 3410.74, "end": 3412.64, "text": " in this case of course is that b in gain", "tokens": [51565, 294, 341, 1389, 295, 1164, 307, 300, 272, 294, 6052, 51660], "temperature": 0.0, "avg_logprob": -0.07336045170689488, "compression_ratio": 1.6064814814814814, "no_speech_prob": 0.00028479666798375547}, {"id": 1282, "seek": 338674, "start": 3412.74, "end": 3414.64, "text": " here is a rule vector of 64 numbers", "tokens": [51665, 510, 307, 257, 4978, 8062, 295, 12145, 3547, 51760], "temperature": 0.0, "avg_logprob": -0.07336045170689488, "compression_ratio": 1.6064814814814814, "no_speech_prob": 0.00028479666798375547}, {"id": 1283, "seek": 338674, "start": 3414.74, "end": 3416.64, "text": " it gets replicated vertically", "tokens": [51765, 309, 2170, 46365, 28450, 51860], "temperature": 0.0, "avg_logprob": -0.07336045170689488, "compression_ratio": 1.6064814814814814, "no_speech_prob": 0.00028479666798375547}, {"id": 1284, "seek": 341664, "start": 3416.64, "end": 3418.54, "text": " in this operation", "tokens": [50365, 294, 341, 6916, 50460], "temperature": 0.0, "avg_logprob": -0.07791039226500969, "compression_ratio": 1.7883817427385893, "no_speech_prob": 0.001408397569321096}, {"id": 1285, "seek": 341664, "start": 3418.64, "end": 3420.54, "text": " and so therefore the correct thing to do", "tokens": [50465, 293, 370, 4412, 264, 3006, 551, 281, 360, 50560], "temperature": 0.0, "avg_logprob": -0.07791039226500969, "compression_ratio": 1.7883817427385893, "no_speech_prob": 0.001408397569321096}, {"id": 1286, "seek": 341664, "start": 3420.64, "end": 3422.54, "text": " is to sum because it's being replicated", "tokens": [50565, 307, 281, 2408, 570, 309, 311, 885, 46365, 50660], "temperature": 0.0, "avg_logprob": -0.07791039226500969, "compression_ratio": 1.7883817427385893, "no_speech_prob": 0.001408397569321096}, {"id": 1287, "seek": 341664, "start": 3422.64, "end": 3424.54, "text": " and therefore", "tokens": [50665, 293, 4412, 50760], "temperature": 0.0, "avg_logprob": -0.07791039226500969, "compression_ratio": 1.7883817427385893, "no_speech_prob": 0.001408397569321096}, {"id": 1288, "seek": 341664, "start": 3424.64, "end": 3426.54, "text": " all the gradients in each of the rows", "tokens": [50765, 439, 264, 2771, 2448, 294, 1184, 295, 264, 13241, 50860], "temperature": 0.0, "avg_logprob": -0.07791039226500969, "compression_ratio": 1.7883817427385893, "no_speech_prob": 0.001408397569321096}, {"id": 1289, "seek": 341664, "start": 3426.64, "end": 3428.54, "text": " that are now flowing backwards", "tokens": [50865, 300, 366, 586, 13974, 12204, 50960], "temperature": 0.0, "avg_logprob": -0.07791039226500969, "compression_ratio": 1.7883817427385893, "no_speech_prob": 0.001408397569321096}, {"id": 1290, "seek": 341664, "start": 3428.64, "end": 3430.54, "text": " need to sum up to that same", "tokens": [50965, 643, 281, 2408, 493, 281, 300, 912, 51060], "temperature": 0.0, "avg_logprob": -0.07791039226500969, "compression_ratio": 1.7883817427385893, "no_speech_prob": 0.001408397569321096}, {"id": 1291, "seek": 341664, "start": 3430.64, "end": 3432.54, "text": " tensor db in gain.", "tokens": [51065, 40863, 274, 65, 294, 6052, 13, 51160], "temperature": 0.0, "avg_logprob": -0.07791039226500969, "compression_ratio": 1.7883817427385893, "no_speech_prob": 0.001408397569321096}, {"id": 1292, "seek": 341664, "start": 3432.64, "end": 3434.54, "text": " So we have to sum across all the zero", "tokens": [51165, 407, 321, 362, 281, 2408, 2108, 439, 264, 4018, 51260], "temperature": 0.0, "avg_logprob": -0.07791039226500969, "compression_ratio": 1.7883817427385893, "no_speech_prob": 0.001408397569321096}, {"id": 1293, "seek": 341664, "start": 3434.64, "end": 3436.54, "text": " all the examples", "tokens": [51265, 439, 264, 5110, 51360], "temperature": 0.0, "avg_logprob": -0.07791039226500969, "compression_ratio": 1.7883817427385893, "no_speech_prob": 0.001408397569321096}, {"id": 1294, "seek": 341664, "start": 3436.64, "end": 3438.54, "text": " basically which is the direction", "tokens": [51365, 1936, 597, 307, 264, 3513, 51460], "temperature": 0.0, "avg_logprob": -0.07791039226500969, "compression_ratio": 1.7883817427385893, "no_speech_prob": 0.001408397569321096}, {"id": 1295, "seek": 341664, "start": 3438.64, "end": 3440.54, "text": " in which this gets replicated", "tokens": [51465, 294, 597, 341, 2170, 46365, 51560], "temperature": 0.0, "avg_logprob": -0.07791039226500969, "compression_ratio": 1.7883817427385893, "no_speech_prob": 0.001408397569321096}, {"id": 1296, "seek": 341664, "start": 3440.64, "end": 3442.54, "text": " and now we have to be also careful because", "tokens": [51565, 293, 586, 321, 362, 281, 312, 611, 5026, 570, 51660], "temperature": 0.0, "avg_logprob": -0.07791039226500969, "compression_ratio": 1.7883817427385893, "no_speech_prob": 0.001408397569321096}, {"id": 1297, "seek": 341664, "start": 3442.64, "end": 3444.54, "text": " b in gain is of shape", "tokens": [51665, 272, 294, 6052, 307, 295, 3909, 51760], "temperature": 0.0, "avg_logprob": -0.07791039226500969, "compression_ratio": 1.7883817427385893, "no_speech_prob": 0.001408397569321096}, {"id": 1298, "seek": 341664, "start": 3444.64, "end": 3446.54, "text": " 1 by 64. So in fact", "tokens": [51765, 502, 538, 12145, 13, 407, 294, 1186, 51860], "temperature": 0.0, "avg_logprob": -0.07791039226500969, "compression_ratio": 1.7883817427385893, "no_speech_prob": 0.001408397569321096}, {"id": 1299, "seek": 344664, "start": 3446.64, "end": 3448.54, "text": " I need to keep them as true", "tokens": [50365, 286, 643, 281, 1066, 552, 382, 2074, 50460], "temperature": 0.0, "avg_logprob": -0.08159881649595319, "compression_ratio": 1.7540983606557377, "no_speech_prob": 0.0007130238809622824}, {"id": 1300, "seek": 344664, "start": 3448.64, "end": 3450.54, "text": " otherwise I would just get 64.", "tokens": [50465, 5911, 286, 576, 445, 483, 12145, 13, 50560], "temperature": 0.0, "avg_logprob": -0.08159881649595319, "compression_ratio": 1.7540983606557377, "no_speech_prob": 0.0007130238809622824}, {"id": 1301, "seek": 344664, "start": 3450.64, "end": 3452.54, "text": " Now I don't actually", "tokens": [50565, 823, 286, 500, 380, 767, 50660], "temperature": 0.0, "avg_logprob": -0.08159881649595319, "compression_ratio": 1.7540983606557377, "no_speech_prob": 0.0007130238809622824}, {"id": 1302, "seek": 344664, "start": 3452.64, "end": 3454.54, "text": " really remember why", "tokens": [50665, 534, 1604, 983, 50760], "temperature": 0.0, "avg_logprob": -0.08159881649595319, "compression_ratio": 1.7540983606557377, "no_speech_prob": 0.0007130238809622824}, {"id": 1303, "seek": 344664, "start": 3454.64, "end": 3456.54, "text": " the b in gain and the b in bias", "tokens": [50765, 264, 272, 294, 6052, 293, 264, 272, 294, 12577, 50860], "temperature": 0.0, "avg_logprob": -0.08159881649595319, "compression_ratio": 1.7540983606557377, "no_speech_prob": 0.0007130238809622824}, {"id": 1304, "seek": 344664, "start": 3456.64, "end": 3458.54, "text": " I made them be 1 by 64", "tokens": [50865, 286, 1027, 552, 312, 502, 538, 12145, 50960], "temperature": 0.0, "avg_logprob": -0.08159881649595319, "compression_ratio": 1.7540983606557377, "no_speech_prob": 0.0007130238809622824}, {"id": 1305, "seek": 344664, "start": 3458.64, "end": 3460.54, "text": " but the biases", "tokens": [50965, 457, 264, 32152, 51060], "temperature": 0.0, "avg_logprob": -0.08159881649595319, "compression_ratio": 1.7540983606557377, "no_speech_prob": 0.0007130238809622824}, {"id": 1306, "seek": 344664, "start": 3460.64, "end": 3462.54, "text": " b1 and b2", "tokens": [51065, 272, 16, 293, 272, 17, 51160], "temperature": 0.0, "avg_logprob": -0.08159881649595319, "compression_ratio": 1.7540983606557377, "no_speech_prob": 0.0007130238809622824}, {"id": 1307, "seek": 344664, "start": 3462.64, "end": 3464.54, "text": " I just made them be one-dimensional vectors", "tokens": [51165, 286, 445, 1027, 552, 312, 472, 12, 18759, 18875, 51260], "temperature": 0.0, "avg_logprob": -0.08159881649595319, "compression_ratio": 1.7540983606557377, "no_speech_prob": 0.0007130238809622824}, {"id": 1308, "seek": 344664, "start": 3464.64, "end": 3466.54, "text": " they're not two-dimensional tensors", "tokens": [51265, 436, 434, 406, 732, 12, 18759, 10688, 830, 51360], "temperature": 0.0, "avg_logprob": -0.08159881649595319, "compression_ratio": 1.7540983606557377, "no_speech_prob": 0.0007130238809622824}, {"id": 1309, "seek": 344664, "start": 3466.64, "end": 3468.54, "text": " so I can't recall exactly why", "tokens": [51365, 370, 286, 393, 380, 9901, 2293, 983, 51460], "temperature": 0.0, "avg_logprob": -0.08159881649595319, "compression_ratio": 1.7540983606557377, "no_speech_prob": 0.0007130238809622824}, {"id": 1310, "seek": 344664, "start": 3468.64, "end": 3470.54, "text": " I left the gain", "tokens": [51465, 286, 1411, 264, 6052, 51560], "temperature": 0.0, "avg_logprob": -0.08159881649595319, "compression_ratio": 1.7540983606557377, "no_speech_prob": 0.0007130238809622824}, {"id": 1311, "seek": 344664, "start": 3470.64, "end": 3472.54, "text": " and the bias as two-dimensional", "tokens": [51565, 293, 264, 12577, 382, 732, 12, 18759, 51660], "temperature": 0.0, "avg_logprob": -0.08159881649595319, "compression_ratio": 1.7540983606557377, "no_speech_prob": 0.0007130238809622824}, {"id": 1312, "seek": 344664, "start": 3472.64, "end": 3474.54, "text": " but it doesn't really matter as long as you are consistent", "tokens": [51665, 457, 309, 1177, 380, 534, 1871, 382, 938, 382, 291, 366, 8398, 51760], "temperature": 0.0, "avg_logprob": -0.08159881649595319, "compression_ratio": 1.7540983606557377, "no_speech_prob": 0.0007130238809622824}, {"id": 1313, "seek": 344664, "start": 3474.64, "end": 3476.54, "text": " and you're keeping it the same.", "tokens": [51765, 293, 291, 434, 5145, 309, 264, 912, 13, 51860], "temperature": 0.0, "avg_logprob": -0.08159881649595319, "compression_ratio": 1.7540983606557377, "no_speech_prob": 0.0007130238809622824}, {"id": 1314, "seek": 347664, "start": 3476.64, "end": 3478.54, "text": " So in this case we want to keep the dimension", "tokens": [50365, 407, 294, 341, 1389, 321, 528, 281, 1066, 264, 10139, 50460], "temperature": 0.0, "avg_logprob": -0.07924214634326619, "compression_ratio": 1.575268817204301, "no_speech_prob": 0.00018210656708106399}, {"id": 1315, "seek": 347664, "start": 3478.64, "end": 3480.54, "text": " so that the tensor shapes work.", "tokens": [50465, 370, 300, 264, 40863, 10854, 589, 13, 50560], "temperature": 0.0, "avg_logprob": -0.07924214634326619, "compression_ratio": 1.575268817204301, "no_speech_prob": 0.00018210656708106399}, {"id": 1316, "seek": 347664, "start": 3480.64, "end": 3482.54, "text": " Next up we have", "tokens": [50565, 3087, 493, 321, 362, 50660], "temperature": 0.0, "avg_logprob": -0.07924214634326619, "compression_ratio": 1.575268817204301, "no_speech_prob": 0.00018210656708106399}, {"id": 1317, "seek": 347664, "start": 3482.64, "end": 3484.54, "text": " b in raw", "tokens": [50665, 272, 294, 8936, 50760], "temperature": 0.0, "avg_logprob": -0.07924214634326619, "compression_ratio": 1.575268817204301, "no_speech_prob": 0.00018210656708106399}, {"id": 1318, "seek": 347664, "start": 3484.64, "end": 3486.54, "text": " so db in raw will be", "tokens": [50765, 370, 274, 65, 294, 8936, 486, 312, 50860], "temperature": 0.0, "avg_logprob": -0.07924214634326619, "compression_ratio": 1.575268817204301, "no_speech_prob": 0.00018210656708106399}, {"id": 1319, "seek": 347664, "start": 3486.64, "end": 3488.54, "text": " b in gain", "tokens": [50865, 272, 294, 6052, 50960], "temperature": 0.0, "avg_logprob": -0.07924214634326619, "compression_ratio": 1.575268817204301, "no_speech_prob": 0.00018210656708106399}, {"id": 1320, "seek": 347664, "start": 3488.64, "end": 3490.54, "text": " multiplying", "tokens": [50965, 30955, 51060], "temperature": 0.0, "avg_logprob": -0.07924214634326619, "compression_ratio": 1.575268817204301, "no_speech_prob": 0.00018210656708106399}, {"id": 1321, "seek": 347664, "start": 3490.64, "end": 3492.54, "text": " dh preact", "tokens": [51065, 274, 71, 659, 578, 51160], "temperature": 0.0, "avg_logprob": -0.07924214634326619, "compression_ratio": 1.575268817204301, "no_speech_prob": 0.00018210656708106399}, {"id": 1322, "seek": 347664, "start": 3492.64, "end": 3494.54, "text": " that's our chain rule.", "tokens": [51165, 300, 311, 527, 5021, 4978, 13, 51260], "temperature": 0.0, "avg_logprob": -0.07924214634326619, "compression_ratio": 1.575268817204301, "no_speech_prob": 0.00018210656708106399}, {"id": 1323, "seek": 347664, "start": 3494.64, "end": 3496.54, "text": " Now what about the", "tokens": [51265, 823, 437, 466, 264, 51360], "temperature": 0.0, "avg_logprob": -0.07924214634326619, "compression_ratio": 1.575268817204301, "no_speech_prob": 0.00018210656708106399}, {"id": 1324, "seek": 347664, "start": 3496.64, "end": 3498.54, "text": " dimensions of this?", "tokens": [51365, 12819, 295, 341, 30, 51460], "temperature": 0.0, "avg_logprob": -0.07924214634326619, "compression_ratio": 1.575268817204301, "no_speech_prob": 0.00018210656708106399}, {"id": 1325, "seek": 347664, "start": 3498.64, "end": 3500.54, "text": " We have to be careful, right?", "tokens": [51465, 492, 362, 281, 312, 5026, 11, 558, 30, 51560], "temperature": 0.0, "avg_logprob": -0.07924214634326619, "compression_ratio": 1.575268817204301, "no_speech_prob": 0.00018210656708106399}, {"id": 1326, "seek": 347664, "start": 3500.64, "end": 3502.54, "text": " So dh preact is", "tokens": [51565, 407, 274, 71, 659, 578, 307, 51660], "temperature": 0.0, "avg_logprob": -0.07924214634326619, "compression_ratio": 1.575268817204301, "no_speech_prob": 0.00018210656708106399}, {"id": 1327, "seek": 347664, "start": 3502.64, "end": 3504.54, "text": " 32 by 64", "tokens": [51665, 8858, 538, 12145, 51760], "temperature": 0.0, "avg_logprob": -0.07924214634326619, "compression_ratio": 1.575268817204301, "no_speech_prob": 0.00018210656708106399}, {"id": 1328, "seek": 347664, "start": 3504.64, "end": 3506.54, "text": " b in gain is 1 by 64", "tokens": [51765, 272, 294, 6052, 307, 502, 538, 12145, 51860], "temperature": 0.0, "avg_logprob": -0.07924214634326619, "compression_ratio": 1.575268817204301, "no_speech_prob": 0.00018210656708106399}, {"id": 1329, "seek": 350654, "start": 3506.54, "end": 3508.44, "text": " so it will just get replicated", "tokens": [50365, 370, 309, 486, 445, 483, 46365, 50460], "temperature": 0.0, "avg_logprob": -0.07238885693084028, "compression_ratio": 1.8264150943396227, "no_speech_prob": 0.0005148784257471561}, {"id": 1330, "seek": 350654, "start": 3508.54, "end": 3510.44, "text": " to create this multiplication", "tokens": [50465, 281, 1884, 341, 27290, 50560], "temperature": 0.0, "avg_logprob": -0.07238885693084028, "compression_ratio": 1.8264150943396227, "no_speech_prob": 0.0005148784257471561}, {"id": 1331, "seek": 350654, "start": 3510.54, "end": 3512.44, "text": " which is the correct thing", "tokens": [50565, 597, 307, 264, 3006, 551, 50660], "temperature": 0.0, "avg_logprob": -0.07238885693084028, "compression_ratio": 1.8264150943396227, "no_speech_prob": 0.0005148784257471561}, {"id": 1332, "seek": 350654, "start": 3512.54, "end": 3514.44, "text": " because in a forward pass it also gets replicated", "tokens": [50665, 570, 294, 257, 2128, 1320, 309, 611, 2170, 46365, 50760], "temperature": 0.0, "avg_logprob": -0.07238885693084028, "compression_ratio": 1.8264150943396227, "no_speech_prob": 0.0005148784257471561}, {"id": 1333, "seek": 350654, "start": 3514.54, "end": 3516.44, "text": " in just the same way.", "tokens": [50765, 294, 445, 264, 912, 636, 13, 50860], "temperature": 0.0, "avg_logprob": -0.07238885693084028, "compression_ratio": 1.8264150943396227, "no_speech_prob": 0.0005148784257471561}, {"id": 1334, "seek": 350654, "start": 3516.54, "end": 3518.44, "text": " So in fact we don't need the brackets here, we're done.", "tokens": [50865, 407, 294, 1186, 321, 500, 380, 643, 264, 26179, 510, 11, 321, 434, 1096, 13, 50960], "temperature": 0.0, "avg_logprob": -0.07238885693084028, "compression_ratio": 1.8264150943396227, "no_speech_prob": 0.0005148784257471561}, {"id": 1335, "seek": 350654, "start": 3518.54, "end": 3520.44, "text": " And the shapes are already correct.", "tokens": [50965, 400, 264, 10854, 366, 1217, 3006, 13, 51060], "temperature": 0.0, "avg_logprob": -0.07238885693084028, "compression_ratio": 1.8264150943396227, "no_speech_prob": 0.0005148784257471561}, {"id": 1336, "seek": 350654, "start": 3520.54, "end": 3522.44, "text": " And finally for the bias", "tokens": [51065, 400, 2721, 337, 264, 12577, 51160], "temperature": 0.0, "avg_logprob": -0.07238885693084028, "compression_ratio": 1.8264150943396227, "no_speech_prob": 0.0005148784257471561}, {"id": 1337, "seek": 350654, "start": 3522.54, "end": 3524.44, "text": " very similar", "tokens": [51165, 588, 2531, 51260], "temperature": 0.0, "avg_logprob": -0.07238885693084028, "compression_ratio": 1.8264150943396227, "no_speech_prob": 0.0005148784257471561}, {"id": 1338, "seek": 350654, "start": 3524.54, "end": 3526.44, "text": " this bias here is very very similar", "tokens": [51265, 341, 12577, 510, 307, 588, 588, 2531, 51360], "temperature": 0.0, "avg_logprob": -0.07238885693084028, "compression_ratio": 1.8264150943396227, "no_speech_prob": 0.0005148784257471561}, {"id": 1339, "seek": 350654, "start": 3526.54, "end": 3528.44, "text": " to the bias we saw in the linear layer", "tokens": [51365, 281, 264, 12577, 321, 1866, 294, 264, 8213, 4583, 51460], "temperature": 0.0, "avg_logprob": -0.07238885693084028, "compression_ratio": 1.8264150943396227, "no_speech_prob": 0.0005148784257471561}, {"id": 1340, "seek": 350654, "start": 3528.54, "end": 3530.44, "text": " and we see that the gradients", "tokens": [51465, 293, 321, 536, 300, 264, 2771, 2448, 51560], "temperature": 0.0, "avg_logprob": -0.07238885693084028, "compression_ratio": 1.8264150943396227, "no_speech_prob": 0.0005148784257471561}, {"id": 1341, "seek": 350654, "start": 3530.54, "end": 3532.44, "text": " from h preact will simply flow", "tokens": [51565, 490, 276, 659, 578, 486, 2935, 3095, 51660], "temperature": 0.0, "avg_logprob": -0.07238885693084028, "compression_ratio": 1.8264150943396227, "no_speech_prob": 0.0005148784257471561}, {"id": 1342, "seek": 350654, "start": 3532.54, "end": 3534.44, "text": " into the biases and add up", "tokens": [51665, 666, 264, 32152, 293, 909, 493, 51760], "temperature": 0.0, "avg_logprob": -0.07238885693084028, "compression_ratio": 1.8264150943396227, "no_speech_prob": 0.0005148784257471561}, {"id": 1343, "seek": 350654, "start": 3534.54, "end": 3536.44, "text": " because these are just offsets.", "tokens": [51765, 570, 613, 366, 445, 39457, 1385, 13, 51860], "temperature": 0.0, "avg_logprob": -0.07238885693084028, "compression_ratio": 1.8264150943396227, "no_speech_prob": 0.0005148784257471561}, {"id": 1344, "seek": 353654, "start": 3536.54, "end": 3538.44, "text": " And so basically we want this to be", "tokens": [50365, 400, 370, 1936, 321, 528, 341, 281, 312, 50460], "temperature": 0.0, "avg_logprob": -0.10125953395192216, "compression_ratio": 1.7091633466135459, "no_speech_prob": 0.0004111918096896261}, {"id": 1345, "seek": 353654, "start": 3538.54, "end": 3540.44, "text": " dh preact but it needs", "tokens": [50465, 274, 71, 659, 578, 457, 309, 2203, 50560], "temperature": 0.0, "avg_logprob": -0.10125953395192216, "compression_ratio": 1.7091633466135459, "no_speech_prob": 0.0004111918096896261}, {"id": 1346, "seek": 353654, "start": 3540.54, "end": 3542.44, "text": " to sum along the right dimension", "tokens": [50565, 281, 2408, 2051, 264, 558, 10139, 50660], "temperature": 0.0, "avg_logprob": -0.10125953395192216, "compression_ratio": 1.7091633466135459, "no_speech_prob": 0.0004111918096896261}, {"id": 1347, "seek": 353654, "start": 3542.54, "end": 3544.44, "text": " and in this case similar to the gain", "tokens": [50665, 293, 294, 341, 1389, 2531, 281, 264, 6052, 50760], "temperature": 0.0, "avg_logprob": -0.10125953395192216, "compression_ratio": 1.7091633466135459, "no_speech_prob": 0.0004111918096896261}, {"id": 1348, "seek": 353654, "start": 3544.54, "end": 3546.44, "text": " we need to sum across the 0th", "tokens": [50765, 321, 643, 281, 2408, 2108, 264, 1958, 392, 50860], "temperature": 0.0, "avg_logprob": -0.10125953395192216, "compression_ratio": 1.7091633466135459, "no_speech_prob": 0.0004111918096896261}, {"id": 1349, "seek": 353654, "start": 3546.54, "end": 3548.44, "text": " dimension, the examples", "tokens": [50865, 10139, 11, 264, 5110, 50960], "temperature": 0.0, "avg_logprob": -0.10125953395192216, "compression_ratio": 1.7091633466135459, "no_speech_prob": 0.0004111918096896261}, {"id": 1350, "seek": 353654, "start": 3548.54, "end": 3550.44, "text": " because of the way that the bias gets replicated", "tokens": [50965, 570, 295, 264, 636, 300, 264, 12577, 2170, 46365, 51060], "temperature": 0.0, "avg_logprob": -0.10125953395192216, "compression_ratio": 1.7091633466135459, "no_speech_prob": 0.0004111918096896261}, {"id": 1351, "seek": 353654, "start": 3550.54, "end": 3552.44, "text": " vertically and we also", "tokens": [51065, 28450, 293, 321, 611, 51160], "temperature": 0.0, "avg_logprob": -0.10125953395192216, "compression_ratio": 1.7091633466135459, "no_speech_prob": 0.0004111918096896261}, {"id": 1352, "seek": 353654, "start": 3552.54, "end": 3554.44, "text": " want to have keep them as true.", "tokens": [51165, 528, 281, 362, 1066, 552, 382, 2074, 13, 51260], "temperature": 0.0, "avg_logprob": -0.10125953395192216, "compression_ratio": 1.7091633466135459, "no_speech_prob": 0.0004111918096896261}, {"id": 1353, "seek": 353654, "start": 3554.54, "end": 3556.44, "text": " And so this will basically take", "tokens": [51265, 400, 370, 341, 486, 1936, 747, 51360], "temperature": 0.0, "avg_logprob": -0.10125953395192216, "compression_ratio": 1.7091633466135459, "no_speech_prob": 0.0004111918096896261}, {"id": 1354, "seek": 353654, "start": 3556.54, "end": 3558.44, "text": " this and sum it up and give us", "tokens": [51365, 341, 293, 2408, 309, 493, 293, 976, 505, 51460], "temperature": 0.0, "avg_logprob": -0.10125953395192216, "compression_ratio": 1.7091633466135459, "no_speech_prob": 0.0004111918096896261}, {"id": 1355, "seek": 353654, "start": 3558.54, "end": 3560.44, "text": " a 1 by 64.", "tokens": [51465, 257, 502, 538, 12145, 13, 51560], "temperature": 0.0, "avg_logprob": -0.10125953395192216, "compression_ratio": 1.7091633466135459, "no_speech_prob": 0.0004111918096896261}, {"id": 1356, "seek": 353654, "start": 3560.54, "end": 3562.44, "text": " So this is the candidate implementation", "tokens": [51565, 407, 341, 307, 264, 11532, 11420, 51660], "temperature": 0.0, "avg_logprob": -0.10125953395192216, "compression_ratio": 1.7091633466135459, "no_speech_prob": 0.0004111918096896261}, {"id": 1357, "seek": 353654, "start": 3562.54, "end": 3564.44, "text": " it makes all the shapes work", "tokens": [51665, 309, 1669, 439, 264, 10854, 589, 51760], "temperature": 0.0, "avg_logprob": -0.10125953395192216, "compression_ratio": 1.7091633466135459, "no_speech_prob": 0.0004111918096896261}, {"id": 1358, "seek": 356444, "start": 3564.44, "end": 3566.34, "text": " let me bring it up", "tokens": [50365, 718, 385, 1565, 309, 493, 50460], "temperature": 0.0, "avg_logprob": -0.09988653059486005, "compression_ratio": 1.7330677290836654, "no_speech_prob": 0.0007118039065971971}, {"id": 1359, "seek": 356444, "start": 3566.44, "end": 3568.34, "text": " down here", "tokens": [50465, 760, 510, 50560], "temperature": 0.0, "avg_logprob": -0.09988653059486005, "compression_ratio": 1.7330677290836654, "no_speech_prob": 0.0007118039065971971}, {"id": 1360, "seek": 356444, "start": 3568.44, "end": 3570.34, "text": " and then let me uncomment these 3 lines", "tokens": [50565, 293, 550, 718, 385, 8585, 518, 613, 805, 3876, 50660], "temperature": 0.0, "avg_logprob": -0.09988653059486005, "compression_ratio": 1.7330677290836654, "no_speech_prob": 0.0007118039065971971}, {"id": 1361, "seek": 356444, "start": 3570.44, "end": 3572.34, "text": " to check that", "tokens": [50665, 281, 1520, 300, 50760], "temperature": 0.0, "avg_logprob": -0.09988653059486005, "compression_ratio": 1.7330677290836654, "no_speech_prob": 0.0007118039065971971}, {"id": 1362, "seek": 356444, "start": 3572.44, "end": 3574.34, "text": " we are getting the correct result", "tokens": [50765, 321, 366, 1242, 264, 3006, 1874, 50860], "temperature": 0.0, "avg_logprob": -0.09988653059486005, "compression_ratio": 1.7330677290836654, "no_speech_prob": 0.0007118039065971971}, {"id": 1363, "seek": 356444, "start": 3574.44, "end": 3576.34, "text": " for all the 3 tensors", "tokens": [50865, 337, 439, 264, 805, 10688, 830, 50960], "temperature": 0.0, "avg_logprob": -0.09988653059486005, "compression_ratio": 1.7330677290836654, "no_speech_prob": 0.0007118039065971971}, {"id": 1364, "seek": 356444, "start": 3576.44, "end": 3578.34, "text": " and indeed we see that all of that", "tokens": [50965, 293, 6451, 321, 536, 300, 439, 295, 300, 51060], "temperature": 0.0, "avg_logprob": -0.09988653059486005, "compression_ratio": 1.7330677290836654, "no_speech_prob": 0.0007118039065971971}, {"id": 1365, "seek": 356444, "start": 3578.44, "end": 3580.34, "text": " got backpropagated correctly.", "tokens": [51065, 658, 646, 79, 1513, 559, 770, 8944, 13, 51160], "temperature": 0.0, "avg_logprob": -0.09988653059486005, "compression_ratio": 1.7330677290836654, "no_speech_prob": 0.0007118039065971971}, {"id": 1366, "seek": 356444, "start": 3580.44, "end": 3582.34, "text": " So now we get to the batchnorm layer", "tokens": [51165, 407, 586, 321, 483, 281, 264, 15245, 13403, 4583, 51260], "temperature": 0.0, "avg_logprob": -0.09988653059486005, "compression_ratio": 1.7330677290836654, "no_speech_prob": 0.0007118039065971971}, {"id": 1367, "seek": 356444, "start": 3582.44, "end": 3584.34, "text": " we see how here bngain and bmbias", "tokens": [51265, 321, 536, 577, 510, 272, 872, 491, 293, 272, 2504, 4609, 51360], "temperature": 0.0, "avg_logprob": -0.09988653059486005, "compression_ratio": 1.7330677290836654, "no_speech_prob": 0.0007118039065971971}, {"id": 1368, "seek": 356444, "start": 3584.44, "end": 3586.34, "text": " are the primers so the backpropagation ends", "tokens": [51365, 366, 264, 2886, 433, 370, 264, 646, 79, 1513, 559, 399, 5314, 51460], "temperature": 0.0, "avg_logprob": -0.09988653059486005, "compression_ratio": 1.7330677290836654, "no_speech_prob": 0.0007118039065971971}, {"id": 1369, "seek": 356444, "start": 3586.44, "end": 3588.34, "text": " but bnraw now", "tokens": [51465, 457, 272, 77, 5131, 586, 51560], "temperature": 0.0, "avg_logprob": -0.09988653059486005, "compression_ratio": 1.7330677290836654, "no_speech_prob": 0.0007118039065971971}, {"id": 1370, "seek": 356444, "start": 3588.44, "end": 3590.34, "text": " is the output of the standardization", "tokens": [51565, 307, 264, 5598, 295, 264, 3832, 2144, 51660], "temperature": 0.0, "avg_logprob": -0.09988653059486005, "compression_ratio": 1.7330677290836654, "no_speech_prob": 0.0007118039065971971}, {"id": 1371, "seek": 356444, "start": 3590.44, "end": 3592.34, "text": " so here what I'm doing", "tokens": [51665, 370, 510, 437, 286, 478, 884, 51760], "temperature": 0.0, "avg_logprob": -0.09988653059486005, "compression_ratio": 1.7330677290836654, "no_speech_prob": 0.0007118039065971971}, {"id": 1372, "seek": 356444, "start": 3592.44, "end": 3594.34, "text": " of course is I'm breaking up the batchnorm", "tokens": [51765, 295, 1164, 307, 286, 478, 7697, 493, 264, 15245, 13403, 51860], "temperature": 0.0, "avg_logprob": -0.09988653059486005, "compression_ratio": 1.7330677290836654, "no_speech_prob": 0.0007118039065971971}, {"id": 1373, "seek": 359434, "start": 3594.34, "end": 3596.2400000000002, "text": " into manageable pieces so we can backpropagate", "tokens": [50365, 666, 38798, 3755, 370, 321, 393, 646, 79, 1513, 559, 473, 50460], "temperature": 0.0, "avg_logprob": -0.0859359077785326, "compression_ratio": 1.8205128205128205, "no_speech_prob": 0.0005510260816663504}, {"id": 1374, "seek": 359434, "start": 3596.34, "end": 3598.2400000000002, "text": " through each line individually", "tokens": [50465, 807, 1184, 1622, 16652, 50560], "temperature": 0.0, "avg_logprob": -0.0859359077785326, "compression_ratio": 1.8205128205128205, "no_speech_prob": 0.0005510260816663504}, {"id": 1375, "seek": 359434, "start": 3598.34, "end": 3600.2400000000002, "text": " but basically what's happening is", "tokens": [50565, 457, 1936, 437, 311, 2737, 307, 50660], "temperature": 0.0, "avg_logprob": -0.0859359077785326, "compression_ratio": 1.8205128205128205, "no_speech_prob": 0.0005510260816663504}, {"id": 1376, "seek": 359434, "start": 3600.34, "end": 3602.2400000000002, "text": " bnmeani is the sum", "tokens": [50665, 272, 77, 1398, 3782, 307, 264, 2408, 50760], "temperature": 0.0, "avg_logprob": -0.0859359077785326, "compression_ratio": 1.8205128205128205, "no_speech_prob": 0.0005510260816663504}, {"id": 1377, "seek": 359434, "start": 3602.34, "end": 3604.2400000000002, "text": " so this is the", "tokens": [50765, 370, 341, 307, 264, 50860], "temperature": 0.0, "avg_logprob": -0.0859359077785326, "compression_ratio": 1.8205128205128205, "no_speech_prob": 0.0005510260816663504}, {"id": 1378, "seek": 359434, "start": 3604.34, "end": 3606.2400000000002, "text": " bnmeani", "tokens": [50865, 272, 77, 1398, 3782, 50960], "temperature": 0.0, "avg_logprob": -0.0859359077785326, "compression_ratio": 1.8205128205128205, "no_speech_prob": 0.0005510260816663504}, {"id": 1379, "seek": 359434, "start": 3606.34, "end": 3608.2400000000002, "text": " I apologize for the variable naming", "tokens": [50965, 286, 12328, 337, 264, 7006, 25290, 51060], "temperature": 0.0, "avg_logprob": -0.0859359077785326, "compression_ratio": 1.8205128205128205, "no_speech_prob": 0.0005510260816663504}, {"id": 1380, "seek": 359434, "start": 3608.34, "end": 3610.2400000000002, "text": " bndiff is x minus mu", "tokens": [51065, 272, 273, 3661, 307, 2031, 3175, 2992, 51160], "temperature": 0.0, "avg_logprob": -0.0859359077785326, "compression_ratio": 1.8205128205128205, "no_speech_prob": 0.0005510260816663504}, {"id": 1381, "seek": 359434, "start": 3610.34, "end": 3612.2400000000002, "text": " bndiff2", "tokens": [51165, 272, 273, 3661, 17, 51260], "temperature": 0.0, "avg_logprob": -0.0859359077785326, "compression_ratio": 1.8205128205128205, "no_speech_prob": 0.0005510260816663504}, {"id": 1382, "seek": 359434, "start": 3612.34, "end": 3614.2400000000002, "text": " is x minus mu squared", "tokens": [51265, 307, 2031, 3175, 2992, 8889, 51360], "temperature": 0.0, "avg_logprob": -0.0859359077785326, "compression_ratio": 1.8205128205128205, "no_speech_prob": 0.0005510260816663504}, {"id": 1383, "seek": 359434, "start": 3614.34, "end": 3616.2400000000002, "text": " here inside the variance", "tokens": [51365, 510, 1854, 264, 21977, 51460], "temperature": 0.0, "avg_logprob": -0.0859359077785326, "compression_ratio": 1.8205128205128205, "no_speech_prob": 0.0005510260816663504}, {"id": 1384, "seek": 359434, "start": 3616.34, "end": 3618.2400000000002, "text": " bnvar is the variance", "tokens": [51465, 272, 77, 8517, 307, 264, 21977, 51560], "temperature": 0.0, "avg_logprob": -0.0859359077785326, "compression_ratio": 1.8205128205128205, "no_speech_prob": 0.0005510260816663504}, {"id": 1385, "seek": 359434, "start": 3618.34, "end": 3620.2400000000002, "text": " so sigma square", "tokens": [51565, 370, 12771, 3732, 51660], "temperature": 0.0, "avg_logprob": -0.0859359077785326, "compression_ratio": 1.8205128205128205, "no_speech_prob": 0.0005510260816663504}, {"id": 1386, "seek": 359434, "start": 3620.34, "end": 3622.2400000000002, "text": " this is bnvar", "tokens": [51665, 341, 307, 272, 77, 8517, 51760], "temperature": 0.0, "avg_logprob": -0.0859359077785326, "compression_ratio": 1.8205128205128205, "no_speech_prob": 0.0005510260816663504}, {"id": 1387, "seek": 359434, "start": 3622.34, "end": 3624.2400000000002, "text": " and it's basically the sum of squares", "tokens": [51765, 293, 309, 311, 1936, 264, 2408, 295, 19368, 51860], "temperature": 0.0, "avg_logprob": -0.0859359077785326, "compression_ratio": 1.8205128205128205, "no_speech_prob": 0.0005510260816663504}, {"id": 1388, "seek": 362434, "start": 3624.34, "end": 3626.2400000000002, "text": " so this is the x minus mu", "tokens": [50365, 370, 341, 307, 264, 2031, 3175, 2992, 50460], "temperature": 0.0, "avg_logprob": -0.07464762560025913, "compression_ratio": 1.7364016736401673, "no_speech_prob": 0.0009592284914106131}, {"id": 1389, "seek": 362434, "start": 3626.34, "end": 3628.2400000000002, "text": " squared", "tokens": [50465, 8889, 50560], "temperature": 0.0, "avg_logprob": -0.07464762560025913, "compression_ratio": 1.7364016736401673, "no_speech_prob": 0.0009592284914106131}, {"id": 1390, "seek": 362434, "start": 3628.34, "end": 3630.2400000000002, "text": " and then the sum", "tokens": [50565, 293, 550, 264, 2408, 50660], "temperature": 0.0, "avg_logprob": -0.07464762560025913, "compression_ratio": 1.7364016736401673, "no_speech_prob": 0.0009592284914106131}, {"id": 1391, "seek": 362434, "start": 3630.34, "end": 3632.2400000000002, "text": " now you'll notice one departure here", "tokens": [50665, 586, 291, 603, 3449, 472, 25866, 510, 50760], "temperature": 0.0, "avg_logprob": -0.07464762560025913, "compression_ratio": 1.7364016736401673, "no_speech_prob": 0.0009592284914106131}, {"id": 1392, "seek": 362434, "start": 3632.34, "end": 3634.2400000000002, "text": " here it is normalized as 1 over m", "tokens": [50765, 510, 309, 307, 48704, 382, 502, 670, 275, 50860], "temperature": 0.0, "avg_logprob": -0.07464762560025913, "compression_ratio": 1.7364016736401673, "no_speech_prob": 0.0009592284914106131}, {"id": 1393, "seek": 362434, "start": 3634.34, "end": 3636.2400000000002, "text": " which is the number of examples", "tokens": [50865, 597, 307, 264, 1230, 295, 5110, 50960], "temperature": 0.0, "avg_logprob": -0.07464762560025913, "compression_ratio": 1.7364016736401673, "no_speech_prob": 0.0009592284914106131}, {"id": 1394, "seek": 362434, "start": 3636.34, "end": 3638.2400000000002, "text": " here I'm normalizing", "tokens": [50965, 510, 286, 478, 2710, 3319, 51060], "temperature": 0.0, "avg_logprob": -0.07464762560025913, "compression_ratio": 1.7364016736401673, "no_speech_prob": 0.0009592284914106131}, {"id": 1395, "seek": 362434, "start": 3638.34, "end": 3640.2400000000002, "text": " as 1 over n minus 1 instead of m", "tokens": [51065, 382, 502, 670, 297, 3175, 502, 2602, 295, 275, 51160], "temperature": 0.0, "avg_logprob": -0.07464762560025913, "compression_ratio": 1.7364016736401673, "no_speech_prob": 0.0009592284914106131}, {"id": 1396, "seek": 362434, "start": 3640.34, "end": 3642.2400000000002, "text": " and this is deliberate and I'll come back to that", "tokens": [51165, 293, 341, 307, 30515, 293, 286, 603, 808, 646, 281, 300, 51260], "temperature": 0.0, "avg_logprob": -0.07464762560025913, "compression_ratio": 1.7364016736401673, "no_speech_prob": 0.0009592284914106131}, {"id": 1397, "seek": 362434, "start": 3642.34, "end": 3644.2400000000002, "text": " in a bit when we are at this line", "tokens": [51265, 294, 257, 857, 562, 321, 366, 412, 341, 1622, 51360], "temperature": 0.0, "avg_logprob": -0.07464762560025913, "compression_ratio": 1.7364016736401673, "no_speech_prob": 0.0009592284914106131}, {"id": 1398, "seek": 362434, "start": 3644.34, "end": 3646.2400000000002, "text": " it is something called the Bessel's correction", "tokens": [51365, 309, 307, 746, 1219, 264, 363, 47166, 311, 19984, 51460], "temperature": 0.0, "avg_logprob": -0.07464762560025913, "compression_ratio": 1.7364016736401673, "no_speech_prob": 0.0009592284914106131}, {"id": 1399, "seek": 362434, "start": 3646.34, "end": 3648.2400000000002, "text": " but this is how I want it", "tokens": [51465, 457, 341, 307, 577, 286, 528, 309, 51560], "temperature": 0.0, "avg_logprob": -0.07464762560025913, "compression_ratio": 1.7364016736401673, "no_speech_prob": 0.0009592284914106131}, {"id": 1400, "seek": 362434, "start": 3648.34, "end": 3650.2400000000002, "text": " in our case", "tokens": [51565, 294, 527, 1389, 51660], "temperature": 0.0, "avg_logprob": -0.07464762560025913, "compression_ratio": 1.7364016736401673, "no_speech_prob": 0.0009592284914106131}, {"id": 1401, "seek": 362434, "start": 3650.34, "end": 3652.2400000000002, "text": " bnvar inv", "tokens": [51665, 272, 77, 8517, 1048, 51760], "temperature": 0.0, "avg_logprob": -0.07464762560025913, "compression_ratio": 1.7364016736401673, "no_speech_prob": 0.0009592284914106131}, {"id": 1402, "seek": 362434, "start": 3652.34, "end": 3654.2400000000002, "text": " then becomes basically bnvar", "tokens": [51765, 550, 3643, 1936, 272, 77, 8517, 51860], "temperature": 0.0, "avg_logprob": -0.07464762560025913, "compression_ratio": 1.7364016736401673, "no_speech_prob": 0.0009592284914106131}, {"id": 1403, "seek": 365434, "start": 3654.34, "end": 3656.2400000000002, "text": " plus epsilon", "tokens": [50365, 1804, 17889, 50460], "temperature": 0.0, "avg_logprob": -0.10116046970173465, "compression_ratio": 1.8601036269430051, "no_speech_prob": 0.0011661205207929015}, {"id": 1404, "seek": 365434, "start": 3656.34, "end": 3658.2400000000002, "text": " epsilon is 1 negative 5", "tokens": [50465, 17889, 307, 502, 3671, 1025, 50560], "temperature": 0.0, "avg_logprob": -0.10116046970173465, "compression_ratio": 1.8601036269430051, "no_speech_prob": 0.0011661205207929015}, {"id": 1405, "seek": 365434, "start": 3658.34, "end": 3660.2400000000002, "text": " and then it's 1 over square root", "tokens": [50565, 293, 550, 309, 311, 502, 670, 3732, 5593, 50660], "temperature": 0.0, "avg_logprob": -0.10116046970173465, "compression_ratio": 1.8601036269430051, "no_speech_prob": 0.0011661205207929015}, {"id": 1406, "seek": 365434, "start": 3660.34, "end": 3662.2400000000002, "text": " is the same as raising to the power of", "tokens": [50665, 307, 264, 912, 382, 11225, 281, 264, 1347, 295, 50760], "temperature": 0.0, "avg_logprob": -0.10116046970173465, "compression_ratio": 1.8601036269430051, "no_speech_prob": 0.0011661205207929015}, {"id": 1407, "seek": 365434, "start": 3662.34, "end": 3664.2400000000002, "text": " negative 0.5", "tokens": [50765, 3671, 1958, 13, 20, 50860], "temperature": 0.0, "avg_logprob": -0.10116046970173465, "compression_ratio": 1.8601036269430051, "no_speech_prob": 0.0011661205207929015}, {"id": 1408, "seek": 365434, "start": 3664.34, "end": 3666.2400000000002, "text": " because 0.5 is square root", "tokens": [50865, 570, 1958, 13, 20, 307, 3732, 5593, 50960], "temperature": 0.0, "avg_logprob": -0.10116046970173465, "compression_ratio": 1.8601036269430051, "no_speech_prob": 0.0011661205207929015}, {"id": 1409, "seek": 365434, "start": 3666.34, "end": 3668.2400000000002, "text": " and then negative makes it 1 over square root", "tokens": [50965, 293, 550, 3671, 1669, 309, 502, 670, 3732, 5593, 51060], "temperature": 0.0, "avg_logprob": -0.10116046970173465, "compression_ratio": 1.8601036269430051, "no_speech_prob": 0.0011661205207929015}, {"id": 1410, "seek": 365434, "start": 3668.34, "end": 3670.2400000000002, "text": " so bnvar inv", "tokens": [51065, 370, 272, 77, 8517, 1048, 51160], "temperature": 0.0, "avg_logprob": -0.10116046970173465, "compression_ratio": 1.8601036269430051, "no_speech_prob": 0.0011661205207929015}, {"id": 1411, "seek": 365434, "start": 3670.34, "end": 3672.2400000000002, "text": " is 1 over this denominator here", "tokens": [51165, 307, 502, 670, 341, 20687, 510, 51260], "temperature": 0.0, "avg_logprob": -0.10116046970173465, "compression_ratio": 1.8601036269430051, "no_speech_prob": 0.0011661205207929015}, {"id": 1412, "seek": 365434, "start": 3672.34, "end": 3674.2400000000002, "text": " and then we can see that", "tokens": [51265, 293, 550, 321, 393, 536, 300, 51360], "temperature": 0.0, "avg_logprob": -0.10116046970173465, "compression_ratio": 1.8601036269430051, "no_speech_prob": 0.0011661205207929015}, {"id": 1413, "seek": 365434, "start": 3674.34, "end": 3676.2400000000002, "text": " bnraw which is the x hat here", "tokens": [51365, 272, 77, 5131, 597, 307, 264, 2031, 2385, 510, 51460], "temperature": 0.0, "avg_logprob": -0.10116046970173465, "compression_ratio": 1.8601036269430051, "no_speech_prob": 0.0011661205207929015}, {"id": 1414, "seek": 365434, "start": 3676.34, "end": 3678.2400000000002, "text": " is equal to the", "tokens": [51465, 307, 2681, 281, 264, 51560], "temperature": 0.0, "avg_logprob": -0.10116046970173465, "compression_ratio": 1.8601036269430051, "no_speech_prob": 0.0011661205207929015}, {"id": 1415, "seek": 365434, "start": 3678.34, "end": 3680.2400000000002, "text": " bndiff the numerator", "tokens": [51565, 272, 273, 3661, 264, 30380, 51660], "temperature": 0.0, "avg_logprob": -0.10116046970173465, "compression_ratio": 1.8601036269430051, "no_speech_prob": 0.0011661205207929015}, {"id": 1416, "seek": 365434, "start": 3680.34, "end": 3682.2400000000002, "text": " multiplied by the", "tokens": [51665, 17207, 538, 264, 51760], "temperature": 0.0, "avg_logprob": -0.10116046970173465, "compression_ratio": 1.8601036269430051, "no_speech_prob": 0.0011661205207929015}, {"id": 1417, "seek": 365434, "start": 3682.34, "end": 3684.2400000000002, "text": " bnvar inv", "tokens": [51765, 272, 77, 8517, 1048, 51860], "temperature": 0.0, "avg_logprob": -0.10116046970173465, "compression_ratio": 1.8601036269430051, "no_speech_prob": 0.0011661205207929015}, {"id": 1418, "seek": 368424, "start": 3684.24, "end": 3686.14, "text": " and this line here", "tokens": [50365, 293, 341, 1622, 510, 50460], "temperature": 0.0, "avg_logprob": -0.08388897326352784, "compression_ratio": 1.9313725490196079, "no_speech_prob": 0.002034225035458803}, {"id": 1419, "seek": 368424, "start": 3686.24, "end": 3688.14, "text": " that creates hpreact was the last piece", "tokens": [50465, 300, 7829, 276, 3712, 578, 390, 264, 1036, 2522, 50560], "temperature": 0.0, "avg_logprob": -0.08388897326352784, "compression_ratio": 1.9313725490196079, "no_speech_prob": 0.002034225035458803}, {"id": 1420, "seek": 368424, "start": 3688.24, "end": 3690.14, "text": " we've already backpropagated through it", "tokens": [50565, 321, 600, 1217, 646, 79, 1513, 559, 770, 807, 309, 50660], "temperature": 0.0, "avg_logprob": -0.08388897326352784, "compression_ratio": 1.9313725490196079, "no_speech_prob": 0.002034225035458803}, {"id": 1421, "seek": 368424, "start": 3690.24, "end": 3692.14, "text": " so now what we want to do", "tokens": [50665, 370, 586, 437, 321, 528, 281, 360, 50760], "temperature": 0.0, "avg_logprob": -0.08388897326352784, "compression_ratio": 1.9313725490196079, "no_speech_prob": 0.002034225035458803}, {"id": 1422, "seek": 368424, "start": 3692.24, "end": 3694.14, "text": " is we are here", "tokens": [50765, 307, 321, 366, 510, 50860], "temperature": 0.0, "avg_logprob": -0.08388897326352784, "compression_ratio": 1.9313725490196079, "no_speech_prob": 0.002034225035458803}, {"id": 1423, "seek": 368424, "start": 3694.24, "end": 3696.14, "text": " and we have bnraw", "tokens": [50865, 293, 321, 362, 272, 77, 5131, 50960], "temperature": 0.0, "avg_logprob": -0.08388897326352784, "compression_ratio": 1.9313725490196079, "no_speech_prob": 0.002034225035458803}, {"id": 1424, "seek": 368424, "start": 3696.24, "end": 3698.14, "text": " and we have to first backpropagate", "tokens": [50965, 293, 321, 362, 281, 700, 646, 79, 1513, 559, 473, 51060], "temperature": 0.0, "avg_logprob": -0.08388897326352784, "compression_ratio": 1.9313725490196079, "no_speech_prob": 0.002034225035458803}, {"id": 1425, "seek": 368424, "start": 3698.24, "end": 3700.14, "text": " into bndiff and bnvar inv", "tokens": [51065, 666, 272, 273, 3661, 293, 272, 77, 8517, 1048, 51160], "temperature": 0.0, "avg_logprob": -0.08388897326352784, "compression_ratio": 1.9313725490196079, "no_speech_prob": 0.002034225035458803}, {"id": 1426, "seek": 368424, "start": 3700.24, "end": 3702.14, "text": " so now we are here", "tokens": [51165, 370, 586, 321, 366, 510, 51260], "temperature": 0.0, "avg_logprob": -0.08388897326352784, "compression_ratio": 1.9313725490196079, "no_speech_prob": 0.002034225035458803}, {"id": 1427, "seek": 368424, "start": 3702.24, "end": 3704.14, "text": " and we have dbnraw", "tokens": [51265, 293, 321, 362, 274, 65, 77, 5131, 51360], "temperature": 0.0, "avg_logprob": -0.08388897326352784, "compression_ratio": 1.9313725490196079, "no_speech_prob": 0.002034225035458803}, {"id": 1428, "seek": 368424, "start": 3704.24, "end": 3706.14, "text": " and we need to backpropagate through this line", "tokens": [51365, 293, 321, 643, 281, 646, 79, 1513, 559, 473, 807, 341, 1622, 51460], "temperature": 0.0, "avg_logprob": -0.08388897326352784, "compression_ratio": 1.9313725490196079, "no_speech_prob": 0.002034225035458803}, {"id": 1429, "seek": 368424, "start": 3706.24, "end": 3708.14, "text": " now I've written out the shapes here", "tokens": [51465, 586, 286, 600, 3720, 484, 264, 10854, 510, 51560], "temperature": 0.0, "avg_logprob": -0.08388897326352784, "compression_ratio": 1.9313725490196079, "no_speech_prob": 0.002034225035458803}, {"id": 1430, "seek": 368424, "start": 3708.24, "end": 3710.14, "text": " and indeed", "tokens": [51565, 293, 6451, 51660], "temperature": 0.0, "avg_logprob": -0.08388897326352784, "compression_ratio": 1.9313725490196079, "no_speech_prob": 0.002034225035458803}, {"id": 1431, "seek": 368424, "start": 3710.24, "end": 3712.14, "text": " bnvar inv is a shape 1 by 64", "tokens": [51665, 272, 77, 8517, 1048, 307, 257, 3909, 502, 538, 12145, 51760], "temperature": 0.0, "avg_logprob": -0.08388897326352784, "compression_ratio": 1.9313725490196079, "no_speech_prob": 0.002034225035458803}, {"id": 1432, "seek": 368424, "start": 3712.24, "end": 3714.14, "text": " so there is a", "tokens": [51765, 370, 456, 307, 257, 51860], "temperature": 0.0, "avg_logprob": -0.08388897326352784, "compression_ratio": 1.9313725490196079, "no_speech_prob": 0.002034225035458803}, {"id": 1433, "seek": 371414, "start": 3714.14, "end": 3716.04, "text": " little bit of broadcasting happening here", "tokens": [50365, 707, 857, 295, 30024, 2737, 510, 50460], "temperature": 0.0, "avg_logprob": -0.08366194024550176, "compression_ratio": 1.7761194029850746, "no_speech_prob": 0.0008978394325822592}, {"id": 1434, "seek": 371414, "start": 3716.14, "end": 3718.04, "text": " that we have to be careful with", "tokens": [50465, 300, 321, 362, 281, 312, 5026, 365, 50560], "temperature": 0.0, "avg_logprob": -0.08366194024550176, "compression_ratio": 1.7761194029850746, "no_speech_prob": 0.0008978394325822592}, {"id": 1435, "seek": 371414, "start": 3718.14, "end": 3720.04, "text": " but it is just an elementwise simple multiplication", "tokens": [50565, 457, 309, 307, 445, 364, 4478, 3711, 2199, 27290, 50660], "temperature": 0.0, "avg_logprob": -0.08366194024550176, "compression_ratio": 1.7761194029850746, "no_speech_prob": 0.0008978394325822592}, {"id": 1436, "seek": 371414, "start": 3720.14, "end": 3722.04, "text": " by now we should be pretty comfortable with that", "tokens": [50665, 538, 586, 321, 820, 312, 1238, 4619, 365, 300, 50760], "temperature": 0.0, "avg_logprob": -0.08366194024550176, "compression_ratio": 1.7761194029850746, "no_speech_prob": 0.0008978394325822592}, {"id": 1437, "seek": 371414, "start": 3722.14, "end": 3724.04, "text": " to get dbndiff", "tokens": [50765, 281, 483, 274, 65, 273, 3661, 50860], "temperature": 0.0, "avg_logprob": -0.08366194024550176, "compression_ratio": 1.7761194029850746, "no_speech_prob": 0.0008978394325822592}, {"id": 1438, "seek": 371414, "start": 3724.14, "end": 3726.04, "text": " we know that this is just", "tokens": [50865, 321, 458, 300, 341, 307, 445, 50960], "temperature": 0.0, "avg_logprob": -0.08366194024550176, "compression_ratio": 1.7761194029850746, "no_speech_prob": 0.0008978394325822592}, {"id": 1439, "seek": 371414, "start": 3726.14, "end": 3728.04, "text": " bnvar inv multiplied with", "tokens": [50965, 272, 77, 8517, 1048, 17207, 365, 51060], "temperature": 0.0, "avg_logprob": -0.08366194024550176, "compression_ratio": 1.7761194029850746, "no_speech_prob": 0.0008978394325822592}, {"id": 1440, "seek": 371414, "start": 3728.14, "end": 3730.04, "text": " dbnraw", "tokens": [51065, 274, 65, 77, 5131, 51160], "temperature": 0.0, "avg_logprob": -0.08366194024550176, "compression_ratio": 1.7761194029850746, "no_speech_prob": 0.0008978394325822592}, {"id": 1441, "seek": 371414, "start": 3730.14, "end": 3732.04, "text": " and conversely", "tokens": [51165, 293, 2615, 736, 51260], "temperature": 0.0, "avg_logprob": -0.08366194024550176, "compression_ratio": 1.7761194029850746, "no_speech_prob": 0.0008978394325822592}, {"id": 1442, "seek": 371414, "start": 3732.14, "end": 3734.04, "text": " to get dbnvar inv", "tokens": [51265, 281, 483, 274, 65, 77, 8517, 1048, 51360], "temperature": 0.0, "avg_logprob": -0.08366194024550176, "compression_ratio": 1.7761194029850746, "no_speech_prob": 0.0008978394325822592}, {"id": 1443, "seek": 371414, "start": 3734.14, "end": 3736.04, "text": " we need to take", "tokens": [51365, 321, 643, 281, 747, 51460], "temperature": 0.0, "avg_logprob": -0.08366194024550176, "compression_ratio": 1.7761194029850746, "no_speech_prob": 0.0008978394325822592}, {"id": 1444, "seek": 371414, "start": 3736.14, "end": 3738.04, "text": " bndiff", "tokens": [51465, 272, 273, 3661, 51560], "temperature": 0.0, "avg_logprob": -0.08366194024550176, "compression_ratio": 1.7761194029850746, "no_speech_prob": 0.0008978394325822592}, {"id": 1445, "seek": 371414, "start": 3738.14, "end": 3740.04, "text": " and multiply that by dbnraw", "tokens": [51565, 293, 12972, 300, 538, 274, 65, 77, 5131, 51660], "temperature": 0.0, "avg_logprob": -0.08366194024550176, "compression_ratio": 1.7761194029850746, "no_speech_prob": 0.0008978394325822592}, {"id": 1446, "seek": 371414, "start": 3742.14, "end": 3744.04, "text": " so this is the candidate", "tokens": [51765, 370, 341, 307, 264, 11532, 51860], "temperature": 0.0, "avg_logprob": -0.08366194024550176, "compression_ratio": 1.7761194029850746, "no_speech_prob": 0.0008978394325822592}, {"id": 1447, "seek": 374404, "start": 3744.04, "end": 3745.94, "text": " but of course", "tokens": [50365, 457, 295, 1164, 50460], "temperature": 0.0, "avg_logprob": -0.08137410147148266, "compression_ratio": 1.7445652173913044, "no_speech_prob": 0.0005707687814719975}, {"id": 1448, "seek": 374404, "start": 3746.04, "end": 3747.94, "text": " we need to make sure that broadcasting is obeyed", "tokens": [50465, 321, 643, 281, 652, 988, 300, 30024, 307, 19297, 292, 50560], "temperature": 0.0, "avg_logprob": -0.08137410147148266, "compression_ratio": 1.7445652173913044, "no_speech_prob": 0.0005707687814719975}, {"id": 1449, "seek": 374404, "start": 3748.04, "end": 3749.94, "text": " so in particular", "tokens": [50565, 370, 294, 1729, 50660], "temperature": 0.0, "avg_logprob": -0.08137410147148266, "compression_ratio": 1.7445652173913044, "no_speech_prob": 0.0005707687814719975}, {"id": 1450, "seek": 374404, "start": 3750.04, "end": 3751.94, "text": " bnvar inv multiplying with dbnraw", "tokens": [50665, 272, 77, 8517, 1048, 30955, 365, 274, 65, 77, 5131, 50760], "temperature": 0.0, "avg_logprob": -0.08137410147148266, "compression_ratio": 1.7445652173913044, "no_speech_prob": 0.0005707687814719975}, {"id": 1451, "seek": 374404, "start": 3752.04, "end": 3753.94, "text": " will be okay", "tokens": [50765, 486, 312, 1392, 50860], "temperature": 0.0, "avg_logprob": -0.08137410147148266, "compression_ratio": 1.7445652173913044, "no_speech_prob": 0.0005707687814719975}, {"id": 1452, "seek": 374404, "start": 3754.04, "end": 3755.94, "text": " and give us 32 by 64 as we expect", "tokens": [50865, 293, 976, 505, 8858, 538, 12145, 382, 321, 2066, 50960], "temperature": 0.0, "avg_logprob": -0.08137410147148266, "compression_ratio": 1.7445652173913044, "no_speech_prob": 0.0005707687814719975}, {"id": 1453, "seek": 374404, "start": 3756.04, "end": 3757.94, "text": " but dbnvar inv", "tokens": [50965, 457, 274, 65, 77, 8517, 1048, 51060], "temperature": 0.0, "avg_logprob": -0.08137410147148266, "compression_ratio": 1.7445652173913044, "no_speech_prob": 0.0005707687814719975}, {"id": 1454, "seek": 374404, "start": 3758.04, "end": 3759.94, "text": " would be taking", "tokens": [51065, 576, 312, 1940, 51160], "temperature": 0.0, "avg_logprob": -0.08137410147148266, "compression_ratio": 1.7445652173913044, "no_speech_prob": 0.0005707687814719975}, {"id": 1455, "seek": 374404, "start": 3760.04, "end": 3761.94, "text": " a 32 by 64", "tokens": [51165, 257, 8858, 538, 12145, 51260], "temperature": 0.0, "avg_logprob": -0.08137410147148266, "compression_ratio": 1.7445652173913044, "no_speech_prob": 0.0005707687814719975}, {"id": 1456, "seek": 374404, "start": 3762.04, "end": 3763.94, "text": " multiplying it by", "tokens": [51265, 30955, 309, 538, 51360], "temperature": 0.0, "avg_logprob": -0.08137410147148266, "compression_ratio": 1.7445652173913044, "no_speech_prob": 0.0005707687814719975}, {"id": 1457, "seek": 374404, "start": 3764.04, "end": 3765.94, "text": " 32 by 64", "tokens": [51365, 8858, 538, 12145, 51460], "temperature": 0.0, "avg_logprob": -0.08137410147148266, "compression_ratio": 1.7445652173913044, "no_speech_prob": 0.0005707687814719975}, {"id": 1458, "seek": 374404, "start": 3766.04, "end": 3767.94, "text": " so this is a 32 by 64", "tokens": [51465, 370, 341, 307, 257, 8858, 538, 12145, 51560], "temperature": 0.0, "avg_logprob": -0.08137410147148266, "compression_ratio": 1.7445652173913044, "no_speech_prob": 0.0005707687814719975}, {"id": 1459, "seek": 374404, "start": 3768.04, "end": 3769.94, "text": " but of course this bnvar inv", "tokens": [51565, 457, 295, 1164, 341, 272, 77, 8517, 1048, 51660], "temperature": 0.0, "avg_logprob": -0.08137410147148266, "compression_ratio": 1.7445652173913044, "no_speech_prob": 0.0005707687814719975}, {"id": 1460, "seek": 374404, "start": 3770.04, "end": 3771.94, "text": " is only 1 by 64", "tokens": [51665, 307, 787, 502, 538, 12145, 51760], "temperature": 0.0, "avg_logprob": -0.08137410147148266, "compression_ratio": 1.7445652173913044, "no_speech_prob": 0.0005707687814719975}, {"id": 1461, "seek": 374404, "start": 3772.04, "end": 3773.94, "text": " so this second line here", "tokens": [51765, 370, 341, 1150, 1622, 510, 51860], "temperature": 0.0, "avg_logprob": -0.08137410147148266, "compression_ratio": 1.7445652173913044, "no_speech_prob": 0.0005707687814719975}, {"id": 1462, "seek": 377394, "start": 3773.94, "end": 3775.84, "text": " needs a sum across the examples", "tokens": [50365, 2203, 257, 2408, 2108, 264, 5110, 50460], "temperature": 0.0, "avg_logprob": -0.07127812504768372, "compression_ratio": 1.72, "no_speech_prob": 0.0002474517095834017}, {"id": 1463, "seek": 377394, "start": 3775.94, "end": 3777.84, "text": " and because there's this", "tokens": [50465, 293, 570, 456, 311, 341, 50560], "temperature": 0.0, "avg_logprob": -0.07127812504768372, "compression_ratio": 1.72, "no_speech_prob": 0.0002474517095834017}, {"id": 1464, "seek": 377394, "start": 3777.94, "end": 3779.84, "text": " dimension here we need to make sure that", "tokens": [50565, 10139, 510, 321, 643, 281, 652, 988, 300, 50660], "temperature": 0.0, "avg_logprob": -0.07127812504768372, "compression_ratio": 1.72, "no_speech_prob": 0.0002474517095834017}, {"id": 1465, "seek": 377394, "start": 3779.94, "end": 3781.84, "text": " keep them is true", "tokens": [50665, 1066, 552, 307, 2074, 50760], "temperature": 0.0, "avg_logprob": -0.07127812504768372, "compression_ratio": 1.72, "no_speech_prob": 0.0002474517095834017}, {"id": 1466, "seek": 377394, "start": 3781.94, "end": 3783.84, "text": " so this is the candidate", "tokens": [50765, 370, 341, 307, 264, 11532, 50860], "temperature": 0.0, "avg_logprob": -0.07127812504768372, "compression_ratio": 1.72, "no_speech_prob": 0.0002474517095834017}, {"id": 1467, "seek": 377394, "start": 3783.94, "end": 3785.84, "text": " let's erase this", "tokens": [50865, 718, 311, 23525, 341, 50960], "temperature": 0.0, "avg_logprob": -0.07127812504768372, "compression_ratio": 1.72, "no_speech_prob": 0.0002474517095834017}, {"id": 1468, "seek": 377394, "start": 3785.94, "end": 3787.84, "text": " and let's swing down here", "tokens": [50965, 293, 718, 311, 11173, 760, 510, 51060], "temperature": 0.0, "avg_logprob": -0.07127812504768372, "compression_ratio": 1.72, "no_speech_prob": 0.0002474517095834017}, {"id": 1469, "seek": 377394, "start": 3787.94, "end": 3789.84, "text": " and implement it", "tokens": [51065, 293, 4445, 309, 51160], "temperature": 0.0, "avg_logprob": -0.07127812504768372, "compression_ratio": 1.72, "no_speech_prob": 0.0002474517095834017}, {"id": 1470, "seek": 377394, "start": 3789.94, "end": 3791.84, "text": " and then let's comment out", "tokens": [51165, 293, 550, 718, 311, 2871, 484, 51260], "temperature": 0.0, "avg_logprob": -0.07127812504768372, "compression_ratio": 1.72, "no_speech_prob": 0.0002474517095834017}, {"id": 1471, "seek": 377394, "start": 3791.94, "end": 3793.84, "text": " dbnvar inv", "tokens": [51265, 274, 65, 77, 8517, 1048, 51360], "temperature": 0.0, "avg_logprob": -0.07127812504768372, "compression_ratio": 1.72, "no_speech_prob": 0.0002474517095834017}, {"id": 1472, "seek": 377394, "start": 3793.94, "end": 3795.84, "text": " and dbndiff", "tokens": [51365, 293, 274, 65, 273, 3661, 51460], "temperature": 0.0, "avg_logprob": -0.07127812504768372, "compression_ratio": 1.72, "no_speech_prob": 0.0002474517095834017}, {"id": 1473, "seek": 377394, "start": 3795.94, "end": 3797.84, "text": " now we'll actually notice", "tokens": [51465, 586, 321, 603, 767, 3449, 51560], "temperature": 0.0, "avg_logprob": -0.07127812504768372, "compression_ratio": 1.72, "no_speech_prob": 0.0002474517095834017}, {"id": 1474, "seek": 377394, "start": 3797.94, "end": 3799.84, "text": " that dbndiff by the way", "tokens": [51565, 300, 274, 65, 273, 3661, 538, 264, 636, 51660], "temperature": 0.0, "avg_logprob": -0.07127812504768372, "compression_ratio": 1.72, "no_speech_prob": 0.0002474517095834017}, {"id": 1475, "seek": 377394, "start": 3799.94, "end": 3801.84, "text": " is going to be incorrect", "tokens": [51665, 307, 516, 281, 312, 18424, 51760], "temperature": 0.0, "avg_logprob": -0.07127812504768372, "compression_ratio": 1.72, "no_speech_prob": 0.0002474517095834017}, {"id": 1476, "seek": 377394, "start": 3801.94, "end": 3803.84, "text": " so when I run this", "tokens": [51765, 370, 562, 286, 1190, 341, 51860], "temperature": 0.0, "avg_logprob": -0.07127812504768372, "compression_ratio": 1.72, "no_speech_prob": 0.0002474517095834017}, {"id": 1477, "seek": 380384, "start": 3803.84, "end": 3805.7400000000002, "text": " bnvar inv is correct", "tokens": [50365, 272, 77, 8517, 1048, 307, 3006, 50460], "temperature": 0.0, "avg_logprob": -0.05908088250593706, "compression_ratio": 2.015, "no_speech_prob": 0.00043367300531826913}, {"id": 1478, "seek": 380384, "start": 3805.84, "end": 3807.7400000000002, "text": " bndiff is not correct", "tokens": [50465, 272, 273, 3661, 307, 406, 3006, 50560], "temperature": 0.0, "avg_logprob": -0.05908088250593706, "compression_ratio": 2.015, "no_speech_prob": 0.00043367300531826913}, {"id": 1479, "seek": 380384, "start": 3807.84, "end": 3809.7400000000002, "text": " and this is actually expected", "tokens": [50565, 293, 341, 307, 767, 5176, 50660], "temperature": 0.0, "avg_logprob": -0.05908088250593706, "compression_ratio": 2.015, "no_speech_prob": 0.00043367300531826913}, {"id": 1480, "seek": 380384, "start": 3809.84, "end": 3811.7400000000002, "text": " because we're not done", "tokens": [50665, 570, 321, 434, 406, 1096, 50760], "temperature": 0.0, "avg_logprob": -0.05908088250593706, "compression_ratio": 2.015, "no_speech_prob": 0.00043367300531826913}, {"id": 1481, "seek": 380384, "start": 3811.84, "end": 3813.7400000000002, "text": " with bndiff", "tokens": [50765, 365, 272, 273, 3661, 50860], "temperature": 0.0, "avg_logprob": -0.05908088250593706, "compression_ratio": 2.015, "no_speech_prob": 0.00043367300531826913}, {"id": 1482, "seek": 380384, "start": 3813.84, "end": 3815.7400000000002, "text": " so in particular when we slide here", "tokens": [50865, 370, 294, 1729, 562, 321, 4137, 510, 50960], "temperature": 0.0, "avg_logprob": -0.05908088250593706, "compression_ratio": 2.015, "no_speech_prob": 0.00043367300531826913}, {"id": 1483, "seek": 380384, "start": 3815.84, "end": 3817.7400000000002, "text": " we see here that bnraw is a function of bndiff", "tokens": [50965, 321, 536, 510, 300, 272, 77, 5131, 307, 257, 2445, 295, 272, 273, 3661, 51060], "temperature": 0.0, "avg_logprob": -0.05908088250593706, "compression_ratio": 2.015, "no_speech_prob": 0.00043367300531826913}, {"id": 1484, "seek": 380384, "start": 3817.84, "end": 3819.7400000000002, "text": " but actually bnvar inv", "tokens": [51065, 457, 767, 272, 77, 8517, 1048, 51160], "temperature": 0.0, "avg_logprob": -0.05908088250593706, "compression_ratio": 2.015, "no_speech_prob": 0.00043367300531826913}, {"id": 1485, "seek": 380384, "start": 3819.84, "end": 3821.7400000000002, "text": " is a function of bnvar", "tokens": [51165, 307, 257, 2445, 295, 272, 77, 8517, 51260], "temperature": 0.0, "avg_logprob": -0.05908088250593706, "compression_ratio": 2.015, "no_speech_prob": 0.00043367300531826913}, {"id": 1486, "seek": 380384, "start": 3821.84, "end": 3823.7400000000002, "text": " which is a function of bndiff too", "tokens": [51265, 597, 307, 257, 2445, 295, 272, 273, 3661, 886, 51360], "temperature": 0.0, "avg_logprob": -0.05908088250593706, "compression_ratio": 2.015, "no_speech_prob": 0.00043367300531826913}, {"id": 1487, "seek": 380384, "start": 3823.84, "end": 3825.7400000000002, "text": " which is a function of bndiff", "tokens": [51365, 597, 307, 257, 2445, 295, 272, 273, 3661, 51460], "temperature": 0.0, "avg_logprob": -0.05908088250593706, "compression_ratio": 2.015, "no_speech_prob": 0.00043367300531826913}, {"id": 1488, "seek": 380384, "start": 3825.84, "end": 3827.7400000000002, "text": " so it comes here", "tokens": [51465, 370, 309, 1487, 510, 51560], "temperature": 0.0, "avg_logprob": -0.05908088250593706, "compression_ratio": 2.015, "no_speech_prob": 0.00043367300531826913}, {"id": 1489, "seek": 380384, "start": 3827.84, "end": 3829.7400000000002, "text": " so bdndiff", "tokens": [51565, 370, 272, 67, 273, 3661, 51660], "temperature": 0.0, "avg_logprob": -0.05908088250593706, "compression_ratio": 2.015, "no_speech_prob": 0.00043367300531826913}, {"id": 1490, "seek": 380384, "start": 3829.84, "end": 3831.7400000000002, "text": " these variable names are crazy I'm sorry", "tokens": [51665, 613, 7006, 5288, 366, 3219, 286, 478, 2597, 51760], "temperature": 0.0, "avg_logprob": -0.05908088250593706, "compression_ratio": 2.015, "no_speech_prob": 0.00043367300531826913}, {"id": 1491, "seek": 380384, "start": 3831.84, "end": 3833.7400000000002, "text": " it branches out into two branches", "tokens": [51765, 309, 14770, 484, 666, 732, 14770, 51860], "temperature": 0.0, "avg_logprob": -0.05908088250593706, "compression_ratio": 2.015, "no_speech_prob": 0.00043367300531826913}, {"id": 1492, "seek": 383374, "start": 3833.74, "end": 3835.64, "text": " we've only done one branch of it", "tokens": [50365, 321, 600, 787, 1096, 472, 9819, 295, 309, 50460], "temperature": 0.0, "avg_logprob": -0.0598001776274687, "compression_ratio": 1.802547770700637, "no_speech_prob": 0.0007826514192856848}, {"id": 1493, "seek": 383374, "start": 3835.74, "end": 3837.64, "text": " we have to continue our backpropagation", "tokens": [50465, 321, 362, 281, 2354, 527, 646, 79, 1513, 559, 399, 50560], "temperature": 0.0, "avg_logprob": -0.0598001776274687, "compression_ratio": 1.802547770700637, "no_speech_prob": 0.0007826514192856848}, {"id": 1494, "seek": 383374, "start": 3837.74, "end": 3839.64, "text": " and eventually come back to bndiff", "tokens": [50565, 293, 4728, 808, 646, 281, 272, 273, 3661, 50660], "temperature": 0.0, "avg_logprob": -0.0598001776274687, "compression_ratio": 1.802547770700637, "no_speech_prob": 0.0007826514192856848}, {"id": 1495, "seek": 383374, "start": 3839.74, "end": 3841.64, "text": " and then we'll be able to do a plus equals", "tokens": [50665, 293, 550, 321, 603, 312, 1075, 281, 360, 257, 1804, 6915, 50760], "temperature": 0.0, "avg_logprob": -0.0598001776274687, "compression_ratio": 1.802547770700637, "no_speech_prob": 0.0007826514192856848}, {"id": 1496, "seek": 383374, "start": 3841.74, "end": 3843.64, "text": " and get the actual correct gradient", "tokens": [50765, 293, 483, 264, 3539, 3006, 16235, 50860], "temperature": 0.0, "avg_logprob": -0.0598001776274687, "compression_ratio": 1.802547770700637, "no_speech_prob": 0.0007826514192856848}, {"id": 1497, "seek": 383374, "start": 3843.74, "end": 3845.64, "text": " for now it is good to verify that cmp also works", "tokens": [50865, 337, 586, 309, 307, 665, 281, 16888, 300, 269, 2455, 611, 1985, 50960], "temperature": 0.0, "avg_logprob": -0.0598001776274687, "compression_ratio": 1.802547770700637, "no_speech_prob": 0.0007826514192856848}, {"id": 1498, "seek": 383374, "start": 3845.74, "end": 3847.64, "text": " it doesn't just lie to us", "tokens": [50965, 309, 1177, 380, 445, 4544, 281, 505, 51060], "temperature": 0.0, "avg_logprob": -0.0598001776274687, "compression_ratio": 1.802547770700637, "no_speech_prob": 0.0007826514192856848}, {"id": 1499, "seek": 383374, "start": 3847.74, "end": 3849.64, "text": " and tell us that everything is always correct", "tokens": [51065, 293, 980, 505, 300, 1203, 307, 1009, 3006, 51160], "temperature": 0.0, "avg_logprob": -0.0598001776274687, "compression_ratio": 1.802547770700637, "no_speech_prob": 0.0007826514192856848}, {"id": 1500, "seek": 383374, "start": 3849.74, "end": 3851.64, "text": " it can in fact detect when your", "tokens": [51165, 309, 393, 294, 1186, 5531, 562, 428, 51260], "temperature": 0.0, "avg_logprob": -0.0598001776274687, "compression_ratio": 1.802547770700637, "no_speech_prob": 0.0007826514192856848}, {"id": 1501, "seek": 383374, "start": 3851.74, "end": 3853.64, "text": " gradient is not correct", "tokens": [51265, 16235, 307, 406, 3006, 51360], "temperature": 0.0, "avg_logprob": -0.0598001776274687, "compression_ratio": 1.802547770700637, "no_speech_prob": 0.0007826514192856848}, {"id": 1502, "seek": 383374, "start": 3853.74, "end": 3855.64, "text": " so that's good to see as well", "tokens": [51365, 370, 300, 311, 665, 281, 536, 382, 731, 51460], "temperature": 0.0, "avg_logprob": -0.0598001776274687, "compression_ratio": 1.802547770700637, "no_speech_prob": 0.0007826514192856848}, {"id": 1503, "seek": 383374, "start": 3855.74, "end": 3857.64, "text": " okay so now we have the derivative here", "tokens": [51465, 1392, 370, 586, 321, 362, 264, 13760, 510, 51560], "temperature": 0.0, "avg_logprob": -0.0598001776274687, "compression_ratio": 1.802547770700637, "no_speech_prob": 0.0007826514192856848}, {"id": 1504, "seek": 383374, "start": 3857.74, "end": 3859.64, "text": " and we're trying to backpropagate through this line", "tokens": [51565, 293, 321, 434, 1382, 281, 646, 79, 1513, 559, 473, 807, 341, 1622, 51660], "temperature": 0.0, "avg_logprob": -0.0598001776274687, "compression_ratio": 1.802547770700637, "no_speech_prob": 0.0007826514192856848}, {"id": 1505, "seek": 383374, "start": 3859.74, "end": 3861.64, "text": " and because we're raising to a power of negative 0.5", "tokens": [51665, 293, 570, 321, 434, 11225, 281, 257, 1347, 295, 3671, 1958, 13, 20, 51760], "temperature": 0.0, "avg_logprob": -0.0598001776274687, "compression_ratio": 1.802547770700637, "no_speech_prob": 0.0007826514192856848}, {"id": 1506, "seek": 383374, "start": 3861.74, "end": 3863.64, "text": " I brought up the power rule", "tokens": [51765, 286, 3038, 493, 264, 1347, 4978, 51860], "temperature": 0.0, "avg_logprob": -0.0598001776274687, "compression_ratio": 1.802547770700637, "no_speech_prob": 0.0007826514192856848}, {"id": 1507, "seek": 386364, "start": 3863.64, "end": 3865.54, "text": " and we see that basically we have that", "tokens": [50365, 293, 321, 536, 300, 1936, 321, 362, 300, 50460], "temperature": 0.0, "avg_logprob": -0.07744662723844013, "compression_ratio": 1.7489711934156378, "no_speech_prob": 0.0008170650689862669}, {"id": 1508, "seek": 386364, "start": 3865.64, "end": 3867.54, "text": " the bnvar will now be", "tokens": [50465, 264, 272, 77, 8517, 486, 586, 312, 50560], "temperature": 0.0, "avg_logprob": -0.07744662723844013, "compression_ratio": 1.7489711934156378, "no_speech_prob": 0.0008170650689862669}, {"id": 1509, "seek": 386364, "start": 3867.64, "end": 3869.54, "text": " we bring down the exponent", "tokens": [50565, 321, 1565, 760, 264, 37871, 50660], "temperature": 0.0, "avg_logprob": -0.07744662723844013, "compression_ratio": 1.7489711934156378, "no_speech_prob": 0.0008170650689862669}, {"id": 1510, "seek": 386364, "start": 3869.64, "end": 3871.54, "text": " so negative 0.5 times x", "tokens": [50665, 370, 3671, 1958, 13, 20, 1413, 2031, 50760], "temperature": 0.0, "avg_logprob": -0.07744662723844013, "compression_ratio": 1.7489711934156378, "no_speech_prob": 0.0008170650689862669}, {"id": 1511, "seek": 386364, "start": 3871.64, "end": 3873.54, "text": " which is this", "tokens": [50765, 597, 307, 341, 50860], "temperature": 0.0, "avg_logprob": -0.07744662723844013, "compression_ratio": 1.7489711934156378, "no_speech_prob": 0.0008170650689862669}, {"id": 1512, "seek": 386364, "start": 3873.64, "end": 3875.54, "text": " and now raised to the power of", "tokens": [50865, 293, 586, 6005, 281, 264, 1347, 295, 50960], "temperature": 0.0, "avg_logprob": -0.07744662723844013, "compression_ratio": 1.7489711934156378, "no_speech_prob": 0.0008170650689862669}, {"id": 1513, "seek": 386364, "start": 3875.64, "end": 3877.54, "text": " negative 0.5 minus 1", "tokens": [50965, 3671, 1958, 13, 20, 3175, 502, 51060], "temperature": 0.0, "avg_logprob": -0.07744662723844013, "compression_ratio": 1.7489711934156378, "no_speech_prob": 0.0008170650689862669}, {"id": 1514, "seek": 386364, "start": 3877.64, "end": 3879.54, "text": " which is negative 1.5", "tokens": [51065, 597, 307, 3671, 502, 13, 20, 51160], "temperature": 0.0, "avg_logprob": -0.07744662723844013, "compression_ratio": 1.7489711934156378, "no_speech_prob": 0.0008170650689862669}, {"id": 1515, "seek": 386364, "start": 3879.64, "end": 3881.54, "text": " now we would have to also apply", "tokens": [51165, 586, 321, 576, 362, 281, 611, 3079, 51260], "temperature": 0.0, "avg_logprob": -0.07744662723844013, "compression_ratio": 1.7489711934156378, "no_speech_prob": 0.0008170650689862669}, {"id": 1516, "seek": 386364, "start": 3881.64, "end": 3883.54, "text": " a small chain rule here in our head", "tokens": [51265, 257, 1359, 5021, 4978, 510, 294, 527, 1378, 51360], "temperature": 0.0, "avg_logprob": -0.07744662723844013, "compression_ratio": 1.7489711934156378, "no_speech_prob": 0.0008170650689862669}, {"id": 1517, "seek": 386364, "start": 3883.64, "end": 3885.54, "text": " because we need to take further", "tokens": [51365, 570, 321, 643, 281, 747, 3052, 51460], "temperature": 0.0, "avg_logprob": -0.07744662723844013, "compression_ratio": 1.7489711934156378, "no_speech_prob": 0.0008170650689862669}, {"id": 1518, "seek": 386364, "start": 3885.64, "end": 3887.54, "text": " the derivative of bnvar", "tokens": [51465, 264, 13760, 295, 272, 77, 8517, 51560], "temperature": 0.0, "avg_logprob": -0.07744662723844013, "compression_ratio": 1.7489711934156378, "no_speech_prob": 0.0008170650689862669}, {"id": 1519, "seek": 386364, "start": 3887.64, "end": 3889.54, "text": " with respect to this expression here", "tokens": [51565, 365, 3104, 281, 341, 6114, 510, 51660], "temperature": 0.0, "avg_logprob": -0.07744662723844013, "compression_ratio": 1.7489711934156378, "no_speech_prob": 0.0008170650689862669}, {"id": 1520, "seek": 386364, "start": 3889.64, "end": 3891.54, "text": " inside the bracket", "tokens": [51665, 1854, 264, 16904, 51760], "temperature": 0.0, "avg_logprob": -0.07744662723844013, "compression_ratio": 1.7489711934156378, "no_speech_prob": 0.0008170650689862669}, {"id": 1521, "seek": 386364, "start": 3891.64, "end": 3893.54, "text": " but because this is an element-wise operation", "tokens": [51765, 457, 570, 341, 307, 364, 4478, 12, 3711, 6916, 51860], "temperature": 0.0, "avg_logprob": -0.07744662723844013, "compression_ratio": 1.7489711934156378, "no_speech_prob": 0.0008170650689862669}, {"id": 1522, "seek": 389354, "start": 3893.54, "end": 3895.44, "text": " everything is fairly simple", "tokens": [50365, 1203, 307, 6457, 2199, 50460], "temperature": 0.0, "avg_logprob": -0.06547782177061547, "compression_ratio": 1.810483870967742, "no_speech_prob": 0.0008935903897508979}, {"id": 1523, "seek": 389354, "start": 3895.54, "end": 3897.44, "text": " that's just one", "tokens": [50465, 300, 311, 445, 472, 50560], "temperature": 0.0, "avg_logprob": -0.06547782177061547, "compression_ratio": 1.810483870967742, "no_speech_prob": 0.0008935903897508979}, {"id": 1524, "seek": 389354, "start": 3897.54, "end": 3899.44, "text": " and so there's nothing to do there", "tokens": [50565, 293, 370, 456, 311, 1825, 281, 360, 456, 50660], "temperature": 0.0, "avg_logprob": -0.06547782177061547, "compression_ratio": 1.810483870967742, "no_speech_prob": 0.0008935903897508979}, {"id": 1525, "seek": 389354, "start": 3899.54, "end": 3901.44, "text": " so this is the local derivative", "tokens": [50665, 370, 341, 307, 264, 2654, 13760, 50760], "temperature": 0.0, "avg_logprob": -0.06547782177061547, "compression_ratio": 1.810483870967742, "no_speech_prob": 0.0008935903897508979}, {"id": 1526, "seek": 389354, "start": 3901.54, "end": 3903.44, "text": " and then times the global derivative", "tokens": [50765, 293, 550, 1413, 264, 4338, 13760, 50860], "temperature": 0.0, "avg_logprob": -0.06547782177061547, "compression_ratio": 1.810483870967742, "no_speech_prob": 0.0008935903897508979}, {"id": 1527, "seek": 389354, "start": 3903.54, "end": 3905.44, "text": " to create the chain rule", "tokens": [50865, 281, 1884, 264, 5021, 4978, 50960], "temperature": 0.0, "avg_logprob": -0.06547782177061547, "compression_ratio": 1.810483870967742, "no_speech_prob": 0.0008935903897508979}, {"id": 1528, "seek": 389354, "start": 3905.54, "end": 3907.44, "text": " this is just times the bnvar", "tokens": [50965, 341, 307, 445, 1413, 264, 272, 77, 8517, 51060], "temperature": 0.0, "avg_logprob": -0.06547782177061547, "compression_ratio": 1.810483870967742, "no_speech_prob": 0.0008935903897508979}, {"id": 1529, "seek": 389354, "start": 3907.54, "end": 3909.44, "text": " so this is our candidate", "tokens": [51065, 370, 341, 307, 527, 11532, 51160], "temperature": 0.0, "avg_logprob": -0.06547782177061547, "compression_ratio": 1.810483870967742, "no_speech_prob": 0.0008935903897508979}, {"id": 1530, "seek": 389354, "start": 3909.54, "end": 3911.44, "text": " let me bring this down", "tokens": [51165, 718, 385, 1565, 341, 760, 51260], "temperature": 0.0, "avg_logprob": -0.06547782177061547, "compression_ratio": 1.810483870967742, "no_speech_prob": 0.0008935903897508979}, {"id": 1531, "seek": 389354, "start": 3911.54, "end": 3913.44, "text": " and uncomment the check", "tokens": [51265, 293, 8585, 518, 264, 1520, 51360], "temperature": 0.0, "avg_logprob": -0.06547782177061547, "compression_ratio": 1.810483870967742, "no_speech_prob": 0.0008935903897508979}, {"id": 1532, "seek": 389354, "start": 3913.54, "end": 3915.44, "text": " and we see that", "tokens": [51365, 293, 321, 536, 300, 51460], "temperature": 0.0, "avg_logprob": -0.06547782177061547, "compression_ratio": 1.810483870967742, "no_speech_prob": 0.0008935903897508979}, {"id": 1533, "seek": 389354, "start": 3915.54, "end": 3917.44, "text": " we have the correct result", "tokens": [51465, 321, 362, 264, 3006, 1874, 51560], "temperature": 0.0, "avg_logprob": -0.06547782177061547, "compression_ratio": 1.810483870967742, "no_speech_prob": 0.0008935903897508979}, {"id": 1534, "seek": 389354, "start": 3917.54, "end": 3919.44, "text": " now before we backpropagate through the next line", "tokens": [51565, 586, 949, 321, 646, 79, 1513, 559, 473, 807, 264, 958, 1622, 51660], "temperature": 0.0, "avg_logprob": -0.06547782177061547, "compression_ratio": 1.810483870967742, "no_speech_prob": 0.0008935903897508979}, {"id": 1535, "seek": 389354, "start": 3919.54, "end": 3921.44, "text": " I want to briefly talk about the node here", "tokens": [51665, 286, 528, 281, 10515, 751, 466, 264, 9984, 510, 51760], "temperature": 0.0, "avg_logprob": -0.06547782177061547, "compression_ratio": 1.810483870967742, "no_speech_prob": 0.0008935903897508979}, {"id": 1536, "seek": 389354, "start": 3921.54, "end": 3923.44, "text": " where I'm using the Bessel's correction", "tokens": [51765, 689, 286, 478, 1228, 264, 363, 47166, 311, 19984, 51860], "temperature": 0.0, "avg_logprob": -0.06547782177061547, "compression_ratio": 1.810483870967742, "no_speech_prob": 0.0008935903897508979}, {"id": 1537, "seek": 392344, "start": 3923.44, "end": 3925.34, "text": " which is 1 over n minus 1", "tokens": [50365, 597, 307, 502, 670, 297, 3175, 502, 50460], "temperature": 0.0, "avg_logprob": -0.09857664035476801, "compression_ratio": 1.9634703196347032, "no_speech_prob": 0.0014310242841020226}, {"id": 1538, "seek": 392344, "start": 3925.44, "end": 3927.34, "text": " instead of dividing by n", "tokens": [50465, 2602, 295, 26764, 538, 297, 50560], "temperature": 0.0, "avg_logprob": -0.09857664035476801, "compression_ratio": 1.9634703196347032, "no_speech_prob": 0.0014310242841020226}, {"id": 1539, "seek": 392344, "start": 3927.44, "end": 3929.34, "text": " when I normalize here", "tokens": [50565, 562, 286, 2710, 1125, 510, 50660], "temperature": 0.0, "avg_logprob": -0.09857664035476801, "compression_ratio": 1.9634703196347032, "no_speech_prob": 0.0014310242841020226}, {"id": 1540, "seek": 392344, "start": 3929.44, "end": 3931.34, "text": " the sum of squares", "tokens": [50665, 264, 2408, 295, 19368, 50760], "temperature": 0.0, "avg_logprob": -0.09857664035476801, "compression_ratio": 1.9634703196347032, "no_speech_prob": 0.0014310242841020226}, {"id": 1541, "seek": 392344, "start": 3931.44, "end": 3933.34, "text": " now you'll notice that this is a departure from the paper", "tokens": [50765, 586, 291, 603, 3449, 300, 341, 307, 257, 25866, 490, 264, 3035, 50860], "temperature": 0.0, "avg_logprob": -0.09857664035476801, "compression_ratio": 1.9634703196347032, "no_speech_prob": 0.0014310242841020226}, {"id": 1542, "seek": 392344, "start": 3933.44, "end": 3935.34, "text": " which uses 1 over n instead", "tokens": [50865, 597, 4960, 502, 670, 297, 2602, 50960], "temperature": 0.0, "avg_logprob": -0.09857664035476801, "compression_ratio": 1.9634703196347032, "no_speech_prob": 0.0014310242841020226}, {"id": 1543, "seek": 392344, "start": 3935.44, "end": 3937.34, "text": " not 1 over n minus 1", "tokens": [50965, 406, 502, 670, 297, 3175, 502, 51060], "temperature": 0.0, "avg_logprob": -0.09857664035476801, "compression_ratio": 1.9634703196347032, "no_speech_prob": 0.0014310242841020226}, {"id": 1544, "seek": 392344, "start": 3937.44, "end": 3939.34, "text": " there m is rn", "tokens": [51065, 456, 275, 307, 367, 77, 51160], "temperature": 0.0, "avg_logprob": -0.09857664035476801, "compression_ratio": 1.9634703196347032, "no_speech_prob": 0.0014310242841020226}, {"id": 1545, "seek": 392344, "start": 3939.44, "end": 3941.34, "text": " so it turns out that there are two ways", "tokens": [51165, 370, 309, 4523, 484, 300, 456, 366, 732, 2098, 51260], "temperature": 0.0, "avg_logprob": -0.09857664035476801, "compression_ratio": 1.9634703196347032, "no_speech_prob": 0.0014310242841020226}, {"id": 1546, "seek": 392344, "start": 3941.44, "end": 3943.34, "text": " of estimating variance of an array", "tokens": [51265, 295, 8017, 990, 21977, 295, 364, 10225, 51360], "temperature": 0.0, "avg_logprob": -0.09857664035476801, "compression_ratio": 1.9634703196347032, "no_speech_prob": 0.0014310242841020226}, {"id": 1547, "seek": 392344, "start": 3943.44, "end": 3945.34, "text": " one is the biased estimate", "tokens": [51365, 472, 307, 264, 28035, 12539, 51460], "temperature": 0.0, "avg_logprob": -0.09857664035476801, "compression_ratio": 1.9634703196347032, "no_speech_prob": 0.0014310242841020226}, {"id": 1548, "seek": 392344, "start": 3945.44, "end": 3947.34, "text": " which is 1 over n", "tokens": [51465, 597, 307, 502, 670, 297, 51560], "temperature": 0.0, "avg_logprob": -0.09857664035476801, "compression_ratio": 1.9634703196347032, "no_speech_prob": 0.0014310242841020226}, {"id": 1549, "seek": 392344, "start": 3947.44, "end": 3949.34, "text": " and the other one is the unbiased estimate", "tokens": [51565, 293, 264, 661, 472, 307, 264, 517, 5614, 1937, 12539, 51660], "temperature": 0.0, "avg_logprob": -0.09857664035476801, "compression_ratio": 1.9634703196347032, "no_speech_prob": 0.0014310242841020226}, {"id": 1550, "seek": 392344, "start": 3949.44, "end": 3951.34, "text": " which is 1 over n minus 1", "tokens": [51665, 597, 307, 502, 670, 297, 3175, 502, 51760], "temperature": 0.0, "avg_logprob": -0.09857664035476801, "compression_ratio": 1.9634703196347032, "no_speech_prob": 0.0014310242841020226}, {"id": 1551, "seek": 392344, "start": 3951.44, "end": 3953.34, "text": " now confusingly in the paper", "tokens": [51765, 586, 13181, 356, 294, 264, 3035, 51860], "temperature": 0.0, "avg_logprob": -0.09857664035476801, "compression_ratio": 1.9634703196347032, "no_speech_prob": 0.0014310242841020226}, {"id": 1552, "seek": 395334, "start": 3953.34, "end": 3955.2400000000002, "text": " it's not very clearly described", "tokens": [50365, 309, 311, 406, 588, 4448, 7619, 50460], "temperature": 0.0, "avg_logprob": -0.09122179308508196, "compression_ratio": 1.944206008583691, "no_speech_prob": 0.0017433133907616138}, {"id": 1553, "seek": 395334, "start": 3955.34, "end": 3957.2400000000002, "text": " and also it's a detail that kind of matters", "tokens": [50465, 293, 611, 309, 311, 257, 2607, 300, 733, 295, 7001, 50560], "temperature": 0.0, "avg_logprob": -0.09122179308508196, "compression_ratio": 1.944206008583691, "no_speech_prob": 0.0017433133907616138}, {"id": 1554, "seek": 395334, "start": 3957.34, "end": 3959.2400000000002, "text": " I think", "tokens": [50565, 286, 519, 50660], "temperature": 0.0, "avg_logprob": -0.09122179308508196, "compression_ratio": 1.944206008583691, "no_speech_prob": 0.0017433133907616138}, {"id": 1555, "seek": 395334, "start": 3959.34, "end": 3961.2400000000002, "text": " we are using the biased version at training time", "tokens": [50665, 321, 366, 1228, 264, 28035, 3037, 412, 3097, 565, 50760], "temperature": 0.0, "avg_logprob": -0.09122179308508196, "compression_ratio": 1.944206008583691, "no_speech_prob": 0.0017433133907616138}, {"id": 1556, "seek": 395334, "start": 3961.34, "end": 3963.2400000000002, "text": " but later when they are talking about the inference", "tokens": [50765, 457, 1780, 562, 436, 366, 1417, 466, 264, 38253, 50860], "temperature": 0.0, "avg_logprob": -0.09122179308508196, "compression_ratio": 1.944206008583691, "no_speech_prob": 0.0017433133907616138}, {"id": 1557, "seek": 395334, "start": 3963.34, "end": 3965.2400000000002, "text": " they are mentioning that", "tokens": [50865, 436, 366, 18315, 300, 50960], "temperature": 0.0, "avg_logprob": -0.09122179308508196, "compression_ratio": 1.944206008583691, "no_speech_prob": 0.0017433133907616138}, {"id": 1558, "seek": 395334, "start": 3965.34, "end": 3967.2400000000002, "text": " when they do the inference", "tokens": [50965, 562, 436, 360, 264, 38253, 51060], "temperature": 0.0, "avg_logprob": -0.09122179308508196, "compression_ratio": 1.944206008583691, "no_speech_prob": 0.0017433133907616138}, {"id": 1559, "seek": 395334, "start": 3967.34, "end": 3969.2400000000002, "text": " they are using the unbiased estimate", "tokens": [51065, 436, 366, 1228, 264, 517, 5614, 1937, 12539, 51160], "temperature": 0.0, "avg_logprob": -0.09122179308508196, "compression_ratio": 1.944206008583691, "no_speech_prob": 0.0017433133907616138}, {"id": 1560, "seek": 395334, "start": 3969.34, "end": 3971.2400000000002, "text": " which is the n minus 1 version", "tokens": [51165, 597, 307, 264, 297, 3175, 502, 3037, 51260], "temperature": 0.0, "avg_logprob": -0.09122179308508196, "compression_ratio": 1.944206008583691, "no_speech_prob": 0.0017433133907616138}, {"id": 1561, "seek": 395334, "start": 3971.34, "end": 3973.2400000000002, "text": " in basically", "tokens": [51265, 294, 1936, 51360], "temperature": 0.0, "avg_logprob": -0.09122179308508196, "compression_ratio": 1.944206008583691, "no_speech_prob": 0.0017433133907616138}, {"id": 1562, "seek": 395334, "start": 3973.34, "end": 3975.2400000000002, "text": " for inference", "tokens": [51365, 337, 38253, 51460], "temperature": 0.0, "avg_logprob": -0.09122179308508196, "compression_ratio": 1.944206008583691, "no_speech_prob": 0.0017433133907616138}, {"id": 1563, "seek": 395334, "start": 3975.34, "end": 3977.2400000000002, "text": " and to calibrate the running mean", "tokens": [51465, 293, 281, 21583, 4404, 264, 2614, 914, 51560], "temperature": 0.0, "avg_logprob": -0.09122179308508196, "compression_ratio": 1.944206008583691, "no_speech_prob": 0.0017433133907616138}, {"id": 1564, "seek": 395334, "start": 3977.34, "end": 3979.2400000000002, "text": " and the running variance basically", "tokens": [51565, 293, 264, 2614, 21977, 1936, 51660], "temperature": 0.0, "avg_logprob": -0.09122179308508196, "compression_ratio": 1.944206008583691, "no_speech_prob": 0.0017433133907616138}, {"id": 1565, "seek": 395334, "start": 3979.34, "end": 3981.2400000000002, "text": " and so they actually introduce", "tokens": [51665, 293, 370, 436, 767, 5366, 51760], "temperature": 0.0, "avg_logprob": -0.09122179308508196, "compression_ratio": 1.944206008583691, "no_speech_prob": 0.0017433133907616138}, {"id": 1566, "seek": 395334, "start": 3981.34, "end": 3983.2400000000002, "text": " a train test mismatch", "tokens": [51765, 257, 3847, 1500, 23220, 852, 51860], "temperature": 0.0, "avg_logprob": -0.09122179308508196, "compression_ratio": 1.944206008583691, "no_speech_prob": 0.0017433133907616138}, {"id": 1567, "seek": 398334, "start": 3983.34, "end": 3985.2400000000002, "text": " where in training they use the biased version", "tokens": [50365, 689, 294, 3097, 436, 764, 264, 28035, 3037, 50460], "temperature": 0.0, "avg_logprob": -0.05419721742616083, "compression_ratio": 1.8453237410071943, "no_speech_prob": 0.002083502011373639}, {"id": 1568, "seek": 398334, "start": 3985.34, "end": 3987.2400000000002, "text": " and in test time they use the unbiased version", "tokens": [50465, 293, 294, 1500, 565, 436, 764, 264, 517, 5614, 1937, 3037, 50560], "temperature": 0.0, "avg_logprob": -0.05419721742616083, "compression_ratio": 1.8453237410071943, "no_speech_prob": 0.002083502011373639}, {"id": 1569, "seek": 398334, "start": 3987.34, "end": 3989.2400000000002, "text": " I find this extremely confusing", "tokens": [50565, 286, 915, 341, 4664, 13181, 50660], "temperature": 0.0, "avg_logprob": -0.05419721742616083, "compression_ratio": 1.8453237410071943, "no_speech_prob": 0.002083502011373639}, {"id": 1570, "seek": 398334, "start": 3989.34, "end": 3991.2400000000002, "text": " you can read more about", "tokens": [50665, 291, 393, 1401, 544, 466, 50760], "temperature": 0.0, "avg_logprob": -0.05419721742616083, "compression_ratio": 1.8453237410071943, "no_speech_prob": 0.002083502011373639}, {"id": 1571, "seek": 398334, "start": 3991.34, "end": 3993.2400000000002, "text": " the Bessel's correction", "tokens": [50765, 264, 363, 47166, 311, 19984, 50860], "temperature": 0.0, "avg_logprob": -0.05419721742616083, "compression_ratio": 1.8453237410071943, "no_speech_prob": 0.002083502011373639}, {"id": 1572, "seek": 398334, "start": 3993.34, "end": 3995.2400000000002, "text": " and why dividing by n minus 1", "tokens": [50865, 293, 983, 26764, 538, 297, 3175, 502, 50960], "temperature": 0.0, "avg_logprob": -0.05419721742616083, "compression_ratio": 1.8453237410071943, "no_speech_prob": 0.002083502011373639}, {"id": 1573, "seek": 398334, "start": 3995.34, "end": 3997.2400000000002, "text": " gives you a better estimate of the variance", "tokens": [50965, 2709, 291, 257, 1101, 12539, 295, 264, 21977, 51060], "temperature": 0.0, "avg_logprob": -0.05419721742616083, "compression_ratio": 1.8453237410071943, "no_speech_prob": 0.002083502011373639}, {"id": 1574, "seek": 398334, "start": 3997.34, "end": 3999.2400000000002, "text": " in the case where you have population sizes", "tokens": [51065, 294, 264, 1389, 689, 291, 362, 4415, 11602, 51160], "temperature": 0.0, "avg_logprob": -0.05419721742616083, "compression_ratio": 1.8453237410071943, "no_speech_prob": 0.002083502011373639}, {"id": 1575, "seek": 398334, "start": 3999.34, "end": 4001.2400000000002, "text": " or samples from a population", "tokens": [51165, 420, 10938, 490, 257, 4415, 51260], "temperature": 0.0, "avg_logprob": -0.05419721742616083, "compression_ratio": 1.8453237410071943, "no_speech_prob": 0.002083502011373639}, {"id": 1576, "seek": 398334, "start": 4001.34, "end": 4003.2400000000002, "text": " that are very small", "tokens": [51265, 300, 366, 588, 1359, 51360], "temperature": 0.0, "avg_logprob": -0.05419721742616083, "compression_ratio": 1.8453237410071943, "no_speech_prob": 0.002083502011373639}, {"id": 1577, "seek": 398334, "start": 4003.34, "end": 4005.2400000000002, "text": " and that is indeed the case for us", "tokens": [51365, 293, 300, 307, 6451, 264, 1389, 337, 505, 51460], "temperature": 0.0, "avg_logprob": -0.05419721742616083, "compression_ratio": 1.8453237410071943, "no_speech_prob": 0.002083502011373639}, {"id": 1578, "seek": 398334, "start": 4005.34, "end": 4007.2400000000002, "text": " because we are dealing with mini-matches", "tokens": [51465, 570, 321, 366, 6260, 365, 8382, 12, 76, 852, 279, 51560], "temperature": 0.0, "avg_logprob": -0.05419721742616083, "compression_ratio": 1.8453237410071943, "no_speech_prob": 0.002083502011373639}, {"id": 1579, "seek": 398334, "start": 4007.34, "end": 4009.2400000000002, "text": " and these mini-matches are a small sample", "tokens": [51565, 293, 613, 8382, 12, 76, 852, 279, 366, 257, 1359, 6889, 51660], "temperature": 0.0, "avg_logprob": -0.05419721742616083, "compression_ratio": 1.8453237410071943, "no_speech_prob": 0.002083502011373639}, {"id": 1580, "seek": 398334, "start": 4009.34, "end": 4011.2400000000002, "text": " of a larger population", "tokens": [51665, 295, 257, 4833, 4415, 51760], "temperature": 0.0, "avg_logprob": -0.05419721742616083, "compression_ratio": 1.8453237410071943, "no_speech_prob": 0.002083502011373639}, {"id": 1581, "seek": 398334, "start": 4011.34, "end": 4013.2400000000002, "text": " which is the entire training set", "tokens": [51765, 597, 307, 264, 2302, 3097, 992, 51860], "temperature": 0.0, "avg_logprob": -0.05419721742616083, "compression_ratio": 1.8453237410071943, "no_speech_prob": 0.002083502011373639}, {"id": 1582, "seek": 401324, "start": 4013.24, "end": 4015.14, "text": " and it turns out that", "tokens": [50365, 293, 309, 4523, 484, 300, 50460], "temperature": 0.0, "avg_logprob": -0.1048717703138079, "compression_ratio": 1.9312977099236641, "no_speech_prob": 0.002041826955974102}, {"id": 1583, "seek": 401324, "start": 4015.24, "end": 4017.14, "text": " if you just estimate it using 1 over n", "tokens": [50465, 498, 291, 445, 12539, 309, 1228, 502, 670, 297, 50560], "temperature": 0.0, "avg_logprob": -0.1048717703138079, "compression_ratio": 1.9312977099236641, "no_speech_prob": 0.002041826955974102}, {"id": 1584, "seek": 401324, "start": 4017.24, "end": 4019.14, "text": " that actually almost always", "tokens": [50565, 300, 767, 1920, 1009, 50660], "temperature": 0.0, "avg_logprob": -0.1048717703138079, "compression_ratio": 1.9312977099236641, "no_speech_prob": 0.002041826955974102}, {"id": 1585, "seek": 401324, "start": 4019.24, "end": 4021.14, "text": " underestimates the variance", "tokens": [50665, 24612, 332, 1024, 264, 21977, 50760], "temperature": 0.0, "avg_logprob": -0.1048717703138079, "compression_ratio": 1.9312977099236641, "no_speech_prob": 0.002041826955974102}, {"id": 1586, "seek": 401324, "start": 4021.24, "end": 4023.14, "text": " and it is a biased estimator", "tokens": [50765, 293, 309, 307, 257, 28035, 8017, 1639, 50860], "temperature": 0.0, "avg_logprob": -0.1048717703138079, "compression_ratio": 1.9312977099236641, "no_speech_prob": 0.002041826955974102}, {"id": 1587, "seek": 401324, "start": 4023.24, "end": 4025.14, "text": " and it is advised that you use the unbiased version", "tokens": [50865, 293, 309, 307, 26269, 300, 291, 764, 264, 517, 5614, 1937, 3037, 50960], "temperature": 0.0, "avg_logprob": -0.1048717703138079, "compression_ratio": 1.9312977099236641, "no_speech_prob": 0.002041826955974102}, {"id": 1588, "seek": 401324, "start": 4025.24, "end": 4027.14, "text": " and divide by n minus 1", "tokens": [50965, 293, 9845, 538, 297, 3175, 502, 51060], "temperature": 0.0, "avg_logprob": -0.1048717703138079, "compression_ratio": 1.9312977099236641, "no_speech_prob": 0.002041826955974102}, {"id": 1589, "seek": 401324, "start": 4027.24, "end": 4029.14, "text": " and you can go through this article here", "tokens": [51065, 293, 291, 393, 352, 807, 341, 7222, 510, 51160], "temperature": 0.0, "avg_logprob": -0.1048717703138079, "compression_ratio": 1.9312977099236641, "no_speech_prob": 0.002041826955974102}, {"id": 1590, "seek": 401324, "start": 4029.24, "end": 4031.14, "text": " that I liked that actually describes", "tokens": [51165, 300, 286, 4501, 300, 767, 15626, 51260], "temperature": 0.0, "avg_logprob": -0.1048717703138079, "compression_ratio": 1.9312977099236641, "no_speech_prob": 0.002041826955974102}, {"id": 1591, "seek": 401324, "start": 4031.24, "end": 4033.14, "text": " the fall of reasoning", "tokens": [51265, 264, 2100, 295, 21577, 51360], "temperature": 0.0, "avg_logprob": -0.1048717703138079, "compression_ratio": 1.9312977099236641, "no_speech_prob": 0.002041826955974102}, {"id": 1592, "seek": 401324, "start": 4033.24, "end": 4035.14, "text": " and I'll link it in the video description", "tokens": [51365, 293, 286, 603, 2113, 309, 294, 264, 960, 3855, 51460], "temperature": 0.0, "avg_logprob": -0.1048717703138079, "compression_ratio": 1.9312977099236641, "no_speech_prob": 0.002041826955974102}, {"id": 1593, "seek": 401324, "start": 4035.24, "end": 4037.14, "text": " now when you calculate the torshta variance", "tokens": [51465, 586, 562, 291, 8873, 264, 3930, 82, 39903, 21977, 51560], "temperature": 0.0, "avg_logprob": -0.1048717703138079, "compression_ratio": 1.9312977099236641, "no_speech_prob": 0.002041826955974102}, {"id": 1594, "seek": 401324, "start": 4037.24, "end": 4039.14, "text": " you'll notice that they take the unbiased flag", "tokens": [51565, 291, 603, 3449, 300, 436, 747, 264, 517, 5614, 1937, 7166, 51660], "temperature": 0.0, "avg_logprob": -0.1048717703138079, "compression_ratio": 1.9312977099236641, "no_speech_prob": 0.002041826955974102}, {"id": 1595, "seek": 401324, "start": 4039.24, "end": 4041.14, "text": " whether or not you want to divide by n", "tokens": [51665, 1968, 420, 406, 291, 528, 281, 9845, 538, 297, 51760], "temperature": 0.0, "avg_logprob": -0.1048717703138079, "compression_ratio": 1.9312977099236641, "no_speech_prob": 0.002041826955974102}, {"id": 1596, "seek": 401324, "start": 4041.24, "end": 4043.14, "text": " or n minus 1", "tokens": [51765, 420, 297, 3175, 502, 51860], "temperature": 0.0, "avg_logprob": -0.1048717703138079, "compression_ratio": 1.9312977099236641, "no_speech_prob": 0.002041826955974102}, {"id": 1597, "seek": 404314, "start": 4043.14, "end": 4045.04, "text": " so the default is for unbiased", "tokens": [50365, 370, 264, 7576, 307, 337, 517, 5614, 1937, 50460], "temperature": 0.0, "avg_logprob": -0.0856648710437287, "compression_ratio": 1.8072727272727274, "no_speech_prob": 0.0005388744757510722}, {"id": 1598, "seek": 404314, "start": 4045.14, "end": 4047.04, "text": " but I believe unbiased by default", "tokens": [50465, 457, 286, 1697, 517, 5614, 1937, 538, 7576, 50560], "temperature": 0.0, "avg_logprob": -0.0856648710437287, "compression_ratio": 1.8072727272727274, "no_speech_prob": 0.0005388744757510722}, {"id": 1599, "seek": 404314, "start": 4047.14, "end": 4049.04, "text": " is true", "tokens": [50565, 307, 2074, 50660], "temperature": 0.0, "avg_logprob": -0.0856648710437287, "compression_ratio": 1.8072727272727274, "no_speech_prob": 0.0005388744757510722}, {"id": 1600, "seek": 404314, "start": 4049.14, "end": 4051.04, "text": " I'm not sure why the docs here don't cite that", "tokens": [50665, 286, 478, 406, 988, 983, 264, 45623, 510, 500, 380, 37771, 300, 50760], "temperature": 0.0, "avg_logprob": -0.0856648710437287, "compression_ratio": 1.8072727272727274, "no_speech_prob": 0.0005388744757510722}, {"id": 1601, "seek": 404314, "start": 4051.14, "end": 4053.04, "text": " now in the batch norm", "tokens": [50765, 586, 294, 264, 15245, 2026, 50860], "temperature": 0.0, "avg_logprob": -0.0856648710437287, "compression_ratio": 1.8072727272727274, "no_speech_prob": 0.0005388744757510722}, {"id": 1602, "seek": 404314, "start": 4053.14, "end": 4055.04, "text": " 1 , the documentation again", "tokens": [50865, 502, 3182, 264, 14333, 797, 50960], "temperature": 0.0, "avg_logprob": -0.0856648710437287, "compression_ratio": 1.8072727272727274, "no_speech_prob": 0.0005388744757510722}, {"id": 1603, "seek": 404314, "start": 4055.14, "end": 4057.04, "text": " is kind of wrong and confusing", "tokens": [50965, 307, 733, 295, 2085, 293, 13181, 51060], "temperature": 0.0, "avg_logprob": -0.0856648710437287, "compression_ratio": 1.8072727272727274, "no_speech_prob": 0.0005388744757510722}, {"id": 1604, "seek": 404314, "start": 4057.14, "end": 4059.04, "text": " it says that the standard deviation is calculated", "tokens": [51065, 309, 1619, 300, 264, 3832, 25163, 307, 15598, 51160], "temperature": 0.0, "avg_logprob": -0.0856648710437287, "compression_ratio": 1.8072727272727274, "no_speech_prob": 0.0005388744757510722}, {"id": 1605, "seek": 404314, "start": 4059.14, "end": 4061.04, "text": " via the biased estimator", "tokens": [51165, 5766, 264, 28035, 8017, 1639, 51260], "temperature": 0.0, "avg_logprob": -0.0856648710437287, "compression_ratio": 1.8072727272727274, "no_speech_prob": 0.0005388744757510722}, {"id": 1606, "seek": 404314, "start": 4061.14, "end": 4063.04, "text": " but this is actually not exactly right", "tokens": [51265, 457, 341, 307, 767, 406, 2293, 558, 51360], "temperature": 0.0, "avg_logprob": -0.0856648710437287, "compression_ratio": 1.8072727272727274, "no_speech_prob": 0.0005388744757510722}, {"id": 1607, "seek": 404314, "start": 4063.14, "end": 4065.04, "text": " and people have pointed out that it is not right", "tokens": [51365, 293, 561, 362, 10932, 484, 300, 309, 307, 406, 558, 51460], "temperature": 0.0, "avg_logprob": -0.0856648710437287, "compression_ratio": 1.8072727272727274, "no_speech_prob": 0.0005388744757510722}, {"id": 1608, "seek": 404314, "start": 4065.14, "end": 4067.04, "text": " in a number of issues since then", "tokens": [51465, 294, 257, 1230, 295, 2663, 1670, 550, 51560], "temperature": 0.0, "avg_logprob": -0.0856648710437287, "compression_ratio": 1.8072727272727274, "no_speech_prob": 0.0005388744757510722}, {"id": 1609, "seek": 404314, "start": 4067.14, "end": 4069.04, "text": " because actually the rabbit hole is deeper", "tokens": [51565, 570, 767, 264, 19509, 5458, 307, 7731, 51660], "temperature": 0.0, "avg_logprob": -0.0856648710437287, "compression_ratio": 1.8072727272727274, "no_speech_prob": 0.0005388744757510722}, {"id": 1610, "seek": 404314, "start": 4069.14, "end": 4071.04, "text": " and they follow the paper exactly", "tokens": [51665, 293, 436, 1524, 264, 3035, 2293, 51760], "temperature": 0.0, "avg_logprob": -0.0856648710437287, "compression_ratio": 1.8072727272727274, "no_speech_prob": 0.0005388744757510722}, {"id": 1611, "seek": 404314, "start": 4071.14, "end": 4073.04, "text": " and they use the biased", "tokens": [51765, 293, 436, 764, 264, 28035, 51860], "temperature": 0.0, "avg_logprob": -0.0856648710437287, "compression_ratio": 1.8072727272727274, "no_speech_prob": 0.0005388744757510722}, {"id": 1612, "seek": 407304, "start": 4073.04, "end": 4074.94, "text": " version for training", "tokens": [50365, 3037, 337, 3097, 50460], "temperature": 0.0, "avg_logprob": -0.06852551724048371, "compression_ratio": 1.9770992366412214, "no_speech_prob": 0.0023899846710264683}, {"id": 1613, "seek": 407304, "start": 4075.04, "end": 4076.94, "text": " but when they're estimating the running standard deviation", "tokens": [50465, 457, 562, 436, 434, 8017, 990, 264, 2614, 3832, 25163, 50560], "temperature": 0.0, "avg_logprob": -0.06852551724048371, "compression_ratio": 1.9770992366412214, "no_speech_prob": 0.0023899846710264683}, {"id": 1614, "seek": 407304, "start": 4077.04, "end": 4078.94, "text": " they are using the unbiased version", "tokens": [50565, 436, 366, 1228, 264, 517, 5614, 1937, 3037, 50660], "temperature": 0.0, "avg_logprob": -0.06852551724048371, "compression_ratio": 1.9770992366412214, "no_speech_prob": 0.0023899846710264683}, {"id": 1615, "seek": 407304, "start": 4079.04, "end": 4080.94, "text": " so again there's the train test mismatch", "tokens": [50665, 370, 797, 456, 311, 264, 3847, 1500, 23220, 852, 50760], "temperature": 0.0, "avg_logprob": -0.06852551724048371, "compression_ratio": 1.9770992366412214, "no_speech_prob": 0.0023899846710264683}, {"id": 1616, "seek": 407304, "start": 4081.04, "end": 4082.94, "text": " so long story short", "tokens": [50765, 370, 938, 1657, 2099, 50860], "temperature": 0.0, "avg_logprob": -0.06852551724048371, "compression_ratio": 1.9770992366412214, "no_speech_prob": 0.0023899846710264683}, {"id": 1617, "seek": 407304, "start": 4083.04, "end": 4084.94, "text": " I'm not a fan of train test discrepancies", "tokens": [50865, 286, 478, 406, 257, 3429, 295, 3847, 1500, 2983, 19919, 32286, 50960], "temperature": 0.0, "avg_logprob": -0.06852551724048371, "compression_ratio": 1.9770992366412214, "no_speech_prob": 0.0023899846710264683}, {"id": 1618, "seek": 407304, "start": 4085.04, "end": 4086.94, "text": " I basically kind of consider", "tokens": [50965, 286, 1936, 733, 295, 1949, 51060], "temperature": 0.0, "avg_logprob": -0.06852551724048371, "compression_ratio": 1.9770992366412214, "no_speech_prob": 0.0023899846710264683}, {"id": 1619, "seek": 407304, "start": 4087.04, "end": 4088.94, "text": " the fact that we use the biased version", "tokens": [51065, 264, 1186, 300, 321, 764, 264, 28035, 3037, 51160], "temperature": 0.0, "avg_logprob": -0.06852551724048371, "compression_ratio": 1.9770992366412214, "no_speech_prob": 0.0023899846710264683}, {"id": 1620, "seek": 407304, "start": 4089.04, "end": 4090.94, "text": " the training time", "tokens": [51165, 264, 3097, 565, 51260], "temperature": 0.0, "avg_logprob": -0.06852551724048371, "compression_ratio": 1.9770992366412214, "no_speech_prob": 0.0023899846710264683}, {"id": 1621, "seek": 407304, "start": 4091.04, "end": 4092.94, "text": " and the unbiased test time", "tokens": [51265, 293, 264, 517, 5614, 1937, 1500, 565, 51360], "temperature": 0.0, "avg_logprob": -0.06852551724048371, "compression_ratio": 1.9770992366412214, "no_speech_prob": 0.0023899846710264683}, {"id": 1622, "seek": 407304, "start": 4093.04, "end": 4094.94, "text": " I basically consider this to be a bug", "tokens": [51365, 286, 1936, 1949, 341, 281, 312, 257, 7426, 51460], "temperature": 0.0, "avg_logprob": -0.06852551724048371, "compression_ratio": 1.9770992366412214, "no_speech_prob": 0.0023899846710264683}, {"id": 1623, "seek": 407304, "start": 4095.04, "end": 4096.94, "text": " and I don't think that there's a good reason for that", "tokens": [51465, 293, 286, 500, 380, 519, 300, 456, 311, 257, 665, 1778, 337, 300, 51560], "temperature": 0.0, "avg_logprob": -0.06852551724048371, "compression_ratio": 1.9770992366412214, "no_speech_prob": 0.0023899846710264683}, {"id": 1624, "seek": 407304, "start": 4097.04, "end": 4098.94, "text": " it's not really", "tokens": [51565, 309, 311, 406, 534, 51660], "temperature": 0.0, "avg_logprob": -0.06852551724048371, "compression_ratio": 1.9770992366412214, "no_speech_prob": 0.0023899846710264683}, {"id": 1625, "seek": 407304, "start": 4099.04, "end": 4100.94, "text": " they don't really go into the detail", "tokens": [51665, 436, 500, 380, 534, 352, 666, 264, 2607, 51760], "temperature": 0.0, "avg_logprob": -0.06852551724048371, "compression_ratio": 1.9770992366412214, "no_speech_prob": 0.0023899846710264683}, {"id": 1626, "seek": 407304, "start": 4101.04, "end": 4102.94, "text": " of the reasoning behind it in this paper", "tokens": [51765, 295, 264, 21577, 2261, 309, 294, 341, 3035, 51860], "temperature": 0.0, "avg_logprob": -0.06852551724048371, "compression_ratio": 1.9770992366412214, "no_speech_prob": 0.0023899846710264683}, {"id": 1627, "seek": 410294, "start": 4102.94, "end": 4104.839999999999, "text": " I basically prefer to use the Bessel's correction", "tokens": [50365, 286, 1936, 4382, 281, 764, 264, 363, 47166, 311, 19984, 50460], "temperature": 0.0, "avg_logprob": -0.08467632819866312, "compression_ratio": 1.7424749163879598, "no_speech_prob": 0.0012157950550317764}, {"id": 1628, "seek": 410294, "start": 4104.94, "end": 4106.839999999999, "text": " in my own work", "tokens": [50465, 294, 452, 1065, 589, 50560], "temperature": 0.0, "avg_logprob": -0.08467632819866312, "compression_ratio": 1.7424749163879598, "no_speech_prob": 0.0012157950550317764}, {"id": 1629, "seek": 410294, "start": 4106.94, "end": 4108.839999999999, "text": " unfortunately BatchNorm does not take", "tokens": [50565, 7015, 363, 852, 45, 687, 775, 406, 747, 50660], "temperature": 0.0, "avg_logprob": -0.08467632819866312, "compression_ratio": 1.7424749163879598, "no_speech_prob": 0.0012157950550317764}, {"id": 1630, "seek": 410294, "start": 4108.94, "end": 4110.839999999999, "text": " a keyword argument that tells you whether or not", "tokens": [50665, 257, 20428, 6770, 300, 5112, 291, 1968, 420, 406, 50760], "temperature": 0.0, "avg_logprob": -0.08467632819866312, "compression_ratio": 1.7424749163879598, "no_speech_prob": 0.0012157950550317764}, {"id": 1631, "seek": 410294, "start": 4110.94, "end": 4112.839999999999, "text": " you want to use the unbiased version", "tokens": [50765, 291, 528, 281, 764, 264, 517, 5614, 1937, 3037, 50860], "temperature": 0.0, "avg_logprob": -0.08467632819866312, "compression_ratio": 1.7424749163879598, "no_speech_prob": 0.0012157950550317764}, {"id": 1632, "seek": 410294, "start": 4112.94, "end": 4114.839999999999, "text": " or the biased version in both train and test", "tokens": [50865, 420, 264, 28035, 3037, 294, 1293, 3847, 293, 1500, 50960], "temperature": 0.0, "avg_logprob": -0.08467632819866312, "compression_ratio": 1.7424749163879598, "no_speech_prob": 0.0012157950550317764}, {"id": 1633, "seek": 410294, "start": 4114.94, "end": 4116.839999999999, "text": " and so therefore anyone using BatchNormalization", "tokens": [50965, 293, 370, 4412, 2878, 1228, 363, 852, 45, 24440, 2144, 51060], "temperature": 0.0, "avg_logprob": -0.08467632819866312, "compression_ratio": 1.7424749163879598, "no_speech_prob": 0.0012157950550317764}, {"id": 1634, "seek": 410294, "start": 4116.94, "end": 4118.839999999999, "text": " basically in my view", "tokens": [51065, 1936, 294, 452, 1910, 51160], "temperature": 0.0, "avg_logprob": -0.08467632819866312, "compression_ratio": 1.7424749163879598, "no_speech_prob": 0.0012157950550317764}, {"id": 1635, "seek": 410294, "start": 4118.94, "end": 4120.839999999999, "text": " has a bit of a bug in the code", "tokens": [51165, 575, 257, 857, 295, 257, 7426, 294, 264, 3089, 51260], "temperature": 0.0, "avg_logprob": -0.08467632819866312, "compression_ratio": 1.7424749163879598, "no_speech_prob": 0.0012157950550317764}, {"id": 1636, "seek": 410294, "start": 4120.94, "end": 4122.839999999999, "text": " and this turns out to be much less of a problem", "tokens": [51265, 293, 341, 4523, 484, 281, 312, 709, 1570, 295, 257, 1154, 51360], "temperature": 0.0, "avg_logprob": -0.08467632819866312, "compression_ratio": 1.7424749163879598, "no_speech_prob": 0.0012157950550317764}, {"id": 1637, "seek": 410294, "start": 4122.94, "end": 4124.839999999999, "text": " if your batch", "tokens": [51365, 498, 428, 15245, 51460], "temperature": 0.0, "avg_logprob": -0.08467632819866312, "compression_ratio": 1.7424749163879598, "no_speech_prob": 0.0012157950550317764}, {"id": 1638, "seek": 410294, "start": 4124.94, "end": 4126.839999999999, "text": " many batch sizes are a bit larger", "tokens": [51465, 867, 15245, 11602, 366, 257, 857, 4833, 51560], "temperature": 0.0, "avg_logprob": -0.08467632819866312, "compression_ratio": 1.7424749163879598, "no_speech_prob": 0.0012157950550317764}, {"id": 1639, "seek": 410294, "start": 4126.94, "end": 4128.839999999999, "text": " but still I just find it kind of", "tokens": [51565, 457, 920, 286, 445, 915, 309, 733, 295, 51660], "temperature": 0.0, "avg_logprob": -0.08467632819866312, "compression_ratio": 1.7424749163879598, "no_speech_prob": 0.0012157950550317764}, {"id": 1640, "seek": 410294, "start": 4128.94, "end": 4130.839999999999, "text": " unpalatable", "tokens": [51665, 20994, 304, 31415, 51760], "temperature": 0.0, "avg_logprob": -0.08467632819866312, "compression_ratio": 1.7424749163879598, "no_speech_prob": 0.0012157950550317764}, {"id": 1641, "seek": 410294, "start": 4130.94, "end": 4132.839999999999, "text": " so maybe someone can explain why this is okay", "tokens": [51765, 370, 1310, 1580, 393, 2903, 983, 341, 307, 1392, 51860], "temperature": 0.0, "avg_logprob": -0.08467632819866312, "compression_ratio": 1.7424749163879598, "no_speech_prob": 0.0012157950550317764}, {"id": 1642, "seek": 413284, "start": 4132.84, "end": 4134.74, "text": " but for now I prefer to use the unbiased version", "tokens": [50365, 457, 337, 586, 286, 4382, 281, 764, 264, 517, 5614, 1937, 3037, 50460], "temperature": 0.0, "avg_logprob": -0.1399412684970432, "compression_ratio": 1.695167286245353, "no_speech_prob": 0.001452437718398869}, {"id": 1643, "seek": 413284, "start": 4134.84, "end": 4136.74, "text": " consistently both during training", "tokens": [50465, 14961, 1293, 1830, 3097, 50560], "temperature": 0.0, "avg_logprob": -0.1399412684970432, "compression_ratio": 1.695167286245353, "no_speech_prob": 0.001452437718398869}, {"id": 1644, "seek": 413284, "start": 4136.84, "end": 4138.74, "text": " and at test time", "tokens": [50565, 293, 412, 1500, 565, 50660], "temperature": 0.0, "avg_logprob": -0.1399412684970432, "compression_ratio": 1.695167286245353, "no_speech_prob": 0.001452437718398869}, {"id": 1645, "seek": 413284, "start": 4138.84, "end": 4140.74, "text": " and that's why I'm using 1 over n minus 1 here", "tokens": [50665, 293, 300, 311, 983, 286, 478, 1228, 502, 670, 297, 3175, 502, 510, 50760], "temperature": 0.0, "avg_logprob": -0.1399412684970432, "compression_ratio": 1.695167286245353, "no_speech_prob": 0.001452437718398869}, {"id": 1646, "seek": 413284, "start": 4140.84, "end": 4142.74, "text": " okay so let's now actually backpropagate", "tokens": [50765, 1392, 370, 718, 311, 586, 767, 646, 79, 1513, 559, 473, 50860], "temperature": 0.0, "avg_logprob": -0.1399412684970432, "compression_ratio": 1.695167286245353, "no_speech_prob": 0.001452437718398869}, {"id": 1647, "seek": 413284, "start": 4142.84, "end": 4144.74, "text": " through this line", "tokens": [50865, 807, 341, 1622, 50960], "temperature": 0.0, "avg_logprob": -0.1399412684970432, "compression_ratio": 1.695167286245353, "no_speech_prob": 0.001452437718398869}, {"id": 1648, "seek": 413284, "start": 4144.84, "end": 4146.74, "text": " so", "tokens": [50965, 370, 51060], "temperature": 0.0, "avg_logprob": -0.1399412684970432, "compression_ratio": 1.695167286245353, "no_speech_prob": 0.001452437718398869}, {"id": 1649, "seek": 413284, "start": 4146.84, "end": 4148.74, "text": " the first thing that I always like to do", "tokens": [51065, 264, 700, 551, 300, 286, 1009, 411, 281, 360, 51160], "temperature": 0.0, "avg_logprob": -0.1399412684970432, "compression_ratio": 1.695167286245353, "no_speech_prob": 0.001452437718398869}, {"id": 1650, "seek": 413284, "start": 4148.84, "end": 4150.74, "text": " is I like to scrutinize the shapes first", "tokens": [51165, 307, 286, 411, 281, 28949, 259, 1125, 264, 10854, 700, 51260], "temperature": 0.0, "avg_logprob": -0.1399412684970432, "compression_ratio": 1.695167286245353, "no_speech_prob": 0.001452437718398869}, {"id": 1651, "seek": 413284, "start": 4150.84, "end": 4152.74, "text": " so in particular here looking at the shapes", "tokens": [51265, 370, 294, 1729, 510, 1237, 412, 264, 10854, 51360], "temperature": 0.0, "avg_logprob": -0.1399412684970432, "compression_ratio": 1.695167286245353, "no_speech_prob": 0.001452437718398869}, {"id": 1652, "seek": 413284, "start": 4152.84, "end": 4154.74, "text": " of what's involved", "tokens": [51365, 295, 437, 311, 3288, 51460], "temperature": 0.0, "avg_logprob": -0.1399412684970432, "compression_ratio": 1.695167286245353, "no_speech_prob": 0.001452437718398869}, {"id": 1653, "seek": 413284, "start": 4154.84, "end": 4156.74, "text": " I see that bnvar shape is 1 by 64", "tokens": [51465, 286, 536, 300, 272, 77, 8517, 3909, 307, 502, 538, 12145, 51560], "temperature": 0.0, "avg_logprob": -0.1399412684970432, "compression_ratio": 1.695167286245353, "no_speech_prob": 0.001452437718398869}, {"id": 1654, "seek": 413284, "start": 4156.84, "end": 4158.74, "text": " so it's a row vector", "tokens": [51565, 370, 309, 311, 257, 5386, 8062, 51660], "temperature": 0.0, "avg_logprob": -0.1399412684970432, "compression_ratio": 1.695167286245353, "no_speech_prob": 0.001452437718398869}, {"id": 1655, "seek": 413284, "start": 4158.84, "end": 4160.74, "text": " and bndiff2.shape is 32 by 64", "tokens": [51665, 293, 272, 273, 3661, 17, 13, 82, 42406, 307, 8858, 538, 12145, 51760], "temperature": 0.0, "avg_logprob": -0.1399412684970432, "compression_ratio": 1.695167286245353, "no_speech_prob": 0.001452437718398869}, {"id": 1656, "seek": 413284, "start": 4160.84, "end": 4162.74, "text": " so I can see that", "tokens": [51765, 370, 286, 393, 536, 300, 51860], "temperature": 0.0, "avg_logprob": -0.1399412684970432, "compression_ratio": 1.695167286245353, "no_speech_prob": 0.001452437718398869}, {"id": 1657, "seek": 416284, "start": 4162.84, "end": 4164.74, "text": " so clearly here we're doing a sum", "tokens": [50365, 370, 4448, 510, 321, 434, 884, 257, 2408, 50460], "temperature": 0.0, "avg_logprob": -0.08457706420402217, "compression_ratio": 1.9915254237288136, "no_speech_prob": 0.0009505633497610688}, {"id": 1658, "seek": 416284, "start": 4164.84, "end": 4166.74, "text": " over the 0th axis", "tokens": [50465, 670, 264, 1958, 392, 10298, 50560], "temperature": 0.0, "avg_logprob": -0.08457706420402217, "compression_ratio": 1.9915254237288136, "no_speech_prob": 0.0009505633497610688}, {"id": 1659, "seek": 416284, "start": 4166.84, "end": 4168.74, "text": " to squash the first dimension", "tokens": [50565, 281, 30725, 264, 700, 10139, 50660], "temperature": 0.0, "avg_logprob": -0.08457706420402217, "compression_ratio": 1.9915254237288136, "no_speech_prob": 0.0009505633497610688}, {"id": 1660, "seek": 416284, "start": 4168.84, "end": 4170.74, "text": " of the shapes here", "tokens": [50665, 295, 264, 10854, 510, 50760], "temperature": 0.0, "avg_logprob": -0.08457706420402217, "compression_ratio": 1.9915254237288136, "no_speech_prob": 0.0009505633497610688}, {"id": 1661, "seek": 416284, "start": 4170.84, "end": 4172.74, "text": " using a sum", "tokens": [50765, 1228, 257, 2408, 50860], "temperature": 0.0, "avg_logprob": -0.08457706420402217, "compression_ratio": 1.9915254237288136, "no_speech_prob": 0.0009505633497610688}, {"id": 1662, "seek": 416284, "start": 4172.84, "end": 4174.74, "text": " so that right away actually hints to me", "tokens": [50865, 370, 300, 558, 1314, 767, 27271, 281, 385, 50960], "temperature": 0.0, "avg_logprob": -0.08457706420402217, "compression_ratio": 1.9915254237288136, "no_speech_prob": 0.0009505633497610688}, {"id": 1663, "seek": 416284, "start": 4174.84, "end": 4176.74, "text": " that there will be some kind of a replication", "tokens": [50965, 300, 456, 486, 312, 512, 733, 295, 257, 39911, 51060], "temperature": 0.0, "avg_logprob": -0.08457706420402217, "compression_ratio": 1.9915254237288136, "no_speech_prob": 0.0009505633497610688}, {"id": 1664, "seek": 416284, "start": 4176.84, "end": 4178.74, "text": " or broadcasting in the backward pass", "tokens": [51065, 420, 30024, 294, 264, 23897, 1320, 51160], "temperature": 0.0, "avg_logprob": -0.08457706420402217, "compression_ratio": 1.9915254237288136, "no_speech_prob": 0.0009505633497610688}, {"id": 1665, "seek": 416284, "start": 4178.84, "end": 4180.74, "text": " and maybe you're noticing the pattern here", "tokens": [51165, 293, 1310, 291, 434, 21814, 264, 5102, 510, 51260], "temperature": 0.0, "avg_logprob": -0.08457706420402217, "compression_ratio": 1.9915254237288136, "no_speech_prob": 0.0009505633497610688}, {"id": 1666, "seek": 416284, "start": 4180.84, "end": 4182.74, "text": " but basically any time you have a sum", "tokens": [51265, 457, 1936, 604, 565, 291, 362, 257, 2408, 51360], "temperature": 0.0, "avg_logprob": -0.08457706420402217, "compression_ratio": 1.9915254237288136, "no_speech_prob": 0.0009505633497610688}, {"id": 1667, "seek": 416284, "start": 4182.84, "end": 4184.74, "text": " in the forward pass", "tokens": [51365, 294, 264, 2128, 1320, 51460], "temperature": 0.0, "avg_logprob": -0.08457706420402217, "compression_ratio": 1.9915254237288136, "no_speech_prob": 0.0009505633497610688}, {"id": 1668, "seek": 416284, "start": 4184.84, "end": 4186.74, "text": " that turns into a replication", "tokens": [51465, 300, 4523, 666, 257, 39911, 51560], "temperature": 0.0, "avg_logprob": -0.08457706420402217, "compression_ratio": 1.9915254237288136, "no_speech_prob": 0.0009505633497610688}, {"id": 1669, "seek": 416284, "start": 4186.84, "end": 4188.74, "text": " or broadcasting in the backward pass", "tokens": [51565, 420, 30024, 294, 264, 23897, 1320, 51660], "temperature": 0.0, "avg_logprob": -0.08457706420402217, "compression_ratio": 1.9915254237288136, "no_speech_prob": 0.0009505633497610688}, {"id": 1670, "seek": 416284, "start": 4188.84, "end": 4190.74, "text": " along the same dimension", "tokens": [51665, 2051, 264, 912, 10139, 51760], "temperature": 0.0, "avg_logprob": -0.08457706420402217, "compression_ratio": 1.9915254237288136, "no_speech_prob": 0.0009505633497610688}, {"id": 1671, "seek": 416284, "start": 4190.84, "end": 4192.74, "text": " and conversely when we have a replication", "tokens": [51765, 293, 2615, 736, 562, 321, 362, 257, 39911, 51860], "temperature": 0.0, "avg_logprob": -0.08457706420402217, "compression_ratio": 1.9915254237288136, "no_speech_prob": 0.0009505633497610688}, {"id": 1672, "seek": 419274, "start": 4192.74, "end": 4194.639999999999, "text": " or a broadcasting in the forward pass", "tokens": [50365, 420, 257, 30024, 294, 264, 2128, 1320, 50460], "temperature": 0.0, "avg_logprob": -0.06524354308398801, "compression_ratio": 1.870503597122302, "no_speech_prob": 0.00031975904130376875}, {"id": 1673, "seek": 419274, "start": 4194.74, "end": 4196.639999999999, "text": " that indicates a variable reuse", "tokens": [50465, 300, 16203, 257, 7006, 26225, 50560], "temperature": 0.0, "avg_logprob": -0.06524354308398801, "compression_ratio": 1.870503597122302, "no_speech_prob": 0.00031975904130376875}, {"id": 1674, "seek": 419274, "start": 4196.74, "end": 4198.639999999999, "text": " and so in the backward pass", "tokens": [50565, 293, 370, 294, 264, 23897, 1320, 50660], "temperature": 0.0, "avg_logprob": -0.06524354308398801, "compression_ratio": 1.870503597122302, "no_speech_prob": 0.00031975904130376875}, {"id": 1675, "seek": 419274, "start": 4198.74, "end": 4200.639999999999, "text": " that turns into a sum", "tokens": [50665, 300, 4523, 666, 257, 2408, 50760], "temperature": 0.0, "avg_logprob": -0.06524354308398801, "compression_ratio": 1.870503597122302, "no_speech_prob": 0.00031975904130376875}, {"id": 1676, "seek": 419274, "start": 4200.74, "end": 4202.639999999999, "text": " over the exact same dimension", "tokens": [50765, 670, 264, 1900, 912, 10139, 50860], "temperature": 0.0, "avg_logprob": -0.06524354308398801, "compression_ratio": 1.870503597122302, "no_speech_prob": 0.00031975904130376875}, {"id": 1677, "seek": 419274, "start": 4202.74, "end": 4204.639999999999, "text": " and so hopefully you're noticing that duality", "tokens": [50865, 293, 370, 4696, 291, 434, 21814, 300, 11848, 507, 50960], "temperature": 0.0, "avg_logprob": -0.06524354308398801, "compression_ratio": 1.870503597122302, "no_speech_prob": 0.00031975904130376875}, {"id": 1678, "seek": 419274, "start": 4204.74, "end": 4206.639999999999, "text": " that those two are kind of like the opposites", "tokens": [50965, 300, 729, 732, 366, 733, 295, 411, 264, 4665, 3324, 51060], "temperature": 0.0, "avg_logprob": -0.06524354308398801, "compression_ratio": 1.870503597122302, "no_speech_prob": 0.00031975904130376875}, {"id": 1679, "seek": 419274, "start": 4206.74, "end": 4208.639999999999, "text": " of each other in the forward and backward pass", "tokens": [51065, 295, 1184, 661, 294, 264, 2128, 293, 23897, 1320, 51160], "temperature": 0.0, "avg_logprob": -0.06524354308398801, "compression_ratio": 1.870503597122302, "no_speech_prob": 0.00031975904130376875}, {"id": 1680, "seek": 419274, "start": 4208.74, "end": 4210.639999999999, "text": " now once we understand the shapes", "tokens": [51165, 586, 1564, 321, 1223, 264, 10854, 51260], "temperature": 0.0, "avg_logprob": -0.06524354308398801, "compression_ratio": 1.870503597122302, "no_speech_prob": 0.00031975904130376875}, {"id": 1681, "seek": 419274, "start": 4210.74, "end": 4212.639999999999, "text": " the next thing I like to do always", "tokens": [51265, 264, 958, 551, 286, 411, 281, 360, 1009, 51360], "temperature": 0.0, "avg_logprob": -0.06524354308398801, "compression_ratio": 1.870503597122302, "no_speech_prob": 0.00031975904130376875}, {"id": 1682, "seek": 419274, "start": 4212.74, "end": 4214.639999999999, "text": " is I like to look at a toy example in my head", "tokens": [51365, 307, 286, 411, 281, 574, 412, 257, 12058, 1365, 294, 452, 1378, 51460], "temperature": 0.0, "avg_logprob": -0.06524354308398801, "compression_ratio": 1.870503597122302, "no_speech_prob": 0.00031975904130376875}, {"id": 1683, "seek": 419274, "start": 4214.74, "end": 4216.639999999999, "text": " to sort of just like understand roughly how", "tokens": [51465, 281, 1333, 295, 445, 411, 1223, 9810, 577, 51560], "temperature": 0.0, "avg_logprob": -0.06524354308398801, "compression_ratio": 1.870503597122302, "no_speech_prob": 0.00031975904130376875}, {"id": 1684, "seek": 419274, "start": 4216.74, "end": 4218.639999999999, "text": " the variable dependencies go", "tokens": [51565, 264, 7006, 36606, 352, 51660], "temperature": 0.0, "avg_logprob": -0.06524354308398801, "compression_ratio": 1.870503597122302, "no_speech_prob": 0.00031975904130376875}, {"id": 1685, "seek": 419274, "start": 4218.74, "end": 4220.639999999999, "text": " in the mathematical formula", "tokens": [51665, 294, 264, 18894, 8513, 51760], "temperature": 0.0, "avg_logprob": -0.06524354308398801, "compression_ratio": 1.870503597122302, "no_speech_prob": 0.00031975904130376875}, {"id": 1686, "seek": 419274, "start": 4220.74, "end": 4222.639999999999, "text": " so here we have", "tokens": [51765, 370, 510, 321, 362, 51860], "temperature": 0.0, "avg_logprob": -0.06524354308398801, "compression_ratio": 1.870503597122302, "no_speech_prob": 0.00031975904130376875}, {"id": 1687, "seek": 422264, "start": 4222.64, "end": 4224.54, "text": " a two-dimensional array", "tokens": [50365, 257, 732, 12, 18759, 10225, 50460], "temperature": 0.0, "avg_logprob": -0.11310380964136835, "compression_ratio": 1.8622222222222222, "no_speech_prob": 0.0018837377429008484}, {"id": 1688, "seek": 422264, "start": 4224.64, "end": 4226.54, "text": " b and div 2 which we are scaling", "tokens": [50465, 272, 293, 3414, 568, 597, 321, 366, 21589, 50560], "temperature": 0.0, "avg_logprob": -0.11310380964136835, "compression_ratio": 1.8622222222222222, "no_speech_prob": 0.0018837377429008484}, {"id": 1689, "seek": 422264, "start": 4226.64, "end": 4228.54, "text": " by a constant and then we are summing", "tokens": [50565, 538, 257, 5754, 293, 550, 321, 366, 2408, 2810, 50660], "temperature": 0.0, "avg_logprob": -0.11310380964136835, "compression_ratio": 1.8622222222222222, "no_speech_prob": 0.0018837377429008484}, {"id": 1690, "seek": 422264, "start": 4228.64, "end": 4230.54, "text": " vertically over the columns", "tokens": [50665, 28450, 670, 264, 13766, 50760], "temperature": 0.0, "avg_logprob": -0.11310380964136835, "compression_ratio": 1.8622222222222222, "no_speech_prob": 0.0018837377429008484}, {"id": 1691, "seek": 422264, "start": 4230.64, "end": 4232.54, "text": " so if we have a 2x2 matrix a", "tokens": [50765, 370, 498, 321, 362, 257, 568, 87, 17, 8141, 257, 50860], "temperature": 0.0, "avg_logprob": -0.11310380964136835, "compression_ratio": 1.8622222222222222, "no_speech_prob": 0.0018837377429008484}, {"id": 1692, "seek": 422264, "start": 4232.64, "end": 4234.54, "text": " and then we sum over the columns", "tokens": [50865, 293, 550, 321, 2408, 670, 264, 13766, 50960], "temperature": 0.0, "avg_logprob": -0.11310380964136835, "compression_ratio": 1.8622222222222222, "no_speech_prob": 0.0018837377429008484}, {"id": 1693, "seek": 422264, "start": 4234.64, "end": 4236.54, "text": " and scale we would get a", "tokens": [50965, 293, 4373, 321, 576, 483, 257, 51060], "temperature": 0.0, "avg_logprob": -0.11310380964136835, "compression_ratio": 1.8622222222222222, "no_speech_prob": 0.0018837377429008484}, {"id": 1694, "seek": 422264, "start": 4236.64, "end": 4238.54, "text": " row vector b1 b2 and", "tokens": [51065, 5386, 8062, 272, 16, 272, 17, 293, 51160], "temperature": 0.0, "avg_logprob": -0.11310380964136835, "compression_ratio": 1.8622222222222222, "no_speech_prob": 0.0018837377429008484}, {"id": 1695, "seek": 422264, "start": 4238.64, "end": 4240.54, "text": " b1 depends on a in this way", "tokens": [51165, 272, 16, 5946, 322, 257, 294, 341, 636, 51260], "temperature": 0.0, "avg_logprob": -0.11310380964136835, "compression_ratio": 1.8622222222222222, "no_speech_prob": 0.0018837377429008484}, {"id": 1696, "seek": 422264, "start": 4240.64, "end": 4242.54, "text": " where it's just sum that is scaled", "tokens": [51265, 689, 309, 311, 445, 2408, 300, 307, 36039, 51360], "temperature": 0.0, "avg_logprob": -0.11310380964136835, "compression_ratio": 1.8622222222222222, "no_speech_prob": 0.0018837377429008484}, {"id": 1697, "seek": 422264, "start": 4242.64, "end": 4244.54, "text": " of a and b2", "tokens": [51365, 295, 257, 293, 272, 17, 51460], "temperature": 0.0, "avg_logprob": -0.11310380964136835, "compression_ratio": 1.8622222222222222, "no_speech_prob": 0.0018837377429008484}, {"id": 1698, "seek": 422264, "start": 4244.64, "end": 4246.54, "text": " in this way where it's the second column", "tokens": [51465, 294, 341, 636, 689, 309, 311, 264, 1150, 7738, 51560], "temperature": 0.0, "avg_logprob": -0.11310380964136835, "compression_ratio": 1.8622222222222222, "no_speech_prob": 0.0018837377429008484}, {"id": 1699, "seek": 422264, "start": 4246.64, "end": 4248.54, "text": " summed and scaled", "tokens": [51565, 2408, 1912, 293, 36039, 51660], "temperature": 0.0, "avg_logprob": -0.11310380964136835, "compression_ratio": 1.8622222222222222, "no_speech_prob": 0.0018837377429008484}, {"id": 1700, "seek": 422264, "start": 4248.64, "end": 4250.54, "text": " and so looking at this basically", "tokens": [51665, 293, 370, 1237, 412, 341, 1936, 51760], "temperature": 0.0, "avg_logprob": -0.11310380964136835, "compression_ratio": 1.8622222222222222, "no_speech_prob": 0.0018837377429008484}, {"id": 1701, "seek": 422264, "start": 4250.64, "end": 4252.54, "text": " what we want to do is", "tokens": [51765, 437, 321, 528, 281, 360, 307, 51860], "temperature": 0.0, "avg_logprob": -0.11310380964136835, "compression_ratio": 1.8622222222222222, "no_speech_prob": 0.0018837377429008484}, {"id": 1702, "seek": 425254, "start": 4252.54, "end": 4254.44, "text": " we have the derivatives on b1 and b2", "tokens": [50365, 321, 362, 264, 33733, 322, 272, 16, 293, 272, 17, 50460], "temperature": 0.0, "avg_logprob": -0.08227977279789192, "compression_ratio": 1.8073394495412844, "no_speech_prob": 0.0007824136991985142}, {"id": 1703, "seek": 425254, "start": 4254.54, "end": 4256.44, "text": " and we want to back propagate them into a's", "tokens": [50465, 293, 321, 528, 281, 646, 48256, 552, 666, 257, 311, 50560], "temperature": 0.0, "avg_logprob": -0.08227977279789192, "compression_ratio": 1.8073394495412844, "no_speech_prob": 0.0007824136991985142}, {"id": 1704, "seek": 425254, "start": 4256.54, "end": 4258.44, "text": " and so it's clear that just", "tokens": [50565, 293, 370, 309, 311, 1850, 300, 445, 50660], "temperature": 0.0, "avg_logprob": -0.08227977279789192, "compression_ratio": 1.8073394495412844, "no_speech_prob": 0.0007824136991985142}, {"id": 1705, "seek": 425254, "start": 4258.54, "end": 4260.44, "text": " differentiating in your head", "tokens": [50665, 27372, 990, 294, 428, 1378, 50760], "temperature": 0.0, "avg_logprob": -0.08227977279789192, "compression_ratio": 1.8073394495412844, "no_speech_prob": 0.0007824136991985142}, {"id": 1706, "seek": 425254, "start": 4260.54, "end": 4262.44, "text": " the local derivative here is 1 over n-1", "tokens": [50765, 264, 2654, 13760, 510, 307, 502, 670, 297, 12, 16, 50860], "temperature": 0.0, "avg_logprob": -0.08227977279789192, "compression_ratio": 1.8073394495412844, "no_speech_prob": 0.0007824136991985142}, {"id": 1707, "seek": 425254, "start": 4262.54, "end": 4264.44, "text": " times 1", "tokens": [50865, 1413, 502, 50960], "temperature": 0.0, "avg_logprob": -0.08227977279789192, "compression_ratio": 1.8073394495412844, "no_speech_prob": 0.0007824136991985142}, {"id": 1708, "seek": 425254, "start": 4264.54, "end": 4266.44, "text": " for each one of these a's", "tokens": [50965, 337, 1184, 472, 295, 613, 257, 311, 51060], "temperature": 0.0, "avg_logprob": -0.08227977279789192, "compression_ratio": 1.8073394495412844, "no_speech_prob": 0.0007824136991985142}, {"id": 1709, "seek": 425254, "start": 4266.54, "end": 4268.44, "text": " and", "tokens": [51065, 293, 51160], "temperature": 0.0, "avg_logprob": -0.08227977279789192, "compression_ratio": 1.8073394495412844, "no_speech_prob": 0.0007824136991985142}, {"id": 1710, "seek": 425254, "start": 4268.54, "end": 4270.44, "text": " basically the derivative of b1", "tokens": [51165, 1936, 264, 13760, 295, 272, 16, 51260], "temperature": 0.0, "avg_logprob": -0.08227977279789192, "compression_ratio": 1.8073394495412844, "no_speech_prob": 0.0007824136991985142}, {"id": 1711, "seek": 425254, "start": 4270.54, "end": 4272.44, "text": " has to flow through the columns of a", "tokens": [51265, 575, 281, 3095, 807, 264, 13766, 295, 257, 51360], "temperature": 0.0, "avg_logprob": -0.08227977279789192, "compression_ratio": 1.8073394495412844, "no_speech_prob": 0.0007824136991985142}, {"id": 1712, "seek": 425254, "start": 4272.54, "end": 4274.44, "text": " scaled by 1 over n-1", "tokens": [51365, 36039, 538, 502, 670, 297, 12, 16, 51460], "temperature": 0.0, "avg_logprob": -0.08227977279789192, "compression_ratio": 1.8073394495412844, "no_speech_prob": 0.0007824136991985142}, {"id": 1713, "seek": 425254, "start": 4274.54, "end": 4276.44, "text": " and that's roughly", "tokens": [51465, 293, 300, 311, 9810, 51560], "temperature": 0.0, "avg_logprob": -0.08227977279789192, "compression_ratio": 1.8073394495412844, "no_speech_prob": 0.0007824136991985142}, {"id": 1714, "seek": 425254, "start": 4276.54, "end": 4278.44, "text": " what's happening here", "tokens": [51565, 437, 311, 2737, 510, 51660], "temperature": 0.0, "avg_logprob": -0.08227977279789192, "compression_ratio": 1.8073394495412844, "no_speech_prob": 0.0007824136991985142}, {"id": 1715, "seek": 425254, "start": 4278.54, "end": 4280.44, "text": " so intuitively the derivative flow", "tokens": [51665, 370, 46506, 264, 13760, 3095, 51760], "temperature": 0.0, "avg_logprob": -0.08227977279789192, "compression_ratio": 1.8073394495412844, "no_speech_prob": 0.0007824136991985142}, {"id": 1716, "seek": 425254, "start": 4280.54, "end": 4282.44, "text": " tells us that", "tokens": [51765, 5112, 505, 300, 51860], "temperature": 0.0, "avg_logprob": -0.08227977279789192, "compression_ratio": 1.8073394495412844, "no_speech_prob": 0.0007824136991985142}, {"id": 1717, "seek": 428244, "start": 4282.44, "end": 4284.339999999999, "text": " db and df2", "tokens": [50365, 274, 65, 293, 274, 69, 17, 50460], "temperature": 0.0, "avg_logprob": -0.10239191130390317, "compression_ratio": 1.7465437788018434, "no_speech_prob": 0.00040684398845769465}, {"id": 1718, "seek": 428244, "start": 4284.44, "end": 4286.339999999999, "text": " will be", "tokens": [50465, 486, 312, 50560], "temperature": 0.0, "avg_logprob": -0.10239191130390317, "compression_ratio": 1.7465437788018434, "no_speech_prob": 0.00040684398845769465}, {"id": 1719, "seek": 428244, "start": 4286.44, "end": 4288.339999999999, "text": " the local derivative of this operation", "tokens": [50565, 264, 2654, 13760, 295, 341, 6916, 50660], "temperature": 0.0, "avg_logprob": -0.10239191130390317, "compression_ratio": 1.7465437788018434, "no_speech_prob": 0.00040684398845769465}, {"id": 1720, "seek": 428244, "start": 4288.44, "end": 4290.339999999999, "text": " and there are many ways to do this by the way", "tokens": [50665, 293, 456, 366, 867, 2098, 281, 360, 341, 538, 264, 636, 50760], "temperature": 0.0, "avg_logprob": -0.10239191130390317, "compression_ratio": 1.7465437788018434, "no_speech_prob": 0.00040684398845769465}, {"id": 1721, "seek": 428244, "start": 4290.44, "end": 4292.339999999999, "text": " but I like to do something like this", "tokens": [50765, 457, 286, 411, 281, 360, 746, 411, 341, 50860], "temperature": 0.0, "avg_logprob": -0.10239191130390317, "compression_ratio": 1.7465437788018434, "no_speech_prob": 0.00040684398845769465}, {"id": 1722, "seek": 428244, "start": 4292.44, "end": 4294.339999999999, "text": " torch dot ones like", "tokens": [50865, 27822, 5893, 2306, 411, 50960], "temperature": 0.0, "avg_logprob": -0.10239191130390317, "compression_ratio": 1.7465437788018434, "no_speech_prob": 0.00040684398845769465}, {"id": 1723, "seek": 428244, "start": 4294.44, "end": 4296.339999999999, "text": " of b and df2", "tokens": [50965, 295, 272, 293, 274, 69, 17, 51060], "temperature": 0.0, "avg_logprob": -0.10239191130390317, "compression_ratio": 1.7465437788018434, "no_speech_prob": 0.00040684398845769465}, {"id": 1724, "seek": 428244, "start": 4296.44, "end": 4298.339999999999, "text": " so I'll create a large array", "tokens": [51065, 370, 286, 603, 1884, 257, 2416, 10225, 51160], "temperature": 0.0, "avg_logprob": -0.10239191130390317, "compression_ratio": 1.7465437788018434, "no_speech_prob": 0.00040684398845769465}, {"id": 1725, "seek": 428244, "start": 4298.44, "end": 4300.339999999999, "text": " two dimensional of ones", "tokens": [51165, 732, 18795, 295, 2306, 51260], "temperature": 0.0, "avg_logprob": -0.10239191130390317, "compression_ratio": 1.7465437788018434, "no_speech_prob": 0.00040684398845769465}, {"id": 1726, "seek": 428244, "start": 4300.44, "end": 4302.339999999999, "text": " and then I will scale it", "tokens": [51265, 293, 550, 286, 486, 4373, 309, 51360], "temperature": 0.0, "avg_logprob": -0.10239191130390317, "compression_ratio": 1.7465437788018434, "no_speech_prob": 0.00040684398845769465}, {"id": 1727, "seek": 428244, "start": 4302.44, "end": 4304.339999999999, "text": " so 1.0 divided by n-1", "tokens": [51365, 370, 502, 13, 15, 6666, 538, 297, 12, 16, 51460], "temperature": 0.0, "avg_logprob": -0.10239191130390317, "compression_ratio": 1.7465437788018434, "no_speech_prob": 0.00040684398845769465}, {"id": 1728, "seek": 428244, "start": 4304.44, "end": 4306.339999999999, "text": " so this is an array of", "tokens": [51465, 370, 341, 307, 364, 10225, 295, 51560], "temperature": 0.0, "avg_logprob": -0.10239191130390317, "compression_ratio": 1.7465437788018434, "no_speech_prob": 0.00040684398845769465}, {"id": 1729, "seek": 428244, "start": 4306.44, "end": 4308.339999999999, "text": " 1 over n-1", "tokens": [51565, 502, 670, 297, 12, 16, 51660], "temperature": 0.0, "avg_logprob": -0.10239191130390317, "compression_ratio": 1.7465437788018434, "no_speech_prob": 0.00040684398845769465}, {"id": 1730, "seek": 428244, "start": 4308.44, "end": 4310.339999999999, "text": " and that's sort of like the local derivative", "tokens": [51665, 293, 300, 311, 1333, 295, 411, 264, 2654, 13760, 51760], "temperature": 0.0, "avg_logprob": -0.10239191130390317, "compression_ratio": 1.7465437788018434, "no_speech_prob": 0.00040684398845769465}, {"id": 1731, "seek": 428244, "start": 4310.44, "end": 4312.339999999999, "text": " and now for the chain rule", "tokens": [51765, 293, 586, 337, 264, 5021, 4978, 51860], "temperature": 0.0, "avg_logprob": -0.10239191130390317, "compression_ratio": 1.7465437788018434, "no_speech_prob": 0.00040684398845769465}, {"id": 1732, "seek": 431234, "start": 4312.34, "end": 4314.24, "text": " I will simply just multiply it by", "tokens": [50365, 286, 486, 2935, 445, 12972, 309, 538, 50460], "temperature": 0.0, "avg_logprob": -0.07749547277178083, "compression_ratio": 1.7636363636363637, "no_speech_prob": 0.0010955419857054949}, {"id": 1733, "seek": 431234, "start": 4314.34, "end": 4316.24, "text": " db and var", "tokens": [50465, 274, 65, 293, 1374, 50560], "temperature": 0.0, "avg_logprob": -0.07749547277178083, "compression_ratio": 1.7636363636363637, "no_speech_prob": 0.0010955419857054949}, {"id": 1734, "seek": 431234, "start": 4318.34, "end": 4320.24, "text": " and notice here what's going to happen", "tokens": [50665, 293, 3449, 510, 437, 311, 516, 281, 1051, 50760], "temperature": 0.0, "avg_logprob": -0.07749547277178083, "compression_ratio": 1.7636363636363637, "no_speech_prob": 0.0010955419857054949}, {"id": 1735, "seek": 431234, "start": 4320.34, "end": 4322.24, "text": " this is 32 by 64", "tokens": [50765, 341, 307, 8858, 538, 12145, 50860], "temperature": 0.0, "avg_logprob": -0.07749547277178083, "compression_ratio": 1.7636363636363637, "no_speech_prob": 0.0010955419857054949}, {"id": 1736, "seek": 431234, "start": 4322.34, "end": 4324.24, "text": " and this is just 1 by 64", "tokens": [50865, 293, 341, 307, 445, 502, 538, 12145, 50960], "temperature": 0.0, "avg_logprob": -0.07749547277178083, "compression_ratio": 1.7636363636363637, "no_speech_prob": 0.0010955419857054949}, {"id": 1737, "seek": 431234, "start": 4324.34, "end": 4326.24, "text": " so I'm letting the broadcasting", "tokens": [50965, 370, 286, 478, 8295, 264, 30024, 51060], "temperature": 0.0, "avg_logprob": -0.07749547277178083, "compression_ratio": 1.7636363636363637, "no_speech_prob": 0.0010955419857054949}, {"id": 1738, "seek": 431234, "start": 4326.34, "end": 4328.24, "text": " do the replication", "tokens": [51065, 360, 264, 39911, 51160], "temperature": 0.0, "avg_logprob": -0.07749547277178083, "compression_ratio": 1.7636363636363637, "no_speech_prob": 0.0010955419857054949}, {"id": 1739, "seek": 431234, "start": 4328.34, "end": 4330.24, "text": " because internally in pytorch", "tokens": [51165, 570, 19501, 294, 25878, 284, 339, 51260], "temperature": 0.0, "avg_logprob": -0.07749547277178083, "compression_ratio": 1.7636363636363637, "no_speech_prob": 0.0010955419857054949}, {"id": 1740, "seek": 431234, "start": 4330.34, "end": 4332.24, "text": " basically db and var", "tokens": [51265, 1936, 274, 65, 293, 1374, 51360], "temperature": 0.0, "avg_logprob": -0.07749547277178083, "compression_ratio": 1.7636363636363637, "no_speech_prob": 0.0010955419857054949}, {"id": 1741, "seek": 431234, "start": 4332.34, "end": 4334.24, "text": " which is 1 by 64 row vector", "tokens": [51365, 597, 307, 502, 538, 12145, 5386, 8062, 51460], "temperature": 0.0, "avg_logprob": -0.07749547277178083, "compression_ratio": 1.7636363636363637, "no_speech_prob": 0.0010955419857054949}, {"id": 1742, "seek": 431234, "start": 4334.34, "end": 4336.24, "text": " will in this multiplication get", "tokens": [51465, 486, 294, 341, 27290, 483, 51560], "temperature": 0.0, "avg_logprob": -0.07749547277178083, "compression_ratio": 1.7636363636363637, "no_speech_prob": 0.0010955419857054949}, {"id": 1743, "seek": 431234, "start": 4336.34, "end": 4338.24, "text": " copied vertically", "tokens": [51565, 25365, 28450, 51660], "temperature": 0.0, "avg_logprob": -0.07749547277178083, "compression_ratio": 1.7636363636363637, "no_speech_prob": 0.0010955419857054949}, {"id": 1744, "seek": 431234, "start": 4338.34, "end": 4340.24, "text": " until the two are of the same shape", "tokens": [51665, 1826, 264, 732, 366, 295, 264, 912, 3909, 51760], "temperature": 0.0, "avg_logprob": -0.07749547277178083, "compression_ratio": 1.7636363636363637, "no_speech_prob": 0.0010955419857054949}, {"id": 1745, "seek": 431234, "start": 4340.34, "end": 4342.24, "text": " and then there will be an elementwise multiply", "tokens": [51765, 293, 550, 456, 486, 312, 364, 4478, 3711, 12972, 51860], "temperature": 0.0, "avg_logprob": -0.07749547277178083, "compression_ratio": 1.7636363636363637, "no_speech_prob": 0.0010955419857054949}, {"id": 1746, "seek": 434224, "start": 4342.24, "end": 4344.139999999999, "text": " so the broadcasting is basically doing the replication", "tokens": [50365, 370, 264, 30024, 307, 1936, 884, 264, 39911, 50460], "temperature": 0.0, "avg_logprob": -0.06667523017296424, "compression_ratio": 1.8646288209606987, "no_speech_prob": 0.0009307161089964211}, {"id": 1747, "seek": 434224, "start": 4344.24, "end": 4346.139999999999, "text": " and I will end up", "tokens": [50465, 293, 286, 486, 917, 493, 50560], "temperature": 0.0, "avg_logprob": -0.06667523017296424, "compression_ratio": 1.8646288209606987, "no_speech_prob": 0.0009307161089964211}, {"id": 1748, "seek": 434224, "start": 4346.24, "end": 4348.139999999999, "text": " with the derivatives of db and df2", "tokens": [50565, 365, 264, 33733, 295, 274, 65, 293, 274, 69, 17, 50660], "temperature": 0.0, "avg_logprob": -0.06667523017296424, "compression_ratio": 1.8646288209606987, "no_speech_prob": 0.0009307161089964211}, {"id": 1749, "seek": 434224, "start": 4348.24, "end": 4350.139999999999, "text": " here", "tokens": [50665, 510, 50760], "temperature": 0.0, "avg_logprob": -0.06667523017296424, "compression_ratio": 1.8646288209606987, "no_speech_prob": 0.0009307161089964211}, {"id": 1750, "seek": 434224, "start": 4350.24, "end": 4352.139999999999, "text": " so this is the candidate solution", "tokens": [50765, 370, 341, 307, 264, 11532, 3827, 50860], "temperature": 0.0, "avg_logprob": -0.06667523017296424, "compression_ratio": 1.8646288209606987, "no_speech_prob": 0.0009307161089964211}, {"id": 1751, "seek": 434224, "start": 4352.24, "end": 4354.139999999999, "text": " let's bring it down here", "tokens": [50865, 718, 311, 1565, 309, 760, 510, 50960], "temperature": 0.0, "avg_logprob": -0.06667523017296424, "compression_ratio": 1.8646288209606987, "no_speech_prob": 0.0009307161089964211}, {"id": 1752, "seek": 434224, "start": 4354.24, "end": 4356.139999999999, "text": " let's uncomment this line", "tokens": [50965, 718, 311, 8585, 518, 341, 1622, 51060], "temperature": 0.0, "avg_logprob": -0.06667523017296424, "compression_ratio": 1.8646288209606987, "no_speech_prob": 0.0009307161089964211}, {"id": 1753, "seek": 434224, "start": 4356.24, "end": 4358.139999999999, "text": " where we check it", "tokens": [51065, 689, 321, 1520, 309, 51160], "temperature": 0.0, "avg_logprob": -0.06667523017296424, "compression_ratio": 1.8646288209606987, "no_speech_prob": 0.0009307161089964211}, {"id": 1754, "seek": 434224, "start": 4358.24, "end": 4360.139999999999, "text": " and let's hope for the best", "tokens": [51165, 293, 718, 311, 1454, 337, 264, 1151, 51260], "temperature": 0.0, "avg_logprob": -0.06667523017296424, "compression_ratio": 1.8646288209606987, "no_speech_prob": 0.0009307161089964211}, {"id": 1755, "seek": 434224, "start": 4360.24, "end": 4362.139999999999, "text": " and indeed we see that this is the correct formula", "tokens": [51265, 293, 6451, 321, 536, 300, 341, 307, 264, 3006, 8513, 51360], "temperature": 0.0, "avg_logprob": -0.06667523017296424, "compression_ratio": 1.8646288209606987, "no_speech_prob": 0.0009307161089964211}, {"id": 1756, "seek": 434224, "start": 4362.24, "end": 4364.139999999999, "text": " next up let's differentiate here", "tokens": [51365, 958, 493, 718, 311, 23203, 510, 51460], "temperature": 0.0, "avg_logprob": -0.06667523017296424, "compression_ratio": 1.8646288209606987, "no_speech_prob": 0.0009307161089964211}, {"id": 1757, "seek": 434224, "start": 4364.24, "end": 4366.139999999999, "text": " into b and df", "tokens": [51465, 666, 272, 293, 274, 69, 51560], "temperature": 0.0, "avg_logprob": -0.06667523017296424, "compression_ratio": 1.8646288209606987, "no_speech_prob": 0.0009307161089964211}, {"id": 1758, "seek": 434224, "start": 4366.24, "end": 4368.139999999999, "text": " so here we have that b and df", "tokens": [51565, 370, 510, 321, 362, 300, 272, 293, 274, 69, 51660], "temperature": 0.0, "avg_logprob": -0.06667523017296424, "compression_ratio": 1.8646288209606987, "no_speech_prob": 0.0009307161089964211}, {"id": 1759, "seek": 434224, "start": 4368.24, "end": 4370.139999999999, "text": " is elementwise squared to create b and df2", "tokens": [51665, 307, 4478, 3711, 8889, 281, 1884, 272, 293, 274, 69, 17, 51760], "temperature": 0.0, "avg_logprob": -0.06667523017296424, "compression_ratio": 1.8646288209606987, "no_speech_prob": 0.0009307161089964211}, {"id": 1760, "seek": 434224, "start": 4370.24, "end": 4372.139999999999, "text": " so this is a", "tokens": [51765, 370, 341, 307, 257, 51860], "temperature": 0.0, "avg_logprob": -0.06667523017296424, "compression_ratio": 1.8646288209606987, "no_speech_prob": 0.0009307161089964211}, {"id": 1761, "seek": 437214, "start": 4372.14, "end": 4374.04, "text": " relatively simple derivative", "tokens": [50365, 7226, 2199, 13760, 50460], "temperature": 0.0, "avg_logprob": -0.057860810309648514, "compression_ratio": 1.9162995594713657, "no_speech_prob": 0.00034204465919174254}, {"id": 1762, "seek": 437214, "start": 4374.14, "end": 4376.04, "text": " because it's a simple elementwise operation", "tokens": [50465, 570, 309, 311, 257, 2199, 4478, 3711, 6916, 50560], "temperature": 0.0, "avg_logprob": -0.057860810309648514, "compression_ratio": 1.9162995594713657, "no_speech_prob": 0.00034204465919174254}, {"id": 1763, "seek": 437214, "start": 4376.14, "end": 4378.04, "text": " so it's kind of like the scalar case", "tokens": [50565, 370, 309, 311, 733, 295, 411, 264, 39684, 1389, 50660], "temperature": 0.0, "avg_logprob": -0.057860810309648514, "compression_ratio": 1.9162995594713657, "no_speech_prob": 0.00034204465919174254}, {"id": 1764, "seek": 437214, "start": 4378.14, "end": 4380.04, "text": " and we have that db and df", "tokens": [50665, 293, 321, 362, 300, 274, 65, 293, 274, 69, 50760], "temperature": 0.0, "avg_logprob": -0.057860810309648514, "compression_ratio": 1.9162995594713657, "no_speech_prob": 0.00034204465919174254}, {"id": 1765, "seek": 437214, "start": 4380.14, "end": 4382.04, "text": " should be", "tokens": [50765, 820, 312, 50860], "temperature": 0.0, "avg_logprob": -0.057860810309648514, "compression_ratio": 1.9162995594713657, "no_speech_prob": 0.00034204465919174254}, {"id": 1766, "seek": 437214, "start": 4382.14, "end": 4384.04, "text": " if this is x squared", "tokens": [50865, 498, 341, 307, 2031, 8889, 50960], "temperature": 0.0, "avg_logprob": -0.057860810309648514, "compression_ratio": 1.9162995594713657, "no_speech_prob": 0.00034204465919174254}, {"id": 1767, "seek": 437214, "start": 4384.14, "end": 4386.04, "text": " then the derivative of this is 2x", "tokens": [50965, 550, 264, 13760, 295, 341, 307, 568, 87, 51060], "temperature": 0.0, "avg_logprob": -0.057860810309648514, "compression_ratio": 1.9162995594713657, "no_speech_prob": 0.00034204465919174254}, {"id": 1768, "seek": 437214, "start": 4386.14, "end": 4388.04, "text": " so it's simply 2 times b and df", "tokens": [51065, 370, 309, 311, 2935, 568, 1413, 272, 293, 274, 69, 51160], "temperature": 0.0, "avg_logprob": -0.057860810309648514, "compression_ratio": 1.9162995594713657, "no_speech_prob": 0.00034204465919174254}, {"id": 1769, "seek": 437214, "start": 4388.14, "end": 4390.04, "text": " that's the local derivative", "tokens": [51165, 300, 311, 264, 2654, 13760, 51260], "temperature": 0.0, "avg_logprob": -0.057860810309648514, "compression_ratio": 1.9162995594713657, "no_speech_prob": 0.00034204465919174254}, {"id": 1770, "seek": 437214, "start": 4390.14, "end": 4392.04, "text": " and then times chain rule", "tokens": [51265, 293, 550, 1413, 5021, 4978, 51360], "temperature": 0.0, "avg_logprob": -0.057860810309648514, "compression_ratio": 1.9162995594713657, "no_speech_prob": 0.00034204465919174254}, {"id": 1771, "seek": 437214, "start": 4392.14, "end": 4394.04, "text": " and the shape of these is the same", "tokens": [51365, 293, 264, 3909, 295, 613, 307, 264, 912, 51460], "temperature": 0.0, "avg_logprob": -0.057860810309648514, "compression_ratio": 1.9162995594713657, "no_speech_prob": 0.00034204465919174254}, {"id": 1772, "seek": 437214, "start": 4394.14, "end": 4396.04, "text": " they are of the same shape", "tokens": [51465, 436, 366, 295, 264, 912, 3909, 51560], "temperature": 0.0, "avg_logprob": -0.057860810309648514, "compression_ratio": 1.9162995594713657, "no_speech_prob": 0.00034204465919174254}, {"id": 1773, "seek": 437214, "start": 4396.14, "end": 4398.04, "text": " so times this", "tokens": [51565, 370, 1413, 341, 51660], "temperature": 0.0, "avg_logprob": -0.057860810309648514, "compression_ratio": 1.9162995594713657, "no_speech_prob": 0.00034204465919174254}, {"id": 1774, "seek": 437214, "start": 4398.14, "end": 4400.04, "text": " so that's the backward pass for this variable", "tokens": [51665, 370, 300, 311, 264, 23897, 1320, 337, 341, 7006, 51760], "temperature": 0.0, "avg_logprob": -0.057860810309648514, "compression_ratio": 1.9162995594713657, "no_speech_prob": 0.00034204465919174254}, {"id": 1775, "seek": 437214, "start": 4400.14, "end": 4402.04, "text": " let me bring it down here", "tokens": [51765, 718, 385, 1565, 309, 760, 510, 51860], "temperature": 0.0, "avg_logprob": -0.057860810309648514, "compression_ratio": 1.9162995594713657, "no_speech_prob": 0.00034204465919174254}, {"id": 1776, "seek": 440204, "start": 4402.04, "end": 4403.94, "text": " I've already calculated db and df", "tokens": [50365, 286, 600, 1217, 15598, 274, 65, 293, 274, 69, 50460], "temperature": 0.0, "avg_logprob": -0.11574293283315805, "compression_ratio": 1.78099173553719, "no_speech_prob": 0.0031543986406177282}, {"id": 1777, "seek": 440204, "start": 4404.04, "end": 4405.94, "text": " so this is just the end of the other", "tokens": [50465, 370, 341, 307, 445, 264, 917, 295, 264, 661, 50560], "temperature": 0.0, "avg_logprob": -0.11574293283315805, "compression_ratio": 1.78099173553719, "no_speech_prob": 0.0031543986406177282}, {"id": 1778, "seek": 440204, "start": 4406.04, "end": 4407.94, "text": " branch coming back to b and df", "tokens": [50565, 9819, 1348, 646, 281, 272, 293, 274, 69, 50660], "temperature": 0.0, "avg_logprob": -0.11574293283315805, "compression_ratio": 1.78099173553719, "no_speech_prob": 0.0031543986406177282}, {"id": 1779, "seek": 440204, "start": 4408.04, "end": 4409.94, "text": " because b and df", "tokens": [50665, 570, 272, 293, 274, 69, 50760], "temperature": 0.0, "avg_logprob": -0.11574293283315805, "compression_ratio": 1.78099173553719, "no_speech_prob": 0.0031543986406177282}, {"id": 1780, "seek": 440204, "start": 4410.04, "end": 4411.94, "text": " were already back propagated to", "tokens": [50765, 645, 1217, 646, 12425, 770, 281, 50860], "temperature": 0.0, "avg_logprob": -0.11574293283315805, "compression_ratio": 1.78099173553719, "no_speech_prob": 0.0031543986406177282}, {"id": 1781, "seek": 440204, "start": 4412.04, "end": 4413.94, "text": " way over here", "tokens": [50865, 636, 670, 510, 50960], "temperature": 0.0, "avg_logprob": -0.11574293283315805, "compression_ratio": 1.78099173553719, "no_speech_prob": 0.0031543986406177282}, {"id": 1782, "seek": 440204, "start": 4414.04, "end": 4415.94, "text": " from b and raw", "tokens": [50965, 490, 272, 293, 8936, 51060], "temperature": 0.0, "avg_logprob": -0.11574293283315805, "compression_ratio": 1.78099173553719, "no_speech_prob": 0.0031543986406177282}, {"id": 1783, "seek": 440204, "start": 4416.04, "end": 4417.94, "text": " so we now completed the second branch", "tokens": [51065, 370, 321, 586, 7365, 264, 1150, 9819, 51160], "temperature": 0.0, "avg_logprob": -0.11574293283315805, "compression_ratio": 1.78099173553719, "no_speech_prob": 0.0031543986406177282}, {"id": 1784, "seek": 440204, "start": 4418.04, "end": 4419.94, "text": " and so that's why I have to do plus equals", "tokens": [51165, 293, 370, 300, 311, 983, 286, 362, 281, 360, 1804, 6915, 51260], "temperature": 0.0, "avg_logprob": -0.11574293283315805, "compression_ratio": 1.78099173553719, "no_speech_prob": 0.0031543986406177282}, {"id": 1785, "seek": 440204, "start": 4420.04, "end": 4421.94, "text": " and if you recall", "tokens": [51265, 293, 498, 291, 9901, 51360], "temperature": 0.0, "avg_logprob": -0.11574293283315805, "compression_ratio": 1.78099173553719, "no_speech_prob": 0.0031543986406177282}, {"id": 1786, "seek": 440204, "start": 4422.04, "end": 4423.94, "text": " we had an incorrect derivative for b and df before", "tokens": [51365, 321, 632, 364, 18424, 13760, 337, 272, 293, 274, 69, 949, 51460], "temperature": 0.0, "avg_logprob": -0.11574293283315805, "compression_ratio": 1.78099173553719, "no_speech_prob": 0.0031543986406177282}, {"id": 1787, "seek": 440204, "start": 4424.04, "end": 4425.94, "text": " and I'm hoping that once we append", "tokens": [51465, 293, 286, 478, 7159, 300, 1564, 321, 34116, 51560], "temperature": 0.0, "avg_logprob": -0.11574293283315805, "compression_ratio": 1.78099173553719, "no_speech_prob": 0.0031543986406177282}, {"id": 1788, "seek": 440204, "start": 4426.04, "end": 4427.94, "text": " this last missing piece", "tokens": [51565, 341, 1036, 5361, 2522, 51660], "temperature": 0.0, "avg_logprob": -0.11574293283315805, "compression_ratio": 1.78099173553719, "no_speech_prob": 0.0031543986406177282}, {"id": 1789, "seek": 440204, "start": 4428.04, "end": 4429.94, "text": " we have the exact correctness", "tokens": [51665, 321, 362, 264, 1900, 3006, 1287, 51760], "temperature": 0.0, "avg_logprob": -0.11574293283315805, "compression_ratio": 1.78099173553719, "no_speech_prob": 0.0031543986406177282}, {"id": 1790, "seek": 440204, "start": 4430.04, "end": 4431.94, "text": " so let's run", "tokens": [51765, 370, 718, 311, 1190, 51860], "temperature": 0.0, "avg_logprob": -0.11574293283315805, "compression_ratio": 1.78099173553719, "no_speech_prob": 0.0031543986406177282}, {"id": 1791, "seek": 443204, "start": 4432.04, "end": 4433.94, "text": " and b and df now actually shows", "tokens": [50365, 293, 272, 293, 274, 69, 586, 767, 3110, 50460], "temperature": 0.0, "avg_logprob": -0.09627876743193596, "compression_ratio": 1.6991869918699187, "no_speech_prob": 0.0007952567539177835}, {"id": 1792, "seek": 443204, "start": 4434.04, "end": 4435.94, "text": " the exact correct derivative", "tokens": [50465, 264, 1900, 3006, 13760, 50560], "temperature": 0.0, "avg_logprob": -0.09627876743193596, "compression_ratio": 1.6991869918699187, "no_speech_prob": 0.0007952567539177835}, {"id": 1793, "seek": 443204, "start": 4436.04, "end": 4437.94, "text": " so that's comforting", "tokens": [50565, 370, 300, 311, 38439, 50660], "temperature": 0.0, "avg_logprob": -0.09627876743193596, "compression_ratio": 1.6991869918699187, "no_speech_prob": 0.0007952567539177835}, {"id": 1794, "seek": 443204, "start": 4438.04, "end": 4439.94, "text": " okay so let's now back propagate", "tokens": [50665, 1392, 370, 718, 311, 586, 646, 48256, 50760], "temperature": 0.0, "avg_logprob": -0.09627876743193596, "compression_ratio": 1.6991869918699187, "no_speech_prob": 0.0007952567539177835}, {"id": 1795, "seek": 443204, "start": 4440.04, "end": 4441.94, "text": " through this line here", "tokens": [50765, 807, 341, 1622, 510, 50860], "temperature": 0.0, "avg_logprob": -0.09627876743193596, "compression_ratio": 1.6991869918699187, "no_speech_prob": 0.0007952567539177835}, {"id": 1796, "seek": 443204, "start": 4442.04, "end": 4443.94, "text": " the first thing we do of course", "tokens": [50865, 264, 700, 551, 321, 360, 295, 1164, 50960], "temperature": 0.0, "avg_logprob": -0.09627876743193596, "compression_ratio": 1.6991869918699187, "no_speech_prob": 0.0007952567539177835}, {"id": 1797, "seek": 443204, "start": 4444.04, "end": 4445.94, "text": " is we check the shapes", "tokens": [50965, 307, 321, 1520, 264, 10854, 51060], "temperature": 0.0, "avg_logprob": -0.09627876743193596, "compression_ratio": 1.6991869918699187, "no_speech_prob": 0.0007952567539177835}, {"id": 1798, "seek": 443204, "start": 4446.04, "end": 4447.94, "text": " and I wrote them out here", "tokens": [51065, 293, 286, 4114, 552, 484, 510, 51160], "temperature": 0.0, "avg_logprob": -0.09627876743193596, "compression_ratio": 1.6991869918699187, "no_speech_prob": 0.0007952567539177835}, {"id": 1799, "seek": 443204, "start": 4448.04, "end": 4449.94, "text": " and basically the shape of this", "tokens": [51165, 293, 1936, 264, 3909, 295, 341, 51260], "temperature": 0.0, "avg_logprob": -0.09627876743193596, "compression_ratio": 1.6991869918699187, "no_speech_prob": 0.0007952567539177835}, {"id": 1800, "seek": 443204, "start": 4450.04, "end": 4451.94, "text": " is 32 by 64", "tokens": [51265, 307, 8858, 538, 12145, 51360], "temperature": 0.0, "avg_logprob": -0.09627876743193596, "compression_ratio": 1.6991869918699187, "no_speech_prob": 0.0007952567539177835}, {"id": 1801, "seek": 443204, "start": 4452.04, "end": 4453.94, "text": " h pre bn is the same shape", "tokens": [51365, 276, 659, 272, 77, 307, 264, 912, 3909, 51460], "temperature": 0.0, "avg_logprob": -0.09627876743193596, "compression_ratio": 1.6991869918699187, "no_speech_prob": 0.0007952567539177835}, {"id": 1802, "seek": 443204, "start": 4454.04, "end": 4455.94, "text": " but b and mean i is a row vector", "tokens": [51465, 457, 272, 293, 914, 741, 307, 257, 5386, 8062, 51560], "temperature": 0.0, "avg_logprob": -0.09627876743193596, "compression_ratio": 1.6991869918699187, "no_speech_prob": 0.0007952567539177835}, {"id": 1803, "seek": 443204, "start": 4456.04, "end": 4457.94, "text": " 1 by 64", "tokens": [51565, 502, 538, 12145, 51660], "temperature": 0.0, "avg_logprob": -0.09627876743193596, "compression_ratio": 1.6991869918699187, "no_speech_prob": 0.0007952567539177835}, {"id": 1804, "seek": 443204, "start": 4458.04, "end": 4459.94, "text": " so this minus here will actually do broadcasting", "tokens": [51665, 370, 341, 3175, 510, 486, 767, 360, 30024, 51760], "temperature": 0.0, "avg_logprob": -0.09627876743193596, "compression_ratio": 1.6991869918699187, "no_speech_prob": 0.0007952567539177835}, {"id": 1805, "seek": 443204, "start": 4460.04, "end": 4461.94, "text": " and so we have to be careful with that", "tokens": [51765, 293, 370, 321, 362, 281, 312, 5026, 365, 300, 51860], "temperature": 0.0, "avg_logprob": -0.09627876743193596, "compression_ratio": 1.6991869918699187, "no_speech_prob": 0.0007952567539177835}, {"id": 1806, "seek": 446194, "start": 4461.94, "end": 4463.839999999999, "text": " again because of the duality", "tokens": [50365, 797, 570, 295, 264, 11848, 507, 50460], "temperature": 0.0, "avg_logprob": -0.06887297984982325, "compression_ratio": 1.8857142857142857, "no_speech_prob": 0.0012307451106607914}, {"id": 1807, "seek": 446194, "start": 4463.94, "end": 4465.839999999999, "text": " a broadcasting in the forward pass", "tokens": [50465, 257, 30024, 294, 264, 2128, 1320, 50560], "temperature": 0.0, "avg_logprob": -0.06887297984982325, "compression_ratio": 1.8857142857142857, "no_speech_prob": 0.0012307451106607914}, {"id": 1808, "seek": 446194, "start": 4465.94, "end": 4467.839999999999, "text": " means a variable reuse", "tokens": [50565, 1355, 257, 7006, 26225, 50660], "temperature": 0.0, "avg_logprob": -0.06887297984982325, "compression_ratio": 1.8857142857142857, "no_speech_prob": 0.0012307451106607914}, {"id": 1809, "seek": 446194, "start": 4467.94, "end": 4469.839999999999, "text": " and therefore there will be a sum", "tokens": [50665, 293, 4412, 456, 486, 312, 257, 2408, 50760], "temperature": 0.0, "avg_logprob": -0.06887297984982325, "compression_ratio": 1.8857142857142857, "no_speech_prob": 0.0012307451106607914}, {"id": 1810, "seek": 446194, "start": 4469.94, "end": 4471.839999999999, "text": " in the backward pass", "tokens": [50765, 294, 264, 23897, 1320, 50860], "temperature": 0.0, "avg_logprob": -0.06887297984982325, "compression_ratio": 1.8857142857142857, "no_speech_prob": 0.0012307451106607914}, {"id": 1811, "seek": 446194, "start": 4471.94, "end": 4473.839999999999, "text": " so let's write out the backward pass here now", "tokens": [50865, 370, 718, 311, 2464, 484, 264, 23897, 1320, 510, 586, 50960], "temperature": 0.0, "avg_logprob": -0.06887297984982325, "compression_ratio": 1.8857142857142857, "no_speech_prob": 0.0012307451106607914}, {"id": 1812, "seek": 446194, "start": 4473.94, "end": 4475.839999999999, "text": " back propagate into the h pre bn", "tokens": [50965, 646, 48256, 666, 264, 276, 659, 272, 77, 51060], "temperature": 0.0, "avg_logprob": -0.06887297984982325, "compression_ratio": 1.8857142857142857, "no_speech_prob": 0.0012307451106607914}, {"id": 1813, "seek": 446194, "start": 4475.94, "end": 4477.839999999999, "text": " because these are the same shape", "tokens": [51065, 570, 613, 366, 264, 912, 3909, 51160], "temperature": 0.0, "avg_logprob": -0.06887297984982325, "compression_ratio": 1.8857142857142857, "no_speech_prob": 0.0012307451106607914}, {"id": 1814, "seek": 446194, "start": 4477.94, "end": 4479.839999999999, "text": " then the local derivative", "tokens": [51165, 550, 264, 2654, 13760, 51260], "temperature": 0.0, "avg_logprob": -0.06887297984982325, "compression_ratio": 1.8857142857142857, "no_speech_prob": 0.0012307451106607914}, {"id": 1815, "seek": 446194, "start": 4479.94, "end": 4481.839999999999, "text": " for each one of the elements here", "tokens": [51265, 337, 1184, 472, 295, 264, 4959, 510, 51360], "temperature": 0.0, "avg_logprob": -0.06887297984982325, "compression_ratio": 1.8857142857142857, "no_speech_prob": 0.0012307451106607914}, {"id": 1816, "seek": 446194, "start": 4481.94, "end": 4483.839999999999, "text": " is just 1 for the corresponding element", "tokens": [51365, 307, 445, 502, 337, 264, 11760, 4478, 51460], "temperature": 0.0, "avg_logprob": -0.06887297984982325, "compression_ratio": 1.8857142857142857, "no_speech_prob": 0.0012307451106607914}, {"id": 1817, "seek": 446194, "start": 4483.94, "end": 4485.839999999999, "text": " in here", "tokens": [51465, 294, 510, 51560], "temperature": 0.0, "avg_logprob": -0.06887297984982325, "compression_ratio": 1.8857142857142857, "no_speech_prob": 0.0012307451106607914}, {"id": 1818, "seek": 446194, "start": 4485.94, "end": 4487.839999999999, "text": " so basically what this means is that", "tokens": [51565, 370, 1936, 437, 341, 1355, 307, 300, 51660], "temperature": 0.0, "avg_logprob": -0.06887297984982325, "compression_ratio": 1.8857142857142857, "no_speech_prob": 0.0012307451106607914}, {"id": 1819, "seek": 446194, "start": 4487.94, "end": 4489.839999999999, "text": " the gradient just simply copies", "tokens": [51665, 264, 16235, 445, 2935, 14341, 51760], "temperature": 0.0, "avg_logprob": -0.06887297984982325, "compression_ratio": 1.8857142857142857, "no_speech_prob": 0.0012307451106607914}, {"id": 1820, "seek": 446194, "start": 4489.94, "end": 4491.839999999999, "text": " it's just a variable assignment", "tokens": [51765, 309, 311, 445, 257, 7006, 15187, 51860], "temperature": 0.0, "avg_logprob": -0.06887297984982325, "compression_ratio": 1.8857142857142857, "no_speech_prob": 0.0012307451106607914}, {"id": 1821, "seek": 449184, "start": 4491.84, "end": 4493.74, "text": " so I'm just going to clone this tensor", "tokens": [50365, 370, 286, 478, 445, 516, 281, 26506, 341, 40863, 50460], "temperature": 0.0, "avg_logprob": -0.14928994178771973, "compression_ratio": 1.6096256684491979, "no_speech_prob": 0.0002900872495956719}, {"id": 1822, "seek": 449184, "start": 4493.84, "end": 4495.74, "text": " just for safety to create an exact copy", "tokens": [50465, 445, 337, 4514, 281, 1884, 364, 1900, 5055, 50560], "temperature": 0.0, "avg_logprob": -0.14928994178771973, "compression_ratio": 1.6096256684491979, "no_speech_prob": 0.0002900872495956719}, {"id": 1823, "seek": 449184, "start": 4495.84, "end": 4497.74, "text": " of db and diff", "tokens": [50565, 295, 274, 65, 293, 7593, 50660], "temperature": 0.0, "avg_logprob": -0.14928994178771973, "compression_ratio": 1.6096256684491979, "no_speech_prob": 0.0002900872495956719}, {"id": 1824, "seek": 449184, "start": 4497.84, "end": 4499.74, "text": " and then here", "tokens": [50665, 293, 550, 510, 50760], "temperature": 0.0, "avg_logprob": -0.14928994178771973, "compression_ratio": 1.6096256684491979, "no_speech_prob": 0.0002900872495956719}, {"id": 1825, "seek": 449184, "start": 4499.84, "end": 4501.74, "text": " to back propagate into this one", "tokens": [50765, 281, 646, 48256, 666, 341, 472, 50860], "temperature": 0.0, "avg_logprob": -0.14928994178771973, "compression_ratio": 1.6096256684491979, "no_speech_prob": 0.0002900872495956719}, {"id": 1826, "seek": 449184, "start": 4501.84, "end": 4503.74, "text": " what I'm inclined to do here", "tokens": [50865, 437, 286, 478, 28173, 281, 360, 510, 50960], "temperature": 0.0, "avg_logprob": -0.14928994178771973, "compression_ratio": 1.6096256684491979, "no_speech_prob": 0.0002900872495956719}, {"id": 1827, "seek": 449184, "start": 4503.84, "end": 4505.74, "text": " is", "tokens": [50965, 307, 51060], "temperature": 0.0, "avg_logprob": -0.14928994178771973, "compression_ratio": 1.6096256684491979, "no_speech_prob": 0.0002900872495956719}, {"id": 1828, "seek": 449184, "start": 4505.84, "end": 4507.74, "text": " d bn mean i", "tokens": [51065, 274, 272, 77, 914, 741, 51160], "temperature": 0.0, "avg_logprob": -0.14928994178771973, "compression_ratio": 1.6096256684491979, "no_speech_prob": 0.0002900872495956719}, {"id": 1829, "seek": 449184, "start": 4507.84, "end": 4509.74, "text": " will basically be", "tokens": [51165, 486, 1936, 312, 51260], "temperature": 0.0, "avg_logprob": -0.14928994178771973, "compression_ratio": 1.6096256684491979, "no_speech_prob": 0.0002900872495956719}, {"id": 1830, "seek": 449184, "start": 4509.84, "end": 4511.74, "text": " what is the local derivative", "tokens": [51265, 437, 307, 264, 2654, 13760, 51360], "temperature": 0.0, "avg_logprob": -0.14928994178771973, "compression_ratio": 1.6096256684491979, "no_speech_prob": 0.0002900872495956719}, {"id": 1831, "seek": 449184, "start": 4511.84, "end": 4513.74, "text": " well it's negative torch.once like", "tokens": [51365, 731, 309, 311, 3671, 27822, 13, 26015, 411, 51460], "temperature": 0.0, "avg_logprob": -0.14928994178771973, "compression_ratio": 1.6096256684491979, "no_speech_prob": 0.0002900872495956719}, {"id": 1832, "seek": 449184, "start": 4513.84, "end": 4515.74, "text": " of the shape of", "tokens": [51465, 295, 264, 3909, 295, 51560], "temperature": 0.0, "avg_logprob": -0.14928994178771973, "compression_ratio": 1.6096256684491979, "no_speech_prob": 0.0002900872495956719}, {"id": 1833, "seek": 449184, "start": 4515.84, "end": 4517.74, "text": " b and diff", "tokens": [51565, 272, 293, 7593, 51660], "temperature": 0.0, "avg_logprob": -0.14928994178771973, "compression_ratio": 1.6096256684491979, "no_speech_prob": 0.0002900872495956719}, {"id": 1834, "seek": 449184, "start": 4517.84, "end": 4519.74, "text": " right", "tokens": [51665, 558, 51760], "temperature": 0.0, "avg_logprob": -0.14928994178771973, "compression_ratio": 1.6096256684491979, "no_speech_prob": 0.0002900872495956719}, {"id": 1835, "seek": 449184, "start": 4519.84, "end": 4521.74, "text": " so", "tokens": [51765, 370, 51860], "temperature": 0.0, "avg_logprob": -0.14928994178771973, "compression_ratio": 1.6096256684491979, "no_speech_prob": 0.0002900872495956719}, {"id": 1836, "seek": 452184, "start": 4522.24, "end": 4523.74, "text": " and then times", "tokens": [50385, 293, 550, 1413, 50460], "temperature": 0.0, "avg_logprob": -0.12208369503850522, "compression_ratio": 1.8050314465408805, "no_speech_prob": 0.000636551936622709}, {"id": 1837, "seek": 452184, "start": 4523.84, "end": 4525.74, "text": " the", "tokens": [50465, 264, 50560], "temperature": 0.0, "avg_logprob": -0.12208369503850522, "compression_ratio": 1.8050314465408805, "no_speech_prob": 0.000636551936622709}, {"id": 1838, "seek": 452184, "start": 4525.84, "end": 4527.74, "text": " derivative here", "tokens": [50565, 13760, 510, 50660], "temperature": 0.0, "avg_logprob": -0.12208369503850522, "compression_ratio": 1.8050314465408805, "no_speech_prob": 0.000636551936622709}, {"id": 1839, "seek": 452184, "start": 4527.84, "end": 4529.74, "text": " db and diff", "tokens": [50665, 274, 65, 293, 7593, 50760], "temperature": 0.0, "avg_logprob": -0.12208369503850522, "compression_ratio": 1.8050314465408805, "no_speech_prob": 0.000636551936622709}, {"id": 1840, "seek": 452184, "start": 4529.84, "end": 4531.74, "text": " and this here", "tokens": [50765, 293, 341, 510, 50860], "temperature": 0.0, "avg_logprob": -0.12208369503850522, "compression_ratio": 1.8050314465408805, "no_speech_prob": 0.000636551936622709}, {"id": 1841, "seek": 452184, "start": 4531.84, "end": 4533.74, "text": " is the back propagation", "tokens": [50865, 307, 264, 646, 38377, 50960], "temperature": 0.0, "avg_logprob": -0.12208369503850522, "compression_ratio": 1.8050314465408805, "no_speech_prob": 0.000636551936622709}, {"id": 1842, "seek": 452184, "start": 4533.84, "end": 4535.74, "text": " for the replicated", "tokens": [50965, 337, 264, 46365, 51060], "temperature": 0.0, "avg_logprob": -0.12208369503850522, "compression_ratio": 1.8050314465408805, "no_speech_prob": 0.000636551936622709}, {"id": 1843, "seek": 452184, "start": 4535.84, "end": 4537.74, "text": " b and mean i", "tokens": [51065, 272, 293, 914, 741, 51160], "temperature": 0.0, "avg_logprob": -0.12208369503850522, "compression_ratio": 1.8050314465408805, "no_speech_prob": 0.000636551936622709}, {"id": 1844, "seek": 452184, "start": 4537.84, "end": 4539.74, "text": " so I still have to back propagate", "tokens": [51165, 370, 286, 920, 362, 281, 646, 48256, 51260], "temperature": 0.0, "avg_logprob": -0.12208369503850522, "compression_ratio": 1.8050314465408805, "no_speech_prob": 0.000636551936622709}, {"id": 1845, "seek": 452184, "start": 4539.84, "end": 4541.74, "text": " through the replication", "tokens": [51265, 807, 264, 39911, 51360], "temperature": 0.0, "avg_logprob": -0.12208369503850522, "compression_ratio": 1.8050314465408805, "no_speech_prob": 0.000636551936622709}, {"id": 1846, "seek": 452184, "start": 4541.84, "end": 4543.74, "text": " in the broadcasting", "tokens": [51365, 294, 264, 30024, 51460], "temperature": 0.0, "avg_logprob": -0.12208369503850522, "compression_ratio": 1.8050314465408805, "no_speech_prob": 0.000636551936622709}, {"id": 1847, "seek": 452184, "start": 4543.84, "end": 4545.74, "text": " and I do that by doing a sum", "tokens": [51465, 293, 286, 360, 300, 538, 884, 257, 2408, 51560], "temperature": 0.0, "avg_logprob": -0.12208369503850522, "compression_ratio": 1.8050314465408805, "no_speech_prob": 0.000636551936622709}, {"id": 1848, "seek": 452184, "start": 4545.84, "end": 4547.74, "text": " so I'm going to take this whole thing", "tokens": [51565, 370, 286, 478, 516, 281, 747, 341, 1379, 551, 51660], "temperature": 0.0, "avg_logprob": -0.12208369503850522, "compression_ratio": 1.8050314465408805, "no_speech_prob": 0.000636551936622709}, {"id": 1849, "seek": 452184, "start": 4547.84, "end": 4549.74, "text": " and I'm going to do a sum", "tokens": [51665, 293, 286, 478, 516, 281, 360, 257, 2408, 51760], "temperature": 0.0, "avg_logprob": -0.12208369503850522, "compression_ratio": 1.8050314465408805, "no_speech_prob": 0.000636551936622709}, {"id": 1850, "seek": 454974, "start": 4549.74, "end": 4551.639999999999, "text": " and I'm going to do a replication", "tokens": [50365, 293, 286, 478, 516, 281, 360, 257, 39911, 50460], "temperature": 0.0, "avg_logprob": -0.1492382518032141, "compression_ratio": 1.6894977168949772, "no_speech_prob": 0.0004887064569629729}, {"id": 1851, "seek": 454974, "start": 4551.74, "end": 4553.639999999999, "text": " so if you scrutinize this by the way", "tokens": [50465, 370, 498, 291, 28949, 259, 1125, 341, 538, 264, 636, 50560], "temperature": 0.0, "avg_logprob": -0.1492382518032141, "compression_ratio": 1.6894977168949772, "no_speech_prob": 0.0004887064569629729}, {"id": 1852, "seek": 454974, "start": 4553.74, "end": 4555.639999999999, "text": " you'll notice that", "tokens": [50565, 291, 603, 3449, 300, 50660], "temperature": 0.0, "avg_logprob": -0.1492382518032141, "compression_ratio": 1.6894977168949772, "no_speech_prob": 0.0004887064569629729}, {"id": 1853, "seek": 454974, "start": 4555.74, "end": 4557.639999999999, "text": " this is the same shape as that", "tokens": [50665, 341, 307, 264, 912, 3909, 382, 300, 50760], "temperature": 0.0, "avg_logprob": -0.1492382518032141, "compression_ratio": 1.6894977168949772, "no_speech_prob": 0.0004887064569629729}, {"id": 1854, "seek": 454974, "start": 4557.74, "end": 4559.639999999999, "text": " and so what I'm doing", "tokens": [50765, 293, 370, 437, 286, 478, 884, 50860], "temperature": 0.0, "avg_logprob": -0.1492382518032141, "compression_ratio": 1.6894977168949772, "no_speech_prob": 0.0004887064569629729}, {"id": 1855, "seek": 454974, "start": 4559.74, "end": 4561.639999999999, "text": " what I'm doing here doesn't actually make that much sense", "tokens": [50865, 437, 286, 478, 884, 510, 1177, 380, 767, 652, 300, 709, 2020, 50960], "temperature": 0.0, "avg_logprob": -0.1492382518032141, "compression_ratio": 1.6894977168949772, "no_speech_prob": 0.0004887064569629729}, {"id": 1856, "seek": 454974, "start": 4561.74, "end": 4563.639999999999, "text": " because it's just a", "tokens": [50965, 570, 309, 311, 445, 257, 51060], "temperature": 0.0, "avg_logprob": -0.1492382518032141, "compression_ratio": 1.6894977168949772, "no_speech_prob": 0.0004887064569629729}, {"id": 1857, "seek": 454974, "start": 4563.74, "end": 4565.639999999999, "text": " array of ones multiplying db and diff", "tokens": [51065, 10225, 295, 2306, 30955, 274, 65, 293, 7593, 51160], "temperature": 0.0, "avg_logprob": -0.1492382518032141, "compression_ratio": 1.6894977168949772, "no_speech_prob": 0.0004887064569629729}, {"id": 1858, "seek": 454974, "start": 4565.74, "end": 4567.639999999999, "text": " so in fact I can just do", "tokens": [51165, 370, 294, 1186, 286, 393, 445, 360, 51260], "temperature": 0.0, "avg_logprob": -0.1492382518032141, "compression_ratio": 1.6894977168949772, "no_speech_prob": 0.0004887064569629729}, {"id": 1859, "seek": 454974, "start": 4567.74, "end": 4569.639999999999, "text": " this", "tokens": [51265, 341, 51360], "temperature": 0.0, "avg_logprob": -0.1492382518032141, "compression_ratio": 1.6894977168949772, "no_speech_prob": 0.0004887064569629729}, {"id": 1860, "seek": 454974, "start": 4569.74, "end": 4571.639999999999, "text": " and that is equivalent", "tokens": [51365, 293, 300, 307, 10344, 51460], "temperature": 0.0, "avg_logprob": -0.1492382518032141, "compression_ratio": 1.6894977168949772, "no_speech_prob": 0.0004887064569629729}, {"id": 1861, "seek": 454974, "start": 4571.74, "end": 4573.639999999999, "text": " so this is the candidate", "tokens": [51465, 370, 341, 307, 264, 11532, 51560], "temperature": 0.0, "avg_logprob": -0.1492382518032141, "compression_ratio": 1.6894977168949772, "no_speech_prob": 0.0004887064569629729}, {"id": 1862, "seek": 454974, "start": 4573.74, "end": 4575.639999999999, "text": " backward pass", "tokens": [51565, 23897, 1320, 51660], "temperature": 0.0, "avg_logprob": -0.1492382518032141, "compression_ratio": 1.6894977168949772, "no_speech_prob": 0.0004887064569629729}, {"id": 1863, "seek": 454974, "start": 4575.74, "end": 4577.639999999999, "text": " let me copy it here", "tokens": [51665, 718, 385, 5055, 309, 510, 51760], "temperature": 0.0, "avg_logprob": -0.1492382518032141, "compression_ratio": 1.6894977168949772, "no_speech_prob": 0.0004887064569629729}, {"id": 1864, "seek": 457764, "start": 4577.64, "end": 4579.54, "text": " let me comment out this one", "tokens": [50365, 718, 385, 2871, 484, 341, 472, 50460], "temperature": 0.0, "avg_logprob": -0.09464005187705711, "compression_ratio": 1.8263157894736841, "no_speech_prob": 0.002021914115175605}, {"id": 1865, "seek": 457764, "start": 4579.64, "end": 4581.54, "text": " and this one", "tokens": [50465, 293, 341, 472, 50560], "temperature": 0.0, "avg_logprob": -0.09464005187705711, "compression_ratio": 1.8263157894736841, "no_speech_prob": 0.002021914115175605}, {"id": 1866, "seek": 457764, "start": 4581.64, "end": 4583.54, "text": " enter", "tokens": [50565, 3242, 50660], "temperature": 0.0, "avg_logprob": -0.09464005187705711, "compression_ratio": 1.8263157894736841, "no_speech_prob": 0.002021914115175605}, {"id": 1867, "seek": 457764, "start": 4583.64, "end": 4585.54, "text": " and it's wrong", "tokens": [50665, 293, 309, 311, 2085, 50760], "temperature": 0.0, "avg_logprob": -0.09464005187705711, "compression_ratio": 1.8263157894736841, "no_speech_prob": 0.002021914115175605}, {"id": 1868, "seek": 457764, "start": 4585.64, "end": 4587.54, "text": " damn", "tokens": [50765, 8151, 50860], "temperature": 0.0, "avg_logprob": -0.09464005187705711, "compression_ratio": 1.8263157894736841, "no_speech_prob": 0.002021914115175605}, {"id": 1869, "seek": 457764, "start": 4587.64, "end": 4589.54, "text": " actually sorry", "tokens": [50865, 767, 2597, 50960], "temperature": 0.0, "avg_logprob": -0.09464005187705711, "compression_ratio": 1.8263157894736841, "no_speech_prob": 0.002021914115175605}, {"id": 1870, "seek": 457764, "start": 4589.64, "end": 4591.54, "text": " this is supposed to be wrong", "tokens": [50965, 341, 307, 3442, 281, 312, 2085, 51060], "temperature": 0.0, "avg_logprob": -0.09464005187705711, "compression_ratio": 1.8263157894736841, "no_speech_prob": 0.002021914115175605}, {"id": 1871, "seek": 457764, "start": 4591.64, "end": 4593.54, "text": " and it's supposed to be wrong because", "tokens": [51065, 293, 309, 311, 3442, 281, 312, 2085, 570, 51160], "temperature": 0.0, "avg_logprob": -0.09464005187705711, "compression_ratio": 1.8263157894736841, "no_speech_prob": 0.002021914115175605}, {"id": 1872, "seek": 457764, "start": 4593.64, "end": 4595.54, "text": " we are back propagating", "tokens": [51165, 321, 366, 646, 12425, 990, 51260], "temperature": 0.0, "avg_logprob": -0.09464005187705711, "compression_ratio": 1.8263157894736841, "no_speech_prob": 0.002021914115175605}, {"id": 1873, "seek": 457764, "start": 4595.64, "end": 4597.54, "text": " from b and diff into h pre bn", "tokens": [51265, 490, 272, 293, 7593, 666, 276, 659, 272, 77, 51360], "temperature": 0.0, "avg_logprob": -0.09464005187705711, "compression_ratio": 1.8263157894736841, "no_speech_prob": 0.002021914115175605}, {"id": 1874, "seek": 457764, "start": 4597.64, "end": 4599.54, "text": " but we're not done", "tokens": [51365, 457, 321, 434, 406, 1096, 51460], "temperature": 0.0, "avg_logprob": -0.09464005187705711, "compression_ratio": 1.8263157894736841, "no_speech_prob": 0.002021914115175605}, {"id": 1875, "seek": 457764, "start": 4599.64, "end": 4601.54, "text": " because b and mean i depends", "tokens": [51465, 570, 272, 293, 914, 741, 5946, 51560], "temperature": 0.0, "avg_logprob": -0.09464005187705711, "compression_ratio": 1.8263157894736841, "no_speech_prob": 0.002021914115175605}, {"id": 1876, "seek": 457764, "start": 4601.64, "end": 4603.54, "text": " on h pre bn and there will be", "tokens": [51565, 322, 276, 659, 272, 77, 293, 456, 486, 312, 51660], "temperature": 0.0, "avg_logprob": -0.09464005187705711, "compression_ratio": 1.8263157894736841, "no_speech_prob": 0.002021914115175605}, {"id": 1877, "seek": 457764, "start": 4603.64, "end": 4605.54, "text": " a second portion of that derivative coming from", "tokens": [51665, 257, 1150, 8044, 295, 300, 13760, 1348, 490, 51760], "temperature": 0.0, "avg_logprob": -0.09464005187705711, "compression_ratio": 1.8263157894736841, "no_speech_prob": 0.002021914115175605}, {"id": 1878, "seek": 457764, "start": 4605.64, "end": 4607.54, "text": " this second branch", "tokens": [51765, 341, 1150, 9819, 51860], "temperature": 0.0, "avg_logprob": -0.09464005187705711, "compression_ratio": 1.8263157894736841, "no_speech_prob": 0.002021914115175605}, {"id": 1879, "seek": 460754, "start": 4607.54, "end": 4609.44, "text": " but we're not done yet and we expect it to be incorrect", "tokens": [50365, 457, 321, 434, 406, 1096, 1939, 293, 321, 2066, 309, 281, 312, 18424, 50460], "temperature": 0.0, "avg_logprob": -0.10133685785181382, "compression_ratio": 1.8007662835249043, "no_speech_prob": 0.00041909125866368413}, {"id": 1880, "seek": 460754, "start": 4609.54, "end": 4611.44, "text": " so there you go", "tokens": [50465, 370, 456, 291, 352, 50560], "temperature": 0.0, "avg_logprob": -0.10133685785181382, "compression_ratio": 1.8007662835249043, "no_speech_prob": 0.00041909125866368413}, {"id": 1881, "seek": 460754, "start": 4611.54, "end": 4613.44, "text": " so let's now back propagate from b and mean i", "tokens": [50565, 370, 718, 311, 586, 646, 48256, 490, 272, 293, 914, 741, 50660], "temperature": 0.0, "avg_logprob": -0.10133685785181382, "compression_ratio": 1.8007662835249043, "no_speech_prob": 0.00041909125866368413}, {"id": 1882, "seek": 460754, "start": 4613.54, "end": 4615.44, "text": " into h pre bn", "tokens": [50665, 666, 276, 659, 272, 77, 50760], "temperature": 0.0, "avg_logprob": -0.10133685785181382, "compression_ratio": 1.8007662835249043, "no_speech_prob": 0.00041909125866368413}, {"id": 1883, "seek": 460754, "start": 4617.54, "end": 4619.44, "text": " and so here again we have to be careful", "tokens": [50865, 293, 370, 510, 797, 321, 362, 281, 312, 5026, 50960], "temperature": 0.0, "avg_logprob": -0.10133685785181382, "compression_ratio": 1.8007662835249043, "no_speech_prob": 0.00041909125866368413}, {"id": 1884, "seek": 460754, "start": 4619.54, "end": 4621.44, "text": " because there's a broadcasting along", "tokens": [50965, 570, 456, 311, 257, 30024, 2051, 51060], "temperature": 0.0, "avg_logprob": -0.10133685785181382, "compression_ratio": 1.8007662835249043, "no_speech_prob": 0.00041909125866368413}, {"id": 1885, "seek": 460754, "start": 4621.54, "end": 4623.44, "text": " or there's a sum along the 0th dimension", "tokens": [51065, 420, 456, 311, 257, 2408, 2051, 264, 1958, 392, 10139, 51160], "temperature": 0.0, "avg_logprob": -0.10133685785181382, "compression_ratio": 1.8007662835249043, "no_speech_prob": 0.00041909125866368413}, {"id": 1886, "seek": 460754, "start": 4623.54, "end": 4625.44, "text": " so this will turn into broadcasting", "tokens": [51165, 370, 341, 486, 1261, 666, 30024, 51260], "temperature": 0.0, "avg_logprob": -0.10133685785181382, "compression_ratio": 1.8007662835249043, "no_speech_prob": 0.00041909125866368413}, {"id": 1887, "seek": 460754, "start": 4625.54, "end": 4627.44, "text": " in the backward pass now", "tokens": [51265, 294, 264, 23897, 1320, 586, 51360], "temperature": 0.0, "avg_logprob": -0.10133685785181382, "compression_ratio": 1.8007662835249043, "no_speech_prob": 0.00041909125866368413}, {"id": 1888, "seek": 460754, "start": 4627.54, "end": 4629.44, "text": " and I'm going to go a little bit faster on this line", "tokens": [51365, 293, 286, 478, 516, 281, 352, 257, 707, 857, 4663, 322, 341, 1622, 51460], "temperature": 0.0, "avg_logprob": -0.10133685785181382, "compression_ratio": 1.8007662835249043, "no_speech_prob": 0.00041909125866368413}, {"id": 1889, "seek": 460754, "start": 4629.54, "end": 4631.44, "text": " because it is very similar to the line", "tokens": [51465, 570, 309, 307, 588, 2531, 281, 264, 1622, 51560], "temperature": 0.0, "avg_logprob": -0.10133685785181382, "compression_ratio": 1.8007662835249043, "no_speech_prob": 0.00041909125866368413}, {"id": 1890, "seek": 460754, "start": 4631.54, "end": 4633.44, "text": " that we had before", "tokens": [51565, 300, 321, 632, 949, 51660], "temperature": 0.0, "avg_logprob": -0.10133685785181382, "compression_ratio": 1.8007662835249043, "no_speech_prob": 0.00041909125866368413}, {"id": 1891, "seek": 460754, "start": 4633.54, "end": 4635.44, "text": " multiple lines in the past in fact", "tokens": [51665, 3866, 3876, 294, 264, 1791, 294, 1186, 51760], "temperature": 0.0, "avg_logprob": -0.10133685785181382, "compression_ratio": 1.8007662835249043, "no_speech_prob": 0.00041909125866368413}, {"id": 1892, "seek": 460754, "start": 4635.54, "end": 4637.44, "text": " so d h pre bn", "tokens": [51765, 370, 274, 276, 659, 272, 77, 51860], "temperature": 0.0, "avg_logprob": -0.10133685785181382, "compression_ratio": 1.8007662835249043, "no_speech_prob": 0.00041909125866368413}, {"id": 1893, "seek": 463754, "start": 4637.54, "end": 4639.44, "text": " will be", "tokens": [50365, 486, 312, 50460], "temperature": 0.0, "avg_logprob": -0.07296692984444754, "compression_ratio": 1.9085714285714286, "no_speech_prob": 0.0025731713976711035}, {"id": 1894, "seek": 463754, "start": 4639.54, "end": 4641.44, "text": " the gradient will be scaled", "tokens": [50465, 264, 16235, 486, 312, 36039, 50560], "temperature": 0.0, "avg_logprob": -0.07296692984444754, "compression_ratio": 1.9085714285714286, "no_speech_prob": 0.0025731713976711035}, {"id": 1895, "seek": 463754, "start": 4641.54, "end": 4643.44, "text": " by 1 over n and then", "tokens": [50565, 538, 502, 670, 297, 293, 550, 50660], "temperature": 0.0, "avg_logprob": -0.07296692984444754, "compression_ratio": 1.9085714285714286, "no_speech_prob": 0.0025731713976711035}, {"id": 1896, "seek": 463754, "start": 4643.54, "end": 4645.44, "text": " basically this gradient here on d bn", "tokens": [50665, 1936, 341, 16235, 510, 322, 274, 272, 77, 50760], "temperature": 0.0, "avg_logprob": -0.07296692984444754, "compression_ratio": 1.9085714285714286, "no_speech_prob": 0.0025731713976711035}, {"id": 1897, "seek": 463754, "start": 4645.54, "end": 4647.44, "text": " mean i", "tokens": [50765, 914, 741, 50860], "temperature": 0.0, "avg_logprob": -0.07296692984444754, "compression_ratio": 1.9085714285714286, "no_speech_prob": 0.0025731713976711035}, {"id": 1898, "seek": 463754, "start": 4647.54, "end": 4649.44, "text": " is going to be scaled by 1 over n", "tokens": [50865, 307, 516, 281, 312, 36039, 538, 502, 670, 297, 50960], "temperature": 0.0, "avg_logprob": -0.07296692984444754, "compression_ratio": 1.9085714285714286, "no_speech_prob": 0.0025731713976711035}, {"id": 1899, "seek": 463754, "start": 4649.54, "end": 4651.44, "text": " and then it's going to flow across", "tokens": [50965, 293, 550, 309, 311, 516, 281, 3095, 2108, 51060], "temperature": 0.0, "avg_logprob": -0.07296692984444754, "compression_ratio": 1.9085714285714286, "no_speech_prob": 0.0025731713976711035}, {"id": 1900, "seek": 463754, "start": 4651.54, "end": 4653.44, "text": " all the columns and deposit itself", "tokens": [51065, 439, 264, 13766, 293, 19107, 2564, 51160], "temperature": 0.0, "avg_logprob": -0.07296692984444754, "compression_ratio": 1.9085714285714286, "no_speech_prob": 0.0025731713976711035}, {"id": 1901, "seek": 463754, "start": 4653.54, "end": 4655.44, "text": " into d h pre bn", "tokens": [51165, 666, 274, 276, 659, 272, 77, 51260], "temperature": 0.0, "avg_logprob": -0.07296692984444754, "compression_ratio": 1.9085714285714286, "no_speech_prob": 0.0025731713976711035}, {"id": 1902, "seek": 463754, "start": 4655.54, "end": 4657.44, "text": " so what we want is this thing", "tokens": [51265, 370, 437, 321, 528, 307, 341, 551, 51360], "temperature": 0.0, "avg_logprob": -0.07296692984444754, "compression_ratio": 1.9085714285714286, "no_speech_prob": 0.0025731713976711035}, {"id": 1903, "seek": 463754, "start": 4657.54, "end": 4659.44, "text": " scaled by 1 over n", "tokens": [51365, 36039, 538, 502, 670, 297, 51460], "temperature": 0.0, "avg_logprob": -0.07296692984444754, "compression_ratio": 1.9085714285714286, "no_speech_prob": 0.0025731713976711035}, {"id": 1904, "seek": 463754, "start": 4659.54, "end": 4661.44, "text": " let me put the constant up front here", "tokens": [51465, 718, 385, 829, 264, 5754, 493, 1868, 510, 51560], "temperature": 0.0, "avg_logprob": -0.07296692984444754, "compression_ratio": 1.9085714285714286, "no_speech_prob": 0.0025731713976711035}, {"id": 1905, "seek": 463754, "start": 4665.54, "end": 4667.44, "text": " so scale down the gradient", "tokens": [51765, 370, 4373, 760, 264, 16235, 51860], "temperature": 0.0, "avg_logprob": -0.07296692984444754, "compression_ratio": 1.9085714285714286, "no_speech_prob": 0.0025731713976711035}, {"id": 1906, "seek": 466744, "start": 4667.44, "end": 4669.339999999999, "text": " and we need to replicate it", "tokens": [50365, 293, 321, 643, 281, 25356, 309, 50460], "temperature": 0.0, "avg_logprob": -0.1698134740193685, "compression_ratio": 1.5403225806451613, "no_speech_prob": 0.0028622073587030172}, {"id": 1907, "seek": 466744, "start": 4669.44, "end": 4671.339999999999, "text": " across all the", "tokens": [50465, 2108, 439, 264, 50560], "temperature": 0.0, "avg_logprob": -0.1698134740193685, "compression_ratio": 1.5403225806451613, "no_speech_prob": 0.0028622073587030172}, {"id": 1908, "seek": 466744, "start": 4671.44, "end": 4673.339999999999, "text": " across all the rows here", "tokens": [50565, 2108, 439, 264, 13241, 510, 50660], "temperature": 0.0, "avg_logprob": -0.1698134740193685, "compression_ratio": 1.5403225806451613, "no_speech_prob": 0.0028622073587030172}, {"id": 1909, "seek": 466744, "start": 4673.44, "end": 4675.339999999999, "text": " so I like to do that", "tokens": [50665, 370, 286, 411, 281, 360, 300, 50760], "temperature": 0.0, "avg_logprob": -0.1698134740193685, "compression_ratio": 1.5403225806451613, "no_speech_prob": 0.0028622073587030172}, {"id": 1910, "seek": 466744, "start": 4675.44, "end": 4677.339999999999, "text": " by torch dot once like", "tokens": [50765, 538, 27822, 5893, 1564, 411, 50860], "temperature": 0.0, "avg_logprob": -0.1698134740193685, "compression_ratio": 1.5403225806451613, "no_speech_prob": 0.0028622073587030172}, {"id": 1911, "seek": 466744, "start": 4677.44, "end": 4679.339999999999, "text": " of basically", "tokens": [50865, 295, 1936, 50960], "temperature": 0.0, "avg_logprob": -0.1698134740193685, "compression_ratio": 1.5403225806451613, "no_speech_prob": 0.0028622073587030172}, {"id": 1912, "seek": 466744, "start": 4679.44, "end": 4681.339999999999, "text": " h pre bn", "tokens": [50965, 276, 659, 272, 77, 51060], "temperature": 0.0, "avg_logprob": -0.1698134740193685, "compression_ratio": 1.5403225806451613, "no_speech_prob": 0.0028622073587030172}, {"id": 1913, "seek": 466744, "start": 4683.44, "end": 4685.339999999999, "text": " and I will let broadcasting", "tokens": [51165, 293, 286, 486, 718, 30024, 51260], "temperature": 0.0, "avg_logprob": -0.1698134740193685, "compression_ratio": 1.5403225806451613, "no_speech_prob": 0.0028622073587030172}, {"id": 1914, "seek": 466744, "start": 4685.44, "end": 4687.339999999999, "text": " do the work of", "tokens": [51265, 360, 264, 589, 295, 51360], "temperature": 0.0, "avg_logprob": -0.1698134740193685, "compression_ratio": 1.5403225806451613, "no_speech_prob": 0.0028622073587030172}, {"id": 1915, "seek": 466744, "start": 4687.44, "end": 4689.339999999999, "text": " replication", "tokens": [51365, 39911, 51460], "temperature": 0.0, "avg_logprob": -0.1698134740193685, "compression_ratio": 1.5403225806451613, "no_speech_prob": 0.0028622073587030172}, {"id": 1916, "seek": 466744, "start": 4689.44, "end": 4691.339999999999, "text": " so", "tokens": [51465, 370, 51560], "temperature": 0.0, "avg_logprob": -0.1698134740193685, "compression_ratio": 1.5403225806451613, "no_speech_prob": 0.0028622073587030172}, {"id": 1917, "seek": 469134, "start": 4691.34, "end": 4695.34, "text": " A", "tokens": [50365, 316, 50565], "temperature": 0.0, "avg_logprob": -0.23305754100575166, "compression_ratio": 1.546875, "no_speech_prob": 0.0019504318479448557}, {"id": 1918, "seek": 469134, "start": 4695.34, "end": 4697.24, "text": " like that", "tokens": [50565, 411, 300, 50660], "temperature": 0.0, "avg_logprob": -0.23305754100575166, "compression_ratio": 1.546875, "no_speech_prob": 0.0019504318479448557}, {"id": 1919, "seek": 469134, "start": 4697.34, "end": 4699.24, "text": " so this is", "tokens": [50665, 370, 341, 307, 50760], "temperature": 0.0, "avg_logprob": -0.23305754100575166, "compression_ratio": 1.546875, "no_speech_prob": 0.0019504318479448557}, {"id": 1920, "seek": 469134, "start": 4699.34, "end": 4701.24, "text": " d h pre bn", "tokens": [50765, 274, 276, 659, 272, 77, 50860], "temperature": 0.0, "avg_logprob": -0.23305754100575166, "compression_ratio": 1.546875, "no_speech_prob": 0.0019504318479448557}, {"id": 1921, "seek": 469134, "start": 4701.34, "end": 4703.24, "text": " and hopefully", "tokens": [50865, 293, 4696, 50960], "temperature": 0.0, "avg_logprob": -0.23305754100575166, "compression_ratio": 1.546875, "no_speech_prob": 0.0019504318479448557}, {"id": 1922, "seek": 469134, "start": 4703.34, "end": 4705.24, "text": " we can plus equals that", "tokens": [50965, 321, 393, 1804, 6915, 300, 51060], "temperature": 0.0, "avg_logprob": -0.23305754100575166, "compression_ratio": 1.546875, "no_speech_prob": 0.0019504318479448557}, {"id": 1923, "seek": 469134, "start": 4709.34, "end": 4711.24, "text": " so this here is broadcasting", "tokens": [51265, 370, 341, 510, 307, 30024, 51360], "temperature": 0.0, "avg_logprob": -0.23305754100575166, "compression_ratio": 1.546875, "no_speech_prob": 0.0019504318479448557}, {"id": 1924, "seek": 469134, "start": 4711.34, "end": 4713.24, "text": " and then this is the scaling", "tokens": [51365, 293, 550, 341, 307, 264, 21589, 51460], "temperature": 0.0, "avg_logprob": -0.23305754100575166, "compression_ratio": 1.546875, "no_speech_prob": 0.0019504318479448557}, {"id": 1925, "seek": 469134, "start": 4713.34, "end": 4715.24, "text": " so this should be correct", "tokens": [51465, 370, 341, 820, 312, 3006, 51560], "temperature": 0.0, "avg_logprob": -0.23305754100575166, "compression_ratio": 1.546875, "no_speech_prob": 0.0019504318479448557}, {"id": 1926, "seek": 469134, "start": 4715.34, "end": 4717.24, "text": " okay", "tokens": [51565, 1392, 51660], "temperature": 0.0, "avg_logprob": -0.23305754100575166, "compression_ratio": 1.546875, "no_speech_prob": 0.0019504318479448557}, {"id": 1927, "seek": 469134, "start": 4717.34, "end": 4719.24, "text": " so that completes the backpropagation", "tokens": [51665, 370, 300, 36362, 264, 646, 79, 1513, 559, 399, 51760], "temperature": 0.0, "avg_logprob": -0.23305754100575166, "compression_ratio": 1.546875, "no_speech_prob": 0.0019504318479448557}, {"id": 1928, "seek": 471924, "start": 4719.24, "end": 4721.139999999999, "text": " let's backpropagate through the linear layer 1", "tokens": [50365, 718, 311, 646, 79, 1513, 559, 473, 807, 264, 8213, 4583, 502, 50460], "temperature": 0.0, "avg_logprob": -0.07008376717567444, "compression_ratio": 1.8114035087719298, "no_speech_prob": 0.0013151903403922915}, {"id": 1929, "seek": 471924, "start": 4721.24, "end": 4723.139999999999, "text": " here now because", "tokens": [50465, 510, 586, 570, 50560], "temperature": 0.0, "avg_logprob": -0.07008376717567444, "compression_ratio": 1.8114035087719298, "no_speech_prob": 0.0013151903403922915}, {"id": 1930, "seek": 471924, "start": 4723.24, "end": 4725.139999999999, "text": " everything is getting a little vertically crazy", "tokens": [50565, 1203, 307, 1242, 257, 707, 28450, 3219, 50660], "temperature": 0.0, "avg_logprob": -0.07008376717567444, "compression_ratio": 1.8114035087719298, "no_speech_prob": 0.0013151903403922915}, {"id": 1931, "seek": 471924, "start": 4725.24, "end": 4727.139999999999, "text": " I copy pasted the line here", "tokens": [50665, 286, 5055, 1791, 292, 264, 1622, 510, 50760], "temperature": 0.0, "avg_logprob": -0.07008376717567444, "compression_ratio": 1.8114035087719298, "no_speech_prob": 0.0013151903403922915}, {"id": 1932, "seek": 471924, "start": 4727.24, "end": 4729.139999999999, "text": " and let's just backpropagate through this one line", "tokens": [50765, 293, 718, 311, 445, 646, 79, 1513, 559, 473, 807, 341, 472, 1622, 50860], "temperature": 0.0, "avg_logprob": -0.07008376717567444, "compression_ratio": 1.8114035087719298, "no_speech_prob": 0.0013151903403922915}, {"id": 1933, "seek": 471924, "start": 4729.24, "end": 4731.139999999999, "text": " so first of course", "tokens": [50865, 370, 700, 295, 1164, 50960], "temperature": 0.0, "avg_logprob": -0.07008376717567444, "compression_ratio": 1.8114035087719298, "no_speech_prob": 0.0013151903403922915}, {"id": 1934, "seek": 471924, "start": 4731.24, "end": 4733.139999999999, "text": " we inspect the shapes and we see that", "tokens": [50965, 321, 15018, 264, 10854, 293, 321, 536, 300, 51060], "temperature": 0.0, "avg_logprob": -0.07008376717567444, "compression_ratio": 1.8114035087719298, "no_speech_prob": 0.0013151903403922915}, {"id": 1935, "seek": 471924, "start": 4733.24, "end": 4735.139999999999, "text": " this is 32 by 64", "tokens": [51065, 341, 307, 8858, 538, 12145, 51160], "temperature": 0.0, "avg_logprob": -0.07008376717567444, "compression_ratio": 1.8114035087719298, "no_speech_prob": 0.0013151903403922915}, {"id": 1936, "seek": 471924, "start": 4735.24, "end": 4737.139999999999, "text": " mcat is 32", "tokens": [51165, 275, 18035, 307, 8858, 51260], "temperature": 0.0, "avg_logprob": -0.07008376717567444, "compression_ratio": 1.8114035087719298, "no_speech_prob": 0.0013151903403922915}, {"id": 1937, "seek": 471924, "start": 4737.24, "end": 4739.139999999999, "text": " by 30", "tokens": [51265, 538, 2217, 51360], "temperature": 0.0, "avg_logprob": -0.07008376717567444, "compression_ratio": 1.8114035087719298, "no_speech_prob": 0.0013151903403922915}, {"id": 1938, "seek": 471924, "start": 4739.24, "end": 4741.139999999999, "text": " w1 is 30 by 64", "tokens": [51365, 261, 16, 307, 2217, 538, 12145, 51460], "temperature": 0.0, "avg_logprob": -0.07008376717567444, "compression_ratio": 1.8114035087719298, "no_speech_prob": 0.0013151903403922915}, {"id": 1939, "seek": 471924, "start": 4741.24, "end": 4743.139999999999, "text": " and b1 is just 64", "tokens": [51465, 293, 272, 16, 307, 445, 12145, 51560], "temperature": 0.0, "avg_logprob": -0.07008376717567444, "compression_ratio": 1.8114035087719298, "no_speech_prob": 0.0013151903403922915}, {"id": 1940, "seek": 471924, "start": 4743.24, "end": 4745.139999999999, "text": " so as I mentioned", "tokens": [51565, 370, 382, 286, 2835, 51660], "temperature": 0.0, "avg_logprob": -0.07008376717567444, "compression_ratio": 1.8114035087719298, "no_speech_prob": 0.0013151903403922915}, {"id": 1941, "seek": 471924, "start": 4745.24, "end": 4747.139999999999, "text": " backpropagating through linear layers", "tokens": [51665, 646, 79, 1513, 559, 990, 807, 8213, 7914, 51760], "temperature": 0.0, "avg_logprob": -0.07008376717567444, "compression_ratio": 1.8114035087719298, "no_speech_prob": 0.0013151903403922915}, {"id": 1942, "seek": 471924, "start": 4747.24, "end": 4749.139999999999, "text": " is fairly easy just by matching the shapes", "tokens": [51765, 307, 6457, 1858, 445, 538, 14324, 264, 10854, 51860], "temperature": 0.0, "avg_logprob": -0.07008376717567444, "compression_ratio": 1.8114035087719298, "no_speech_prob": 0.0013151903403922915}, {"id": 1943, "seek": 474924, "start": 4749.24, "end": 4751.139999999999, "text": " so let's do that", "tokens": [50365, 370, 718, 311, 360, 300, 50460], "temperature": 0.0, "avg_logprob": -0.06517010149748428, "compression_ratio": 1.5231788079470199, "no_speech_prob": 0.0014333425788208842}, {"id": 1944, "seek": 474924, "start": 4751.24, "end": 4753.139999999999, "text": " we have that d mcat", "tokens": [50465, 321, 362, 300, 274, 275, 18035, 50560], "temperature": 0.0, "avg_logprob": -0.06517010149748428, "compression_ratio": 1.5231788079470199, "no_speech_prob": 0.0014333425788208842}, {"id": 1945, "seek": 474924, "start": 4753.24, "end": 4755.139999999999, "text": " should be", "tokens": [50565, 820, 312, 50660], "temperature": 0.0, "avg_logprob": -0.06517010149748428, "compression_ratio": 1.5231788079470199, "no_speech_prob": 0.0014333425788208842}, {"id": 1946, "seek": 474924, "start": 4755.24, "end": 4757.139999999999, "text": " some matrix multiplication", "tokens": [50665, 512, 8141, 27290, 50760], "temperature": 0.0, "avg_logprob": -0.06517010149748428, "compression_ratio": 1.5231788079470199, "no_speech_prob": 0.0014333425788208842}, {"id": 1947, "seek": 474924, "start": 4757.24, "end": 4759.139999999999, "text": " of d h pre bn with", "tokens": [50765, 295, 274, 276, 659, 272, 77, 365, 50860], "temperature": 0.0, "avg_logprob": -0.06517010149748428, "compression_ratio": 1.5231788079470199, "no_speech_prob": 0.0014333425788208842}, {"id": 1948, "seek": 474924, "start": 4759.24, "end": 4761.139999999999, "text": " w1 and 1 transpose", "tokens": [50865, 261, 16, 293, 502, 25167, 50960], "temperature": 0.0, "avg_logprob": -0.06517010149748428, "compression_ratio": 1.5231788079470199, "no_speech_prob": 0.0014333425788208842}, {"id": 1949, "seek": 474924, "start": 4761.24, "end": 4763.139999999999, "text": " thrown in there", "tokens": [50965, 11732, 294, 456, 51060], "temperature": 0.0, "avg_logprob": -0.06517010149748428, "compression_ratio": 1.5231788079470199, "no_speech_prob": 0.0014333425788208842}, {"id": 1950, "seek": 474924, "start": 4763.24, "end": 4765.139999999999, "text": " so to make mcat", "tokens": [51065, 370, 281, 652, 275, 18035, 51160], "temperature": 0.0, "avg_logprob": -0.06517010149748428, "compression_ratio": 1.5231788079470199, "no_speech_prob": 0.0014333425788208842}, {"id": 1951, "seek": 474924, "start": 4765.24, "end": 4767.139999999999, "text": " be 32 by 30", "tokens": [51165, 312, 8858, 538, 2217, 51260], "temperature": 0.0, "avg_logprob": -0.06517010149748428, "compression_ratio": 1.5231788079470199, "no_speech_prob": 0.0014333425788208842}, {"id": 1952, "seek": 474924, "start": 4767.24, "end": 4769.139999999999, "text": " I need to take", "tokens": [51265, 286, 643, 281, 747, 51360], "temperature": 0.0, "avg_logprob": -0.06517010149748428, "compression_ratio": 1.5231788079470199, "no_speech_prob": 0.0014333425788208842}, {"id": 1953, "seek": 474924, "start": 4769.24, "end": 4771.139999999999, "text": " d h pre bn", "tokens": [51365, 274, 276, 659, 272, 77, 51460], "temperature": 0.0, "avg_logprob": -0.06517010149748428, "compression_ratio": 1.5231788079470199, "no_speech_prob": 0.0014333425788208842}, {"id": 1954, "seek": 474924, "start": 4771.24, "end": 4773.139999999999, "text": " 32 by 64", "tokens": [51465, 8858, 538, 12145, 51560], "temperature": 0.0, "avg_logprob": -0.06517010149748428, "compression_ratio": 1.5231788079470199, "no_speech_prob": 0.0014333425788208842}, {"id": 1955, "seek": 474924, "start": 4773.24, "end": 4775.139999999999, "text": " and multiply it by", "tokens": [51565, 293, 12972, 309, 538, 51660], "temperature": 0.0, "avg_logprob": -0.06517010149748428, "compression_ratio": 1.5231788079470199, "no_speech_prob": 0.0014333425788208842}, {"id": 1956, "seek": 474924, "start": 4775.24, "end": 4777.139999999999, "text": " w1 dot transpose", "tokens": [51665, 261, 16, 5893, 25167, 51760], "temperature": 0.0, "avg_logprob": -0.06517010149748428, "compression_ratio": 1.5231788079470199, "no_speech_prob": 0.0014333425788208842}, {"id": 1957, "seek": 474924, "start": 4777.24, "end": 4779.139999999999, "text": " ...", "tokens": [51765, 1097, 51860], "temperature": 0.0, "avg_logprob": -0.06517010149748428, "compression_ratio": 1.5231788079470199, "no_speech_prob": 0.0014333425788208842}, {"id": 1958, "seek": 477924, "start": 4779.24, "end": 4781.139999999999, "text": " to get d w1", "tokens": [50365, 281, 483, 274, 261, 16, 50460], "temperature": 0.0, "avg_logprob": -0.09131018320719402, "compression_ratio": 1.6257668711656441, "no_speech_prob": 0.0010612071491777897}, {"id": 1959, "seek": 477924, "start": 4781.24, "end": 4783.139999999999, "text": " I need to end up with", "tokens": [50465, 286, 643, 281, 917, 493, 365, 50560], "temperature": 0.0, "avg_logprob": -0.09131018320719402, "compression_ratio": 1.6257668711656441, "no_speech_prob": 0.0010612071491777897}, {"id": 1960, "seek": 477924, "start": 4783.24, "end": 4785.139999999999, "text": " 30 by 64", "tokens": [50565, 2217, 538, 12145, 50660], "temperature": 0.0, "avg_logprob": -0.09131018320719402, "compression_ratio": 1.6257668711656441, "no_speech_prob": 0.0010612071491777897}, {"id": 1961, "seek": 477924, "start": 4785.24, "end": 4787.139999999999, "text": " so to get that I need to take", "tokens": [50665, 370, 281, 483, 300, 286, 643, 281, 747, 50760], "temperature": 0.0, "avg_logprob": -0.09131018320719402, "compression_ratio": 1.6257668711656441, "no_speech_prob": 0.0010612071491777897}, {"id": 1962, "seek": 477924, "start": 4787.24, "end": 4789.139999999999, "text": " mcat transpose", "tokens": [50765, 275, 18035, 25167, 50860], "temperature": 0.0, "avg_logprob": -0.09131018320719402, "compression_ratio": 1.6257668711656441, "no_speech_prob": 0.0010612071491777897}, {"id": 1963, "seek": 477924, "start": 4789.24, "end": 4791.139999999999, "text": " ...", "tokens": [50865, 1097, 50960], "temperature": 0.0, "avg_logprob": -0.09131018320719402, "compression_ratio": 1.6257668711656441, "no_speech_prob": 0.0010612071491777897}, {"id": 1964, "seek": 477924, "start": 4791.24, "end": 4793.139999999999, "text": " and multiply that by", "tokens": [50965, 293, 12972, 300, 538, 51060], "temperature": 0.0, "avg_logprob": -0.09131018320719402, "compression_ratio": 1.6257668711656441, "no_speech_prob": 0.0010612071491777897}, {"id": 1965, "seek": 477924, "start": 4793.24, "end": 4795.139999999999, "text": " d h pre bn", "tokens": [51065, 274, 276, 659, 272, 77, 51160], "temperature": 0.0, "avg_logprob": -0.09131018320719402, "compression_ratio": 1.6257668711656441, "no_speech_prob": 0.0010612071491777897}, {"id": 1966, "seek": 477924, "start": 4795.24, "end": 4797.139999999999, "text": " ...", "tokens": [51165, 1097, 51260], "temperature": 0.0, "avg_logprob": -0.09131018320719402, "compression_ratio": 1.6257668711656441, "no_speech_prob": 0.0010612071491777897}, {"id": 1967, "seek": 477924, "start": 4797.24, "end": 4799.139999999999, "text": " and finally to get", "tokens": [51265, 293, 2721, 281, 483, 51360], "temperature": 0.0, "avg_logprob": -0.09131018320719402, "compression_ratio": 1.6257668711656441, "no_speech_prob": 0.0010612071491777897}, {"id": 1968, "seek": 477924, "start": 4799.24, "end": 4801.139999999999, "text": " d b1", "tokens": [51365, 274, 272, 16, 51460], "temperature": 0.0, "avg_logprob": -0.09131018320719402, "compression_ratio": 1.6257668711656441, "no_speech_prob": 0.0010612071491777897}, {"id": 1969, "seek": 477924, "start": 4801.24, "end": 4803.139999999999, "text": " this is an addition", "tokens": [51465, 341, 307, 364, 4500, 51560], "temperature": 0.0, "avg_logprob": -0.09131018320719402, "compression_ratio": 1.6257668711656441, "no_speech_prob": 0.0010612071491777897}, {"id": 1970, "seek": 477924, "start": 4803.24, "end": 4805.139999999999, "text": " and we saw that basically", "tokens": [51565, 293, 321, 1866, 300, 1936, 51660], "temperature": 0.0, "avg_logprob": -0.09131018320719402, "compression_ratio": 1.6257668711656441, "no_speech_prob": 0.0010612071491777897}, {"id": 1971, "seek": 477924, "start": 4805.24, "end": 4807.139999999999, "text": " I need to just sum the elements", "tokens": [51665, 286, 643, 281, 445, 2408, 264, 4959, 51760], "temperature": 0.0, "avg_logprob": -0.09131018320719402, "compression_ratio": 1.6257668711656441, "no_speech_prob": 0.0010612071491777897}, {"id": 1972, "seek": 477924, "start": 4807.24, "end": 4809.139999999999, "text": " in d h pre bn along some dimensions", "tokens": [51765, 294, 274, 276, 659, 272, 77, 2051, 512, 12819, 51860], "temperature": 0.0, "avg_logprob": -0.09131018320719402, "compression_ratio": 1.6257668711656441, "no_speech_prob": 0.0010612071491777897}, {"id": 1973, "seek": 480924, "start": 4809.24, "end": 4811.139999999999, "text": " and to make the dimensions work out", "tokens": [50365, 293, 281, 652, 264, 12819, 589, 484, 50460], "temperature": 0.0, "avg_logprob": -0.12341978675440739, "compression_ratio": 1.696969696969697, "no_speech_prob": 0.0004080058424733579}, {"id": 1974, "seek": 480924, "start": 4811.24, "end": 4813.139999999999, "text": " I need to sum along the 0th axis", "tokens": [50465, 286, 643, 281, 2408, 2051, 264, 1958, 392, 10298, 50560], "temperature": 0.0, "avg_logprob": -0.12341978675440739, "compression_ratio": 1.696969696969697, "no_speech_prob": 0.0004080058424733579}, {"id": 1975, "seek": 480924, "start": 4813.24, "end": 4815.139999999999, "text": " here to eliminate", "tokens": [50565, 510, 281, 13819, 50660], "temperature": 0.0, "avg_logprob": -0.12341978675440739, "compression_ratio": 1.696969696969697, "no_speech_prob": 0.0004080058424733579}, {"id": 1976, "seek": 480924, "start": 4815.24, "end": 4817.139999999999, "text": " this dimension", "tokens": [50665, 341, 10139, 50760], "temperature": 0.0, "avg_logprob": -0.12341978675440739, "compression_ratio": 1.696969696969697, "no_speech_prob": 0.0004080058424733579}, {"id": 1977, "seek": 480924, "start": 4817.24, "end": 4819.139999999999, "text": " and we do not keep dims", "tokens": [50765, 293, 321, 360, 406, 1066, 5013, 82, 50860], "temperature": 0.0, "avg_logprob": -0.12341978675440739, "compression_ratio": 1.696969696969697, "no_speech_prob": 0.0004080058424733579}, {"id": 1978, "seek": 480924, "start": 4819.24, "end": 4821.139999999999, "text": " so that we want to just get a single", "tokens": [50865, 370, 300, 321, 528, 281, 445, 483, 257, 2167, 50960], "temperature": 0.0, "avg_logprob": -0.12341978675440739, "compression_ratio": 1.696969696969697, "no_speech_prob": 0.0004080058424733579}, {"id": 1979, "seek": 480924, "start": 4821.24, "end": 4823.139999999999, "text": " one-dimensional vector of 64", "tokens": [50965, 472, 12, 18759, 8062, 295, 12145, 51060], "temperature": 0.0, "avg_logprob": -0.12341978675440739, "compression_ratio": 1.696969696969697, "no_speech_prob": 0.0004080058424733579}, {"id": 1980, "seek": 480924, "start": 4823.24, "end": 4825.139999999999, "text": " so these are the claimed derivatives", "tokens": [51065, 370, 613, 366, 264, 12941, 33733, 51160], "temperature": 0.0, "avg_logprob": -0.12341978675440739, "compression_ratio": 1.696969696969697, "no_speech_prob": 0.0004080058424733579}, {"id": 1981, "seek": 480924, "start": 4825.24, "end": 4827.139999999999, "text": " let me put that here", "tokens": [51165, 718, 385, 829, 300, 510, 51260], "temperature": 0.0, "avg_logprob": -0.12341978675440739, "compression_ratio": 1.696969696969697, "no_speech_prob": 0.0004080058424733579}, {"id": 1982, "seek": 480924, "start": 4827.24, "end": 4829.139999999999, "text": " and let me", "tokens": [51265, 293, 718, 385, 51360], "temperature": 0.0, "avg_logprob": -0.12341978675440739, "compression_ratio": 1.696969696969697, "no_speech_prob": 0.0004080058424733579}, {"id": 1983, "seek": 480924, "start": 4829.24, "end": 4831.139999999999, "text": " uncomment three lines", "tokens": [51365, 8585, 518, 1045, 3876, 51460], "temperature": 0.0, "avg_logprob": -0.12341978675440739, "compression_ratio": 1.696969696969697, "no_speech_prob": 0.0004080058424733579}, {"id": 1984, "seek": 480924, "start": 4831.24, "end": 4833.139999999999, "text": " and cross our fingers", "tokens": [51465, 293, 3278, 527, 7350, 51560], "temperature": 0.0, "avg_logprob": -0.12341978675440739, "compression_ratio": 1.696969696969697, "no_speech_prob": 0.0004080058424733579}, {"id": 1985, "seek": 480924, "start": 4833.24, "end": 4835.139999999999, "text": " everything is great", "tokens": [51565, 1203, 307, 869, 51660], "temperature": 0.0, "avg_logprob": -0.12341978675440739, "compression_ratio": 1.696969696969697, "no_speech_prob": 0.0004080058424733579}, {"id": 1986, "seek": 480924, "start": 4835.24, "end": 4837.139999999999, "text": " okay so we now continue almost there", "tokens": [51665, 1392, 370, 321, 586, 2354, 1920, 456, 51760], "temperature": 0.0, "avg_logprob": -0.12341978675440739, "compression_ratio": 1.696969696969697, "no_speech_prob": 0.0004080058424733579}, {"id": 1987, "seek": 480924, "start": 4837.24, "end": 4839.139999999999, "text": " we have the derivative of mcat", "tokens": [51765, 321, 362, 264, 13760, 295, 275, 18035, 51860], "temperature": 0.0, "avg_logprob": -0.12341978675440739, "compression_ratio": 1.696969696969697, "no_speech_prob": 0.0004080058424733579}, {"id": 1988, "seek": 483914, "start": 4839.14, "end": 4841.04, "text": " and we want to backpropagate", "tokens": [50365, 293, 321, 528, 281, 646, 79, 1513, 559, 473, 50460], "temperature": 0.0, "avg_logprob": -0.09625524139404297, "compression_ratio": 1.7826086956521738, "no_speech_prob": 0.0008322940557263792}, {"id": 1989, "seek": 483914, "start": 4841.14, "end": 4843.04, "text": " into mb", "tokens": [50465, 666, 275, 65, 50560], "temperature": 0.0, "avg_logprob": -0.09625524139404297, "compression_ratio": 1.7826086956521738, "no_speech_prob": 0.0008322940557263792}, {"id": 1990, "seek": 483914, "start": 4843.14, "end": 4845.04, "text": " so I again copied this line over here", "tokens": [50565, 370, 286, 797, 25365, 341, 1622, 670, 510, 50660], "temperature": 0.0, "avg_logprob": -0.09625524139404297, "compression_ratio": 1.7826086956521738, "no_speech_prob": 0.0008322940557263792}, {"id": 1991, "seek": 483914, "start": 4845.14, "end": 4847.04, "text": " so this is the forward pass", "tokens": [50665, 370, 341, 307, 264, 2128, 1320, 50760], "temperature": 0.0, "avg_logprob": -0.09625524139404297, "compression_ratio": 1.7826086956521738, "no_speech_prob": 0.0008322940557263792}, {"id": 1992, "seek": 483914, "start": 4847.14, "end": 4849.04, "text": " and then this is the shapes", "tokens": [50765, 293, 550, 341, 307, 264, 10854, 50860], "temperature": 0.0, "avg_logprob": -0.09625524139404297, "compression_ratio": 1.7826086956521738, "no_speech_prob": 0.0008322940557263792}, {"id": 1993, "seek": 483914, "start": 4849.14, "end": 4851.04, "text": " so remember that the shape here", "tokens": [50865, 370, 1604, 300, 264, 3909, 510, 50960], "temperature": 0.0, "avg_logprob": -0.09625524139404297, "compression_ratio": 1.7826086956521738, "no_speech_prob": 0.0008322940557263792}, {"id": 1994, "seek": 483914, "start": 4851.14, "end": 4853.04, "text": " was 32 by 30", "tokens": [50965, 390, 8858, 538, 2217, 51060], "temperature": 0.0, "avg_logprob": -0.09625524139404297, "compression_ratio": 1.7826086956521738, "no_speech_prob": 0.0008322940557263792}, {"id": 1995, "seek": 483914, "start": 4853.14, "end": 4855.04, "text": " and the original shape of mb", "tokens": [51065, 293, 264, 3380, 3909, 295, 275, 65, 51160], "temperature": 0.0, "avg_logprob": -0.09625524139404297, "compression_ratio": 1.7826086956521738, "no_speech_prob": 0.0008322940557263792}, {"id": 1996, "seek": 483914, "start": 4855.14, "end": 4857.04, "text": " was 32 by 3 by 10", "tokens": [51165, 390, 8858, 538, 805, 538, 1266, 51260], "temperature": 0.0, "avg_logprob": -0.09625524139404297, "compression_ratio": 1.7826086956521738, "no_speech_prob": 0.0008322940557263792}, {"id": 1997, "seek": 483914, "start": 4857.14, "end": 4859.04, "text": " so this layer in the forward pass", "tokens": [51265, 370, 341, 4583, 294, 264, 2128, 1320, 51360], "temperature": 0.0, "avg_logprob": -0.09625524139404297, "compression_ratio": 1.7826086956521738, "no_speech_prob": 0.0008322940557263792}, {"id": 1998, "seek": 483914, "start": 4859.14, "end": 4861.04, "text": " as you recall did the concatenation", "tokens": [51365, 382, 291, 9901, 630, 264, 1588, 7186, 399, 51460], "temperature": 0.0, "avg_logprob": -0.09625524139404297, "compression_ratio": 1.7826086956521738, "no_speech_prob": 0.0008322940557263792}, {"id": 1999, "seek": 483914, "start": 4861.14, "end": 4863.04, "text": " of these three 10-dimensional", "tokens": [51465, 295, 613, 1045, 1266, 12, 18759, 51560], "temperature": 0.0, "avg_logprob": -0.09625524139404297, "compression_ratio": 1.7826086956521738, "no_speech_prob": 0.0008322940557263792}, {"id": 2000, "seek": 483914, "start": 4863.14, "end": 4865.04, "text": " character vectors", "tokens": [51565, 2517, 18875, 51660], "temperature": 0.0, "avg_logprob": -0.09625524139404297, "compression_ratio": 1.7826086956521738, "no_speech_prob": 0.0008322940557263792}, {"id": 2001, "seek": 483914, "start": 4865.14, "end": 4867.04, "text": " and so now we just want to undo that", "tokens": [51665, 293, 370, 586, 321, 445, 528, 281, 23779, 300, 51760], "temperature": 0.0, "avg_logprob": -0.09625524139404297, "compression_ratio": 1.7826086956521738, "no_speech_prob": 0.0008322940557263792}, {"id": 2002, "seek": 483914, "start": 4867.14, "end": 4869.04, "text": " so this is actually a relatively", "tokens": [51765, 370, 341, 307, 767, 257, 7226, 51860], "temperature": 0.0, "avg_logprob": -0.09625524139404297, "compression_ratio": 1.7826086956521738, "no_speech_prob": 0.0008322940557263792}, {"id": 2003, "seek": 486904, "start": 4869.04, "end": 4870.94, "text": " simple iteration because", "tokens": [50365, 2199, 24784, 570, 50460], "temperature": 0.0, "avg_logprob": -0.15816052754720053, "compression_ratio": 1.7109004739336493, "no_speech_prob": 0.002074092859402299}, {"id": 2004, "seek": 486904, "start": 4871.04, "end": 4872.94, "text": " the backward pass of the", "tokens": [50465, 264, 23897, 1320, 295, 264, 50560], "temperature": 0.0, "avg_logprob": -0.15816052754720053, "compression_ratio": 1.7109004739336493, "no_speech_prob": 0.002074092859402299}, {"id": 2005, "seek": 486904, "start": 4873.04, "end": 4874.94, "text": " what is the view? view is just a", "tokens": [50565, 437, 307, 264, 1910, 30, 1910, 307, 445, 257, 50660], "temperature": 0.0, "avg_logprob": -0.15816052754720053, "compression_ratio": 1.7109004739336493, "no_speech_prob": 0.002074092859402299}, {"id": 2006, "seek": 486904, "start": 4875.04, "end": 4876.94, "text": " representation of the array", "tokens": [50665, 10290, 295, 264, 10225, 50760], "temperature": 0.0, "avg_logprob": -0.15816052754720053, "compression_ratio": 1.7109004739336493, "no_speech_prob": 0.002074092859402299}, {"id": 2007, "seek": 486904, "start": 4877.04, "end": 4878.94, "text": " it's just a logical form of how", "tokens": [50765, 309, 311, 445, 257, 14978, 1254, 295, 577, 50860], "temperature": 0.0, "avg_logprob": -0.15816052754720053, "compression_ratio": 1.7109004739336493, "no_speech_prob": 0.002074092859402299}, {"id": 2008, "seek": 486904, "start": 4879.04, "end": 4880.94, "text": " you interpret the array", "tokens": [50865, 291, 7302, 264, 10225, 50960], "temperature": 0.0, "avg_logprob": -0.15816052754720053, "compression_ratio": 1.7109004739336493, "no_speech_prob": 0.002074092859402299}, {"id": 2009, "seek": 486904, "start": 4881.04, "end": 4882.94, "text": " so let's just reinterpret it", "tokens": [50965, 370, 718, 311, 445, 319, 41935, 309, 51060], "temperature": 0.0, "avg_logprob": -0.15816052754720053, "compression_ratio": 1.7109004739336493, "no_speech_prob": 0.002074092859402299}, {"id": 2010, "seek": 486904, "start": 4883.04, "end": 4884.94, "text": " to be what it was before", "tokens": [51065, 281, 312, 437, 309, 390, 949, 51160], "temperature": 0.0, "avg_logprob": -0.15816052754720053, "compression_ratio": 1.7109004739336493, "no_speech_prob": 0.002074092859402299}, {"id": 2011, "seek": 486904, "start": 4885.04, "end": 4886.94, "text": " so in other words dmb is not 32 by 30", "tokens": [51165, 370, 294, 661, 2283, 274, 2504, 307, 406, 8858, 538, 2217, 51260], "temperature": 0.0, "avg_logprob": -0.15816052754720053, "compression_ratio": 1.7109004739336493, "no_speech_prob": 0.002074092859402299}, {"id": 2012, "seek": 486904, "start": 4887.04, "end": 4888.94, "text": " it is basically dmpcat", "tokens": [51265, 309, 307, 1936, 274, 2455, 18035, 51360], "temperature": 0.0, "avg_logprob": -0.15816052754720053, "compression_ratio": 1.7109004739336493, "no_speech_prob": 0.002074092859402299}, {"id": 2013, "seek": 486904, "start": 4889.04, "end": 4890.94, "text": " but if you view it", "tokens": [51365, 457, 498, 291, 1910, 309, 51460], "temperature": 0.0, "avg_logprob": -0.15816052754720053, "compression_ratio": 1.7109004739336493, "no_speech_prob": 0.002074092859402299}, {"id": 2014, "seek": 486904, "start": 4891.04, "end": 4892.94, "text": " as the original shape", "tokens": [51465, 382, 264, 3380, 3909, 51560], "temperature": 0.0, "avg_logprob": -0.15816052754720053, "compression_ratio": 1.7109004739336493, "no_speech_prob": 0.002074092859402299}, {"id": 2015, "seek": 486904, "start": 4893.04, "end": 4894.94, "text": " so just m.shape", "tokens": [51565, 370, 445, 275, 13, 82, 42406, 51660], "temperature": 0.0, "avg_logprob": -0.15816052754720053, "compression_ratio": 1.7109004739336493, "no_speech_prob": 0.002074092859402299}, {"id": 2016, "seek": 486904, "start": 4897.04, "end": 4898.94, "text": " you can pass and tuple", "tokens": [51765, 291, 393, 1320, 293, 2604, 781, 51860], "temperature": 0.0, "avg_logprob": -0.15816052754720053, "compression_ratio": 1.7109004739336493, "no_speech_prob": 0.002074092859402299}, {"id": 2017, "seek": 489894, "start": 4898.94, "end": 4900.839999999999, "text": " into view", "tokens": [50365, 666, 1910, 50460], "temperature": 0.0, "avg_logprob": -0.10147552180096386, "compression_ratio": 1.811926605504587, "no_speech_prob": 0.001311408937908709}, {"id": 2018, "seek": 489894, "start": 4900.94, "end": 4902.839999999999, "text": " and so this should just be", "tokens": [50465, 293, 370, 341, 820, 445, 312, 50560], "temperature": 0.0, "avg_logprob": -0.10147552180096386, "compression_ratio": 1.811926605504587, "no_speech_prob": 0.001311408937908709}, {"id": 2019, "seek": 489894, "start": 4902.94, "end": 4904.839999999999, "text": " okay", "tokens": [50565, 1392, 50660], "temperature": 0.0, "avg_logprob": -0.10147552180096386, "compression_ratio": 1.811926605504587, "no_speech_prob": 0.001311408937908709}, {"id": 2020, "seek": 489894, "start": 4904.94, "end": 4906.839999999999, "text": " we just re-represent that view", "tokens": [50665, 321, 445, 319, 12, 19919, 11662, 300, 1910, 50760], "temperature": 0.0, "avg_logprob": -0.10147552180096386, "compression_ratio": 1.811926605504587, "no_speech_prob": 0.001311408937908709}, {"id": 2021, "seek": 489894, "start": 4906.94, "end": 4908.839999999999, "text": " and then we uncomment this line here", "tokens": [50765, 293, 550, 321, 8585, 518, 341, 1622, 510, 50860], "temperature": 0.0, "avg_logprob": -0.10147552180096386, "compression_ratio": 1.811926605504587, "no_speech_prob": 0.001311408937908709}, {"id": 2022, "seek": 489894, "start": 4908.94, "end": 4910.839999999999, "text": " and hopefully", "tokens": [50865, 293, 4696, 50960], "temperature": 0.0, "avg_logprob": -0.10147552180096386, "compression_ratio": 1.811926605504587, "no_speech_prob": 0.001311408937908709}, {"id": 2023, "seek": 489894, "start": 4910.94, "end": 4912.839999999999, "text": " yeah, so the derivative of m", "tokens": [50965, 1338, 11, 370, 264, 13760, 295, 275, 51060], "temperature": 0.0, "avg_logprob": -0.10147552180096386, "compression_ratio": 1.811926605504587, "no_speech_prob": 0.001311408937908709}, {"id": 2024, "seek": 489894, "start": 4912.94, "end": 4914.839999999999, "text": " is correct", "tokens": [51065, 307, 3006, 51160], "temperature": 0.0, "avg_logprob": -0.10147552180096386, "compression_ratio": 1.811926605504587, "no_speech_prob": 0.001311408937908709}, {"id": 2025, "seek": 489894, "start": 4914.94, "end": 4916.839999999999, "text": " so in this case we just have to re-represent", "tokens": [51165, 370, 294, 341, 1389, 321, 445, 362, 281, 319, 12, 19919, 11662, 51260], "temperature": 0.0, "avg_logprob": -0.10147552180096386, "compression_ratio": 1.811926605504587, "no_speech_prob": 0.001311408937908709}, {"id": 2026, "seek": 489894, "start": 4916.94, "end": 4918.839999999999, "text": " the shape of those derivatives", "tokens": [51265, 264, 3909, 295, 729, 33733, 51360], "temperature": 0.0, "avg_logprob": -0.10147552180096386, "compression_ratio": 1.811926605504587, "no_speech_prob": 0.001311408937908709}, {"id": 2027, "seek": 489894, "start": 4918.94, "end": 4920.839999999999, "text": " into the original view", "tokens": [51365, 666, 264, 3380, 1910, 51460], "temperature": 0.0, "avg_logprob": -0.10147552180096386, "compression_ratio": 1.811926605504587, "no_speech_prob": 0.001311408937908709}, {"id": 2028, "seek": 489894, "start": 4920.94, "end": 4922.839999999999, "text": " so now we are at the final line", "tokens": [51465, 370, 586, 321, 366, 412, 264, 2572, 1622, 51560], "temperature": 0.0, "avg_logprob": -0.10147552180096386, "compression_ratio": 1.811926605504587, "no_speech_prob": 0.001311408937908709}, {"id": 2029, "seek": 489894, "start": 4922.94, "end": 4924.839999999999, "text": " and the only thing that's left to backpropagate through", "tokens": [51565, 293, 264, 787, 551, 300, 311, 1411, 281, 646, 79, 1513, 559, 473, 807, 51660], "temperature": 0.0, "avg_logprob": -0.10147552180096386, "compression_ratio": 1.811926605504587, "no_speech_prob": 0.001311408937908709}, {"id": 2030, "seek": 489894, "start": 4924.94, "end": 4926.839999999999, "text": " is this indexing operation here", "tokens": [51665, 307, 341, 8186, 278, 6916, 510, 51760], "temperature": 0.0, "avg_logprob": -0.10147552180096386, "compression_ratio": 1.811926605504587, "no_speech_prob": 0.001311408937908709}, {"id": 2031, "seek": 489894, "start": 4926.94, "end": 4928.839999999999, "text": " m is c at xb", "tokens": [51765, 275, 307, 269, 412, 2031, 65, 51860], "temperature": 0.0, "avg_logprob": -0.10147552180096386, "compression_ratio": 1.811926605504587, "no_speech_prob": 0.001311408937908709}, {"id": 2032, "seek": 492884, "start": 4928.84, "end": 4930.74, "text": " or I copy pasted this line here", "tokens": [50365, 420, 286, 5055, 1791, 292, 341, 1622, 510, 50460], "temperature": 0.0, "avg_logprob": -0.09759954184540047, "compression_ratio": 1.7119341563786008, "no_speech_prob": 0.0006466629565693438}, {"id": 2033, "seek": 492884, "start": 4930.84, "end": 4932.74, "text": " and let's look at the shapes of everything that's involved", "tokens": [50465, 293, 718, 311, 574, 412, 264, 10854, 295, 1203, 300, 311, 3288, 50560], "temperature": 0.0, "avg_logprob": -0.09759954184540047, "compression_ratio": 1.7119341563786008, "no_speech_prob": 0.0006466629565693438}, {"id": 2034, "seek": 492884, "start": 4932.84, "end": 4934.74, "text": " and remind ourselves how this worked", "tokens": [50565, 293, 4160, 4175, 577, 341, 2732, 50660], "temperature": 0.0, "avg_logprob": -0.09759954184540047, "compression_ratio": 1.7119341563786008, "no_speech_prob": 0.0006466629565693438}, {"id": 2035, "seek": 492884, "start": 4934.84, "end": 4936.74, "text": " so m.shape", "tokens": [50665, 370, 275, 13, 82, 42406, 50760], "temperature": 0.0, "avg_logprob": -0.09759954184540047, "compression_ratio": 1.7119341563786008, "no_speech_prob": 0.0006466629565693438}, {"id": 2036, "seek": 492884, "start": 4936.84, "end": 4938.74, "text": " was 32 by 3 by 10", "tokens": [50765, 390, 8858, 538, 805, 538, 1266, 50860], "temperature": 0.0, "avg_logprob": -0.09759954184540047, "compression_ratio": 1.7119341563786008, "no_speech_prob": 0.0006466629565693438}, {"id": 2037, "seek": 492884, "start": 4938.84, "end": 4940.74, "text": " so it's 32 examples", "tokens": [50865, 370, 309, 311, 8858, 5110, 50960], "temperature": 0.0, "avg_logprob": -0.09759954184540047, "compression_ratio": 1.7119341563786008, "no_speech_prob": 0.0006466629565693438}, {"id": 2038, "seek": 492884, "start": 4940.84, "end": 4942.74, "text": " and then we have 3 characters", "tokens": [50965, 293, 550, 321, 362, 805, 4342, 51060], "temperature": 0.0, "avg_logprob": -0.09759954184540047, "compression_ratio": 1.7119341563786008, "no_speech_prob": 0.0006466629565693438}, {"id": 2039, "seek": 492884, "start": 4942.84, "end": 4944.74, "text": " each one of them has a 10 dimensional", "tokens": [51065, 1184, 472, 295, 552, 575, 257, 1266, 18795, 51160], "temperature": 0.0, "avg_logprob": -0.09759954184540047, "compression_ratio": 1.7119341563786008, "no_speech_prob": 0.0006466629565693438}, {"id": 2040, "seek": 492884, "start": 4944.84, "end": 4946.74, "text": " embedding", "tokens": [51165, 12240, 3584, 51260], "temperature": 0.0, "avg_logprob": -0.09759954184540047, "compression_ratio": 1.7119341563786008, "no_speech_prob": 0.0006466629565693438}, {"id": 2041, "seek": 492884, "start": 4946.84, "end": 4948.74, "text": " and this was achieved by taking the", "tokens": [51265, 293, 341, 390, 11042, 538, 1940, 264, 51360], "temperature": 0.0, "avg_logprob": -0.09759954184540047, "compression_ratio": 1.7119341563786008, "no_speech_prob": 0.0006466629565693438}, {"id": 2042, "seek": 492884, "start": 4948.84, "end": 4950.74, "text": " lookup table c which have 27", "tokens": [51365, 574, 1010, 3199, 269, 597, 362, 7634, 51460], "temperature": 0.0, "avg_logprob": -0.09759954184540047, "compression_ratio": 1.7119341563786008, "no_speech_prob": 0.0006466629565693438}, {"id": 2043, "seek": 492884, "start": 4950.84, "end": 4952.74, "text": " possible characters", "tokens": [51465, 1944, 4342, 51560], "temperature": 0.0, "avg_logprob": -0.09759954184540047, "compression_ratio": 1.7119341563786008, "no_speech_prob": 0.0006466629565693438}, {"id": 2044, "seek": 492884, "start": 4952.84, "end": 4954.74, "text": " each of them 10 dimensional", "tokens": [51565, 1184, 295, 552, 1266, 18795, 51660], "temperature": 0.0, "avg_logprob": -0.09759954184540047, "compression_ratio": 1.7119341563786008, "no_speech_prob": 0.0006466629565693438}, {"id": 2045, "seek": 492884, "start": 4954.84, "end": 4956.74, "text": " and we looked up at the rows", "tokens": [51665, 293, 321, 2956, 493, 412, 264, 13241, 51760], "temperature": 0.0, "avg_logprob": -0.09759954184540047, "compression_ratio": 1.7119341563786008, "no_speech_prob": 0.0006466629565693438}, {"id": 2046, "seek": 492884, "start": 4956.84, "end": 4958.74, "text": " that were specified", "tokens": [51765, 300, 645, 22206, 51860], "temperature": 0.0, "avg_logprob": -0.09759954184540047, "compression_ratio": 1.7119341563786008, "no_speech_prob": 0.0006466629565693438}, {"id": 2047, "seek": 495874, "start": 4958.74, "end": 4960.639999999999, "text": " inside this tensor xb", "tokens": [50365, 1854, 341, 40863, 2031, 65, 50460], "temperature": 0.0, "avg_logprob": -0.06143123347584794, "compression_ratio": 1.995475113122172, "no_speech_prob": 0.0013592162868008018}, {"id": 2048, "seek": 495874, "start": 4960.74, "end": 4962.639999999999, "text": " so xb is 32 by 3", "tokens": [50465, 370, 2031, 65, 307, 8858, 538, 805, 50560], "temperature": 0.0, "avg_logprob": -0.06143123347584794, "compression_ratio": 1.995475113122172, "no_speech_prob": 0.0013592162868008018}, {"id": 2049, "seek": 495874, "start": 4962.74, "end": 4964.639999999999, "text": " and it's basically giving us for each example", "tokens": [50565, 293, 309, 311, 1936, 2902, 505, 337, 1184, 1365, 50660], "temperature": 0.0, "avg_logprob": -0.06143123347584794, "compression_ratio": 1.995475113122172, "no_speech_prob": 0.0013592162868008018}, {"id": 2050, "seek": 495874, "start": 4964.74, "end": 4966.639999999999, "text": " the identity or the index", "tokens": [50665, 264, 6575, 420, 264, 8186, 50760], "temperature": 0.0, "avg_logprob": -0.06143123347584794, "compression_ratio": 1.995475113122172, "no_speech_prob": 0.0013592162868008018}, {"id": 2051, "seek": 495874, "start": 4966.74, "end": 4968.639999999999, "text": " of which character", "tokens": [50765, 295, 597, 2517, 50860], "temperature": 0.0, "avg_logprob": -0.06143123347584794, "compression_ratio": 1.995475113122172, "no_speech_prob": 0.0013592162868008018}, {"id": 2052, "seek": 495874, "start": 4968.74, "end": 4970.639999999999, "text": " is part of that example", "tokens": [50865, 307, 644, 295, 300, 1365, 50960], "temperature": 0.0, "avg_logprob": -0.06143123347584794, "compression_ratio": 1.995475113122172, "no_speech_prob": 0.0013592162868008018}, {"id": 2053, "seek": 495874, "start": 4970.74, "end": 4972.639999999999, "text": " and so here I'm showing the first 5 rows", "tokens": [50965, 293, 370, 510, 286, 478, 4099, 264, 700, 1025, 13241, 51060], "temperature": 0.0, "avg_logprob": -0.06143123347584794, "compression_ratio": 1.995475113122172, "no_speech_prob": 0.0013592162868008018}, {"id": 2054, "seek": 495874, "start": 4972.74, "end": 4976.639999999999, "text": " of this tensor xb", "tokens": [51065, 295, 341, 40863, 2031, 65, 51260], "temperature": 0.0, "avg_logprob": -0.06143123347584794, "compression_ratio": 1.995475113122172, "no_speech_prob": 0.0013592162868008018}, {"id": 2055, "seek": 495874, "start": 4976.74, "end": 4978.639999999999, "text": " and so we can see that for example here", "tokens": [51265, 293, 370, 321, 393, 536, 300, 337, 1365, 510, 51360], "temperature": 0.0, "avg_logprob": -0.06143123347584794, "compression_ratio": 1.995475113122172, "no_speech_prob": 0.0013592162868008018}, {"id": 2056, "seek": 495874, "start": 4978.74, "end": 4980.639999999999, "text": " it was the first example in this batch", "tokens": [51365, 309, 390, 264, 700, 1365, 294, 341, 15245, 51460], "temperature": 0.0, "avg_logprob": -0.06143123347584794, "compression_ratio": 1.995475113122172, "no_speech_prob": 0.0013592162868008018}, {"id": 2057, "seek": 495874, "start": 4980.74, "end": 4982.639999999999, "text": " is that the first character", "tokens": [51465, 307, 300, 264, 700, 2517, 51560], "temperature": 0.0, "avg_logprob": -0.06143123347584794, "compression_ratio": 1.995475113122172, "no_speech_prob": 0.0013592162868008018}, {"id": 2058, "seek": 495874, "start": 4982.74, "end": 4984.639999999999, "text": " and the first character and the fourth character", "tokens": [51565, 293, 264, 700, 2517, 293, 264, 6409, 2517, 51660], "temperature": 0.0, "avg_logprob": -0.06143123347584794, "compression_ratio": 1.995475113122172, "no_speech_prob": 0.0013592162868008018}, {"id": 2059, "seek": 495874, "start": 4984.74, "end": 4986.639999999999, "text": " comes into the neural net", "tokens": [51665, 1487, 666, 264, 18161, 2533, 51760], "temperature": 0.0, "avg_logprob": -0.06143123347584794, "compression_ratio": 1.995475113122172, "no_speech_prob": 0.0013592162868008018}, {"id": 2060, "seek": 495874, "start": 4986.74, "end": 4988.639999999999, "text": " and then we want to predict the next character", "tokens": [51765, 293, 550, 321, 528, 281, 6069, 264, 958, 2517, 51860], "temperature": 0.0, "avg_logprob": -0.06143123347584794, "compression_ratio": 1.995475113122172, "no_speech_prob": 0.0013592162868008018}, {"id": 2061, "seek": 498864, "start": 4988.64, "end": 4990.54, "text": " in the sequence after the character is 114", "tokens": [50365, 294, 264, 8310, 934, 264, 2517, 307, 2975, 19, 50460], "temperature": 0.0, "avg_logprob": -0.11299130955680473, "compression_ratio": 1.8157894736842106, "no_speech_prob": 0.0009966247016564012}, {"id": 2062, "seek": 498864, "start": 4990.64, "end": 4992.54, "text": " so basically", "tokens": [50465, 370, 1936, 50560], "temperature": 0.0, "avg_logprob": -0.11299130955680473, "compression_ratio": 1.8157894736842106, "no_speech_prob": 0.0009966247016564012}, {"id": 2063, "seek": 498864, "start": 4992.64, "end": 4994.54, "text": " what's happening here is", "tokens": [50565, 437, 311, 2737, 510, 307, 50660], "temperature": 0.0, "avg_logprob": -0.11299130955680473, "compression_ratio": 1.8157894736842106, "no_speech_prob": 0.0009966247016564012}, {"id": 2064, "seek": 498864, "start": 4994.64, "end": 4996.54, "text": " there are integers inside xb", "tokens": [50665, 456, 366, 41674, 1854, 2031, 65, 50760], "temperature": 0.0, "avg_logprob": -0.11299130955680473, "compression_ratio": 1.8157894736842106, "no_speech_prob": 0.0009966247016564012}, {"id": 2065, "seek": 498864, "start": 4996.64, "end": 4998.54, "text": " and each one of these integers", "tokens": [50765, 293, 1184, 472, 295, 613, 41674, 50860], "temperature": 0.0, "avg_logprob": -0.11299130955680473, "compression_ratio": 1.8157894736842106, "no_speech_prob": 0.0009966247016564012}, {"id": 2066, "seek": 498864, "start": 4998.64, "end": 5000.54, "text": " is specifying which row of c", "tokens": [50865, 307, 1608, 5489, 597, 5386, 295, 269, 50960], "temperature": 0.0, "avg_logprob": -0.11299130955680473, "compression_ratio": 1.8157894736842106, "no_speech_prob": 0.0009966247016564012}, {"id": 2067, "seek": 498864, "start": 5000.64, "end": 5002.54, "text": " we want to pluck out", "tokens": [50965, 321, 528, 281, 41514, 484, 51060], "temperature": 0.0, "avg_logprob": -0.11299130955680473, "compression_ratio": 1.8157894736842106, "no_speech_prob": 0.0009966247016564012}, {"id": 2068, "seek": 498864, "start": 5002.64, "end": 5004.54, "text": " right and then we arrange", "tokens": [51065, 558, 293, 550, 321, 9424, 51160], "temperature": 0.0, "avg_logprob": -0.11299130955680473, "compression_ratio": 1.8157894736842106, "no_speech_prob": 0.0009966247016564012}, {"id": 2069, "seek": 498864, "start": 5004.64, "end": 5006.54, "text": " those rows that we've plucked out", "tokens": [51165, 729, 13241, 300, 321, 600, 41514, 292, 484, 51260], "temperature": 0.0, "avg_logprob": -0.11299130955680473, "compression_ratio": 1.8157894736842106, "no_speech_prob": 0.0009966247016564012}, {"id": 2070, "seek": 498864, "start": 5006.64, "end": 5008.54, "text": " into 32 by 3 by 10 tensor", "tokens": [51265, 666, 8858, 538, 805, 538, 1266, 40863, 51360], "temperature": 0.0, "avg_logprob": -0.11299130955680473, "compression_ratio": 1.8157894736842106, "no_speech_prob": 0.0009966247016564012}, {"id": 2071, "seek": 498864, "start": 5008.64, "end": 5010.54, "text": " and we just package them in", "tokens": [51365, 293, 321, 445, 7372, 552, 294, 51460], "temperature": 0.0, "avg_logprob": -0.11299130955680473, "compression_ratio": 1.8157894736842106, "no_speech_prob": 0.0009966247016564012}, {"id": 2072, "seek": 498864, "start": 5010.64, "end": 5012.54, "text": " we just package them into this tensor", "tokens": [51465, 321, 445, 7372, 552, 666, 341, 40863, 51560], "temperature": 0.0, "avg_logprob": -0.11299130955680473, "compression_ratio": 1.8157894736842106, "no_speech_prob": 0.0009966247016564012}, {"id": 2073, "seek": 498864, "start": 5012.64, "end": 5014.54, "text": " and now what's happening", "tokens": [51565, 293, 586, 437, 311, 2737, 51660], "temperature": 0.0, "avg_logprob": -0.11299130955680473, "compression_ratio": 1.8157894736842106, "no_speech_prob": 0.0009966247016564012}, {"id": 2074, "seek": 498864, "start": 5014.64, "end": 5016.54, "text": " is that we have dimp", "tokens": [51665, 307, 300, 321, 362, 274, 8814, 51760], "temperature": 0.0, "avg_logprob": -0.11299130955680473, "compression_ratio": 1.8157894736842106, "no_speech_prob": 0.0009966247016564012}, {"id": 2075, "seek": 498864, "start": 5016.64, "end": 5018.54, "text": " so for every one of these", "tokens": [51765, 370, 337, 633, 472, 295, 613, 51860], "temperature": 0.0, "avg_logprob": -0.11299130955680473, "compression_ratio": 1.8157894736842106, "no_speech_prob": 0.0009966247016564012}, {"id": 2076, "seek": 501854, "start": 5018.54, "end": 5020.44, "text": " basically plucked out rows", "tokens": [50365, 1936, 41514, 292, 484, 13241, 50460], "temperature": 0.0, "avg_logprob": -0.05720261896937347, "compression_ratio": 1.7563025210084033, "no_speech_prob": 0.0005518567049875855}, {"id": 2077, "seek": 501854, "start": 5020.54, "end": 5022.44, "text": " we have their gradients now", "tokens": [50465, 321, 362, 641, 2771, 2448, 586, 50560], "temperature": 0.0, "avg_logprob": -0.05720261896937347, "compression_ratio": 1.7563025210084033, "no_speech_prob": 0.0005518567049875855}, {"id": 2078, "seek": 501854, "start": 5022.54, "end": 5026.44, "text": " but they're arranged inside this 32 by 3 by 10 tensor", "tokens": [50565, 457, 436, 434, 18721, 1854, 341, 8858, 538, 805, 538, 1266, 40863, 50760], "temperature": 0.0, "avg_logprob": -0.05720261896937347, "compression_ratio": 1.7563025210084033, "no_speech_prob": 0.0005518567049875855}, {"id": 2079, "seek": 501854, "start": 5026.54, "end": 5028.44, "text": " so all we have to do now", "tokens": [50765, 370, 439, 321, 362, 281, 360, 586, 50860], "temperature": 0.0, "avg_logprob": -0.05720261896937347, "compression_ratio": 1.7563025210084033, "no_speech_prob": 0.0005518567049875855}, {"id": 2080, "seek": 501854, "start": 5028.54, "end": 5030.44, "text": " is we just need to route this gradient", "tokens": [50865, 307, 321, 445, 643, 281, 7955, 341, 16235, 50960], "temperature": 0.0, "avg_logprob": -0.05720261896937347, "compression_ratio": 1.7563025210084033, "no_speech_prob": 0.0005518567049875855}, {"id": 2081, "seek": 501854, "start": 5030.54, "end": 5032.44, "text": " backwards through this assignment", "tokens": [50965, 12204, 807, 341, 15187, 51060], "temperature": 0.0, "avg_logprob": -0.05720261896937347, "compression_ratio": 1.7563025210084033, "no_speech_prob": 0.0005518567049875855}, {"id": 2082, "seek": 501854, "start": 5032.54, "end": 5034.44, "text": " so we need to find which row of c", "tokens": [51065, 370, 321, 643, 281, 915, 597, 5386, 295, 269, 51160], "temperature": 0.0, "avg_logprob": -0.05720261896937347, "compression_ratio": 1.7563025210084033, "no_speech_prob": 0.0005518567049875855}, {"id": 2083, "seek": 501854, "start": 5034.54, "end": 5036.44, "text": " that every one of these", "tokens": [51165, 300, 633, 472, 295, 613, 51260], "temperature": 0.0, "avg_logprob": -0.05720261896937347, "compression_ratio": 1.7563025210084033, "no_speech_prob": 0.0005518567049875855}, {"id": 2084, "seek": 501854, "start": 5036.54, "end": 5038.44, "text": " 10 dimensional embeddings come from", "tokens": [51265, 1266, 18795, 12240, 29432, 808, 490, 51360], "temperature": 0.0, "avg_logprob": -0.05720261896937347, "compression_ratio": 1.7563025210084033, "no_speech_prob": 0.0005518567049875855}, {"id": 2085, "seek": 501854, "start": 5038.54, "end": 5040.44, "text": " and then we need to deposit them", "tokens": [51365, 293, 550, 321, 643, 281, 19107, 552, 51460], "temperature": 0.0, "avg_logprob": -0.05720261896937347, "compression_ratio": 1.7563025210084033, "no_speech_prob": 0.0005518567049875855}, {"id": 2086, "seek": 501854, "start": 5040.54, "end": 5042.44, "text": " into dc", "tokens": [51465, 666, 274, 66, 51560], "temperature": 0.0, "avg_logprob": -0.05720261896937347, "compression_ratio": 1.7563025210084033, "no_speech_prob": 0.0005518567049875855}, {"id": 2087, "seek": 501854, "start": 5042.54, "end": 5044.44, "text": " so we just need to undo the indexing", "tokens": [51565, 370, 321, 445, 643, 281, 23779, 264, 8186, 278, 51660], "temperature": 0.0, "avg_logprob": -0.05720261896937347, "compression_ratio": 1.7563025210084033, "no_speech_prob": 0.0005518567049875855}, {"id": 2088, "seek": 501854, "start": 5044.54, "end": 5046.44, "text": " and of course", "tokens": [51665, 293, 295, 1164, 51760], "temperature": 0.0, "avg_logprob": -0.05720261896937347, "compression_ratio": 1.7563025210084033, "no_speech_prob": 0.0005518567049875855}, {"id": 2089, "seek": 501854, "start": 5046.54, "end": 5048.44, "text": " if any of these rows of c", "tokens": [51765, 498, 604, 295, 613, 13241, 295, 269, 51860], "temperature": 0.0, "avg_logprob": -0.05720261896937347, "compression_ratio": 1.7563025210084033, "no_speech_prob": 0.0005518567049875855}, {"id": 2090, "seek": 504844, "start": 5048.44, "end": 5050.339999999999, "text": " were used multiple times", "tokens": [50365, 645, 1143, 3866, 1413, 50460], "temperature": 0.0, "avg_logprob": -0.08487411833157504, "compression_ratio": 1.7857142857142858, "no_speech_prob": 0.0011007460998371243}, {"id": 2091, "seek": 504844, "start": 5050.44, "end": 5052.339999999999, "text": " which almost certainly is the case", "tokens": [50465, 597, 1920, 3297, 307, 264, 1389, 50560], "temperature": 0.0, "avg_logprob": -0.08487411833157504, "compression_ratio": 1.7857142857142858, "no_speech_prob": 0.0011007460998371243}, {"id": 2092, "seek": 504844, "start": 5052.44, "end": 5054.339999999999, "text": " like the row 1 and 1 was used multiple times", "tokens": [50565, 411, 264, 5386, 502, 293, 502, 390, 1143, 3866, 1413, 50660], "temperature": 0.0, "avg_logprob": -0.08487411833157504, "compression_ratio": 1.7857142857142858, "no_speech_prob": 0.0011007460998371243}, {"id": 2093, "seek": 504844, "start": 5054.44, "end": 5056.339999999999, "text": " then we have to remember that the gradients", "tokens": [50665, 550, 321, 362, 281, 1604, 300, 264, 2771, 2448, 50760], "temperature": 0.0, "avg_logprob": -0.08487411833157504, "compression_ratio": 1.7857142857142858, "no_speech_prob": 0.0011007460998371243}, {"id": 2094, "seek": 504844, "start": 5056.44, "end": 5058.339999999999, "text": " that arrive there have to add", "tokens": [50765, 300, 8881, 456, 362, 281, 909, 50860], "temperature": 0.0, "avg_logprob": -0.08487411833157504, "compression_ratio": 1.7857142857142858, "no_speech_prob": 0.0011007460998371243}, {"id": 2095, "seek": 504844, "start": 5058.44, "end": 5060.339999999999, "text": " so for each occurrence", "tokens": [50865, 370, 337, 1184, 36122, 50960], "temperature": 0.0, "avg_logprob": -0.08487411833157504, "compression_ratio": 1.7857142857142858, "no_speech_prob": 0.0011007460998371243}, {"id": 2096, "seek": 504844, "start": 5060.44, "end": 5062.339999999999, "text": " we have to have an addition", "tokens": [50965, 321, 362, 281, 362, 364, 4500, 51060], "temperature": 0.0, "avg_logprob": -0.08487411833157504, "compression_ratio": 1.7857142857142858, "no_speech_prob": 0.0011007460998371243}, {"id": 2097, "seek": 504844, "start": 5062.44, "end": 5064.339999999999, "text": " so let's now write this out", "tokens": [51065, 370, 718, 311, 586, 2464, 341, 484, 51160], "temperature": 0.0, "avg_logprob": -0.08487411833157504, "compression_ratio": 1.7857142857142858, "no_speech_prob": 0.0011007460998371243}, {"id": 2098, "seek": 504844, "start": 5064.44, "end": 5066.339999999999, "text": " and I don't actually know of like", "tokens": [51165, 293, 286, 500, 380, 767, 458, 295, 411, 51260], "temperature": 0.0, "avg_logprob": -0.08487411833157504, "compression_ratio": 1.7857142857142858, "no_speech_prob": 0.0011007460998371243}, {"id": 2099, "seek": 504844, "start": 5066.44, "end": 5068.339999999999, "text": " a much better way to do this", "tokens": [51265, 257, 709, 1101, 636, 281, 360, 341, 51360], "temperature": 0.0, "avg_logprob": -0.08487411833157504, "compression_ratio": 1.7857142857142858, "no_speech_prob": 0.0011007460998371243}, {"id": 2100, "seek": 504844, "start": 5068.44, "end": 5070.339999999999, "text": " than a for loop unfortunately in python", "tokens": [51365, 813, 257, 337, 6367, 7015, 294, 38797, 51460], "temperature": 0.0, "avg_logprob": -0.08487411833157504, "compression_ratio": 1.7857142857142858, "no_speech_prob": 0.0011007460998371243}, {"id": 2101, "seek": 504844, "start": 5070.44, "end": 5072.339999999999, "text": " so maybe someone can come up with", "tokens": [51465, 370, 1310, 1580, 393, 808, 493, 365, 51560], "temperature": 0.0, "avg_logprob": -0.08487411833157504, "compression_ratio": 1.7857142857142858, "no_speech_prob": 0.0011007460998371243}, {"id": 2102, "seek": 504844, "start": 5072.44, "end": 5074.339999999999, "text": " a vectorized efficient operation", "tokens": [51565, 257, 8062, 1602, 7148, 6916, 51660], "temperature": 0.0, "avg_logprob": -0.08487411833157504, "compression_ratio": 1.7857142857142858, "no_speech_prob": 0.0011007460998371243}, {"id": 2103, "seek": 504844, "start": 5074.44, "end": 5076.339999999999, "text": " but for now let's just use for loops", "tokens": [51665, 457, 337, 586, 718, 311, 445, 764, 337, 16121, 51760], "temperature": 0.0, "avg_logprob": -0.08487411833157504, "compression_ratio": 1.7857142857142858, "no_speech_prob": 0.0011007460998371243}, {"id": 2104, "seek": 504844, "start": 5076.44, "end": 5078.339999999999, "text": " so let me create torch.zeros like c", "tokens": [51765, 370, 718, 385, 1884, 27822, 13, 4527, 329, 411, 269, 51860], "temperature": 0.0, "avg_logprob": -0.08487411833157504, "compression_ratio": 1.7857142857142858, "no_speech_prob": 0.0011007460998371243}, {"id": 2105, "seek": 507834, "start": 5078.34, "end": 5080.24, "text": " and I'm going to utilize just", "tokens": [50365, 293, 286, 478, 516, 281, 16117, 445, 50460], "temperature": 0.0, "avg_logprob": -0.23987474216250923, "compression_ratio": 1.8743961352657006, "no_speech_prob": 0.0018516265554353595}, {"id": 2106, "seek": 507834, "start": 5080.34, "end": 5082.24, "text": " a 27 by 10 tensor of all zeros", "tokens": [50465, 257, 7634, 538, 1266, 40863, 295, 439, 35193, 50560], "temperature": 0.0, "avg_logprob": -0.23987474216250923, "compression_ratio": 1.8743961352657006, "no_speech_prob": 0.0018516265554353595}, {"id": 2107, "seek": 507834, "start": 5082.34, "end": 5084.24, "text": " and then honestly", "tokens": [50565, 293, 550, 6095, 50660], "temperature": 0.0, "avg_logprob": -0.23987474216250923, "compression_ratio": 1.8743961352657006, "no_speech_prob": 0.0018516265554353595}, {"id": 2108, "seek": 507834, "start": 5084.34, "end": 5086.24, "text": " for k in range xb.shape at 0", "tokens": [50665, 337, 350, 294, 3613, 2031, 65, 13, 82, 42406, 412, 1958, 50760], "temperature": 0.0, "avg_logprob": -0.23987474216250923, "compression_ratio": 1.8743961352657006, "no_speech_prob": 0.0018516265554353595}, {"id": 2109, "seek": 507834, "start": 5086.34, "end": 5088.24, "text": " maybe someone has a better way to do this", "tokens": [50765, 1310, 1580, 575, 257, 1101, 636, 281, 360, 341, 50860], "temperature": 0.0, "avg_logprob": -0.23987474216250923, "compression_ratio": 1.8743961352657006, "no_speech_prob": 0.0018516265554353595}, {"id": 2110, "seek": 507834, "start": 5088.34, "end": 5090.24, "text": " but for j in range xb.shape at 1", "tokens": [50865, 457, 337, 361, 294, 3613, 2031, 65, 13, 82, 42406, 412, 502, 50960], "temperature": 0.0, "avg_logprob": -0.23987474216250923, "compression_ratio": 1.8743961352657006, "no_speech_prob": 0.0018516265554353595}, {"id": 2111, "seek": 507834, "start": 5090.34, "end": 5092.24, "text": " this is going to iterate over", "tokens": [50965, 341, 307, 516, 281, 44497, 670, 51060], "temperature": 0.0, "avg_logprob": -0.23987474216250923, "compression_ratio": 1.8743961352657006, "no_speech_prob": 0.0018516265554353595}, {"id": 2112, "seek": 507834, "start": 5092.34, "end": 5094.24, "text": " all the elements of xb", "tokens": [51065, 439, 264, 4959, 295, 2031, 65, 51160], "temperature": 0.0, "avg_logprob": -0.23987474216250923, "compression_ratio": 1.8743961352657006, "no_speech_prob": 0.0018516265554353595}, {"id": 2113, "seek": 507834, "start": 5094.34, "end": 5096.24, "text": " all these integers", "tokens": [51165, 439, 613, 41674, 51260], "temperature": 0.0, "avg_logprob": -0.23987474216250923, "compression_ratio": 1.8743961352657006, "no_speech_prob": 0.0018516265554353595}, {"id": 2114, "seek": 507834, "start": 5096.34, "end": 5098.24, "text": " and then let's get the index", "tokens": [51265, 293, 550, 718, 311, 483, 264, 8186, 51360], "temperature": 0.0, "avg_logprob": -0.23987474216250923, "compression_ratio": 1.8743961352657006, "no_speech_prob": 0.0018516265554353595}, {"id": 2115, "seek": 507834, "start": 5098.34, "end": 5100.24, "text": " at this position", "tokens": [51365, 412, 341, 2535, 51460], "temperature": 0.0, "avg_logprob": -0.23987474216250923, "compression_ratio": 1.8743961352657006, "no_speech_prob": 0.0018516265554353595}, {"id": 2116, "seek": 507834, "start": 5100.34, "end": 5102.24, "text": " so the index is basically", "tokens": [51465, 370, 264, 8186, 307, 1936, 51560], "temperature": 0.0, "avg_logprob": -0.23987474216250923, "compression_ratio": 1.8743961352657006, "no_speech_prob": 0.0018516265554353595}, {"id": 2117, "seek": 507834, "start": 5102.34, "end": 5104.24, "text": " the value of xb", "tokens": [51565, 264, 2158, 295, 2031, 65, 51660], "temperature": 0.0, "avg_logprob": -0.23987474216250923, "compression_ratio": 1.8743961352657006, "no_speech_prob": 0.0018516265554353595}, {"id": 2118, "seek": 507834, "start": 5104.34, "end": 5106.24, "text": " and then let's get the index", "tokens": [51665, 293, 550, 718, 311, 483, 264, 8186, 51760], "temperature": 0.0, "avg_logprob": -0.23987474216250923, "compression_ratio": 1.8743961352657006, "no_speech_prob": 0.0018516265554353595}, {"id": 2119, "seek": 507834, "start": 5106.34, "end": 5108.24, "text": " at this position", "tokens": [51765, 412, 341, 2535, 51860], "temperature": 0.0, "avg_logprob": -0.23987474216250923, "compression_ratio": 1.8743961352657006, "no_speech_prob": 0.0018516265554353595}, {"id": 2120, "seek": 510824, "start": 5108.24, "end": 5110.139999999999, "text": " which is basically xb at kj", "tokens": [50365, 597, 307, 1936, 2031, 65, 412, 350, 73, 50460], "temperature": 0.0, "avg_logprob": -0.1340224301373517, "compression_ratio": 1.718918918918919, "no_speech_prob": 0.0008561037248000503}, {"id": 2121, "seek": 510824, "start": 5110.24, "end": 5112.139999999999, "text": " so an example of that", "tokens": [50465, 370, 364, 1365, 295, 300, 50560], "temperature": 0.0, "avg_logprob": -0.1340224301373517, "compression_ratio": 1.718918918918919, "no_speech_prob": 0.0008561037248000503}, {"id": 2122, "seek": 510824, "start": 5112.24, "end": 5114.139999999999, "text": " is 11 or 14 and so on", "tokens": [50565, 307, 2975, 420, 3499, 293, 370, 322, 50660], "temperature": 0.0, "avg_logprob": -0.1340224301373517, "compression_ratio": 1.718918918918919, "no_speech_prob": 0.0008561037248000503}, {"id": 2123, "seek": 510824, "start": 5114.24, "end": 5116.139999999999, "text": " and now in a forward pass", "tokens": [50665, 293, 586, 294, 257, 2128, 1320, 50760], "temperature": 0.0, "avg_logprob": -0.1340224301373517, "compression_ratio": 1.718918918918919, "no_speech_prob": 0.0008561037248000503}, {"id": 2124, "seek": 510824, "start": 5116.24, "end": 5118.139999999999, "text": " we took", "tokens": [50765, 321, 1890, 50860], "temperature": 0.0, "avg_logprob": -0.1340224301373517, "compression_ratio": 1.718918918918919, "no_speech_prob": 0.0008561037248000503}, {"id": 2125, "seek": 510824, "start": 5118.24, "end": 5120.139999999999, "text": " we basically took", "tokens": [50865, 321, 1936, 1890, 50960], "temperature": 0.0, "avg_logprob": -0.1340224301373517, "compression_ratio": 1.718918918918919, "no_speech_prob": 0.0008561037248000503}, {"id": 2126, "seek": 510824, "start": 5120.24, "end": 5122.139999999999, "text": " um", "tokens": [50965, 1105, 51060], "temperature": 0.0, "avg_logprob": -0.1340224301373517, "compression_ratio": 1.718918918918919, "no_speech_prob": 0.0008561037248000503}, {"id": 2127, "seek": 510824, "start": 5122.24, "end": 5124.139999999999, "text": " the row of c at index", "tokens": [51065, 264, 5386, 295, 269, 412, 8186, 51160], "temperature": 0.0, "avg_logprob": -0.1340224301373517, "compression_ratio": 1.718918918918919, "no_speech_prob": 0.0008561037248000503}, {"id": 2128, "seek": 510824, "start": 5124.24, "end": 5126.139999999999, "text": " and we deposited it", "tokens": [51165, 293, 321, 42002, 309, 51260], "temperature": 0.0, "avg_logprob": -0.1340224301373517, "compression_ratio": 1.718918918918919, "no_speech_prob": 0.0008561037248000503}, {"id": 2129, "seek": 510824, "start": 5126.24, "end": 5128.139999999999, "text": " into emb at k at j", "tokens": [51265, 666, 4605, 412, 350, 412, 361, 51360], "temperature": 0.0, "avg_logprob": -0.1340224301373517, "compression_ratio": 1.718918918918919, "no_speech_prob": 0.0008561037248000503}, {"id": 2130, "seek": 510824, "start": 5128.24, "end": 5130.139999999999, "text": " that's what happened", "tokens": [51365, 300, 311, 437, 2011, 51460], "temperature": 0.0, "avg_logprob": -0.1340224301373517, "compression_ratio": 1.718918918918919, "no_speech_prob": 0.0008561037248000503}, {"id": 2131, "seek": 510824, "start": 5130.24, "end": 5132.139999999999, "text": " that's where they are packaged", "tokens": [51465, 300, 311, 689, 436, 366, 38162, 51560], "temperature": 0.0, "avg_logprob": -0.1340224301373517, "compression_ratio": 1.718918918918919, "no_speech_prob": 0.0008561037248000503}, {"id": 2132, "seek": 510824, "start": 5132.24, "end": 5134.139999999999, "text": " so now we need to go backwards", "tokens": [51565, 370, 586, 321, 643, 281, 352, 12204, 51660], "temperature": 0.0, "avg_logprob": -0.1340224301373517, "compression_ratio": 1.718918918918919, "no_speech_prob": 0.0008561037248000503}, {"id": 2133, "seek": 510824, "start": 5134.24, "end": 5136.139999999999, "text": " and we just need to route", "tokens": [51665, 293, 321, 445, 643, 281, 7955, 51760], "temperature": 0.0, "avg_logprob": -0.1340224301373517, "compression_ratio": 1.718918918918919, "no_speech_prob": 0.0008561037248000503}, {"id": 2134, "seek": 510824, "start": 5136.24, "end": 5138.139999999999, "text": " deemb at the position", "tokens": [51765, 368, 33748, 412, 264, 2535, 51860], "temperature": 0.0, "avg_logprob": -0.1340224301373517, "compression_ratio": 1.718918918918919, "no_speech_prob": 0.0008561037248000503}, {"id": 2135, "seek": 513814, "start": 5138.14, "end": 5140.04, "text": " kj", "tokens": [50365, 350, 73, 50460], "temperature": 0.0, "avg_logprob": -0.10307379488675099, "compression_ratio": 1.6604651162790698, "no_speech_prob": 0.0007620006799697876}, {"id": 2136, "seek": 513814, "start": 5140.14, "end": 5142.04, "text": " we now have these derivatives", "tokens": [50465, 321, 586, 362, 613, 33733, 50560], "temperature": 0.0, "avg_logprob": -0.10307379488675099, "compression_ratio": 1.6604651162790698, "no_speech_prob": 0.0007620006799697876}, {"id": 2137, "seek": 513814, "start": 5142.14, "end": 5144.04, "text": " for each position", "tokens": [50565, 337, 1184, 2535, 50660], "temperature": 0.0, "avg_logprob": -0.10307379488675099, "compression_ratio": 1.6604651162790698, "no_speech_prob": 0.0007620006799697876}, {"id": 2138, "seek": 513814, "start": 5144.14, "end": 5146.04, "text": " and it's 10 dimensional", "tokens": [50665, 293, 309, 311, 1266, 18795, 50760], "temperature": 0.0, "avg_logprob": -0.10307379488675099, "compression_ratio": 1.6604651162790698, "no_speech_prob": 0.0007620006799697876}, {"id": 2139, "seek": 513814, "start": 5146.14, "end": 5148.04, "text": " and you just need to go into the correct", "tokens": [50765, 293, 291, 445, 643, 281, 352, 666, 264, 3006, 50860], "temperature": 0.0, "avg_logprob": -0.10307379488675099, "compression_ratio": 1.6604651162790698, "no_speech_prob": 0.0007620006799697876}, {"id": 2140, "seek": 513814, "start": 5148.14, "end": 5150.04, "text": " row of c", "tokens": [50865, 5386, 295, 269, 50960], "temperature": 0.0, "avg_logprob": -0.10307379488675099, "compression_ratio": 1.6604651162790698, "no_speech_prob": 0.0007620006799697876}, {"id": 2141, "seek": 513814, "start": 5150.14, "end": 5152.04, "text": " so dc rather", "tokens": [50965, 370, 274, 66, 2831, 51060], "temperature": 0.0, "avg_logprob": -0.10307379488675099, "compression_ratio": 1.6604651162790698, "no_speech_prob": 0.0007620006799697876}, {"id": 2142, "seek": 513814, "start": 5152.14, "end": 5154.04, "text": " at ix is this", "tokens": [51065, 412, 741, 87, 307, 341, 51160], "temperature": 0.0, "avg_logprob": -0.10307379488675099, "compression_ratio": 1.6604651162790698, "no_speech_prob": 0.0007620006799697876}, {"id": 2143, "seek": 513814, "start": 5154.14, "end": 5156.04, "text": " but plus equals", "tokens": [51165, 457, 1804, 6915, 51260], "temperature": 0.0, "avg_logprob": -0.10307379488675099, "compression_ratio": 1.6604651162790698, "no_speech_prob": 0.0007620006799697876}, {"id": 2144, "seek": 513814, "start": 5156.14, "end": 5158.04, "text": " because there could be multiple occurrences", "tokens": [51265, 570, 456, 727, 312, 3866, 5160, 38983, 51360], "temperature": 0.0, "avg_logprob": -0.10307379488675099, "compression_ratio": 1.6604651162790698, "no_speech_prob": 0.0007620006799697876}, {"id": 2145, "seek": 513814, "start": 5158.14, "end": 5160.04, "text": " like the same row could have been used", "tokens": [51365, 411, 264, 912, 5386, 727, 362, 668, 1143, 51460], "temperature": 0.0, "avg_logprob": -0.10307379488675099, "compression_ratio": 1.6604651162790698, "no_speech_prob": 0.0007620006799697876}, {"id": 2146, "seek": 513814, "start": 5160.14, "end": 5162.04, "text": " many many times", "tokens": [51465, 867, 867, 1413, 51560], "temperature": 0.0, "avg_logprob": -0.10307379488675099, "compression_ratio": 1.6604651162790698, "no_speech_prob": 0.0007620006799697876}, {"id": 2147, "seek": 513814, "start": 5162.14, "end": 5164.04, "text": " and so all those derivatives will", "tokens": [51565, 293, 370, 439, 729, 33733, 486, 51660], "temperature": 0.0, "avg_logprob": -0.10307379488675099, "compression_ratio": 1.6604651162790698, "no_speech_prob": 0.0007620006799697876}, {"id": 2148, "seek": 513814, "start": 5164.14, "end": 5166.04, "text": " just go backwards through the indexing", "tokens": [51665, 445, 352, 12204, 807, 264, 8186, 278, 51760], "temperature": 0.0, "avg_logprob": -0.10307379488675099, "compression_ratio": 1.6604651162790698, "no_speech_prob": 0.0007620006799697876}, {"id": 2149, "seek": 513814, "start": 5166.14, "end": 5168.04, "text": " and they will add", "tokens": [51765, 293, 436, 486, 909, 51860], "temperature": 0.0, "avg_logprob": -0.10307379488675099, "compression_ratio": 1.6604651162790698, "no_speech_prob": 0.0007620006799697876}, {"id": 2150, "seek": 516804, "start": 5168.04, "end": 5169.94, "text": " so this is my candidate", "tokens": [50365, 370, 341, 307, 452, 11532, 50460], "temperature": 0.0, "avg_logprob": -0.12297726670900981, "compression_ratio": 1.605263157894737, "no_speech_prob": 0.001141462940722704}, {"id": 2151, "seek": 516804, "start": 5170.04, "end": 5171.94, "text": " solution", "tokens": [50465, 3827, 50560], "temperature": 0.0, "avg_logprob": -0.12297726670900981, "compression_ratio": 1.605263157894737, "no_speech_prob": 0.001141462940722704}, {"id": 2152, "seek": 516804, "start": 5172.04, "end": 5173.94, "text": " let's copy it here", "tokens": [50565, 718, 311, 5055, 309, 510, 50660], "temperature": 0.0, "avg_logprob": -0.12297726670900981, "compression_ratio": 1.605263157894737, "no_speech_prob": 0.001141462940722704}, {"id": 2153, "seek": 516804, "start": 5176.04, "end": 5177.94, "text": " let's uncomment this", "tokens": [50765, 718, 311, 8585, 518, 341, 50860], "temperature": 0.0, "avg_logprob": -0.12297726670900981, "compression_ratio": 1.605263157894737, "no_speech_prob": 0.001141462940722704}, {"id": 2154, "seek": 516804, "start": 5178.04, "end": 5179.94, "text": " and cross our fingers", "tokens": [50865, 293, 3278, 527, 7350, 50960], "temperature": 0.0, "avg_logprob": -0.12297726670900981, "compression_ratio": 1.605263157894737, "no_speech_prob": 0.001141462940722704}, {"id": 2155, "seek": 516804, "start": 5180.04, "end": 5181.94, "text": " yay", "tokens": [50965, 23986, 51060], "temperature": 0.0, "avg_logprob": -0.12297726670900981, "compression_ratio": 1.605263157894737, "no_speech_prob": 0.001141462940722704}, {"id": 2156, "seek": 516804, "start": 5182.04, "end": 5183.94, "text": " so that's it", "tokens": [51065, 370, 300, 311, 309, 51160], "temperature": 0.0, "avg_logprob": -0.12297726670900981, "compression_ratio": 1.605263157894737, "no_speech_prob": 0.001141462940722704}, {"id": 2157, "seek": 516804, "start": 5184.04, "end": 5185.94, "text": " we've backpropagated through", "tokens": [51165, 321, 600, 646, 79, 1513, 559, 770, 807, 51260], "temperature": 0.0, "avg_logprob": -0.12297726670900981, "compression_ratio": 1.605263157894737, "no_speech_prob": 0.001141462940722704}, {"id": 2158, "seek": 516804, "start": 5186.04, "end": 5187.94, "text": " this entire beast", "tokens": [51265, 341, 2302, 13464, 51360], "temperature": 0.0, "avg_logprob": -0.12297726670900981, "compression_ratio": 1.605263157894737, "no_speech_prob": 0.001141462940722704}, {"id": 2159, "seek": 516804, "start": 5188.04, "end": 5189.94, "text": " so there we go", "tokens": [51365, 370, 456, 321, 352, 51460], "temperature": 0.0, "avg_logprob": -0.12297726670900981, "compression_ratio": 1.605263157894737, "no_speech_prob": 0.001141462940722704}, {"id": 2160, "seek": 516804, "start": 5190.04, "end": 5191.94, "text": " totally makes sense", "tokens": [51465, 3879, 1669, 2020, 51560], "temperature": 0.0, "avg_logprob": -0.12297726670900981, "compression_ratio": 1.605263157894737, "no_speech_prob": 0.001141462940722704}, {"id": 2161, "seek": 516804, "start": 5192.04, "end": 5193.94, "text": " so now we come to exercise 2", "tokens": [51565, 370, 586, 321, 808, 281, 5380, 568, 51660], "temperature": 0.0, "avg_logprob": -0.12297726670900981, "compression_ratio": 1.605263157894737, "no_speech_prob": 0.001141462940722704}, {"id": 2162, "seek": 516804, "start": 5194.04, "end": 5195.94, "text": " it basically turns out that in this first exercise", "tokens": [51665, 309, 1936, 4523, 484, 300, 294, 341, 700, 5380, 51760], "temperature": 0.0, "avg_logprob": -0.12297726670900981, "compression_ratio": 1.605263157894737, "no_speech_prob": 0.001141462940722704}, {"id": 2163, "seek": 516804, "start": 5196.04, "end": 5197.94, "text": " we were doing way too much work", "tokens": [51765, 321, 645, 884, 636, 886, 709, 589, 51860], "temperature": 0.0, "avg_logprob": -0.12297726670900981, "compression_ratio": 1.605263157894737, "no_speech_prob": 0.001141462940722704}, {"id": 2164, "seek": 519794, "start": 5197.94, "end": 5199.839999999999, "text": " we were backpropagating way too much", "tokens": [50365, 321, 645, 646, 79, 1513, 559, 990, 636, 886, 709, 50460], "temperature": 0.0, "avg_logprob": -0.06996968021131542, "compression_ratio": 1.8646864686468647, "no_speech_prob": 0.0012668933486565948}, {"id": 2165, "seek": 519794, "start": 5199.94, "end": 5201.839999999999, "text": " and it was all good practice and so on", "tokens": [50465, 293, 309, 390, 439, 665, 3124, 293, 370, 322, 50560], "temperature": 0.0, "avg_logprob": -0.06996968021131542, "compression_ratio": 1.8646864686468647, "no_speech_prob": 0.0012668933486565948}, {"id": 2166, "seek": 519794, "start": 5201.94, "end": 5203.839999999999, "text": " but it's not what you would do in practice", "tokens": [50565, 457, 309, 311, 406, 437, 291, 576, 360, 294, 3124, 50660], "temperature": 0.0, "avg_logprob": -0.06996968021131542, "compression_ratio": 1.8646864686468647, "no_speech_prob": 0.0012668933486565948}, {"id": 2167, "seek": 519794, "start": 5203.94, "end": 5205.839999999999, "text": " and the reason for that is for example", "tokens": [50665, 293, 264, 1778, 337, 300, 307, 337, 1365, 50760], "temperature": 0.0, "avg_logprob": -0.06996968021131542, "compression_ratio": 1.8646864686468647, "no_speech_prob": 0.0012668933486565948}, {"id": 2168, "seek": 519794, "start": 5205.94, "end": 5207.839999999999, "text": " here I separated out this loss calculation", "tokens": [50765, 510, 286, 12005, 484, 341, 4470, 17108, 50860], "temperature": 0.0, "avg_logprob": -0.06996968021131542, "compression_ratio": 1.8646864686468647, "no_speech_prob": 0.0012668933486565948}, {"id": 2169, "seek": 519794, "start": 5207.94, "end": 5209.839999999999, "text": " over multiple lines", "tokens": [50865, 670, 3866, 3876, 50960], "temperature": 0.0, "avg_logprob": -0.06996968021131542, "compression_ratio": 1.8646864686468647, "no_speech_prob": 0.0012668933486565948}, {"id": 2170, "seek": 519794, "start": 5209.94, "end": 5211.839999999999, "text": " and I broke it up all to like", "tokens": [50965, 293, 286, 6902, 309, 493, 439, 281, 411, 51060], "temperature": 0.0, "avg_logprob": -0.06996968021131542, "compression_ratio": 1.8646864686468647, "no_speech_prob": 0.0012668933486565948}, {"id": 2171, "seek": 519794, "start": 5211.94, "end": 5213.839999999999, "text": " its smallest atomic pieces", "tokens": [51065, 1080, 16998, 22275, 3755, 51160], "temperature": 0.0, "avg_logprob": -0.06996968021131542, "compression_ratio": 1.8646864686468647, "no_speech_prob": 0.0012668933486565948}, {"id": 2172, "seek": 519794, "start": 5213.94, "end": 5215.839999999999, "text": " and we backpropagated through all of those individually", "tokens": [51165, 293, 321, 646, 79, 1513, 559, 770, 807, 439, 295, 729, 16652, 51260], "temperature": 0.0, "avg_logprob": -0.06996968021131542, "compression_ratio": 1.8646864686468647, "no_speech_prob": 0.0012668933486565948}, {"id": 2173, "seek": 519794, "start": 5215.94, "end": 5217.839999999999, "text": " but it turns out that if you just look at", "tokens": [51265, 457, 309, 4523, 484, 300, 498, 291, 445, 574, 412, 51360], "temperature": 0.0, "avg_logprob": -0.06996968021131542, "compression_ratio": 1.8646864686468647, "no_speech_prob": 0.0012668933486565948}, {"id": 2174, "seek": 519794, "start": 5217.94, "end": 5219.839999999999, "text": " the mathematical expression for the loss", "tokens": [51365, 264, 18894, 6114, 337, 264, 4470, 51460], "temperature": 0.0, "avg_logprob": -0.06996968021131542, "compression_ratio": 1.8646864686468647, "no_speech_prob": 0.0012668933486565948}, {"id": 2175, "seek": 519794, "start": 5219.94, "end": 5221.839999999999, "text": " then actually you can do", "tokens": [51465, 550, 767, 291, 393, 360, 51560], "temperature": 0.0, "avg_logprob": -0.06996968021131542, "compression_ratio": 1.8646864686468647, "no_speech_prob": 0.0012668933486565948}, {"id": 2176, "seek": 519794, "start": 5221.94, "end": 5223.839999999999, "text": " the differentiation on pen and paper", "tokens": [51565, 264, 38902, 322, 3435, 293, 3035, 51660], "temperature": 0.0, "avg_logprob": -0.06996968021131542, "compression_ratio": 1.8646864686468647, "no_speech_prob": 0.0012668933486565948}, {"id": 2177, "seek": 519794, "start": 5223.94, "end": 5225.839999999999, "text": " and a lot of terms cancel and simplify", "tokens": [51665, 293, 257, 688, 295, 2115, 10373, 293, 20460, 51760], "temperature": 0.0, "avg_logprob": -0.06996968021131542, "compression_ratio": 1.8646864686468647, "no_speech_prob": 0.0012668933486565948}, {"id": 2178, "seek": 519794, "start": 5225.94, "end": 5227.839999999999, "text": " and the mathematical expression you end up with", "tokens": [51765, 293, 264, 18894, 6114, 291, 917, 493, 365, 51860], "temperature": 0.0, "avg_logprob": -0.06996968021131542, "compression_ratio": 1.8646864686468647, "no_speech_prob": 0.0012668933486565948}, {"id": 2179, "seek": 522784, "start": 5227.9400000000005, "end": 5229.74, "text": " is significantly shorter", "tokens": [50370, 307, 10591, 11639, 50460], "temperature": 0.0, "avg_logprob": -0.05368698760867119, "compression_ratio": 1.7741935483870968, "no_speech_prob": 0.0014917636290192604}, {"id": 2180, "seek": 522784, "start": 5229.84, "end": 5231.74, "text": " and easier to implement", "tokens": [50465, 293, 3571, 281, 4445, 50560], "temperature": 0.0, "avg_logprob": -0.05368698760867119, "compression_ratio": 1.7741935483870968, "no_speech_prob": 0.0014917636290192604}, {"id": 2181, "seek": 522784, "start": 5231.84, "end": 5233.74, "text": " than backpropagating through all the little pieces", "tokens": [50565, 813, 646, 79, 1513, 559, 990, 807, 439, 264, 707, 3755, 50660], "temperature": 0.0, "avg_logprob": -0.05368698760867119, "compression_ratio": 1.7741935483870968, "no_speech_prob": 0.0014917636290192604}, {"id": 2182, "seek": 522784, "start": 5233.84, "end": 5235.74, "text": " of everything you've done", "tokens": [50665, 295, 1203, 291, 600, 1096, 50760], "temperature": 0.0, "avg_logprob": -0.05368698760867119, "compression_ratio": 1.7741935483870968, "no_speech_prob": 0.0014917636290192604}, {"id": 2183, "seek": 522784, "start": 5235.84, "end": 5237.74, "text": " so before we had this complicated forward pass", "tokens": [50765, 370, 949, 321, 632, 341, 6179, 2128, 1320, 50860], "temperature": 0.0, "avg_logprob": -0.05368698760867119, "compression_ratio": 1.7741935483870968, "no_speech_prob": 0.0014917636290192604}, {"id": 2184, "seek": 522784, "start": 5237.84, "end": 5239.74, "text": " going from logits to the loss", "tokens": [50865, 516, 490, 3565, 1208, 281, 264, 4470, 50960], "temperature": 0.0, "avg_logprob": -0.05368698760867119, "compression_ratio": 1.7741935483870968, "no_speech_prob": 0.0014917636290192604}, {"id": 2185, "seek": 522784, "start": 5239.84, "end": 5241.74, "text": " but in pytorch everything can just be", "tokens": [50965, 457, 294, 25878, 284, 339, 1203, 393, 445, 312, 51060], "temperature": 0.0, "avg_logprob": -0.05368698760867119, "compression_ratio": 1.7741935483870968, "no_speech_prob": 0.0014917636290192604}, {"id": 2186, "seek": 522784, "start": 5241.84, "end": 5243.74, "text": " glued together into a single call", "tokens": [51065, 28008, 1214, 666, 257, 2167, 818, 51160], "temperature": 0.0, "avg_logprob": -0.05368698760867119, "compression_ratio": 1.7741935483870968, "no_speech_prob": 0.0014917636290192604}, {"id": 2187, "seek": 522784, "start": 5243.84, "end": 5245.74, "text": " at that cross entropy", "tokens": [51165, 412, 300, 3278, 30867, 51260], "temperature": 0.0, "avg_logprob": -0.05368698760867119, "compression_ratio": 1.7741935483870968, "no_speech_prob": 0.0014917636290192604}, {"id": 2188, "seek": 522784, "start": 5245.84, "end": 5247.74, "text": " you just pass in logits and the labels", "tokens": [51265, 291, 445, 1320, 294, 3565, 1208, 293, 264, 16949, 51360], "temperature": 0.0, "avg_logprob": -0.05368698760867119, "compression_ratio": 1.7741935483870968, "no_speech_prob": 0.0014917636290192604}, {"id": 2189, "seek": 522784, "start": 5247.84, "end": 5249.74, "text": " and you get the exact same loss", "tokens": [51365, 293, 291, 483, 264, 1900, 912, 4470, 51460], "temperature": 0.0, "avg_logprob": -0.05368698760867119, "compression_ratio": 1.7741935483870968, "no_speech_prob": 0.0014917636290192604}, {"id": 2190, "seek": 522784, "start": 5249.84, "end": 5251.74, "text": " as I verify here", "tokens": [51465, 382, 286, 16888, 510, 51560], "temperature": 0.0, "avg_logprob": -0.05368698760867119, "compression_ratio": 1.7741935483870968, "no_speech_prob": 0.0014917636290192604}, {"id": 2191, "seek": 522784, "start": 5251.84, "end": 5253.74, "text": " so our previous loss and the fast loss", "tokens": [51565, 370, 527, 3894, 4470, 293, 264, 2370, 4470, 51660], "temperature": 0.0, "avg_logprob": -0.05368698760867119, "compression_ratio": 1.7741935483870968, "no_speech_prob": 0.0014917636290192604}, {"id": 2192, "seek": 522784, "start": 5253.84, "end": 5255.74, "text": " coming from the chunk of operations", "tokens": [51665, 1348, 490, 264, 16635, 295, 7705, 51760], "temperature": 0.0, "avg_logprob": -0.05368698760867119, "compression_ratio": 1.7741935483870968, "no_speech_prob": 0.0014917636290192604}, {"id": 2193, "seek": 522784, "start": 5255.84, "end": 5257.74, "text": " as a single mathematical expression", "tokens": [51765, 382, 257, 2167, 18894, 6114, 51860], "temperature": 0.0, "avg_logprob": -0.05368698760867119, "compression_ratio": 1.7741935483870968, "no_speech_prob": 0.0014917636290192604}, {"id": 2194, "seek": 525774, "start": 5257.84, "end": 5259.639999999999, "text": " is much faster than the backward pass", "tokens": [50370, 307, 709, 4663, 813, 264, 23897, 1320, 50460], "temperature": 0.0, "avg_logprob": -0.1214154210583917, "compression_ratio": 1.9084249084249085, "no_speech_prob": 0.0030879660043865442}, {"id": 2195, "seek": 525774, "start": 5259.74, "end": 5261.639999999999, "text": " it's also much much faster in backward pass", "tokens": [50465, 309, 311, 611, 709, 709, 4663, 294, 23897, 1320, 50560], "temperature": 0.0, "avg_logprob": -0.1214154210583917, "compression_ratio": 1.9084249084249085, "no_speech_prob": 0.0030879660043865442}, {"id": 2196, "seek": 525774, "start": 5261.74, "end": 5263.639999999999, "text": " and the reason for that is if you just look at", "tokens": [50565, 293, 264, 1778, 337, 300, 307, 498, 291, 445, 574, 412, 50660], "temperature": 0.0, "avg_logprob": -0.1214154210583917, "compression_ratio": 1.9084249084249085, "no_speech_prob": 0.0030879660043865442}, {"id": 2197, "seek": 525774, "start": 5263.74, "end": 5265.639999999999, "text": " the mathematical form of this and differentiate again", "tokens": [50665, 264, 18894, 1254, 295, 341, 293, 23203, 797, 50760], "temperature": 0.0, "avg_logprob": -0.1214154210583917, "compression_ratio": 1.9084249084249085, "no_speech_prob": 0.0030879660043865442}, {"id": 2198, "seek": 525774, "start": 5265.74, "end": 5267.639999999999, "text": " you will end up with a very small and short expression", "tokens": [50765, 291, 486, 917, 493, 365, 257, 588, 1359, 293, 2099, 6114, 50860], "temperature": 0.0, "avg_logprob": -0.1214154210583917, "compression_ratio": 1.9084249084249085, "no_speech_prob": 0.0030879660043865442}, {"id": 2199, "seek": 525774, "start": 5267.74, "end": 5269.639999999999, "text": " so that's what we want to do here", "tokens": [50865, 370, 300, 311, 437, 321, 528, 281, 360, 510, 50960], "temperature": 0.0, "avg_logprob": -0.1214154210583917, "compression_ratio": 1.9084249084249085, "no_speech_prob": 0.0030879660043865442}, {"id": 2200, "seek": 525774, "start": 5269.74, "end": 5271.639999999999, "text": " we want to in a single operation", "tokens": [50965, 321, 528, 281, 294, 257, 2167, 6916, 51060], "temperature": 0.0, "avg_logprob": -0.1214154210583917, "compression_ratio": 1.9084249084249085, "no_speech_prob": 0.0030879660043865442}, {"id": 2201, "seek": 525774, "start": 5271.74, "end": 5273.639999999999, "text": " or in a single go or like very quickly", "tokens": [51065, 420, 294, 257, 2167, 352, 420, 411, 588, 2661, 51160], "temperature": 0.0, "avg_logprob": -0.1214154210583917, "compression_ratio": 1.9084249084249085, "no_speech_prob": 0.0030879660043865442}, {"id": 2202, "seek": 525774, "start": 5273.74, "end": 5275.639999999999, "text": " go directly into dlogits", "tokens": [51165, 352, 3838, 666, 274, 4987, 1208, 51260], "temperature": 0.0, "avg_logprob": -0.1214154210583917, "compression_ratio": 1.9084249084249085, "no_speech_prob": 0.0030879660043865442}, {"id": 2203, "seek": 525774, "start": 5275.74, "end": 5277.639999999999, "text": " and we need to implement dlogits", "tokens": [51265, 293, 321, 643, 281, 4445, 274, 4987, 1208, 51360], "temperature": 0.0, "avg_logprob": -0.1214154210583917, "compression_ratio": 1.9084249084249085, "no_speech_prob": 0.0030879660043865442}, {"id": 2204, "seek": 525774, "start": 5277.74, "end": 5279.639999999999, "text": " as a function of logits", "tokens": [51365, 382, 257, 2445, 295, 3565, 1208, 51460], "temperature": 0.0, "avg_logprob": -0.1214154210583917, "compression_ratio": 1.9084249084249085, "no_speech_prob": 0.0030879660043865442}, {"id": 2205, "seek": 525774, "start": 5279.74, "end": 5281.639999999999, "text": " and yb's", "tokens": [51465, 293, 288, 65, 311, 51560], "temperature": 0.0, "avg_logprob": -0.1214154210583917, "compression_ratio": 1.9084249084249085, "no_speech_prob": 0.0030879660043865442}, {"id": 2206, "seek": 525774, "start": 5281.74, "end": 5283.639999999999, "text": " but it will be significantly shorter", "tokens": [51565, 457, 309, 486, 312, 10591, 11639, 51660], "temperature": 0.0, "avg_logprob": -0.1214154210583917, "compression_ratio": 1.9084249084249085, "no_speech_prob": 0.0030879660043865442}, {"id": 2207, "seek": 525774, "start": 5283.74, "end": 5285.639999999999, "text": " than whatever we did here", "tokens": [51665, 813, 2035, 321, 630, 510, 51760], "temperature": 0.0, "avg_logprob": -0.1214154210583917, "compression_ratio": 1.9084249084249085, "no_speech_prob": 0.0030879660043865442}, {"id": 2208, "seek": 525774, "start": 5285.74, "end": 5287.639999999999, "text": " where to get to dlogits", "tokens": [51765, 689, 281, 483, 281, 274, 4987, 1208, 51860], "temperature": 0.0, "avg_logprob": -0.1214154210583917, "compression_ratio": 1.9084249084249085, "no_speech_prob": 0.0030879660043865442}, {"id": 2209, "seek": 528764, "start": 5287.64, "end": 5289.54, "text": " we need to go all the way here", "tokens": [50365, 321, 643, 281, 352, 439, 264, 636, 510, 50460], "temperature": 0.0, "avg_logprob": -0.05606837350814069, "compression_ratio": 1.886178861788618, "no_speech_prob": 0.0017940065590664744}, {"id": 2210, "seek": 528764, "start": 5289.64, "end": 5291.54, "text": " so all of this work can be skipped", "tokens": [50465, 370, 439, 295, 341, 589, 393, 312, 30193, 50560], "temperature": 0.0, "avg_logprob": -0.05606837350814069, "compression_ratio": 1.886178861788618, "no_speech_prob": 0.0017940065590664744}, {"id": 2211, "seek": 528764, "start": 5291.64, "end": 5293.54, "text": " in a much much simpler mathematical expression", "tokens": [50565, 294, 257, 709, 709, 18587, 18894, 6114, 50660], "temperature": 0.0, "avg_logprob": -0.05606837350814069, "compression_ratio": 1.886178861788618, "no_speech_prob": 0.0017940065590664744}, {"id": 2212, "seek": 528764, "start": 5293.64, "end": 5295.54, "text": " that you can implement here", "tokens": [50665, 300, 291, 393, 4445, 510, 50760], "temperature": 0.0, "avg_logprob": -0.05606837350814069, "compression_ratio": 1.886178861788618, "no_speech_prob": 0.0017940065590664744}, {"id": 2213, "seek": 528764, "start": 5295.64, "end": 5297.54, "text": " so you can", "tokens": [50765, 370, 291, 393, 50860], "temperature": 0.0, "avg_logprob": -0.05606837350814069, "compression_ratio": 1.886178861788618, "no_speech_prob": 0.0017940065590664744}, {"id": 2214, "seek": 528764, "start": 5297.64, "end": 5299.54, "text": " give it a shot yourself", "tokens": [50865, 976, 309, 257, 3347, 1803, 50960], "temperature": 0.0, "avg_logprob": -0.05606837350814069, "compression_ratio": 1.886178861788618, "no_speech_prob": 0.0017940065590664744}, {"id": 2215, "seek": 528764, "start": 5299.64, "end": 5301.54, "text": " basically look at what exactly", "tokens": [50965, 1936, 574, 412, 437, 2293, 51060], "temperature": 0.0, "avg_logprob": -0.05606837350814069, "compression_ratio": 1.886178861788618, "no_speech_prob": 0.0017940065590664744}, {"id": 2216, "seek": 528764, "start": 5301.64, "end": 5303.54, "text": " is the mathematical expression of loss", "tokens": [51065, 307, 264, 18894, 6114, 295, 4470, 51160], "temperature": 0.0, "avg_logprob": -0.05606837350814069, "compression_ratio": 1.886178861788618, "no_speech_prob": 0.0017940065590664744}, {"id": 2217, "seek": 528764, "start": 5303.64, "end": 5305.54, "text": " and differentiate with respect to the logits", "tokens": [51165, 293, 23203, 365, 3104, 281, 264, 3565, 1208, 51260], "temperature": 0.0, "avg_logprob": -0.05606837350814069, "compression_ratio": 1.886178861788618, "no_speech_prob": 0.0017940065590664744}, {"id": 2218, "seek": 528764, "start": 5305.64, "end": 5307.54, "text": " so let me show you", "tokens": [51265, 370, 718, 385, 855, 291, 51360], "temperature": 0.0, "avg_logprob": -0.05606837350814069, "compression_ratio": 1.886178861788618, "no_speech_prob": 0.0017940065590664744}, {"id": 2219, "seek": 528764, "start": 5307.64, "end": 5309.54, "text": " a hint", "tokens": [51365, 257, 12075, 51460], "temperature": 0.0, "avg_logprob": -0.05606837350814069, "compression_ratio": 1.886178861788618, "no_speech_prob": 0.0017940065590664744}, {"id": 2220, "seek": 528764, "start": 5309.64, "end": 5311.54, "text": " you can of course try it fully yourself", "tokens": [51465, 291, 393, 295, 1164, 853, 309, 4498, 1803, 51560], "temperature": 0.0, "avg_logprob": -0.05606837350814069, "compression_ratio": 1.886178861788618, "no_speech_prob": 0.0017940065590664744}, {"id": 2221, "seek": 528764, "start": 5311.64, "end": 5313.54, "text": " but if not I can give you some hint", "tokens": [51565, 457, 498, 406, 286, 393, 976, 291, 512, 12075, 51660], "temperature": 0.0, "avg_logprob": -0.05606837350814069, "compression_ratio": 1.886178861788618, "no_speech_prob": 0.0017940065590664744}, {"id": 2222, "seek": 528764, "start": 5313.64, "end": 5315.54, "text": " of how to get started mathematically", "tokens": [51665, 295, 577, 281, 483, 1409, 44003, 51760], "temperature": 0.0, "avg_logprob": -0.05606837350814069, "compression_ratio": 1.886178861788618, "no_speech_prob": 0.0017940065590664744}, {"id": 2223, "seek": 528764, "start": 5315.64, "end": 5317.54, "text": " so basically what's happening here", "tokens": [51765, 370, 1936, 437, 311, 2737, 510, 51860], "temperature": 0.0, "avg_logprob": -0.05606837350814069, "compression_ratio": 1.886178861788618, "no_speech_prob": 0.0017940065590664744}, {"id": 2224, "seek": 531764, "start": 5317.64, "end": 5319.54, "text": " is we have logits", "tokens": [50365, 307, 321, 362, 3565, 1208, 50460], "temperature": 0.0, "avg_logprob": -0.047071509891086154, "compression_ratio": 2.0137614678899083, "no_speech_prob": 0.00213510449975729}, {"id": 2225, "seek": 531764, "start": 5319.64, "end": 5321.54, "text": " then there's the softmax", "tokens": [50465, 550, 456, 311, 264, 2787, 41167, 50560], "temperature": 0.0, "avg_logprob": -0.047071509891086154, "compression_ratio": 2.0137614678899083, "no_speech_prob": 0.00213510449975729}, {"id": 2226, "seek": 531764, "start": 5321.64, "end": 5323.54, "text": " that takes the logits and gives you probabilities", "tokens": [50565, 300, 2516, 264, 3565, 1208, 293, 2709, 291, 33783, 50660], "temperature": 0.0, "avg_logprob": -0.047071509891086154, "compression_ratio": 2.0137614678899083, "no_speech_prob": 0.00213510449975729}, {"id": 2227, "seek": 531764, "start": 5323.64, "end": 5325.54, "text": " then we are using the identity", "tokens": [50665, 550, 321, 366, 1228, 264, 6575, 50760], "temperature": 0.0, "avg_logprob": -0.047071509891086154, "compression_ratio": 2.0137614678899083, "no_speech_prob": 0.00213510449975729}, {"id": 2228, "seek": 531764, "start": 5325.64, "end": 5327.54, "text": " of the correct next character", "tokens": [50765, 295, 264, 3006, 958, 2517, 50860], "temperature": 0.0, "avg_logprob": -0.047071509891086154, "compression_ratio": 2.0137614678899083, "no_speech_prob": 0.00213510449975729}, {"id": 2229, "seek": 531764, "start": 5327.64, "end": 5329.54, "text": " to pluck out a row of probabilities", "tokens": [50865, 281, 41514, 484, 257, 5386, 295, 33783, 50960], "temperature": 0.0, "avg_logprob": -0.047071509891086154, "compression_ratio": 2.0137614678899083, "no_speech_prob": 0.00213510449975729}, {"id": 2230, "seek": 531764, "start": 5329.64, "end": 5331.54, "text": " take the negative log of it", "tokens": [50965, 747, 264, 3671, 3565, 295, 309, 51060], "temperature": 0.0, "avg_logprob": -0.047071509891086154, "compression_ratio": 2.0137614678899083, "no_speech_prob": 0.00213510449975729}, {"id": 2231, "seek": 531764, "start": 5331.64, "end": 5333.54, "text": " to get our negative log probability", "tokens": [51065, 281, 483, 527, 3671, 3565, 8482, 51160], "temperature": 0.0, "avg_logprob": -0.047071509891086154, "compression_ratio": 2.0137614678899083, "no_speech_prob": 0.00213510449975729}, {"id": 2232, "seek": 531764, "start": 5333.64, "end": 5335.54, "text": " and then we average up", "tokens": [51165, 293, 550, 321, 4274, 493, 51260], "temperature": 0.0, "avg_logprob": -0.047071509891086154, "compression_ratio": 2.0137614678899083, "no_speech_prob": 0.00213510449975729}, {"id": 2233, "seek": 531764, "start": 5335.64, "end": 5337.54, "text": " all the log probabilities", "tokens": [51265, 439, 264, 3565, 33783, 51360], "temperature": 0.0, "avg_logprob": -0.047071509891086154, "compression_ratio": 2.0137614678899083, "no_speech_prob": 0.00213510449975729}, {"id": 2234, "seek": 531764, "start": 5337.64, "end": 5339.54, "text": " or negative log probabilities", "tokens": [51365, 420, 3671, 3565, 33783, 51460], "temperature": 0.0, "avg_logprob": -0.047071509891086154, "compression_ratio": 2.0137614678899083, "no_speech_prob": 0.00213510449975729}, {"id": 2235, "seek": 531764, "start": 5339.64, "end": 5341.54, "text": " to get our loss", "tokens": [51465, 281, 483, 527, 4470, 51560], "temperature": 0.0, "avg_logprob": -0.047071509891086154, "compression_ratio": 2.0137614678899083, "no_speech_prob": 0.00213510449975729}, {"id": 2236, "seek": 531764, "start": 5341.64, "end": 5343.54, "text": " so basically what we have", "tokens": [51565, 370, 1936, 437, 321, 362, 51660], "temperature": 0.0, "avg_logprob": -0.047071509891086154, "compression_ratio": 2.0137614678899083, "no_speech_prob": 0.00213510449975729}, {"id": 2237, "seek": 531764, "start": 5343.64, "end": 5345.54, "text": " is for a single individual example", "tokens": [51665, 307, 337, 257, 2167, 2609, 1365, 51760], "temperature": 0.0, "avg_logprob": -0.047071509891086154, "compression_ratio": 2.0137614678899083, "no_speech_prob": 0.00213510449975729}, {"id": 2238, "seek": 531764, "start": 5345.64, "end": 5347.54, "text": " we have that loss is equal to", "tokens": [51765, 321, 362, 300, 4470, 307, 2681, 281, 51860], "temperature": 0.0, "avg_logprob": -0.047071509891086154, "compression_ratio": 2.0137614678899083, "no_speech_prob": 0.00213510449975729}, {"id": 2239, "seek": 534754, "start": 5347.54, "end": 5349.44, "text": " where p here is kind of like", "tokens": [50365, 689, 280, 510, 307, 733, 295, 411, 50460], "temperature": 0.0, "avg_logprob": -0.07865149573942201, "compression_ratio": 1.7951219512195122, "no_speech_prob": 0.0012828644830733538}, {"id": 2240, "seek": 534754, "start": 5349.54, "end": 5351.44, "text": " thought of as a vector", "tokens": [50465, 1194, 295, 382, 257, 8062, 50560], "temperature": 0.0, "avg_logprob": -0.07865149573942201, "compression_ratio": 1.7951219512195122, "no_speech_prob": 0.0012828644830733538}, {"id": 2241, "seek": 534754, "start": 5351.54, "end": 5353.44, "text": " of all the probabilities", "tokens": [50565, 295, 439, 264, 33783, 50660], "temperature": 0.0, "avg_logprob": -0.07865149573942201, "compression_ratio": 1.7951219512195122, "no_speech_prob": 0.0012828644830733538}, {"id": 2242, "seek": 534754, "start": 5353.54, "end": 5355.44, "text": " so at the yth position", "tokens": [50665, 370, 412, 264, 288, 392, 2535, 50760], "temperature": 0.0, "avg_logprob": -0.07865149573942201, "compression_ratio": 1.7951219512195122, "no_speech_prob": 0.0012828644830733538}, {"id": 2243, "seek": 534754, "start": 5355.54, "end": 5357.44, "text": " where y is the label", "tokens": [50765, 689, 288, 307, 264, 7645, 50860], "temperature": 0.0, "avg_logprob": -0.07865149573942201, "compression_ratio": 1.7951219512195122, "no_speech_prob": 0.0012828644830733538}, {"id": 2244, "seek": 534754, "start": 5357.54, "end": 5359.44, "text": " and we have that p here of course", "tokens": [50865, 293, 321, 362, 300, 280, 510, 295, 1164, 50960], "temperature": 0.0, "avg_logprob": -0.07865149573942201, "compression_ratio": 1.7951219512195122, "no_speech_prob": 0.0012828644830733538}, {"id": 2245, "seek": 534754, "start": 5359.54, "end": 5361.44, "text": " is the softmax", "tokens": [50965, 307, 264, 2787, 41167, 51060], "temperature": 0.0, "avg_logprob": -0.07865149573942201, "compression_ratio": 1.7951219512195122, "no_speech_prob": 0.0012828644830733538}, {"id": 2246, "seek": 534754, "start": 5361.54, "end": 5363.44, "text": " so the ith component of p", "tokens": [51065, 370, 264, 309, 71, 6542, 295, 280, 51160], "temperature": 0.0, "avg_logprob": -0.07865149573942201, "compression_ratio": 1.7951219512195122, "no_speech_prob": 0.0012828644830733538}, {"id": 2247, "seek": 534754, "start": 5363.54, "end": 5365.44, "text": " of this probability vector", "tokens": [51165, 295, 341, 8482, 8062, 51260], "temperature": 0.0, "avg_logprob": -0.07865149573942201, "compression_ratio": 1.7951219512195122, "no_speech_prob": 0.0012828644830733538}, {"id": 2248, "seek": 534754, "start": 5365.54, "end": 5367.44, "text": " is just the softmax function", "tokens": [51265, 307, 445, 264, 2787, 41167, 2445, 51360], "temperature": 0.0, "avg_logprob": -0.07865149573942201, "compression_ratio": 1.7951219512195122, "no_speech_prob": 0.0012828644830733538}, {"id": 2249, "seek": 534754, "start": 5367.54, "end": 5369.44, "text": " so raising all the logits", "tokens": [51365, 370, 11225, 439, 264, 3565, 1208, 51460], "temperature": 0.0, "avg_logprob": -0.07865149573942201, "compression_ratio": 1.7951219512195122, "no_speech_prob": 0.0012828644830733538}, {"id": 2250, "seek": 534754, "start": 5369.54, "end": 5371.44, "text": " basically to the power of e", "tokens": [51465, 1936, 281, 264, 1347, 295, 308, 51560], "temperature": 0.0, "avg_logprob": -0.07865149573942201, "compression_ratio": 1.7951219512195122, "no_speech_prob": 0.0012828644830733538}, {"id": 2251, "seek": 534754, "start": 5371.54, "end": 5373.44, "text": " and normalizing", "tokens": [51565, 293, 2710, 3319, 51660], "temperature": 0.0, "avg_logprob": -0.07865149573942201, "compression_ratio": 1.7951219512195122, "no_speech_prob": 0.0012828644830733538}, {"id": 2252, "seek": 534754, "start": 5373.54, "end": 5375.44, "text": " so everything sums to one", "tokens": [51665, 370, 1203, 34499, 281, 472, 51760], "temperature": 0.0, "avg_logprob": -0.07865149573942201, "compression_ratio": 1.7951219512195122, "no_speech_prob": 0.0012828644830733538}, {"id": 2253, "seek": 534754, "start": 5375.54, "end": 5377.44, "text": " now if you write out", "tokens": [51765, 586, 498, 291, 2464, 484, 51860], "temperature": 0.0, "avg_logprob": -0.07865149573942201, "compression_ratio": 1.7951219512195122, "no_speech_prob": 0.0012828644830733538}, {"id": 2254, "seek": 537744, "start": 5377.44, "end": 5379.339999999999, "text": " this expression here", "tokens": [50365, 341, 6114, 510, 50460], "temperature": 0.0, "avg_logprob": -0.12263226008915401, "compression_ratio": 1.948, "no_speech_prob": 0.0019216617802157998}, {"id": 2255, "seek": 537744, "start": 5379.44, "end": 5381.339999999999, "text": " you can just write out the softmax", "tokens": [50465, 291, 393, 445, 2464, 484, 264, 2787, 41167, 50560], "temperature": 0.0, "avg_logprob": -0.12263226008915401, "compression_ratio": 1.948, "no_speech_prob": 0.0019216617802157998}, {"id": 2256, "seek": 537744, "start": 5381.44, "end": 5383.339999999999, "text": " and then basically what we're interested in", "tokens": [50565, 293, 550, 1936, 437, 321, 434, 3102, 294, 50660], "temperature": 0.0, "avg_logprob": -0.12263226008915401, "compression_ratio": 1.948, "no_speech_prob": 0.0019216617802157998}, {"id": 2257, "seek": 537744, "start": 5383.44, "end": 5385.339999999999, "text": " is we're interested in the derivative of the loss", "tokens": [50665, 307, 321, 434, 3102, 294, 264, 13760, 295, 264, 4470, 50760], "temperature": 0.0, "avg_logprob": -0.12263226008915401, "compression_ratio": 1.948, "no_speech_prob": 0.0019216617802157998}, {"id": 2258, "seek": 537744, "start": 5385.44, "end": 5387.339999999999, "text": " with respect to the ith logit", "tokens": [50765, 365, 3104, 281, 264, 309, 71, 3565, 270, 50860], "temperature": 0.0, "avg_logprob": -0.12263226008915401, "compression_ratio": 1.948, "no_speech_prob": 0.0019216617802157998}, {"id": 2259, "seek": 537744, "start": 5387.44, "end": 5389.339999999999, "text": " and so basically it's a d by dLi", "tokens": [50865, 293, 370, 1936, 309, 311, 257, 274, 538, 274, 43, 72, 50960], "temperature": 0.0, "avg_logprob": -0.12263226008915401, "compression_ratio": 1.948, "no_speech_prob": 0.0019216617802157998}, {"id": 2260, "seek": 537744, "start": 5389.44, "end": 5391.339999999999, "text": " of this expression here", "tokens": [50965, 295, 341, 6114, 510, 51060], "temperature": 0.0, "avg_logprob": -0.12263226008915401, "compression_ratio": 1.948, "no_speech_prob": 0.0019216617802157998}, {"id": 2261, "seek": 537744, "start": 5391.44, "end": 5393.339999999999, "text": " where we have l indexed", "tokens": [51065, 689, 321, 362, 287, 8186, 292, 51160], "temperature": 0.0, "avg_logprob": -0.12263226008915401, "compression_ratio": 1.948, "no_speech_prob": 0.0019216617802157998}, {"id": 2262, "seek": 537744, "start": 5393.44, "end": 5395.339999999999, "text": " with the specific label y", "tokens": [51165, 365, 264, 2685, 7645, 288, 51260], "temperature": 0.0, "avg_logprob": -0.12263226008915401, "compression_ratio": 1.948, "no_speech_prob": 0.0019216617802157998}, {"id": 2263, "seek": 537744, "start": 5395.44, "end": 5397.339999999999, "text": " and on the bottom we have a sum over j", "tokens": [51265, 293, 322, 264, 2767, 321, 362, 257, 2408, 670, 361, 51360], "temperature": 0.0, "avg_logprob": -0.12263226008915401, "compression_ratio": 1.948, "no_speech_prob": 0.0019216617802157998}, {"id": 2264, "seek": 537744, "start": 5397.44, "end": 5399.339999999999, "text": " of e to the lj", "tokens": [51365, 295, 308, 281, 264, 287, 73, 51460], "temperature": 0.0, "avg_logprob": -0.12263226008915401, "compression_ratio": 1.948, "no_speech_prob": 0.0019216617802157998}, {"id": 2265, "seek": 537744, "start": 5399.44, "end": 5401.339999999999, "text": " and the negative log of all that", "tokens": [51465, 293, 264, 3671, 3565, 295, 439, 300, 51560], "temperature": 0.0, "avg_logprob": -0.12263226008915401, "compression_ratio": 1.948, "no_speech_prob": 0.0019216617802157998}, {"id": 2266, "seek": 537744, "start": 5401.44, "end": 5403.339999999999, "text": " so potentially give it a shot", "tokens": [51565, 370, 7263, 976, 309, 257, 3347, 51660], "temperature": 0.0, "avg_logprob": -0.12263226008915401, "compression_ratio": 1.948, "no_speech_prob": 0.0019216617802157998}, {"id": 2267, "seek": 537744, "start": 5403.44, "end": 5405.339999999999, "text": " pen and paper and see if you can actually", "tokens": [51665, 3435, 293, 3035, 293, 536, 498, 291, 393, 767, 51760], "temperature": 0.0, "avg_logprob": -0.12263226008915401, "compression_ratio": 1.948, "no_speech_prob": 0.0019216617802157998}, {"id": 2268, "seek": 537744, "start": 5405.44, "end": 5407.339999999999, "text": " derive the expression for the loss by dLi", "tokens": [51765, 28446, 264, 6114, 337, 264, 4470, 538, 274, 43, 72, 51860], "temperature": 0.0, "avg_logprob": -0.12263226008915401, "compression_ratio": 1.948, "no_speech_prob": 0.0019216617802157998}, {"id": 2269, "seek": 540734, "start": 5407.34, "end": 5409.24, "text": " and to implement it here", "tokens": [50365, 293, 281, 4445, 309, 510, 50460], "temperature": 0.0, "avg_logprob": -0.06707348823547363, "compression_ratio": 1.7971530249110321, "no_speech_prob": 0.0008748275577090681}, {"id": 2270, "seek": 540734, "start": 5409.34, "end": 5411.24, "text": " okay so I'm going to give away the result here", "tokens": [50465, 1392, 370, 286, 478, 516, 281, 976, 1314, 264, 1874, 510, 50560], "temperature": 0.0, "avg_logprob": -0.06707348823547363, "compression_ratio": 1.7971530249110321, "no_speech_prob": 0.0008748275577090681}, {"id": 2271, "seek": 540734, "start": 5411.34, "end": 5413.24, "text": " so this is some of the math I did", "tokens": [50565, 370, 341, 307, 512, 295, 264, 5221, 286, 630, 50660], "temperature": 0.0, "avg_logprob": -0.06707348823547363, "compression_ratio": 1.7971530249110321, "no_speech_prob": 0.0008748275577090681}, {"id": 2272, "seek": 540734, "start": 5413.34, "end": 5415.24, "text": " to derive the gradients", "tokens": [50665, 281, 28446, 264, 2771, 2448, 50760], "temperature": 0.0, "avg_logprob": -0.06707348823547363, "compression_ratio": 1.7971530249110321, "no_speech_prob": 0.0008748275577090681}, {"id": 2273, "seek": 540734, "start": 5415.34, "end": 5417.24, "text": " analytically", "tokens": [50765, 10783, 984, 50860], "temperature": 0.0, "avg_logprob": -0.06707348823547363, "compression_ratio": 1.7971530249110321, "no_speech_prob": 0.0008748275577090681}, {"id": 2274, "seek": 540734, "start": 5417.34, "end": 5419.24, "text": " and so we see here that I'm just applying", "tokens": [50865, 293, 370, 321, 536, 510, 300, 286, 478, 445, 9275, 50960], "temperature": 0.0, "avg_logprob": -0.06707348823547363, "compression_ratio": 1.7971530249110321, "no_speech_prob": 0.0008748275577090681}, {"id": 2275, "seek": 540734, "start": 5419.34, "end": 5421.24, "text": " the rules of calculus from your first or second year", "tokens": [50965, 264, 4474, 295, 33400, 490, 428, 700, 420, 1150, 1064, 51060], "temperature": 0.0, "avg_logprob": -0.06707348823547363, "compression_ratio": 1.7971530249110321, "no_speech_prob": 0.0008748275577090681}, {"id": 2276, "seek": 540734, "start": 5421.34, "end": 5423.24, "text": " of bachelor's degree if you took it", "tokens": [51065, 295, 25947, 311, 4314, 498, 291, 1890, 309, 51160], "temperature": 0.0, "avg_logprob": -0.06707348823547363, "compression_ratio": 1.7971530249110321, "no_speech_prob": 0.0008748275577090681}, {"id": 2277, "seek": 540734, "start": 5423.34, "end": 5425.24, "text": " and we see that the expressions", "tokens": [51165, 293, 321, 536, 300, 264, 15277, 51260], "temperature": 0.0, "avg_logprob": -0.06707348823547363, "compression_ratio": 1.7971530249110321, "no_speech_prob": 0.0008748275577090681}, {"id": 2278, "seek": 540734, "start": 5425.34, "end": 5427.24, "text": " actually simplify quite a bit", "tokens": [51265, 767, 20460, 1596, 257, 857, 51360], "temperature": 0.0, "avg_logprob": -0.06707348823547363, "compression_ratio": 1.7971530249110321, "no_speech_prob": 0.0008748275577090681}, {"id": 2279, "seek": 540734, "start": 5427.34, "end": 5429.24, "text": " you have to separate out the analysis", "tokens": [51365, 291, 362, 281, 4994, 484, 264, 5215, 51460], "temperature": 0.0, "avg_logprob": -0.06707348823547363, "compression_ratio": 1.7971530249110321, "no_speech_prob": 0.0008748275577090681}, {"id": 2280, "seek": 540734, "start": 5429.34, "end": 5431.24, "text": " in the case where the ith index", "tokens": [51465, 294, 264, 1389, 689, 264, 309, 71, 8186, 51560], "temperature": 0.0, "avg_logprob": -0.06707348823547363, "compression_ratio": 1.7971530249110321, "no_speech_prob": 0.0008748275577090681}, {"id": 2281, "seek": 540734, "start": 5431.34, "end": 5433.24, "text": " that you're interested in inside logits", "tokens": [51565, 300, 291, 434, 3102, 294, 1854, 3565, 1208, 51660], "temperature": 0.0, "avg_logprob": -0.06707348823547363, "compression_ratio": 1.7971530249110321, "no_speech_prob": 0.0008748275577090681}, {"id": 2282, "seek": 540734, "start": 5433.34, "end": 5435.24, "text": " is either equal to the label", "tokens": [51665, 307, 2139, 2681, 281, 264, 7645, 51760], "temperature": 0.0, "avg_logprob": -0.06707348823547363, "compression_ratio": 1.7971530249110321, "no_speech_prob": 0.0008748275577090681}, {"id": 2283, "seek": 540734, "start": 5435.34, "end": 5437.24, "text": " or it's not equal to the label", "tokens": [51765, 420, 309, 311, 406, 2681, 281, 264, 7645, 51860], "temperature": 0.0, "avg_logprob": -0.06707348823547363, "compression_ratio": 1.7971530249110321, "no_speech_prob": 0.0008748275577090681}, {"id": 2284, "seek": 543724, "start": 5437.24, "end": 5439.139999999999, "text": " in a slightly different way", "tokens": [50365, 294, 257, 4748, 819, 636, 50460], "temperature": 0.0, "avg_logprob": -0.09017286682128907, "compression_ratio": 1.8744769874476988, "no_speech_prob": 0.0012040293077006936}, {"id": 2285, "seek": 543724, "start": 5439.24, "end": 5441.139999999999, "text": " and what we end up with is something", "tokens": [50465, 293, 437, 321, 917, 493, 365, 307, 746, 50560], "temperature": 0.0, "avg_logprob": -0.09017286682128907, "compression_ratio": 1.8744769874476988, "no_speech_prob": 0.0012040293077006936}, {"id": 2286, "seek": 543724, "start": 5441.24, "end": 5443.139999999999, "text": " very very simple", "tokens": [50565, 588, 588, 2199, 50660], "temperature": 0.0, "avg_logprob": -0.09017286682128907, "compression_ratio": 1.8744769874476988, "no_speech_prob": 0.0012040293077006936}, {"id": 2287, "seek": 543724, "start": 5443.24, "end": 5445.139999999999, "text": " we either end up with basically p at i", "tokens": [50665, 321, 2139, 917, 493, 365, 1936, 280, 412, 741, 50760], "temperature": 0.0, "avg_logprob": -0.09017286682128907, "compression_ratio": 1.8744769874476988, "no_speech_prob": 0.0012040293077006936}, {"id": 2288, "seek": 543724, "start": 5445.24, "end": 5447.139999999999, "text": " where p is again this vector of", "tokens": [50765, 689, 280, 307, 797, 341, 8062, 295, 50860], "temperature": 0.0, "avg_logprob": -0.09017286682128907, "compression_ratio": 1.8744769874476988, "no_speech_prob": 0.0012040293077006936}, {"id": 2289, "seek": 543724, "start": 5447.24, "end": 5449.139999999999, "text": " probabilities after a softmax", "tokens": [50865, 33783, 934, 257, 2787, 41167, 50960], "temperature": 0.0, "avg_logprob": -0.09017286682128907, "compression_ratio": 1.8744769874476988, "no_speech_prob": 0.0012040293077006936}, {"id": 2290, "seek": 543724, "start": 5449.24, "end": 5451.139999999999, "text": " or p at i minus one", "tokens": [50965, 420, 280, 412, 741, 3175, 472, 51060], "temperature": 0.0, "avg_logprob": -0.09017286682128907, "compression_ratio": 1.8744769874476988, "no_speech_prob": 0.0012040293077006936}, {"id": 2291, "seek": 543724, "start": 5451.24, "end": 5453.139999999999, "text": " where we just simply subtract a one", "tokens": [51065, 689, 321, 445, 2935, 16390, 257, 472, 51160], "temperature": 0.0, "avg_logprob": -0.09017286682128907, "compression_ratio": 1.8744769874476988, "no_speech_prob": 0.0012040293077006936}, {"id": 2292, "seek": 543724, "start": 5453.24, "end": 5455.139999999999, "text": " but in any case we just need to calculate", "tokens": [51165, 457, 294, 604, 1389, 321, 445, 643, 281, 8873, 51260], "temperature": 0.0, "avg_logprob": -0.09017286682128907, "compression_ratio": 1.8744769874476988, "no_speech_prob": 0.0012040293077006936}, {"id": 2293, "seek": 543724, "start": 5455.24, "end": 5457.139999999999, "text": " the softmax p", "tokens": [51265, 264, 2787, 41167, 280, 51360], "temperature": 0.0, "avg_logprob": -0.09017286682128907, "compression_ratio": 1.8744769874476988, "no_speech_prob": 0.0012040293077006936}, {"id": 2294, "seek": 543724, "start": 5457.24, "end": 5459.139999999999, "text": " and then in the correct dimension", "tokens": [51365, 293, 550, 294, 264, 3006, 10139, 51460], "temperature": 0.0, "avg_logprob": -0.09017286682128907, "compression_ratio": 1.8744769874476988, "no_speech_prob": 0.0012040293077006936}, {"id": 2295, "seek": 543724, "start": 5459.24, "end": 5461.139999999999, "text": " we need to subtract a one", "tokens": [51465, 321, 643, 281, 16390, 257, 472, 51560], "temperature": 0.0, "avg_logprob": -0.09017286682128907, "compression_ratio": 1.8744769874476988, "no_speech_prob": 0.0012040293077006936}, {"id": 2296, "seek": 543724, "start": 5461.24, "end": 5463.139999999999, "text": " and that's the gradient", "tokens": [51565, 293, 300, 311, 264, 16235, 51660], "temperature": 0.0, "avg_logprob": -0.09017286682128907, "compression_ratio": 1.8744769874476988, "no_speech_prob": 0.0012040293077006936}, {"id": 2297, "seek": 543724, "start": 5463.24, "end": 5465.139999999999, "text": " the form that it takes analytically", "tokens": [51665, 264, 1254, 300, 309, 2516, 10783, 984, 51760], "temperature": 0.0, "avg_logprob": -0.09017286682128907, "compression_ratio": 1.8744769874476988, "no_speech_prob": 0.0012040293077006936}, {"id": 2298, "seek": 543724, "start": 5465.24, "end": 5467.139999999999, "text": " so let's implement this basically", "tokens": [51765, 370, 718, 311, 4445, 341, 1936, 51860], "temperature": 0.0, "avg_logprob": -0.09017286682128907, "compression_ratio": 1.8744769874476988, "no_speech_prob": 0.0012040293077006936}, {"id": 2299, "seek": 546714, "start": 5467.14, "end": 5469.04, "text": " but here we are working with batches of examples", "tokens": [50365, 457, 510, 321, 366, 1364, 365, 15245, 279, 295, 5110, 50460], "temperature": 0.0, "avg_logprob": -0.07034910900492064, "compression_ratio": 2.055084745762712, "no_speech_prob": 0.00122642214410007}, {"id": 2300, "seek": 546714, "start": 5469.14, "end": 5471.04, "text": " so we have to be careful of that", "tokens": [50465, 370, 321, 362, 281, 312, 5026, 295, 300, 50560], "temperature": 0.0, "avg_logprob": -0.07034910900492064, "compression_ratio": 2.055084745762712, "no_speech_prob": 0.00122642214410007}, {"id": 2301, "seek": 546714, "start": 5471.14, "end": 5473.04, "text": " and then the loss for a batch", "tokens": [50565, 293, 550, 264, 4470, 337, 257, 15245, 50660], "temperature": 0.0, "avg_logprob": -0.07034910900492064, "compression_ratio": 2.055084745762712, "no_speech_prob": 0.00122642214410007}, {"id": 2302, "seek": 546714, "start": 5473.14, "end": 5475.04, "text": " is the average loss over all the examples", "tokens": [50665, 307, 264, 4274, 4470, 670, 439, 264, 5110, 50760], "temperature": 0.0, "avg_logprob": -0.07034910900492064, "compression_ratio": 2.055084745762712, "no_speech_prob": 0.00122642214410007}, {"id": 2303, "seek": 546714, "start": 5475.14, "end": 5477.04, "text": " so in other words", "tokens": [50765, 370, 294, 661, 2283, 50860], "temperature": 0.0, "avg_logprob": -0.07034910900492064, "compression_ratio": 2.055084745762712, "no_speech_prob": 0.00122642214410007}, {"id": 2304, "seek": 546714, "start": 5477.14, "end": 5479.04, "text": " is the example for all the individual examples", "tokens": [50865, 307, 264, 1365, 337, 439, 264, 2609, 5110, 50960], "temperature": 0.0, "avg_logprob": -0.07034910900492064, "compression_ratio": 2.055084745762712, "no_speech_prob": 0.00122642214410007}, {"id": 2305, "seek": 546714, "start": 5479.14, "end": 5481.04, "text": " is the loss for each individual example", "tokens": [50965, 307, 264, 4470, 337, 1184, 2609, 1365, 51060], "temperature": 0.0, "avg_logprob": -0.07034910900492064, "compression_ratio": 2.055084745762712, "no_speech_prob": 0.00122642214410007}, {"id": 2306, "seek": 546714, "start": 5481.14, "end": 5483.04, "text": " summed up and then divided by n", "tokens": [51065, 2408, 1912, 493, 293, 550, 6666, 538, 297, 51160], "temperature": 0.0, "avg_logprob": -0.07034910900492064, "compression_ratio": 2.055084745762712, "no_speech_prob": 0.00122642214410007}, {"id": 2307, "seek": 546714, "start": 5483.14, "end": 5485.04, "text": " and we have to backpropagate through that", "tokens": [51165, 293, 321, 362, 281, 646, 79, 1513, 559, 473, 807, 300, 51260], "temperature": 0.0, "avg_logprob": -0.07034910900492064, "compression_ratio": 2.055084745762712, "no_speech_prob": 0.00122642214410007}, {"id": 2308, "seek": 546714, "start": 5485.14, "end": 5487.04, "text": " as well and be careful with it", "tokens": [51265, 382, 731, 293, 312, 5026, 365, 309, 51360], "temperature": 0.0, "avg_logprob": -0.07034910900492064, "compression_ratio": 2.055084745762712, "no_speech_prob": 0.00122642214410007}, {"id": 2309, "seek": 546714, "start": 5487.14, "end": 5489.04, "text": " so dlogits", "tokens": [51365, 370, 274, 4987, 1208, 51460], "temperature": 0.0, "avg_logprob": -0.07034910900492064, "compression_ratio": 2.055084745762712, "no_speech_prob": 0.00122642214410007}, {"id": 2310, "seek": 546714, "start": 5489.14, "end": 5491.04, "text": " is going to be f dot softmax", "tokens": [51465, 307, 516, 281, 312, 283, 5893, 2787, 41167, 51560], "temperature": 0.0, "avg_logprob": -0.07034910900492064, "compression_ratio": 2.055084745762712, "no_speech_prob": 0.00122642214410007}, {"id": 2311, "seek": 546714, "start": 5491.14, "end": 5493.04, "text": " pytorch has a softmax function", "tokens": [51565, 25878, 284, 339, 575, 257, 2787, 41167, 2445, 51660], "temperature": 0.0, "avg_logprob": -0.07034910900492064, "compression_ratio": 2.055084745762712, "no_speech_prob": 0.00122642214410007}, {"id": 2312, "seek": 546714, "start": 5493.14, "end": 5495.04, "text": " that you can call", "tokens": [51665, 300, 291, 393, 818, 51760], "temperature": 0.0, "avg_logprob": -0.07034910900492064, "compression_ratio": 2.055084745762712, "no_speech_prob": 0.00122642214410007}, {"id": 2313, "seek": 546714, "start": 5495.14, "end": 5497.04, "text": " and we want to apply the softmax", "tokens": [51765, 293, 321, 528, 281, 3079, 264, 2787, 41167, 51860], "temperature": 0.0, "avg_logprob": -0.07034910900492064, "compression_ratio": 2.055084745762712, "no_speech_prob": 0.00122642214410007}, {"id": 2314, "seek": 549704, "start": 5497.04, "end": 5498.94, "text": " on the logits and we want to go", "tokens": [50365, 322, 264, 3565, 1208, 293, 321, 528, 281, 352, 50460], "temperature": 0.0, "avg_logprob": -0.057678752932055245, "compression_ratio": 1.9238578680203047, "no_speech_prob": 0.001876747002825141}, {"id": 2315, "seek": 549704, "start": 5499.04, "end": 5500.94, "text": " in the dimension", "tokens": [50465, 294, 264, 10139, 50560], "temperature": 0.0, "avg_logprob": -0.057678752932055245, "compression_ratio": 1.9238578680203047, "no_speech_prob": 0.001876747002825141}, {"id": 2316, "seek": 549704, "start": 5501.04, "end": 5502.94, "text": " that is one", "tokens": [50565, 300, 307, 472, 50660], "temperature": 0.0, "avg_logprob": -0.057678752932055245, "compression_ratio": 1.9238578680203047, "no_speech_prob": 0.001876747002825141}, {"id": 2317, "seek": 549704, "start": 5503.04, "end": 5504.94, "text": " so basically we want to do the softmax", "tokens": [50665, 370, 1936, 321, 528, 281, 360, 264, 2787, 41167, 50760], "temperature": 0.0, "avg_logprob": -0.057678752932055245, "compression_ratio": 1.9238578680203047, "no_speech_prob": 0.001876747002825141}, {"id": 2318, "seek": 549704, "start": 5505.04, "end": 5506.94, "text": " along the rows of these logits", "tokens": [50765, 2051, 264, 13241, 295, 613, 3565, 1208, 50860], "temperature": 0.0, "avg_logprob": -0.057678752932055245, "compression_ratio": 1.9238578680203047, "no_speech_prob": 0.001876747002825141}, {"id": 2319, "seek": 549704, "start": 5507.04, "end": 5508.94, "text": " then at the correct positions", "tokens": [50865, 550, 412, 264, 3006, 8432, 50960], "temperature": 0.0, "avg_logprob": -0.057678752932055245, "compression_ratio": 1.9238578680203047, "no_speech_prob": 0.001876747002825141}, {"id": 2320, "seek": 549704, "start": 5509.04, "end": 5510.94, "text": " we need to subtract a one", "tokens": [50965, 321, 643, 281, 16390, 257, 472, 51060], "temperature": 0.0, "avg_logprob": -0.057678752932055245, "compression_ratio": 1.9238578680203047, "no_speech_prob": 0.001876747002825141}, {"id": 2321, "seek": 549704, "start": 5511.04, "end": 5512.94, "text": " so dlogits at iterating over all the rows", "tokens": [51065, 370, 274, 4987, 1208, 412, 17138, 990, 670, 439, 264, 13241, 51160], "temperature": 0.0, "avg_logprob": -0.057678752932055245, "compression_ratio": 1.9238578680203047, "no_speech_prob": 0.001876747002825141}, {"id": 2322, "seek": 549704, "start": 5513.04, "end": 5514.94, "text": " and indexing", "tokens": [51165, 293, 8186, 278, 51260], "temperature": 0.0, "avg_logprob": -0.057678752932055245, "compression_ratio": 1.9238578680203047, "no_speech_prob": 0.001876747002825141}, {"id": 2323, "seek": 549704, "start": 5515.04, "end": 5516.94, "text": " into the columns", "tokens": [51265, 666, 264, 13766, 51360], "temperature": 0.0, "avg_logprob": -0.057678752932055245, "compression_ratio": 1.9238578680203047, "no_speech_prob": 0.001876747002825141}, {"id": 2324, "seek": 549704, "start": 5517.04, "end": 5518.94, "text": " provided by the correct labels", "tokens": [51365, 5649, 538, 264, 3006, 16949, 51460], "temperature": 0.0, "avg_logprob": -0.057678752932055245, "compression_ratio": 1.9238578680203047, "no_speech_prob": 0.001876747002825141}, {"id": 2325, "seek": 549704, "start": 5519.04, "end": 5520.94, "text": " inside yb", "tokens": [51465, 1854, 288, 65, 51560], "temperature": 0.0, "avg_logprob": -0.057678752932055245, "compression_ratio": 1.9238578680203047, "no_speech_prob": 0.001876747002825141}, {"id": 2326, "seek": 549704, "start": 5521.04, "end": 5522.94, "text": " we need to subtract one", "tokens": [51565, 321, 643, 281, 16390, 472, 51660], "temperature": 0.0, "avg_logprob": -0.057678752932055245, "compression_ratio": 1.9238578680203047, "no_speech_prob": 0.001876747002825141}, {"id": 2327, "seek": 549704, "start": 5523.04, "end": 5524.94, "text": " and then finally it's the average loss", "tokens": [51665, 293, 550, 2721, 309, 311, 264, 4274, 4470, 51760], "temperature": 0.0, "avg_logprob": -0.057678752932055245, "compression_ratio": 1.9238578680203047, "no_speech_prob": 0.001876747002825141}, {"id": 2328, "seek": 549704, "start": 5525.04, "end": 5526.94, "text": " that is the loss", "tokens": [51765, 300, 307, 264, 4470, 51860], "temperature": 0.0, "avg_logprob": -0.057678752932055245, "compression_ratio": 1.9238578680203047, "no_speech_prob": 0.001876747002825141}, {"id": 2329, "seek": 552694, "start": 5526.94, "end": 5528.839999999999, "text": " so in average there's a one over n", "tokens": [50365, 370, 294, 4274, 456, 311, 257, 472, 670, 297, 50460], "temperature": 0.0, "avg_logprob": -0.08328569235921908, "compression_ratio": 1.6798245614035088, "no_speech_prob": 0.0007307534106075764}, {"id": 2330, "seek": 552694, "start": 5528.94, "end": 5530.839999999999, "text": " of all the losses added up", "tokens": [50465, 295, 439, 264, 15352, 3869, 493, 50560], "temperature": 0.0, "avg_logprob": -0.08328569235921908, "compression_ratio": 1.6798245614035088, "no_speech_prob": 0.0007307534106075764}, {"id": 2331, "seek": 552694, "start": 5530.94, "end": 5532.839999999999, "text": " and so we need to also backpropagate", "tokens": [50565, 293, 370, 321, 643, 281, 611, 646, 79, 1513, 559, 473, 50660], "temperature": 0.0, "avg_logprob": -0.08328569235921908, "compression_ratio": 1.6798245614035088, "no_speech_prob": 0.0007307534106075764}, {"id": 2332, "seek": 552694, "start": 5532.94, "end": 5534.839999999999, "text": " through that division", "tokens": [50665, 807, 300, 10044, 50760], "temperature": 0.0, "avg_logprob": -0.08328569235921908, "compression_ratio": 1.6798245614035088, "no_speech_prob": 0.0007307534106075764}, {"id": 2333, "seek": 552694, "start": 5534.94, "end": 5536.839999999999, "text": " so the gradient has to be scaled down", "tokens": [50765, 370, 264, 16235, 575, 281, 312, 36039, 760, 50860], "temperature": 0.0, "avg_logprob": -0.08328569235921908, "compression_ratio": 1.6798245614035088, "no_speech_prob": 0.0007307534106075764}, {"id": 2334, "seek": 552694, "start": 5536.94, "end": 5538.839999999999, "text": " by n as well", "tokens": [50865, 538, 297, 382, 731, 50960], "temperature": 0.0, "avg_logprob": -0.08328569235921908, "compression_ratio": 1.6798245614035088, "no_speech_prob": 0.0007307534106075764}, {"id": 2335, "seek": 552694, "start": 5538.94, "end": 5540.839999999999, "text": " because of the mean", "tokens": [50965, 570, 295, 264, 914, 51060], "temperature": 0.0, "avg_logprob": -0.08328569235921908, "compression_ratio": 1.6798245614035088, "no_speech_prob": 0.0007307534106075764}, {"id": 2336, "seek": 552694, "start": 5540.94, "end": 5542.839999999999, "text": " but this otherwise should be the result", "tokens": [51065, 457, 341, 5911, 820, 312, 264, 1874, 51160], "temperature": 0.0, "avg_logprob": -0.08328569235921908, "compression_ratio": 1.6798245614035088, "no_speech_prob": 0.0007307534106075764}, {"id": 2337, "seek": 552694, "start": 5542.94, "end": 5544.839999999999, "text": " so now if we verify this", "tokens": [51165, 370, 586, 498, 321, 16888, 341, 51260], "temperature": 0.0, "avg_logprob": -0.08328569235921908, "compression_ratio": 1.6798245614035088, "no_speech_prob": 0.0007307534106075764}, {"id": 2338, "seek": 552694, "start": 5544.94, "end": 5546.839999999999, "text": " we see that we don't get an exact match", "tokens": [51265, 321, 536, 300, 321, 500, 380, 483, 364, 1900, 2995, 51360], "temperature": 0.0, "avg_logprob": -0.08328569235921908, "compression_ratio": 1.6798245614035088, "no_speech_prob": 0.0007307534106075764}, {"id": 2339, "seek": 552694, "start": 5546.94, "end": 5548.839999999999, "text": " but at the same time", "tokens": [51365, 457, 412, 264, 912, 565, 51460], "temperature": 0.0, "avg_logprob": -0.08328569235921908, "compression_ratio": 1.6798245614035088, "no_speech_prob": 0.0007307534106075764}, {"id": 2340, "seek": 552694, "start": 5548.94, "end": 5550.839999999999, "text": " the maximum difference from", "tokens": [51465, 264, 6674, 2649, 490, 51560], "temperature": 0.0, "avg_logprob": -0.08328569235921908, "compression_ratio": 1.6798245614035088, "no_speech_prob": 0.0007307534106075764}, {"id": 2341, "seek": 552694, "start": 5550.94, "end": 5552.839999999999, "text": " logits from pytorch", "tokens": [51565, 3565, 1208, 490, 25878, 284, 339, 51660], "temperature": 0.0, "avg_logprob": -0.08328569235921908, "compression_ratio": 1.6798245614035088, "no_speech_prob": 0.0007307534106075764}, {"id": 2342, "seek": 552694, "start": 5552.94, "end": 5554.839999999999, "text": " and rdlogits here", "tokens": [51665, 293, 367, 67, 4987, 1208, 510, 51760], "temperature": 0.0, "avg_logprob": -0.08328569235921908, "compression_ratio": 1.6798245614035088, "no_speech_prob": 0.0007307534106075764}, {"id": 2343, "seek": 555484, "start": 5554.84, "end": 5556.74, "text": " is on the order of 5e-9", "tokens": [50365, 307, 322, 264, 1668, 295, 1025, 68, 12, 24, 50460], "temperature": 0.0, "avg_logprob": -0.05745524399040281, "compression_ratio": 1.6628352490421456, "no_speech_prob": 0.0007086570258252323}, {"id": 2344, "seek": 555484, "start": 5556.84, "end": 5558.74, "text": " so it's a tiny tiny number", "tokens": [50465, 370, 309, 311, 257, 5870, 5870, 1230, 50560], "temperature": 0.0, "avg_logprob": -0.05745524399040281, "compression_ratio": 1.6628352490421456, "no_speech_prob": 0.0007086570258252323}, {"id": 2345, "seek": 555484, "start": 5558.84, "end": 5560.74, "text": " so because of floating point wonkiness", "tokens": [50565, 370, 570, 295, 12607, 935, 1582, 74, 1324, 50660], "temperature": 0.0, "avg_logprob": -0.05745524399040281, "compression_ratio": 1.6628352490421456, "no_speech_prob": 0.0007086570258252323}, {"id": 2346, "seek": 555484, "start": 5560.84, "end": 5562.74, "text": " we don't get the exact bitwise result", "tokens": [50665, 321, 500, 380, 483, 264, 1900, 857, 3711, 1874, 50760], "temperature": 0.0, "avg_logprob": -0.05745524399040281, "compression_ratio": 1.6628352490421456, "no_speech_prob": 0.0007086570258252323}, {"id": 2347, "seek": 555484, "start": 5562.84, "end": 5564.74, "text": " but we basically get", "tokens": [50765, 457, 321, 1936, 483, 50860], "temperature": 0.0, "avg_logprob": -0.05745524399040281, "compression_ratio": 1.6628352490421456, "no_speech_prob": 0.0007086570258252323}, {"id": 2348, "seek": 555484, "start": 5564.84, "end": 5566.74, "text": " the correct answer", "tokens": [50865, 264, 3006, 1867, 50960], "temperature": 0.0, "avg_logprob": -0.05745524399040281, "compression_ratio": 1.6628352490421456, "no_speech_prob": 0.0007086570258252323}, {"id": 2349, "seek": 555484, "start": 5566.84, "end": 5568.74, "text": " approximately", "tokens": [50965, 10447, 51060], "temperature": 0.0, "avg_logprob": -0.05745524399040281, "compression_ratio": 1.6628352490421456, "no_speech_prob": 0.0007086570258252323}, {"id": 2350, "seek": 555484, "start": 5568.84, "end": 5570.74, "text": " now I'd like to pause here briefly", "tokens": [51065, 586, 286, 1116, 411, 281, 10465, 510, 10515, 51160], "temperature": 0.0, "avg_logprob": -0.05745524399040281, "compression_ratio": 1.6628352490421456, "no_speech_prob": 0.0007086570258252323}, {"id": 2351, "seek": 555484, "start": 5570.84, "end": 5572.74, "text": " before we move on to the next exercise", "tokens": [51165, 949, 321, 1286, 322, 281, 264, 958, 5380, 51260], "temperature": 0.0, "avg_logprob": -0.05745524399040281, "compression_ratio": 1.6628352490421456, "no_speech_prob": 0.0007086570258252323}, {"id": 2352, "seek": 555484, "start": 5572.84, "end": 5574.74, "text": " because I'd like us to get an intuitive sense", "tokens": [51265, 570, 286, 1116, 411, 505, 281, 483, 364, 21769, 2020, 51360], "temperature": 0.0, "avg_logprob": -0.05745524399040281, "compression_ratio": 1.6628352490421456, "no_speech_prob": 0.0007086570258252323}, {"id": 2353, "seek": 555484, "start": 5574.84, "end": 5576.74, "text": " of what dlogits is", "tokens": [51365, 295, 437, 274, 4987, 1208, 307, 51460], "temperature": 0.0, "avg_logprob": -0.05745524399040281, "compression_ratio": 1.6628352490421456, "no_speech_prob": 0.0007086570258252323}, {"id": 2354, "seek": 555484, "start": 5576.84, "end": 5578.74, "text": " because it has a beautiful and very simple", "tokens": [51465, 570, 309, 575, 257, 2238, 293, 588, 2199, 51560], "temperature": 0.0, "avg_logprob": -0.05745524399040281, "compression_ratio": 1.6628352490421456, "no_speech_prob": 0.0007086570258252323}, {"id": 2355, "seek": 555484, "start": 5578.84, "end": 5580.74, "text": " explanation honestly", "tokens": [51565, 10835, 6095, 51660], "temperature": 0.0, "avg_logprob": -0.05745524399040281, "compression_ratio": 1.6628352490421456, "no_speech_prob": 0.0007086570258252323}, {"id": 2356, "seek": 555484, "start": 5580.84, "end": 5582.74, "text": " so here I'm taking dlogits", "tokens": [51665, 370, 510, 286, 478, 1940, 274, 4987, 1208, 51760], "temperature": 0.0, "avg_logprob": -0.05745524399040281, "compression_ratio": 1.6628352490421456, "no_speech_prob": 0.0007086570258252323}, {"id": 2357, "seek": 555484, "start": 5582.84, "end": 5584.74, "text": " and I'm visualizing it", "tokens": [51765, 293, 286, 478, 5056, 3319, 309, 51860], "temperature": 0.0, "avg_logprob": -0.05745524399040281, "compression_ratio": 1.6628352490421456, "no_speech_prob": 0.0007086570258252323}, {"id": 2358, "seek": 558474, "start": 5584.74, "end": 5586.639999999999, "text": " and I see that we have a batch of 32 examples", "tokens": [50365, 293, 286, 536, 300, 321, 362, 257, 15245, 295, 8858, 5110, 50460], "temperature": 0.0, "avg_logprob": -0.07007200755770245, "compression_ratio": 1.8278688524590163, "no_speech_prob": 0.0012459276476874948}, {"id": 2359, "seek": 558474, "start": 5586.74, "end": 5588.639999999999, "text": " of 27 characters", "tokens": [50465, 295, 7634, 4342, 50560], "temperature": 0.0, "avg_logprob": -0.07007200755770245, "compression_ratio": 1.8278688524590163, "no_speech_prob": 0.0012459276476874948}, {"id": 2360, "seek": 558474, "start": 5588.74, "end": 5590.639999999999, "text": " and what is dlogits intuitively?", "tokens": [50565, 293, 437, 307, 274, 4987, 1208, 46506, 30, 50660], "temperature": 0.0, "avg_logprob": -0.07007200755770245, "compression_ratio": 1.8278688524590163, "no_speech_prob": 0.0012459276476874948}, {"id": 2361, "seek": 558474, "start": 5590.74, "end": 5592.639999999999, "text": " dlogits is the probabilities", "tokens": [50665, 274, 4987, 1208, 307, 264, 33783, 50760], "temperature": 0.0, "avg_logprob": -0.07007200755770245, "compression_ratio": 1.8278688524590163, "no_speech_prob": 0.0012459276476874948}, {"id": 2362, "seek": 558474, "start": 5592.74, "end": 5594.639999999999, "text": " that the probabilities matrix", "tokens": [50765, 300, 264, 33783, 8141, 50860], "temperature": 0.0, "avg_logprob": -0.07007200755770245, "compression_ratio": 1.8278688524590163, "no_speech_prob": 0.0012459276476874948}, {"id": 2363, "seek": 558474, "start": 5594.74, "end": 5596.639999999999, "text": " in the forward pass", "tokens": [50865, 294, 264, 2128, 1320, 50960], "temperature": 0.0, "avg_logprob": -0.07007200755770245, "compression_ratio": 1.8278688524590163, "no_speech_prob": 0.0012459276476874948}, {"id": 2364, "seek": 558474, "start": 5596.74, "end": 5598.639999999999, "text": " but then here these black squares", "tokens": [50965, 457, 550, 510, 613, 2211, 19368, 51060], "temperature": 0.0, "avg_logprob": -0.07007200755770245, "compression_ratio": 1.8278688524590163, "no_speech_prob": 0.0012459276476874948}, {"id": 2365, "seek": 558474, "start": 5598.74, "end": 5600.639999999999, "text": " are the positions of the correct indices", "tokens": [51065, 366, 264, 8432, 295, 264, 3006, 43840, 51160], "temperature": 0.0, "avg_logprob": -0.07007200755770245, "compression_ratio": 1.8278688524590163, "no_speech_prob": 0.0012459276476874948}, {"id": 2366, "seek": 558474, "start": 5600.74, "end": 5602.639999999999, "text": " where we subtracted a 1", "tokens": [51165, 689, 321, 16390, 292, 257, 502, 51260], "temperature": 0.0, "avg_logprob": -0.07007200755770245, "compression_ratio": 1.8278688524590163, "no_speech_prob": 0.0012459276476874948}, {"id": 2367, "seek": 558474, "start": 5602.74, "end": 5604.639999999999, "text": " and so what is this doing?", "tokens": [51265, 293, 370, 437, 307, 341, 884, 30, 51360], "temperature": 0.0, "avg_logprob": -0.07007200755770245, "compression_ratio": 1.8278688524590163, "no_speech_prob": 0.0012459276476874948}, {"id": 2368, "seek": 558474, "start": 5604.74, "end": 5606.639999999999, "text": " these are the derivatives on dlogits", "tokens": [51365, 613, 366, 264, 33733, 322, 274, 4987, 1208, 51460], "temperature": 0.0, "avg_logprob": -0.07007200755770245, "compression_ratio": 1.8278688524590163, "no_speech_prob": 0.0012459276476874948}, {"id": 2369, "seek": 558474, "start": 5606.74, "end": 5608.639999999999, "text": " and so let's look at", "tokens": [51465, 293, 370, 718, 311, 574, 412, 51560], "temperature": 0.0, "avg_logprob": -0.07007200755770245, "compression_ratio": 1.8278688524590163, "no_speech_prob": 0.0012459276476874948}, {"id": 2370, "seek": 558474, "start": 5608.74, "end": 5610.639999999999, "text": " just the first row here", "tokens": [51565, 445, 264, 700, 5386, 510, 51660], "temperature": 0.0, "avg_logprob": -0.07007200755770245, "compression_ratio": 1.8278688524590163, "no_speech_prob": 0.0012459276476874948}, {"id": 2371, "seek": 558474, "start": 5610.74, "end": 5612.639999999999, "text": " so that's what I'm doing here", "tokens": [51665, 370, 300, 311, 437, 286, 478, 884, 510, 51760], "temperature": 0.0, "avg_logprob": -0.07007200755770245, "compression_ratio": 1.8278688524590163, "no_speech_prob": 0.0012459276476874948}, {"id": 2372, "seek": 558474, "start": 5612.74, "end": 5614.639999999999, "text": " I'm calculating the probabilities", "tokens": [51765, 286, 478, 28258, 264, 33783, 51860], "temperature": 0.0, "avg_logprob": -0.07007200755770245, "compression_ratio": 1.8278688524590163, "no_speech_prob": 0.0012459276476874948}, {"id": 2373, "seek": 561464, "start": 5614.64, "end": 5616.54, "text": " and then I'm taking just the first row", "tokens": [50365, 293, 550, 286, 478, 1940, 445, 264, 700, 5386, 50460], "temperature": 0.0, "avg_logprob": -0.07141644304448908, "compression_ratio": 1.8455284552845528, "no_speech_prob": 0.0009858153061941266}, {"id": 2374, "seek": 561464, "start": 5616.64, "end": 5618.54, "text": " and this is the probability row", "tokens": [50465, 293, 341, 307, 264, 8482, 5386, 50560], "temperature": 0.0, "avg_logprob": -0.07141644304448908, "compression_ratio": 1.8455284552845528, "no_speech_prob": 0.0009858153061941266}, {"id": 2375, "seek": 561464, "start": 5618.64, "end": 5620.54, "text": " and then dlogits of the first row", "tokens": [50565, 293, 550, 274, 4987, 1208, 295, 264, 700, 5386, 50660], "temperature": 0.0, "avg_logprob": -0.07141644304448908, "compression_ratio": 1.8455284552845528, "no_speech_prob": 0.0009858153061941266}, {"id": 2376, "seek": 561464, "start": 5620.64, "end": 5622.54, "text": " and multiplying by n", "tokens": [50665, 293, 30955, 538, 297, 50760], "temperature": 0.0, "avg_logprob": -0.07141644304448908, "compression_ratio": 1.8455284552845528, "no_speech_prob": 0.0009858153061941266}, {"id": 2377, "seek": 561464, "start": 5622.64, "end": 5624.54, "text": " just for us so that", "tokens": [50765, 445, 337, 505, 370, 300, 50860], "temperature": 0.0, "avg_logprob": -0.07141644304448908, "compression_ratio": 1.8455284552845528, "no_speech_prob": 0.0009858153061941266}, {"id": 2378, "seek": 561464, "start": 5624.64, "end": 5626.54, "text": " we don't have the scaling by n in here", "tokens": [50865, 321, 500, 380, 362, 264, 21589, 538, 297, 294, 510, 50960], "temperature": 0.0, "avg_logprob": -0.07141644304448908, "compression_ratio": 1.8455284552845528, "no_speech_prob": 0.0009858153061941266}, {"id": 2379, "seek": 561464, "start": 5626.64, "end": 5628.54, "text": " and everything is more interpretable", "tokens": [50965, 293, 1203, 307, 544, 7302, 712, 51060], "temperature": 0.0, "avg_logprob": -0.07141644304448908, "compression_ratio": 1.8455284552845528, "no_speech_prob": 0.0009858153061941266}, {"id": 2380, "seek": 561464, "start": 5628.64, "end": 5630.54, "text": " we see that it's exactly equal to the probability", "tokens": [51065, 321, 536, 300, 309, 311, 2293, 2681, 281, 264, 8482, 51160], "temperature": 0.0, "avg_logprob": -0.07141644304448908, "compression_ratio": 1.8455284552845528, "no_speech_prob": 0.0009858153061941266}, {"id": 2381, "seek": 561464, "start": 5630.64, "end": 5632.54, "text": " of course but then the position", "tokens": [51165, 295, 1164, 457, 550, 264, 2535, 51260], "temperature": 0.0, "avg_logprob": -0.07141644304448908, "compression_ratio": 1.8455284552845528, "no_speech_prob": 0.0009858153061941266}, {"id": 2382, "seek": 561464, "start": 5632.64, "end": 5634.54, "text": " of the correct index has a minus equals 1", "tokens": [51265, 295, 264, 3006, 8186, 575, 257, 3175, 6915, 502, 51360], "temperature": 0.0, "avg_logprob": -0.07141644304448908, "compression_ratio": 1.8455284552845528, "no_speech_prob": 0.0009858153061941266}, {"id": 2383, "seek": 561464, "start": 5634.64, "end": 5636.54, "text": " so minus 1 on that position", "tokens": [51365, 370, 3175, 502, 322, 300, 2535, 51460], "temperature": 0.0, "avg_logprob": -0.07141644304448908, "compression_ratio": 1.8455284552845528, "no_speech_prob": 0.0009858153061941266}, {"id": 2384, "seek": 561464, "start": 5636.64, "end": 5638.54, "text": " and so notice that", "tokens": [51465, 293, 370, 3449, 300, 51560], "temperature": 0.0, "avg_logprob": -0.07141644304448908, "compression_ratio": 1.8455284552845528, "no_speech_prob": 0.0009858153061941266}, {"id": 2385, "seek": 561464, "start": 5638.64, "end": 5640.54, "text": " if you take dlogits at 0", "tokens": [51565, 498, 291, 747, 274, 4987, 1208, 412, 1958, 51660], "temperature": 0.0, "avg_logprob": -0.07141644304448908, "compression_ratio": 1.8455284552845528, "no_speech_prob": 0.0009858153061941266}, {"id": 2386, "seek": 561464, "start": 5640.64, "end": 5642.54, "text": " and you sum it", "tokens": [51665, 293, 291, 2408, 309, 51760], "temperature": 0.0, "avg_logprob": -0.07141644304448908, "compression_ratio": 1.8455284552845528, "no_speech_prob": 0.0009858153061941266}, {"id": 2387, "seek": 561464, "start": 5642.64, "end": 5644.54, "text": " it actually sums to 0", "tokens": [51765, 309, 767, 34499, 281, 1958, 51860], "temperature": 0.0, "avg_logprob": -0.07141644304448908, "compression_ratio": 1.8455284552845528, "no_speech_prob": 0.0009858153061941266}, {"id": 2388, "seek": 564464, "start": 5644.64, "end": 5646.54, "text": " and so you should think of these", "tokens": [50365, 293, 370, 291, 820, 519, 295, 613, 50460], "temperature": 0.0, "avg_logprob": -0.06310483841668992, "compression_ratio": 1.75, "no_speech_prob": 0.0007190362084656954}, {"id": 2389, "seek": 564464, "start": 5646.64, "end": 5648.54, "text": " gradients here", "tokens": [50465, 2771, 2448, 510, 50560], "temperature": 0.0, "avg_logprob": -0.06310483841668992, "compression_ratio": 1.75, "no_speech_prob": 0.0007190362084656954}, {"id": 2390, "seek": 564464, "start": 5648.64, "end": 5650.54, "text": " at each cell", "tokens": [50565, 412, 1184, 2815, 50660], "temperature": 0.0, "avg_logprob": -0.06310483841668992, "compression_ratio": 1.75, "no_speech_prob": 0.0007190362084656954}, {"id": 2391, "seek": 564464, "start": 5650.64, "end": 5652.54, "text": " as like a force", "tokens": [50665, 382, 411, 257, 3464, 50760], "temperature": 0.0, "avg_logprob": -0.06310483841668992, "compression_ratio": 1.75, "no_speech_prob": 0.0007190362084656954}, {"id": 2392, "seek": 564464, "start": 5652.64, "end": 5654.54, "text": " we are going to be basically", "tokens": [50765, 321, 366, 516, 281, 312, 1936, 50860], "temperature": 0.0, "avg_logprob": -0.06310483841668992, "compression_ratio": 1.75, "no_speech_prob": 0.0007190362084656954}, {"id": 2393, "seek": 564464, "start": 5654.64, "end": 5656.54, "text": " pulling down on the probabilities", "tokens": [50865, 8407, 760, 322, 264, 33783, 50960], "temperature": 0.0, "avg_logprob": -0.06310483841668992, "compression_ratio": 1.75, "no_speech_prob": 0.0007190362084656954}, {"id": 2394, "seek": 564464, "start": 5656.64, "end": 5658.54, "text": " of the incorrect characters", "tokens": [50965, 295, 264, 18424, 4342, 51060], "temperature": 0.0, "avg_logprob": -0.06310483841668992, "compression_ratio": 1.75, "no_speech_prob": 0.0007190362084656954}, {"id": 2395, "seek": 564464, "start": 5658.64, "end": 5660.54, "text": " and we're going to be pulling up", "tokens": [51065, 293, 321, 434, 516, 281, 312, 8407, 493, 51160], "temperature": 0.0, "avg_logprob": -0.06310483841668992, "compression_ratio": 1.75, "no_speech_prob": 0.0007190362084656954}, {"id": 2396, "seek": 564464, "start": 5660.64, "end": 5662.54, "text": " on the probability", "tokens": [51165, 322, 264, 8482, 51260], "temperature": 0.0, "avg_logprob": -0.06310483841668992, "compression_ratio": 1.75, "no_speech_prob": 0.0007190362084656954}, {"id": 2397, "seek": 564464, "start": 5662.64, "end": 5664.54, "text": " at the correct index", "tokens": [51265, 412, 264, 3006, 8186, 51360], "temperature": 0.0, "avg_logprob": -0.06310483841668992, "compression_ratio": 1.75, "no_speech_prob": 0.0007190362084656954}, {"id": 2398, "seek": 564464, "start": 5664.64, "end": 5666.54, "text": " and that's what's basically happening", "tokens": [51365, 293, 300, 311, 437, 311, 1936, 2737, 51460], "temperature": 0.0, "avg_logprob": -0.06310483841668992, "compression_ratio": 1.75, "no_speech_prob": 0.0007190362084656954}, {"id": 2399, "seek": 564464, "start": 5666.64, "end": 5668.54, "text": " in each row", "tokens": [51465, 294, 1184, 5386, 51560], "temperature": 0.0, "avg_logprob": -0.06310483841668992, "compression_ratio": 1.75, "no_speech_prob": 0.0007190362084656954}, {"id": 2400, "seek": 564464, "start": 5668.64, "end": 5670.54, "text": " and the amount of push and pull", "tokens": [51565, 293, 264, 2372, 295, 2944, 293, 2235, 51660], "temperature": 0.0, "avg_logprob": -0.06310483841668992, "compression_ratio": 1.75, "no_speech_prob": 0.0007190362084656954}, {"id": 2401, "seek": 564464, "start": 5670.64, "end": 5672.54, "text": " is exactly equalized", "tokens": [51665, 307, 2293, 2681, 1602, 51760], "temperature": 0.0, "avg_logprob": -0.06310483841668992, "compression_ratio": 1.75, "no_speech_prob": 0.0007190362084656954}, {"id": 2402, "seek": 564464, "start": 5672.64, "end": 5674.54, "text": " because the sum is 0", "tokens": [51765, 570, 264, 2408, 307, 1958, 51860], "temperature": 0.0, "avg_logprob": -0.06310483841668992, "compression_ratio": 1.75, "no_speech_prob": 0.0007190362084656954}, {"id": 2403, "seek": 567454, "start": 5674.54, "end": 5676.44, "text": " and the amount to which we pull down", "tokens": [50365, 293, 264, 2372, 281, 597, 321, 2235, 760, 50460], "temperature": 0.0, "avg_logprob": -0.08743594674503102, "compression_ratio": 2.161137440758294, "no_speech_prob": 0.0020796481985598803}, {"id": 2404, "seek": 567454, "start": 5676.54, "end": 5678.44, "text": " on the probabilities", "tokens": [50465, 322, 264, 33783, 50560], "temperature": 0.0, "avg_logprob": -0.08743594674503102, "compression_ratio": 2.161137440758294, "no_speech_prob": 0.0020796481985598803}, {"id": 2405, "seek": 567454, "start": 5678.54, "end": 5680.44, "text": " and the amount that we push up", "tokens": [50565, 293, 264, 2372, 300, 321, 2944, 493, 50660], "temperature": 0.0, "avg_logprob": -0.08743594674503102, "compression_ratio": 2.161137440758294, "no_speech_prob": 0.0020796481985598803}, {"id": 2406, "seek": 567454, "start": 5680.54, "end": 5682.44, "text": " on the probability of the correct character", "tokens": [50665, 322, 264, 8482, 295, 264, 3006, 2517, 50760], "temperature": 0.0, "avg_logprob": -0.08743594674503102, "compression_ratio": 2.161137440758294, "no_speech_prob": 0.0020796481985598803}, {"id": 2407, "seek": 567454, "start": 5682.54, "end": 5684.44, "text": " is equal", "tokens": [50765, 307, 2681, 50860], "temperature": 0.0, "avg_logprob": -0.08743594674503102, "compression_ratio": 2.161137440758294, "no_speech_prob": 0.0020796481985598803}, {"id": 2408, "seek": 567454, "start": 5684.54, "end": 5686.44, "text": " so the repulsion and the attraction are equal", "tokens": [50865, 370, 264, 1085, 22973, 293, 264, 17672, 366, 2681, 50960], "temperature": 0.0, "avg_logprob": -0.08743594674503102, "compression_ratio": 2.161137440758294, "no_speech_prob": 0.0020796481985598803}, {"id": 2409, "seek": 567454, "start": 5686.54, "end": 5688.44, "text": " and think of the neural net now", "tokens": [50965, 293, 519, 295, 264, 18161, 2533, 586, 51060], "temperature": 0.0, "avg_logprob": -0.08743594674503102, "compression_ratio": 2.161137440758294, "no_speech_prob": 0.0020796481985598803}, {"id": 2410, "seek": 567454, "start": 5688.54, "end": 5690.44, "text": " as a massive pulley system", "tokens": [51065, 382, 257, 5994, 48399, 1185, 51160], "temperature": 0.0, "avg_logprob": -0.08743594674503102, "compression_ratio": 2.161137440758294, "no_speech_prob": 0.0020796481985598803}, {"id": 2411, "seek": 567454, "start": 5690.54, "end": 5692.44, "text": " or something like that", "tokens": [51165, 420, 746, 411, 300, 51260], "temperature": 0.0, "avg_logprob": -0.08743594674503102, "compression_ratio": 2.161137440758294, "no_speech_prob": 0.0020796481985598803}, {"id": 2412, "seek": 567454, "start": 5692.54, "end": 5694.44, "text": " we're up here on top of dlogits", "tokens": [51265, 321, 434, 493, 510, 322, 1192, 295, 274, 4987, 1208, 51360], "temperature": 0.0, "avg_logprob": -0.08743594674503102, "compression_ratio": 2.161137440758294, "no_speech_prob": 0.0020796481985598803}, {"id": 2413, "seek": 567454, "start": 5694.54, "end": 5696.44, "text": " and we're pulling up", "tokens": [51365, 293, 321, 434, 8407, 493, 51460], "temperature": 0.0, "avg_logprob": -0.08743594674503102, "compression_ratio": 2.161137440758294, "no_speech_prob": 0.0020796481985598803}, {"id": 2414, "seek": 567454, "start": 5696.54, "end": 5698.44, "text": " we're pulling down the probabilities of incorrect", "tokens": [51465, 321, 434, 8407, 760, 264, 33783, 295, 18424, 51560], "temperature": 0.0, "avg_logprob": -0.08743594674503102, "compression_ratio": 2.161137440758294, "no_speech_prob": 0.0020796481985598803}, {"id": 2415, "seek": 567454, "start": 5698.54, "end": 5700.44, "text": " and pulling up the probability of the correct", "tokens": [51565, 293, 8407, 493, 264, 8482, 295, 264, 3006, 51660], "temperature": 0.0, "avg_logprob": -0.08743594674503102, "compression_ratio": 2.161137440758294, "no_speech_prob": 0.0020796481985598803}, {"id": 2416, "seek": 567454, "start": 5700.54, "end": 5702.44, "text": " and in this complicated pulley system", "tokens": [51665, 293, 294, 341, 6179, 48399, 1185, 51760], "temperature": 0.0, "avg_logprob": -0.08743594674503102, "compression_ratio": 2.161137440758294, "no_speech_prob": 0.0020796481985598803}, {"id": 2417, "seek": 570244, "start": 5702.44, "end": 5704.339999999999, "text": " we think of it as sort of like", "tokens": [50365, 321, 519, 295, 309, 382, 1333, 295, 411, 50460], "temperature": 0.0, "avg_logprob": -0.10050157819475447, "compression_ratio": 1.8719723183391004, "no_speech_prob": 0.0036284043453633785}, {"id": 2418, "seek": 570244, "start": 5704.44, "end": 5706.339999999999, "text": " this tension translating to this", "tokens": [50465, 341, 8980, 35030, 281, 341, 50560], "temperature": 0.0, "avg_logprob": -0.10050157819475447, "compression_ratio": 1.8719723183391004, "no_speech_prob": 0.0036284043453633785}, {"id": 2419, "seek": 570244, "start": 5706.44, "end": 5708.339999999999, "text": " complicating pulley mechanism", "tokens": [50565, 16060, 990, 48399, 7513, 50660], "temperature": 0.0, "avg_logprob": -0.10050157819475447, "compression_ratio": 1.8719723183391004, "no_speech_prob": 0.0036284043453633785}, {"id": 2420, "seek": 570244, "start": 5708.44, "end": 5710.339999999999, "text": " and then eventually we get a tug", "tokens": [50665, 293, 550, 4728, 321, 483, 257, 33543, 50760], "temperature": 0.0, "avg_logprob": -0.10050157819475447, "compression_ratio": 1.8719723183391004, "no_speech_prob": 0.0036284043453633785}, {"id": 2421, "seek": 570244, "start": 5710.44, "end": 5712.339999999999, "text": " on the weights and the biases", "tokens": [50765, 322, 264, 17443, 293, 264, 32152, 50860], "temperature": 0.0, "avg_logprob": -0.10050157819475447, "compression_ratio": 1.8719723183391004, "no_speech_prob": 0.0036284043453633785}, {"id": 2422, "seek": 570244, "start": 5712.44, "end": 5714.339999999999, "text": " and basically in each update", "tokens": [50865, 293, 1936, 294, 1184, 5623, 50960], "temperature": 0.0, "avg_logprob": -0.10050157819475447, "compression_ratio": 1.8719723183391004, "no_speech_prob": 0.0036284043453633785}, {"id": 2423, "seek": 570244, "start": 5714.44, "end": 5716.339999999999, "text": " we just kind of like tug in the direction", "tokens": [50965, 321, 445, 733, 295, 411, 33543, 294, 264, 3513, 51060], "temperature": 0.0, "avg_logprob": -0.10050157819475447, "compression_ratio": 1.8719723183391004, "no_speech_prob": 0.0036284043453633785}, {"id": 2424, "seek": 570244, "start": 5716.44, "end": 5718.339999999999, "text": " that we'd like for each of these elements", "tokens": [51065, 300, 321, 1116, 411, 337, 1184, 295, 613, 4959, 51160], "temperature": 0.0, "avg_logprob": -0.10050157819475447, "compression_ratio": 1.8719723183391004, "no_speech_prob": 0.0036284043453633785}, {"id": 2425, "seek": 570244, "start": 5718.44, "end": 5720.339999999999, "text": " and the parameters are slowly given in", "tokens": [51165, 293, 264, 9834, 366, 5692, 2212, 294, 51260], "temperature": 0.0, "avg_logprob": -0.10050157819475447, "compression_ratio": 1.8719723183391004, "no_speech_prob": 0.0036284043453633785}, {"id": 2426, "seek": 570244, "start": 5720.44, "end": 5722.339999999999, "text": " to the tug and that's what training in neural net", "tokens": [51265, 281, 264, 33543, 293, 300, 311, 437, 3097, 294, 18161, 2533, 51360], "temperature": 0.0, "avg_logprob": -0.10050157819475447, "compression_ratio": 1.8719723183391004, "no_speech_prob": 0.0036284043453633785}, {"id": 2427, "seek": 570244, "start": 5722.44, "end": 5724.339999999999, "text": " kind of like looks like on a high level", "tokens": [51365, 733, 295, 411, 1542, 411, 322, 257, 1090, 1496, 51460], "temperature": 0.0, "avg_logprob": -0.10050157819475447, "compression_ratio": 1.8719723183391004, "no_speech_prob": 0.0036284043453633785}, {"id": 2428, "seek": 570244, "start": 5724.44, "end": 5726.339999999999, "text": " and so I think the forces of push and pull", "tokens": [51465, 293, 370, 286, 519, 264, 5874, 295, 2944, 293, 2235, 51560], "temperature": 0.0, "avg_logprob": -0.10050157819475447, "compression_ratio": 1.8719723183391004, "no_speech_prob": 0.0036284043453633785}, {"id": 2429, "seek": 570244, "start": 5726.44, "end": 5728.339999999999, "text": " in these gradients are actually", "tokens": [51565, 294, 613, 2771, 2448, 366, 767, 51660], "temperature": 0.0, "avg_logprob": -0.10050157819475447, "compression_ratio": 1.8719723183391004, "no_speech_prob": 0.0036284043453633785}, {"id": 2430, "seek": 570244, "start": 5728.44, "end": 5730.339999999999, "text": " very intuitive here", "tokens": [51665, 588, 21769, 510, 51760], "temperature": 0.0, "avg_logprob": -0.10050157819475447, "compression_ratio": 1.8719723183391004, "no_speech_prob": 0.0036284043453633785}, {"id": 2431, "seek": 570244, "start": 5730.44, "end": 5732.339999999999, "text": " we're pushing and pulling on the correct answer", "tokens": [51765, 321, 434, 7380, 293, 8407, 322, 264, 3006, 1867, 51860], "temperature": 0.0, "avg_logprob": -0.10050157819475447, "compression_ratio": 1.8719723183391004, "no_speech_prob": 0.0036284043453633785}, {"id": 2432, "seek": 573234, "start": 5732.34, "end": 5734.24, "text": " and the amount of force that we're applying", "tokens": [50365, 293, 264, 2372, 295, 3464, 300, 321, 434, 9275, 50460], "temperature": 0.0, "avg_logprob": -0.07227649984433669, "compression_ratio": 1.9080459770114941, "no_speech_prob": 0.001025065896101296}, {"id": 2433, "seek": 573234, "start": 5734.34, "end": 5736.24, "text": " is actually proportional to", "tokens": [50465, 307, 767, 24969, 281, 50560], "temperature": 0.0, "avg_logprob": -0.07227649984433669, "compression_ratio": 1.9080459770114941, "no_speech_prob": 0.001025065896101296}, {"id": 2434, "seek": 573234, "start": 5736.34, "end": 5738.24, "text": " the probabilities that came out", "tokens": [50565, 264, 33783, 300, 1361, 484, 50660], "temperature": 0.0, "avg_logprob": -0.07227649984433669, "compression_ratio": 1.9080459770114941, "no_speech_prob": 0.001025065896101296}, {"id": 2435, "seek": 573234, "start": 5738.34, "end": 5740.24, "text": " in the forward pass", "tokens": [50665, 294, 264, 2128, 1320, 50760], "temperature": 0.0, "avg_logprob": -0.07227649984433669, "compression_ratio": 1.9080459770114941, "no_speech_prob": 0.001025065896101296}, {"id": 2436, "seek": 573234, "start": 5740.34, "end": 5742.24, "text": " and so for example if our probabilities came out", "tokens": [50765, 293, 370, 337, 1365, 498, 527, 33783, 1361, 484, 50860], "temperature": 0.0, "avg_logprob": -0.07227649984433669, "compression_ratio": 1.9080459770114941, "no_speech_prob": 0.001025065896101296}, {"id": 2437, "seek": 573234, "start": 5742.34, "end": 5744.24, "text": " exactly correct so they would have had", "tokens": [50865, 2293, 3006, 370, 436, 576, 362, 632, 50960], "temperature": 0.0, "avg_logprob": -0.07227649984433669, "compression_ratio": 1.9080459770114941, "no_speech_prob": 0.001025065896101296}, {"id": 2438, "seek": 573234, "start": 5744.34, "end": 5746.24, "text": " zero everywhere except for one", "tokens": [50965, 4018, 5315, 3993, 337, 472, 51060], "temperature": 0.0, "avg_logprob": -0.07227649984433669, "compression_ratio": 1.9080459770114941, "no_speech_prob": 0.001025065896101296}, {"id": 2439, "seek": 573234, "start": 5746.34, "end": 5748.24, "text": " at the correct position", "tokens": [51065, 412, 264, 3006, 2535, 51160], "temperature": 0.0, "avg_logprob": -0.07227649984433669, "compression_ratio": 1.9080459770114941, "no_speech_prob": 0.001025065896101296}, {"id": 2440, "seek": 573234, "start": 5748.34, "end": 5750.24, "text": " then the dlogits would be all", "tokens": [51165, 550, 264, 274, 4987, 1208, 576, 312, 439, 51260], "temperature": 0.0, "avg_logprob": -0.07227649984433669, "compression_ratio": 1.9080459770114941, "no_speech_prob": 0.001025065896101296}, {"id": 2441, "seek": 573234, "start": 5750.34, "end": 5752.24, "text": " a row of zeros for that example", "tokens": [51265, 257, 5386, 295, 35193, 337, 300, 1365, 51360], "temperature": 0.0, "avg_logprob": -0.07227649984433669, "compression_ratio": 1.9080459770114941, "no_speech_prob": 0.001025065896101296}, {"id": 2442, "seek": 573234, "start": 5752.34, "end": 5754.24, "text": " there would be no push and pull", "tokens": [51365, 456, 576, 312, 572, 2944, 293, 2235, 51460], "temperature": 0.0, "avg_logprob": -0.07227649984433669, "compression_ratio": 1.9080459770114941, "no_speech_prob": 0.001025065896101296}, {"id": 2443, "seek": 573234, "start": 5754.34, "end": 5756.24, "text": " so the amount to which your prediction is incorrect", "tokens": [51465, 370, 264, 2372, 281, 597, 428, 17630, 307, 18424, 51560], "temperature": 0.0, "avg_logprob": -0.07227649984433669, "compression_ratio": 1.9080459770114941, "no_speech_prob": 0.001025065896101296}, {"id": 2444, "seek": 573234, "start": 5756.34, "end": 5758.24, "text": " is exactly the amount", "tokens": [51565, 307, 2293, 264, 2372, 51660], "temperature": 0.0, "avg_logprob": -0.07227649984433669, "compression_ratio": 1.9080459770114941, "no_speech_prob": 0.001025065896101296}, {"id": 2445, "seek": 573234, "start": 5758.34, "end": 5760.24, "text": " by which you're going to get a pull", "tokens": [51665, 538, 597, 291, 434, 516, 281, 483, 257, 2235, 51760], "temperature": 0.0, "avg_logprob": -0.07227649984433669, "compression_ratio": 1.9080459770114941, "no_speech_prob": 0.001025065896101296}, {"id": 2446, "seek": 573234, "start": 5760.34, "end": 5762.24, "text": " or a push in that dimension", "tokens": [51765, 420, 257, 2944, 294, 300, 10139, 51860], "temperature": 0.0, "avg_logprob": -0.07227649984433669, "compression_ratio": 1.9080459770114941, "no_speech_prob": 0.001025065896101296}, {"id": 2447, "seek": 576224, "start": 5762.24, "end": 5764.139999999999, "text": " so if you have for example", "tokens": [50365, 370, 498, 291, 362, 337, 1365, 50460], "temperature": 0.0, "avg_logprob": -0.060691317221275846, "compression_ratio": 1.928030303030303, "no_speech_prob": 0.0005899900570511818}, {"id": 2448, "seek": 576224, "start": 5764.24, "end": 5766.139999999999, "text": " a very confidently mispredicted element here", "tokens": [50465, 257, 588, 41956, 3346, 79, 986, 11254, 4478, 510, 50560], "temperature": 0.0, "avg_logprob": -0.060691317221275846, "compression_ratio": 1.928030303030303, "no_speech_prob": 0.0005899900570511818}, {"id": 2449, "seek": 576224, "start": 5766.24, "end": 5768.139999999999, "text": " then what's going to happen is", "tokens": [50565, 550, 437, 311, 516, 281, 1051, 307, 50660], "temperature": 0.0, "avg_logprob": -0.060691317221275846, "compression_ratio": 1.928030303030303, "no_speech_prob": 0.0005899900570511818}, {"id": 2450, "seek": 576224, "start": 5768.24, "end": 5770.139999999999, "text": " that element is going to be pulled down", "tokens": [50665, 300, 4478, 307, 516, 281, 312, 7373, 760, 50760], "temperature": 0.0, "avg_logprob": -0.060691317221275846, "compression_ratio": 1.928030303030303, "no_speech_prob": 0.0005899900570511818}, {"id": 2451, "seek": 576224, "start": 5770.24, "end": 5772.139999999999, "text": " very heavily and the correct answer", "tokens": [50765, 588, 10950, 293, 264, 3006, 1867, 50860], "temperature": 0.0, "avg_logprob": -0.060691317221275846, "compression_ratio": 1.928030303030303, "no_speech_prob": 0.0005899900570511818}, {"id": 2452, "seek": 576224, "start": 5772.24, "end": 5774.139999999999, "text": " is going to be pulled up to the same amount", "tokens": [50865, 307, 516, 281, 312, 7373, 493, 281, 264, 912, 2372, 50960], "temperature": 0.0, "avg_logprob": -0.060691317221275846, "compression_ratio": 1.928030303030303, "no_speech_prob": 0.0005899900570511818}, {"id": 2453, "seek": 576224, "start": 5774.24, "end": 5776.139999999999, "text": " and the other characters", "tokens": [50965, 293, 264, 661, 4342, 51060], "temperature": 0.0, "avg_logprob": -0.060691317221275846, "compression_ratio": 1.928030303030303, "no_speech_prob": 0.0005899900570511818}, {"id": 2454, "seek": 576224, "start": 5776.24, "end": 5778.139999999999, "text": " are not going to be influenced too much", "tokens": [51065, 366, 406, 516, 281, 312, 15269, 886, 709, 51160], "temperature": 0.0, "avg_logprob": -0.060691317221275846, "compression_ratio": 1.928030303030303, "no_speech_prob": 0.0005899900570511818}, {"id": 2455, "seek": 576224, "start": 5778.24, "end": 5780.139999999999, "text": " so the amount to which", "tokens": [51165, 370, 264, 2372, 281, 597, 51260], "temperature": 0.0, "avg_logprob": -0.060691317221275846, "compression_ratio": 1.928030303030303, "no_speech_prob": 0.0005899900570511818}, {"id": 2456, "seek": 576224, "start": 5780.24, "end": 5782.139999999999, "text": " you mispredict is then proportional", "tokens": [51265, 291, 3346, 79, 24945, 307, 550, 24969, 51360], "temperature": 0.0, "avg_logprob": -0.060691317221275846, "compression_ratio": 1.928030303030303, "no_speech_prob": 0.0005899900570511818}, {"id": 2457, "seek": 576224, "start": 5782.24, "end": 5784.139999999999, "text": " to the strength of the pull", "tokens": [51365, 281, 264, 3800, 295, 264, 2235, 51460], "temperature": 0.0, "avg_logprob": -0.060691317221275846, "compression_ratio": 1.928030303030303, "no_speech_prob": 0.0005899900570511818}, {"id": 2458, "seek": 576224, "start": 5784.24, "end": 5786.139999999999, "text": " and that's happening independently", "tokens": [51465, 293, 300, 311, 2737, 21761, 51560], "temperature": 0.0, "avg_logprob": -0.060691317221275846, "compression_ratio": 1.928030303030303, "no_speech_prob": 0.0005899900570511818}, {"id": 2459, "seek": 576224, "start": 5786.24, "end": 5788.139999999999, "text": " in all the dimensions of this tensor", "tokens": [51565, 294, 439, 264, 12819, 295, 341, 40863, 51660], "temperature": 0.0, "avg_logprob": -0.060691317221275846, "compression_ratio": 1.928030303030303, "no_speech_prob": 0.0005899900570511818}, {"id": 2460, "seek": 576224, "start": 5788.24, "end": 5790.139999999999, "text": " and it's sort of very intuitive", "tokens": [51665, 293, 309, 311, 1333, 295, 588, 21769, 51760], "temperature": 0.0, "avg_logprob": -0.060691317221275846, "compression_ratio": 1.928030303030303, "no_speech_prob": 0.0005899900570511818}, {"id": 2461, "seek": 576224, "start": 5790.24, "end": 5792.139999999999, "text": " and very easy to think through", "tokens": [51765, 293, 588, 1858, 281, 519, 807, 51860], "temperature": 0.0, "avg_logprob": -0.060691317221275846, "compression_ratio": 1.928030303030303, "no_speech_prob": 0.0005899900570511818}, {"id": 2462, "seek": 579214, "start": 5792.14, "end": 5794.04, "text": " and that's basically the magic of the cross entropy loss", "tokens": [50365, 293, 300, 311, 1936, 264, 5585, 295, 264, 3278, 30867, 4470, 50460], "temperature": 0.0, "avg_logprob": -0.07515755215206661, "compression_ratio": 2.057823129251701, "no_speech_prob": 0.0020768071990460157}, {"id": 2463, "seek": 579214, "start": 5794.14, "end": 5796.04, "text": " and what it's doing dynamically", "tokens": [50465, 293, 437, 309, 311, 884, 43492, 50560], "temperature": 0.0, "avg_logprob": -0.07515755215206661, "compression_ratio": 2.057823129251701, "no_speech_prob": 0.0020768071990460157}, {"id": 2464, "seek": 579214, "start": 5796.14, "end": 5798.04, "text": " in the backward pass of the neural net", "tokens": [50565, 294, 264, 23897, 1320, 295, 264, 18161, 2533, 50660], "temperature": 0.0, "avg_logprob": -0.07515755215206661, "compression_ratio": 2.057823129251701, "no_speech_prob": 0.0020768071990460157}, {"id": 2465, "seek": 579214, "start": 5798.14, "end": 5800.04, "text": " so now we get to exercise number three", "tokens": [50665, 370, 586, 321, 483, 281, 5380, 1230, 1045, 50760], "temperature": 0.0, "avg_logprob": -0.07515755215206661, "compression_ratio": 2.057823129251701, "no_speech_prob": 0.0020768071990460157}, {"id": 2466, "seek": 579214, "start": 5800.14, "end": 5802.04, "text": " which is a very fun exercise", "tokens": [50765, 597, 307, 257, 588, 1019, 5380, 50860], "temperature": 0.0, "avg_logprob": -0.07515755215206661, "compression_ratio": 2.057823129251701, "no_speech_prob": 0.0020768071990460157}, {"id": 2467, "seek": 579214, "start": 5802.14, "end": 5804.04, "text": " depending on your definition of fun", "tokens": [50865, 5413, 322, 428, 7123, 295, 1019, 50960], "temperature": 0.0, "avg_logprob": -0.07515755215206661, "compression_ratio": 2.057823129251701, "no_speech_prob": 0.0020768071990460157}, {"id": 2468, "seek": 579214, "start": 5804.14, "end": 5806.04, "text": " and we are going to do for batch normalization", "tokens": [50965, 293, 321, 366, 516, 281, 360, 337, 15245, 2710, 2144, 51060], "temperature": 0.0, "avg_logprob": -0.07515755215206661, "compression_ratio": 2.057823129251701, "no_speech_prob": 0.0020768071990460157}, {"id": 2469, "seek": 579214, "start": 5806.14, "end": 5808.04, "text": " exactly what we did for cross entropy loss", "tokens": [51065, 2293, 437, 321, 630, 337, 3278, 30867, 4470, 51160], "temperature": 0.0, "avg_logprob": -0.07515755215206661, "compression_ratio": 2.057823129251701, "no_speech_prob": 0.0020768071990460157}, {"id": 2470, "seek": 579214, "start": 5808.14, "end": 5810.04, "text": " in exercise number two", "tokens": [51165, 294, 5380, 1230, 732, 51260], "temperature": 0.0, "avg_logprob": -0.07515755215206661, "compression_ratio": 2.057823129251701, "no_speech_prob": 0.0020768071990460157}, {"id": 2471, "seek": 579214, "start": 5810.14, "end": 5812.04, "text": " that is we are going to consider it as a glued", "tokens": [51265, 300, 307, 321, 366, 516, 281, 1949, 309, 382, 257, 28008, 51360], "temperature": 0.0, "avg_logprob": -0.07515755215206661, "compression_ratio": 2.057823129251701, "no_speech_prob": 0.0020768071990460157}, {"id": 2472, "seek": 579214, "start": 5812.14, "end": 5814.04, "text": " single mathematical expression", "tokens": [51365, 2167, 18894, 6114, 51460], "temperature": 0.0, "avg_logprob": -0.07515755215206661, "compression_ratio": 2.057823129251701, "no_speech_prob": 0.0020768071990460157}, {"id": 2473, "seek": 579214, "start": 5814.14, "end": 5816.04, "text": " and back propagate through it in a very efficient manner", "tokens": [51465, 293, 646, 48256, 807, 309, 294, 257, 588, 7148, 9060, 51560], "temperature": 0.0, "avg_logprob": -0.07515755215206661, "compression_ratio": 2.057823129251701, "no_speech_prob": 0.0020768071990460157}, {"id": 2474, "seek": 579214, "start": 5816.14, "end": 5818.04, "text": " because we are going to derive a much simpler formula", "tokens": [51565, 570, 321, 366, 516, 281, 28446, 257, 709, 18587, 8513, 51660], "temperature": 0.0, "avg_logprob": -0.07515755215206661, "compression_ratio": 2.057823129251701, "no_speech_prob": 0.0020768071990460157}, {"id": 2475, "seek": 579214, "start": 5818.14, "end": 5820.04, "text": " for the backward pass of batch normalization", "tokens": [51665, 337, 264, 23897, 1320, 295, 15245, 2710, 2144, 51760], "temperature": 0.0, "avg_logprob": -0.07515755215206661, "compression_ratio": 2.057823129251701, "no_speech_prob": 0.0020768071990460157}, {"id": 2476, "seek": 579214, "start": 5820.14, "end": 5822.04, "text": " and we're going to do that", "tokens": [51765, 293, 321, 434, 516, 281, 360, 300, 51860], "temperature": 0.0, "avg_logprob": -0.07515755215206661, "compression_ratio": 2.057823129251701, "no_speech_prob": 0.0020768071990460157}, {"id": 2477, "seek": 582204, "start": 5822.04, "end": 5823.94, "text": " using pen and paper", "tokens": [50365, 1228, 3435, 293, 3035, 50460], "temperature": 0.0, "avg_logprob": -0.06836302830622747, "compression_ratio": 1.8686131386861313, "no_speech_prob": 0.00091423315461725}, {"id": 2478, "seek": 582204, "start": 5824.04, "end": 5825.94, "text": " so previously we've broken up batch normalization", "tokens": [50465, 370, 8046, 321, 600, 5463, 493, 15245, 2710, 2144, 50560], "temperature": 0.0, "avg_logprob": -0.06836302830622747, "compression_ratio": 1.8686131386861313, "no_speech_prob": 0.00091423315461725}, {"id": 2479, "seek": 582204, "start": 5826.04, "end": 5827.94, "text": " into all of the little intermediate pieces", "tokens": [50565, 666, 439, 295, 264, 707, 19376, 3755, 50660], "temperature": 0.0, "avg_logprob": -0.06836302830622747, "compression_ratio": 1.8686131386861313, "no_speech_prob": 0.00091423315461725}, {"id": 2480, "seek": 582204, "start": 5828.04, "end": 5829.94, "text": " and all the atomic operations inside it", "tokens": [50665, 293, 439, 264, 22275, 7705, 1854, 309, 50760], "temperature": 0.0, "avg_logprob": -0.06836302830622747, "compression_ratio": 1.8686131386861313, "no_speech_prob": 0.00091423315461725}, {"id": 2481, "seek": 582204, "start": 5830.04, "end": 5831.94, "text": " and then we back propagated through it", "tokens": [50765, 293, 550, 321, 646, 12425, 770, 807, 309, 50860], "temperature": 0.0, "avg_logprob": -0.06836302830622747, "compression_ratio": 1.8686131386861313, "no_speech_prob": 0.00091423315461725}, {"id": 2482, "seek": 582204, "start": 5832.04, "end": 5833.94, "text": " one by one", "tokens": [50865, 472, 538, 472, 50960], "temperature": 0.0, "avg_logprob": -0.06836302830622747, "compression_ratio": 1.8686131386861313, "no_speech_prob": 0.00091423315461725}, {"id": 2483, "seek": 582204, "start": 5834.04, "end": 5835.94, "text": " now we just have a single sort of forward pass", "tokens": [50965, 586, 321, 445, 362, 257, 2167, 1333, 295, 2128, 1320, 51060], "temperature": 0.0, "avg_logprob": -0.06836302830622747, "compression_ratio": 1.8686131386861313, "no_speech_prob": 0.00091423315461725}, {"id": 2484, "seek": 582204, "start": 5836.04, "end": 5837.94, "text": " of a batch form", "tokens": [51065, 295, 257, 15245, 1254, 51160], "temperature": 0.0, "avg_logprob": -0.06836302830622747, "compression_ratio": 1.8686131386861313, "no_speech_prob": 0.00091423315461725}, {"id": 2485, "seek": 582204, "start": 5838.04, "end": 5839.94, "text": " and it's all glued together", "tokens": [51165, 293, 309, 311, 439, 28008, 1214, 51260], "temperature": 0.0, "avg_logprob": -0.06836302830622747, "compression_ratio": 1.8686131386861313, "no_speech_prob": 0.00091423315461725}, {"id": 2486, "seek": 582204, "start": 5840.04, "end": 5841.94, "text": " and we see that we get the exact same result as before", "tokens": [51265, 293, 321, 536, 300, 321, 483, 264, 1900, 912, 1874, 382, 949, 51360], "temperature": 0.0, "avg_logprob": -0.06836302830622747, "compression_ratio": 1.8686131386861313, "no_speech_prob": 0.00091423315461725}, {"id": 2487, "seek": 582204, "start": 5842.04, "end": 5843.94, "text": " now for the backward pass", "tokens": [51365, 586, 337, 264, 23897, 1320, 51460], "temperature": 0.0, "avg_logprob": -0.06836302830622747, "compression_ratio": 1.8686131386861313, "no_speech_prob": 0.00091423315461725}, {"id": 2488, "seek": 582204, "start": 5844.04, "end": 5845.94, "text": " we'd like to also implement", "tokens": [51465, 321, 1116, 411, 281, 611, 4445, 51560], "temperature": 0.0, "avg_logprob": -0.06836302830622747, "compression_ratio": 1.8686131386861313, "no_speech_prob": 0.00091423315461725}, {"id": 2489, "seek": 582204, "start": 5846.04, "end": 5847.94, "text": " a single formula basically", "tokens": [51565, 257, 2167, 8513, 1936, 51660], "temperature": 0.0, "avg_logprob": -0.06836302830622747, "compression_ratio": 1.8686131386861313, "no_speech_prob": 0.00091423315461725}, {"id": 2490, "seek": 582204, "start": 5848.04, "end": 5849.94, "text": " for back propagating through this entire operation", "tokens": [51665, 337, 646, 12425, 990, 807, 341, 2302, 6916, 51760], "temperature": 0.0, "avg_logprob": -0.06836302830622747, "compression_ratio": 1.8686131386861313, "no_speech_prob": 0.00091423315461725}, {"id": 2491, "seek": 582204, "start": 5850.04, "end": 5851.94, "text": " that is the batch normalization", "tokens": [51765, 300, 307, 264, 15245, 2710, 2144, 51860], "temperature": 0.0, "avg_logprob": -0.06836302830622747, "compression_ratio": 1.8686131386861313, "no_speech_prob": 0.00091423315461725}, {"id": 2492, "seek": 585204, "start": 5852.04, "end": 5853.94, "text": " so in the forward pass previously", "tokens": [50365, 370, 294, 264, 2128, 1320, 8046, 50460], "temperature": 0.0, "avg_logprob": -0.06650262560163225, "compression_ratio": 2.05045871559633, "no_speech_prob": 0.0007186249131336808}, {"id": 2493, "seek": 585204, "start": 5854.04, "end": 5855.94, "text": " we took h pre bn", "tokens": [50465, 321, 1890, 276, 659, 272, 77, 50560], "temperature": 0.0, "avg_logprob": -0.06650262560163225, "compression_ratio": 2.05045871559633, "no_speech_prob": 0.0007186249131336808}, {"id": 2494, "seek": 585204, "start": 5856.04, "end": 5857.94, "text": " the hidden states of the pre batch normalization", "tokens": [50565, 264, 7633, 4368, 295, 264, 659, 15245, 2710, 2144, 50660], "temperature": 0.0, "avg_logprob": -0.06650262560163225, "compression_ratio": 2.05045871559633, "no_speech_prob": 0.0007186249131336808}, {"id": 2495, "seek": 585204, "start": 5858.04, "end": 5859.94, "text": " and created h preact", "tokens": [50665, 293, 2942, 276, 659, 578, 50760], "temperature": 0.0, "avg_logprob": -0.06650262560163225, "compression_ratio": 2.05045871559633, "no_speech_prob": 0.0007186249131336808}, {"id": 2496, "seek": 585204, "start": 5860.04, "end": 5861.94, "text": " which is the hidden states", "tokens": [50765, 597, 307, 264, 7633, 4368, 50860], "temperature": 0.0, "avg_logprob": -0.06650262560163225, "compression_ratio": 2.05045871559633, "no_speech_prob": 0.0007186249131336808}, {"id": 2497, "seek": 585204, "start": 5862.04, "end": 5863.94, "text": " just before the activation", "tokens": [50865, 445, 949, 264, 24433, 50960], "temperature": 0.0, "avg_logprob": -0.06650262560163225, "compression_ratio": 2.05045871559633, "no_speech_prob": 0.0007186249131336808}, {"id": 2498, "seek": 585204, "start": 5864.04, "end": 5865.94, "text": " in the batch normalization paper", "tokens": [50965, 294, 264, 15245, 2710, 2144, 3035, 51060], "temperature": 0.0, "avg_logprob": -0.06650262560163225, "compression_ratio": 2.05045871559633, "no_speech_prob": 0.0007186249131336808}, {"id": 2499, "seek": 585204, "start": 5866.04, "end": 5867.94, "text": " h pre bn is x", "tokens": [51065, 276, 659, 272, 77, 307, 2031, 51160], "temperature": 0.0, "avg_logprob": -0.06650262560163225, "compression_ratio": 2.05045871559633, "no_speech_prob": 0.0007186249131336808}, {"id": 2500, "seek": 585204, "start": 5868.04, "end": 5869.94, "text": " and h preact is y", "tokens": [51165, 293, 276, 659, 578, 307, 288, 51260], "temperature": 0.0, "avg_logprob": -0.06650262560163225, "compression_ratio": 2.05045871559633, "no_speech_prob": 0.0007186249131336808}, {"id": 2501, "seek": 585204, "start": 5870.04, "end": 5871.94, "text": " so in the backward pass what we'd like to do now", "tokens": [51265, 370, 294, 264, 23897, 1320, 437, 321, 1116, 411, 281, 360, 586, 51360], "temperature": 0.0, "avg_logprob": -0.06650262560163225, "compression_ratio": 2.05045871559633, "no_speech_prob": 0.0007186249131336808}, {"id": 2502, "seek": 585204, "start": 5872.04, "end": 5873.94, "text": " is we have dh preact", "tokens": [51365, 307, 321, 362, 274, 71, 659, 578, 51460], "temperature": 0.0, "avg_logprob": -0.06650262560163225, "compression_ratio": 2.05045871559633, "no_speech_prob": 0.0007186249131336808}, {"id": 2503, "seek": 585204, "start": 5874.04, "end": 5875.94, "text": " and we'd like to produce dh pre bn", "tokens": [51465, 293, 321, 1116, 411, 281, 5258, 274, 71, 659, 272, 77, 51560], "temperature": 0.0, "avg_logprob": -0.06650262560163225, "compression_ratio": 2.05045871559633, "no_speech_prob": 0.0007186249131336808}, {"id": 2504, "seek": 585204, "start": 5876.04, "end": 5877.94, "text": " and we'd like to do that in a very efficient manner", "tokens": [51565, 293, 321, 1116, 411, 281, 360, 300, 294, 257, 588, 7148, 9060, 51660], "temperature": 0.0, "avg_logprob": -0.06650262560163225, "compression_ratio": 2.05045871559633, "no_speech_prob": 0.0007186249131336808}, {"id": 2505, "seek": 585204, "start": 5878.04, "end": 5879.94, "text": " so that's the name of the game", "tokens": [51665, 370, 300, 311, 264, 1315, 295, 264, 1216, 51760], "temperature": 0.0, "avg_logprob": -0.06650262560163225, "compression_ratio": 2.05045871559633, "no_speech_prob": 0.0007186249131336808}, {"id": 2506, "seek": 585204, "start": 5880.04, "end": 5881.94, "text": " calculate dh pre bn", "tokens": [51765, 8873, 274, 71, 659, 272, 77, 51860], "temperature": 0.0, "avg_logprob": -0.06650262560163225, "compression_ratio": 2.05045871559633, "no_speech_prob": 0.0007186249131336808}, {"id": 2507, "seek": 588204, "start": 5882.04, "end": 5883.94, "text": " given dh preact", "tokens": [50365, 2212, 274, 71, 659, 578, 50460], "temperature": 0.0, "avg_logprob": -0.041700629393259685, "compression_ratio": 1.7222222222222223, "no_speech_prob": 0.0005243812338449061}, {"id": 2508, "seek": 588204, "start": 5884.04, "end": 5885.94, "text": " and for the purposes of this exercise", "tokens": [50465, 293, 337, 264, 9932, 295, 341, 5380, 50560], "temperature": 0.0, "avg_logprob": -0.041700629393259685, "compression_ratio": 1.7222222222222223, "no_speech_prob": 0.0005243812338449061}, {"id": 2509, "seek": 588204, "start": 5886.04, "end": 5887.94, "text": " we're going to ignore gamma and beta", "tokens": [50565, 321, 434, 516, 281, 11200, 15546, 293, 9861, 50660], "temperature": 0.0, "avg_logprob": -0.041700629393259685, "compression_ratio": 1.7222222222222223, "no_speech_prob": 0.0005243812338449061}, {"id": 2510, "seek": 588204, "start": 5888.04, "end": 5889.94, "text": " and their derivatives", "tokens": [50665, 293, 641, 33733, 50760], "temperature": 0.0, "avg_logprob": -0.041700629393259685, "compression_ratio": 1.7222222222222223, "no_speech_prob": 0.0005243812338449061}, {"id": 2511, "seek": 588204, "start": 5890.04, "end": 5891.94, "text": " because they take on a very simple form", "tokens": [50765, 570, 436, 747, 322, 257, 588, 2199, 1254, 50860], "temperature": 0.0, "avg_logprob": -0.041700629393259685, "compression_ratio": 1.7222222222222223, "no_speech_prob": 0.0005243812338449061}, {"id": 2512, "seek": 588204, "start": 5892.04, "end": 5893.94, "text": " in a very similar way to what we did up above", "tokens": [50865, 294, 257, 588, 2531, 636, 281, 437, 321, 630, 493, 3673, 50960], "temperature": 0.0, "avg_logprob": -0.041700629393259685, "compression_ratio": 1.7222222222222223, "no_speech_prob": 0.0005243812338449061}, {"id": 2513, "seek": 588204, "start": 5894.04, "end": 5895.94, "text": " so let's calculate this", "tokens": [50965, 370, 718, 311, 8873, 341, 51060], "temperature": 0.0, "avg_logprob": -0.041700629393259685, "compression_ratio": 1.7222222222222223, "no_speech_prob": 0.0005243812338449061}, {"id": 2514, "seek": 588204, "start": 5896.04, "end": 5897.94, "text": " given that right here", "tokens": [51065, 2212, 300, 558, 510, 51160], "temperature": 0.0, "avg_logprob": -0.041700629393259685, "compression_ratio": 1.7222222222222223, "no_speech_prob": 0.0005243812338449061}, {"id": 2515, "seek": 588204, "start": 5898.04, "end": 5899.94, "text": " so to help you a little bit", "tokens": [51165, 370, 281, 854, 291, 257, 707, 857, 51260], "temperature": 0.0, "avg_logprob": -0.041700629393259685, "compression_ratio": 1.7222222222222223, "no_speech_prob": 0.0005243812338449061}, {"id": 2516, "seek": 588204, "start": 5900.04, "end": 5901.94, "text": " like I did before", "tokens": [51265, 411, 286, 630, 949, 51360], "temperature": 0.0, "avg_logprob": -0.041700629393259685, "compression_ratio": 1.7222222222222223, "no_speech_prob": 0.0005243812338449061}, {"id": 2517, "seek": 588204, "start": 5902.04, "end": 5903.94, "text": " I started off the implementation here", "tokens": [51365, 286, 1409, 766, 264, 11420, 510, 51460], "temperature": 0.0, "avg_logprob": -0.041700629393259685, "compression_ratio": 1.7222222222222223, "no_speech_prob": 0.0005243812338449061}, {"id": 2518, "seek": 588204, "start": 5904.04, "end": 5905.94, "text": " on pen and paper", "tokens": [51465, 322, 3435, 293, 3035, 51560], "temperature": 0.0, "avg_logprob": -0.041700629393259685, "compression_ratio": 1.7222222222222223, "no_speech_prob": 0.0005243812338449061}, {"id": 2519, "seek": 588204, "start": 5906.04, "end": 5907.94, "text": " and I took two sheets of paper", "tokens": [51565, 293, 286, 1890, 732, 15421, 295, 3035, 51660], "temperature": 0.0, "avg_logprob": -0.041700629393259685, "compression_ratio": 1.7222222222222223, "no_speech_prob": 0.0005243812338449061}, {"id": 2520, "seek": 588204, "start": 5908.04, "end": 5909.94, "text": " to derive the mathematical formulas", "tokens": [51665, 281, 28446, 264, 18894, 30546, 51760], "temperature": 0.0, "avg_logprob": -0.041700629393259685, "compression_ratio": 1.7222222222222223, "no_speech_prob": 0.0005243812338449061}, {"id": 2521, "seek": 588204, "start": 5910.04, "end": 5911.94, "text": " for the backward pass", "tokens": [51765, 337, 264, 23897, 1320, 51860], "temperature": 0.0, "avg_logprob": -0.041700629393259685, "compression_ratio": 1.7222222222222223, "no_speech_prob": 0.0005243812338449061}, {"id": 2522, "seek": 591194, "start": 5911.94, "end": 5913.839999999999, "text": " so to solve the problem", "tokens": [50365, 370, 281, 5039, 264, 1154, 50460], "temperature": 0.0, "avg_logprob": -0.09504955731905423, "compression_ratio": 1.886178861788618, "no_speech_prob": 0.000706541002728045}, {"id": 2523, "seek": 591194, "start": 5913.94, "end": 5915.839999999999, "text": " just write out the mu sigma square variance", "tokens": [50465, 445, 2464, 484, 264, 2992, 12771, 3732, 21977, 50560], "temperature": 0.0, "avg_logprob": -0.09504955731905423, "compression_ratio": 1.886178861788618, "no_speech_prob": 0.000706541002728045}, {"id": 2524, "seek": 591194, "start": 5915.94, "end": 5917.839999999999, "text": " xi hat and yi", "tokens": [50565, 36800, 2385, 293, 288, 72, 50660], "temperature": 0.0, "avg_logprob": -0.09504955731905423, "compression_ratio": 1.886178861788618, "no_speech_prob": 0.000706541002728045}, {"id": 2525, "seek": 591194, "start": 5917.94, "end": 5919.839999999999, "text": " exactly as in the paper", "tokens": [50665, 2293, 382, 294, 264, 3035, 50760], "temperature": 0.0, "avg_logprob": -0.09504955731905423, "compression_ratio": 1.886178861788618, "no_speech_prob": 0.000706541002728045}, {"id": 2526, "seek": 591194, "start": 5919.94, "end": 5921.839999999999, "text": " except for the Bessel correction", "tokens": [50765, 3993, 337, 264, 363, 47166, 19984, 50860], "temperature": 0.0, "avg_logprob": -0.09504955731905423, "compression_ratio": 1.886178861788618, "no_speech_prob": 0.000706541002728045}, {"id": 2527, "seek": 591194, "start": 5921.94, "end": 5923.839999999999, "text": " and then in the backward pass", "tokens": [50865, 293, 550, 294, 264, 23897, 1320, 50960], "temperature": 0.0, "avg_logprob": -0.09504955731905423, "compression_ratio": 1.886178861788618, "no_speech_prob": 0.000706541002728045}, {"id": 2528, "seek": 591194, "start": 5923.94, "end": 5925.839999999999, "text": " we have the derivative of the laws", "tokens": [50965, 321, 362, 264, 13760, 295, 264, 6064, 51060], "temperature": 0.0, "avg_logprob": -0.09504955731905423, "compression_ratio": 1.886178861788618, "no_speech_prob": 0.000706541002728045}, {"id": 2529, "seek": 591194, "start": 5925.94, "end": 5927.839999999999, "text": " with respect to all the elements of y", "tokens": [51065, 365, 3104, 281, 439, 264, 4959, 295, 288, 51160], "temperature": 0.0, "avg_logprob": -0.09504955731905423, "compression_ratio": 1.886178861788618, "no_speech_prob": 0.000706541002728045}, {"id": 2530, "seek": 591194, "start": 5927.94, "end": 5929.839999999999, "text": " and remember that y is a vector", "tokens": [51165, 293, 1604, 300, 288, 307, 257, 8062, 51260], "temperature": 0.0, "avg_logprob": -0.09504955731905423, "compression_ratio": 1.886178861788618, "no_speech_prob": 0.000706541002728045}, {"id": 2531, "seek": 591194, "start": 5929.94, "end": 5931.839999999999, "text": " there's multiple numbers here", "tokens": [51265, 456, 311, 3866, 3547, 510, 51360], "temperature": 0.0, "avg_logprob": -0.09504955731905423, "compression_ratio": 1.886178861788618, "no_speech_prob": 0.000706541002728045}, {"id": 2532, "seek": 591194, "start": 5931.94, "end": 5933.839999999999, "text": " so we have all the derivatives", "tokens": [51365, 370, 321, 362, 439, 264, 33733, 51460], "temperature": 0.0, "avg_logprob": -0.09504955731905423, "compression_ratio": 1.886178861788618, "no_speech_prob": 0.000706541002728045}, {"id": 2533, "seek": 591194, "start": 5933.94, "end": 5935.839999999999, "text": " with respect to all the y's", "tokens": [51465, 365, 3104, 281, 439, 264, 288, 311, 51560], "temperature": 0.0, "avg_logprob": -0.09504955731905423, "compression_ratio": 1.886178861788618, "no_speech_prob": 0.000706541002728045}, {"id": 2534, "seek": 591194, "start": 5935.94, "end": 5937.839999999999, "text": " and then there's a gamma and a beta", "tokens": [51565, 293, 550, 456, 311, 257, 15546, 293, 257, 9861, 51660], "temperature": 0.0, "avg_logprob": -0.09504955731905423, "compression_ratio": 1.886178861788618, "no_speech_prob": 0.000706541002728045}, {"id": 2535, "seek": 591194, "start": 5937.94, "end": 5939.839999999999, "text": " and this is kind of like the compute graph", "tokens": [51665, 293, 341, 307, 733, 295, 411, 264, 14722, 4295, 51760], "temperature": 0.0, "avg_logprob": -0.09504955731905423, "compression_ratio": 1.886178861788618, "no_speech_prob": 0.000706541002728045}, {"id": 2536, "seek": 591194, "start": 5939.94, "end": 5941.839999999999, "text": " the gamma and the beta", "tokens": [51765, 264, 15546, 293, 264, 9861, 51860], "temperature": 0.0, "avg_logprob": -0.09504955731905423, "compression_ratio": 1.886178861788618, "no_speech_prob": 0.000706541002728045}, {"id": 2537, "seek": 594184, "start": 5941.84, "end": 5943.74, "text": " there's the x hat", "tokens": [50365, 456, 311, 264, 2031, 2385, 50460], "temperature": 0.0, "avg_logprob": -0.08346499529751865, "compression_ratio": 1.8043478260869565, "no_speech_prob": 0.0025842308532446623}, {"id": 2538, "seek": 594184, "start": 5943.84, "end": 5945.74, "text": " and then the mu and the sigma square", "tokens": [50465, 293, 550, 264, 2992, 293, 264, 12771, 3732, 50560], "temperature": 0.0, "avg_logprob": -0.08346499529751865, "compression_ratio": 1.8043478260869565, "no_speech_prob": 0.0025842308532446623}, {"id": 2539, "seek": 594184, "start": 5945.84, "end": 5947.74, "text": " and the x", "tokens": [50565, 293, 264, 2031, 50660], "temperature": 0.0, "avg_logprob": -0.08346499529751865, "compression_ratio": 1.8043478260869565, "no_speech_prob": 0.0025842308532446623}, {"id": 2540, "seek": 594184, "start": 5947.84, "end": 5949.74, "text": " so we have dl by dyi", "tokens": [50665, 370, 321, 362, 37873, 538, 14584, 72, 50760], "temperature": 0.0, "avg_logprob": -0.08346499529751865, "compression_ratio": 1.8043478260869565, "no_speech_prob": 0.0025842308532446623}, {"id": 2541, "seek": 594184, "start": 5949.84, "end": 5951.74, "text": " and we want dl by dxi", "tokens": [50765, 293, 321, 528, 37873, 538, 274, 27579, 50860], "temperature": 0.0, "avg_logprob": -0.08346499529751865, "compression_ratio": 1.8043478260869565, "no_speech_prob": 0.0025842308532446623}, {"id": 2542, "seek": 594184, "start": 5951.84, "end": 5953.74, "text": " for all the i's in these vectors", "tokens": [50865, 337, 439, 264, 741, 311, 294, 613, 18875, 50960], "temperature": 0.0, "avg_logprob": -0.08346499529751865, "compression_ratio": 1.8043478260869565, "no_speech_prob": 0.0025842308532446623}, {"id": 2543, "seek": 594184, "start": 5953.84, "end": 5955.74, "text": " so", "tokens": [50965, 370, 51060], "temperature": 0.0, "avg_logprob": -0.08346499529751865, "compression_ratio": 1.8043478260869565, "no_speech_prob": 0.0025842308532446623}, {"id": 2544, "seek": 594184, "start": 5955.84, "end": 5957.74, "text": " this is the compute graph", "tokens": [51065, 341, 307, 264, 14722, 4295, 51160], "temperature": 0.0, "avg_logprob": -0.08346499529751865, "compression_ratio": 1.8043478260869565, "no_speech_prob": 0.0025842308532446623}, {"id": 2545, "seek": 594184, "start": 5957.84, "end": 5959.74, "text": " and you have to be careful because", "tokens": [51165, 293, 291, 362, 281, 312, 5026, 570, 51260], "temperature": 0.0, "avg_logprob": -0.08346499529751865, "compression_ratio": 1.8043478260869565, "no_speech_prob": 0.0025842308532446623}, {"id": 2546, "seek": 594184, "start": 5959.84, "end": 5961.74, "text": " I'm trying to note here that", "tokens": [51265, 286, 478, 1382, 281, 3637, 510, 300, 51360], "temperature": 0.0, "avg_logprob": -0.08346499529751865, "compression_ratio": 1.8043478260869565, "no_speech_prob": 0.0025842308532446623}, {"id": 2547, "seek": 594184, "start": 5961.84, "end": 5963.74, "text": " these are vectors", "tokens": [51365, 613, 366, 18875, 51460], "temperature": 0.0, "avg_logprob": -0.08346499529751865, "compression_ratio": 1.8043478260869565, "no_speech_prob": 0.0025842308532446623}, {"id": 2548, "seek": 594184, "start": 5963.84, "end": 5965.74, "text": " there's many nodes here inside x", "tokens": [51465, 456, 311, 867, 13891, 510, 1854, 2031, 51560], "temperature": 0.0, "avg_logprob": -0.08346499529751865, "compression_ratio": 1.8043478260869565, "no_speech_prob": 0.0025842308532446623}, {"id": 2549, "seek": 594184, "start": 5965.84, "end": 5967.74, "text": " x hat and y", "tokens": [51565, 2031, 2385, 293, 288, 51660], "temperature": 0.0, "avg_logprob": -0.08346499529751865, "compression_ratio": 1.8043478260869565, "no_speech_prob": 0.0025842308532446623}, {"id": 2550, "seek": 594184, "start": 5967.84, "end": 5969.74, "text": " but mu and sigma", "tokens": [51665, 457, 2992, 293, 12771, 51760], "temperature": 0.0, "avg_logprob": -0.08346499529751865, "compression_ratio": 1.8043478260869565, "no_speech_prob": 0.0025842308532446623}, {"id": 2551, "seek": 594184, "start": 5969.84, "end": 5971.74, "text": " sorry sigma square", "tokens": [51765, 2597, 12771, 3732, 51860], "temperature": 0.0, "avg_logprob": -0.08346499529751865, "compression_ratio": 1.8043478260869565, "no_speech_prob": 0.0025842308532446623}, {"id": 2552, "seek": 597184, "start": 5971.84, "end": 5973.74, "text": " so you have to be careful with that", "tokens": [50365, 370, 291, 362, 281, 312, 5026, 365, 300, 50460], "temperature": 0.0, "avg_logprob": -0.06238766539868691, "compression_ratio": 1.8235294117647058, "no_speech_prob": 0.0012680395739153028}, {"id": 2553, "seek": 597184, "start": 5973.84, "end": 5975.74, "text": " you have to imagine there's multiple nodes here", "tokens": [50465, 291, 362, 281, 3811, 456, 311, 3866, 13891, 510, 50560], "temperature": 0.0, "avg_logprob": -0.06238766539868691, "compression_ratio": 1.8235294117647058, "no_speech_prob": 0.0012680395739153028}, {"id": 2554, "seek": 597184, "start": 5975.84, "end": 5977.74, "text": " or you're going to get your math wrong", "tokens": [50565, 420, 291, 434, 516, 281, 483, 428, 5221, 2085, 50660], "temperature": 0.0, "avg_logprob": -0.06238766539868691, "compression_ratio": 1.8235294117647058, "no_speech_prob": 0.0012680395739153028}, {"id": 2555, "seek": 597184, "start": 5977.84, "end": 5979.74, "text": " so as an example", "tokens": [50665, 370, 382, 364, 1365, 50760], "temperature": 0.0, "avg_logprob": -0.06238766539868691, "compression_ratio": 1.8235294117647058, "no_speech_prob": 0.0012680395739153028}, {"id": 2556, "seek": 597184, "start": 5979.84, "end": 5981.74, "text": " I would suggest that you go in the following order", "tokens": [50765, 286, 576, 3402, 300, 291, 352, 294, 264, 3480, 1668, 50860], "temperature": 0.0, "avg_logprob": -0.06238766539868691, "compression_ratio": 1.8235294117647058, "no_speech_prob": 0.0012680395739153028}, {"id": 2557, "seek": 597184, "start": 5981.84, "end": 5983.74, "text": " one, two, three, four", "tokens": [50865, 472, 11, 732, 11, 1045, 11, 1451, 50960], "temperature": 0.0, "avg_logprob": -0.06238766539868691, "compression_ratio": 1.8235294117647058, "no_speech_prob": 0.0012680395739153028}, {"id": 2558, "seek": 597184, "start": 5983.84, "end": 5985.74, "text": " in terms of the back propagation", "tokens": [50965, 294, 2115, 295, 264, 646, 38377, 51060], "temperature": 0.0, "avg_logprob": -0.06238766539868691, "compression_ratio": 1.8235294117647058, "no_speech_prob": 0.0012680395739153028}, {"id": 2559, "seek": 597184, "start": 5985.84, "end": 5987.74, "text": " so back propagate into x hat", "tokens": [51065, 370, 646, 48256, 666, 2031, 2385, 51160], "temperature": 0.0, "avg_logprob": -0.06238766539868691, "compression_ratio": 1.8235294117647058, "no_speech_prob": 0.0012680395739153028}, {"id": 2560, "seek": 597184, "start": 5987.84, "end": 5989.74, "text": " then into sigma square", "tokens": [51165, 550, 666, 12771, 3732, 51260], "temperature": 0.0, "avg_logprob": -0.06238766539868691, "compression_ratio": 1.8235294117647058, "no_speech_prob": 0.0012680395739153028}, {"id": 2561, "seek": 597184, "start": 5989.84, "end": 5991.74, "text": " then into mu and then into x", "tokens": [51265, 550, 666, 2992, 293, 550, 666, 2031, 51360], "temperature": 0.0, "avg_logprob": -0.06238766539868691, "compression_ratio": 1.8235294117647058, "no_speech_prob": 0.0012680395739153028}, {"id": 2562, "seek": 597184, "start": 5991.84, "end": 5993.74, "text": " just like in a topological sort", "tokens": [51365, 445, 411, 294, 257, 1192, 4383, 1333, 51460], "temperature": 0.0, "avg_logprob": -0.06238766539868691, "compression_ratio": 1.8235294117647058, "no_speech_prob": 0.0012680395739153028}, {"id": 2563, "seek": 597184, "start": 5993.84, "end": 5995.74, "text": " in micrograd we would go from right to left", "tokens": [51465, 294, 4532, 7165, 321, 576, 352, 490, 558, 281, 1411, 51560], "temperature": 0.0, "avg_logprob": -0.06238766539868691, "compression_ratio": 1.8235294117647058, "no_speech_prob": 0.0012680395739153028}, {"id": 2564, "seek": 597184, "start": 5995.84, "end": 5997.74, "text": " you're doing the exact same thing", "tokens": [51565, 291, 434, 884, 264, 1900, 912, 551, 51660], "temperature": 0.0, "avg_logprob": -0.06238766539868691, "compression_ratio": 1.8235294117647058, "no_speech_prob": 0.0012680395739153028}, {"id": 2565, "seek": 597184, "start": 5997.84, "end": 5999.74, "text": " except you're doing it with symbols", "tokens": [51665, 3993, 291, 434, 884, 309, 365, 16944, 51760], "temperature": 0.0, "avg_logprob": -0.06238766539868691, "compression_ratio": 1.8235294117647058, "no_speech_prob": 0.0012680395739153028}, {"id": 2566, "seek": 597184, "start": 5999.84, "end": 6001.74, "text": " and on a piece of paper", "tokens": [51765, 293, 322, 257, 2522, 295, 3035, 51860], "temperature": 0.0, "avg_logprob": -0.06238766539868691, "compression_ratio": 1.8235294117647058, "no_speech_prob": 0.0012680395739153028}, {"id": 2567, "seek": 600174, "start": 6001.74, "end": 6003.639999999999, "text": " so for number one", "tokens": [50365, 370, 337, 1230, 472, 50460], "temperature": 0.0, "avg_logprob": -0.09087550121804942, "compression_ratio": 1.6623376623376624, "no_speech_prob": 0.0005058132228441536}, {"id": 2568, "seek": 600174, "start": 6003.74, "end": 6005.639999999999, "text": " I'm not giving away too much", "tokens": [50465, 286, 478, 406, 2902, 1314, 886, 709, 50560], "temperature": 0.0, "avg_logprob": -0.09087550121804942, "compression_ratio": 1.6623376623376624, "no_speech_prob": 0.0005058132228441536}, {"id": 2569, "seek": 600174, "start": 6005.74, "end": 6007.639999999999, "text": " if you want dl of", "tokens": [50565, 498, 291, 528, 37873, 295, 50660], "temperature": 0.0, "avg_logprob": -0.09087550121804942, "compression_ratio": 1.6623376623376624, "no_speech_prob": 0.0005058132228441536}, {"id": 2570, "seek": 600174, "start": 6007.74, "end": 6009.639999999999, "text": " dxi hat", "tokens": [50665, 274, 27579, 2385, 50760], "temperature": 0.0, "avg_logprob": -0.09087550121804942, "compression_ratio": 1.6623376623376624, "no_speech_prob": 0.0005058132228441536}, {"id": 2571, "seek": 600174, "start": 6009.74, "end": 6011.639999999999, "text": " then we just take dl by dyi", "tokens": [50765, 550, 321, 445, 747, 37873, 538, 14584, 72, 50860], "temperature": 0.0, "avg_logprob": -0.09087550121804942, "compression_ratio": 1.6623376623376624, "no_speech_prob": 0.0005058132228441536}, {"id": 2572, "seek": 600174, "start": 6011.74, "end": 6013.639999999999, "text": " and multiply it by gamma", "tokens": [50865, 293, 12972, 309, 538, 15546, 50960], "temperature": 0.0, "avg_logprob": -0.09087550121804942, "compression_ratio": 1.6623376623376624, "no_speech_prob": 0.0005058132228441536}, {"id": 2573, "seek": 600174, "start": 6013.74, "end": 6015.639999999999, "text": " because of this expression here", "tokens": [50965, 570, 295, 341, 6114, 510, 51060], "temperature": 0.0, "avg_logprob": -0.09087550121804942, "compression_ratio": 1.6623376623376624, "no_speech_prob": 0.0005058132228441536}, {"id": 2574, "seek": 600174, "start": 6015.74, "end": 6017.639999999999, "text": " where any individual yi is just gamma", "tokens": [51065, 689, 604, 2609, 288, 72, 307, 445, 15546, 51160], "temperature": 0.0, "avg_logprob": -0.09087550121804942, "compression_ratio": 1.6623376623376624, "no_speech_prob": 0.0005058132228441536}, {"id": 2575, "seek": 600174, "start": 6017.74, "end": 6019.639999999999, "text": " times xi hat plus beta", "tokens": [51165, 1413, 36800, 2385, 1804, 9861, 51260], "temperature": 0.0, "avg_logprob": -0.09087550121804942, "compression_ratio": 1.6623376623376624, "no_speech_prob": 0.0005058132228441536}, {"id": 2576, "seek": 600174, "start": 6019.74, "end": 6021.639999999999, "text": " so it didn't help you", "tokens": [51265, 370, 309, 994, 380, 854, 291, 51360], "temperature": 0.0, "avg_logprob": -0.09087550121804942, "compression_ratio": 1.6623376623376624, "no_speech_prob": 0.0005058132228441536}, {"id": 2577, "seek": 600174, "start": 6021.74, "end": 6023.639999999999, "text": " too much there", "tokens": [51365, 886, 709, 456, 51460], "temperature": 0.0, "avg_logprob": -0.09087550121804942, "compression_ratio": 1.6623376623376624, "no_speech_prob": 0.0005058132228441536}, {"id": 2578, "seek": 600174, "start": 6023.74, "end": 6025.639999999999, "text": " but this gives you basically the derivatives", "tokens": [51465, 457, 341, 2709, 291, 1936, 264, 33733, 51560], "temperature": 0.0, "avg_logprob": -0.09087550121804942, "compression_ratio": 1.6623376623376624, "no_speech_prob": 0.0005058132228441536}, {"id": 2579, "seek": 600174, "start": 6025.74, "end": 6027.639999999999, "text": " for all the x hats", "tokens": [51565, 337, 439, 264, 2031, 20549, 51660], "temperature": 0.0, "avg_logprob": -0.09087550121804942, "compression_ratio": 1.6623376623376624, "no_speech_prob": 0.0005058132228441536}, {"id": 2580, "seek": 600174, "start": 6027.74, "end": 6029.639999999999, "text": " and so now try to go through this computational graph", "tokens": [51665, 293, 370, 586, 853, 281, 352, 807, 341, 28270, 4295, 51760], "temperature": 0.0, "avg_logprob": -0.09087550121804942, "compression_ratio": 1.6623376623376624, "no_speech_prob": 0.0005058132228441536}, {"id": 2581, "seek": 600174, "start": 6029.74, "end": 6031.639999999999, "text": " and derive", "tokens": [51765, 293, 28446, 51860], "temperature": 0.0, "avg_logprob": -0.09087550121804942, "compression_ratio": 1.6623376623376624, "no_speech_prob": 0.0005058132228441536}, {"id": 2582, "seek": 603164, "start": 6031.64, "end": 6033.54, "text": " what is dl by d sigma square", "tokens": [50365, 437, 307, 37873, 538, 274, 12771, 3732, 50460], "temperature": 0.0, "avg_logprob": -0.08302821182623142, "compression_ratio": 1.912037037037037, "no_speech_prob": 0.0007011470152065158}, {"id": 2583, "seek": 603164, "start": 6033.64, "end": 6035.54, "text": " and then what is dl by d mu", "tokens": [50465, 293, 550, 437, 307, 37873, 538, 274, 2992, 50560], "temperature": 0.0, "avg_logprob": -0.08302821182623142, "compression_ratio": 1.912037037037037, "no_speech_prob": 0.0007011470152065158}, {"id": 2584, "seek": 603164, "start": 6035.64, "end": 6037.54, "text": " and then what is dl by dx", "tokens": [50565, 293, 550, 437, 307, 37873, 538, 30017, 50660], "temperature": 0.0, "avg_logprob": -0.08302821182623142, "compression_ratio": 1.912037037037037, "no_speech_prob": 0.0007011470152065158}, {"id": 2585, "seek": 603164, "start": 6037.64, "end": 6039.54, "text": " eventually", "tokens": [50665, 4728, 50760], "temperature": 0.0, "avg_logprob": -0.08302821182623142, "compression_ratio": 1.912037037037037, "no_speech_prob": 0.0007011470152065158}, {"id": 2586, "seek": 603164, "start": 6039.64, "end": 6041.54, "text": " so give it a go", "tokens": [50765, 370, 976, 309, 257, 352, 50860], "temperature": 0.0, "avg_logprob": -0.08302821182623142, "compression_ratio": 1.912037037037037, "no_speech_prob": 0.0007011470152065158}, {"id": 2587, "seek": 603164, "start": 6041.64, "end": 6043.54, "text": " and I'm going to be revealing the answer", "tokens": [50865, 293, 286, 478, 516, 281, 312, 23983, 264, 1867, 50960], "temperature": 0.0, "avg_logprob": -0.08302821182623142, "compression_ratio": 1.912037037037037, "no_speech_prob": 0.0007011470152065158}, {"id": 2588, "seek": 603164, "start": 6043.64, "end": 6045.54, "text": " one piece at a time", "tokens": [50965, 472, 2522, 412, 257, 565, 51060], "temperature": 0.0, "avg_logprob": -0.08302821182623142, "compression_ratio": 1.912037037037037, "no_speech_prob": 0.0007011470152065158}, {"id": 2589, "seek": 603164, "start": 6045.64, "end": 6047.54, "text": " okay, so to get dl by d sigma square", "tokens": [51065, 1392, 11, 370, 281, 483, 37873, 538, 274, 12771, 3732, 51160], "temperature": 0.0, "avg_logprob": -0.08302821182623142, "compression_ratio": 1.912037037037037, "no_speech_prob": 0.0007011470152065158}, {"id": 2590, "seek": 603164, "start": 6047.64, "end": 6049.54, "text": " we have to remember again, like I mentioned", "tokens": [51165, 321, 362, 281, 1604, 797, 11, 411, 286, 2835, 51260], "temperature": 0.0, "avg_logprob": -0.08302821182623142, "compression_ratio": 1.912037037037037, "no_speech_prob": 0.0007011470152065158}, {"id": 2591, "seek": 603164, "start": 6049.64, "end": 6051.54, "text": " that there are many x hats here", "tokens": [51265, 300, 456, 366, 867, 2031, 20549, 510, 51360], "temperature": 0.0, "avg_logprob": -0.08302821182623142, "compression_ratio": 1.912037037037037, "no_speech_prob": 0.0007011470152065158}, {"id": 2592, "seek": 603164, "start": 6051.64, "end": 6053.54, "text": " and remember that sigma square", "tokens": [51365, 293, 1604, 300, 12771, 3732, 51460], "temperature": 0.0, "avg_logprob": -0.08302821182623142, "compression_ratio": 1.912037037037037, "no_speech_prob": 0.0007011470152065158}, {"id": 2593, "seek": 603164, "start": 6053.64, "end": 6055.54, "text": " is just a single individual number here", "tokens": [51465, 307, 445, 257, 2167, 2609, 1230, 510, 51560], "temperature": 0.0, "avg_logprob": -0.08302821182623142, "compression_ratio": 1.912037037037037, "no_speech_prob": 0.0007011470152065158}, {"id": 2594, "seek": 603164, "start": 6055.64, "end": 6057.54, "text": " so when we look at the expression", "tokens": [51565, 370, 562, 321, 574, 412, 264, 6114, 51660], "temperature": 0.0, "avg_logprob": -0.08302821182623142, "compression_ratio": 1.912037037037037, "no_speech_prob": 0.0007011470152065158}, {"id": 2595, "seek": 603164, "start": 6057.64, "end": 6059.54, "text": " for dl by d sigma square", "tokens": [51665, 337, 37873, 538, 274, 12771, 3732, 51760], "temperature": 0.0, "avg_logprob": -0.08302821182623142, "compression_ratio": 1.912037037037037, "no_speech_prob": 0.0007011470152065158}, {"id": 2596, "seek": 605954, "start": 6059.54, "end": 6061.44, "text": " for dl by d sigma square", "tokens": [50365, 337, 37873, 538, 274, 12771, 3732, 50460], "temperature": 0.0, "avg_logprob": -0.07767294335553027, "compression_ratio": 2.0514018691588785, "no_speech_prob": 0.0008197844726964831}, {"id": 2597, "seek": 605954, "start": 6061.54, "end": 6063.44, "text": " we have that we have to actually", "tokens": [50465, 321, 362, 300, 321, 362, 281, 767, 50560], "temperature": 0.0, "avg_logprob": -0.07767294335553027, "compression_ratio": 2.0514018691588785, "no_speech_prob": 0.0008197844726964831}, {"id": 2598, "seek": 605954, "start": 6063.54, "end": 6065.44, "text": " consider all the possible paths", "tokens": [50565, 1949, 439, 264, 1944, 14518, 50660], "temperature": 0.0, "avg_logprob": -0.07767294335553027, "compression_ratio": 2.0514018691588785, "no_speech_prob": 0.0008197844726964831}, {"id": 2599, "seek": 605954, "start": 6065.54, "end": 6067.44, "text": " that", "tokens": [50665, 300, 50760], "temperature": 0.0, "avg_logprob": -0.07767294335553027, "compression_ratio": 2.0514018691588785, "no_speech_prob": 0.0008197844726964831}, {"id": 2600, "seek": 605954, "start": 6067.54, "end": 6069.44, "text": " we basically have that", "tokens": [50765, 321, 1936, 362, 300, 50860], "temperature": 0.0, "avg_logprob": -0.07767294335553027, "compression_ratio": 2.0514018691588785, "no_speech_prob": 0.0008197844726964831}, {"id": 2601, "seek": 605954, "start": 6069.54, "end": 6071.44, "text": " there's many x hats", "tokens": [50865, 456, 311, 867, 2031, 20549, 50960], "temperature": 0.0, "avg_logprob": -0.07767294335553027, "compression_ratio": 2.0514018691588785, "no_speech_prob": 0.0008197844726964831}, {"id": 2602, "seek": 605954, "start": 6071.54, "end": 6073.44, "text": " and they all feed off from", "tokens": [50965, 293, 436, 439, 3154, 766, 490, 51060], "temperature": 0.0, "avg_logprob": -0.07767294335553027, "compression_ratio": 2.0514018691588785, "no_speech_prob": 0.0008197844726964831}, {"id": 2603, "seek": 605954, "start": 6073.54, "end": 6075.44, "text": " they all depend on sigma square", "tokens": [51065, 436, 439, 5672, 322, 12771, 3732, 51160], "temperature": 0.0, "avg_logprob": -0.07767294335553027, "compression_ratio": 2.0514018691588785, "no_speech_prob": 0.0008197844726964831}, {"id": 2604, "seek": 605954, "start": 6075.54, "end": 6077.44, "text": " so sigma square has a large fan out", "tokens": [51165, 370, 12771, 3732, 575, 257, 2416, 3429, 484, 51260], "temperature": 0.0, "avg_logprob": -0.07767294335553027, "compression_ratio": 2.0514018691588785, "no_speech_prob": 0.0008197844726964831}, {"id": 2605, "seek": 605954, "start": 6077.54, "end": 6079.44, "text": " there's lots of arrows coming out from sigma square", "tokens": [51265, 456, 311, 3195, 295, 19669, 1348, 484, 490, 12771, 3732, 51360], "temperature": 0.0, "avg_logprob": -0.07767294335553027, "compression_ratio": 2.0514018691588785, "no_speech_prob": 0.0008197844726964831}, {"id": 2606, "seek": 605954, "start": 6079.54, "end": 6081.44, "text": " into all the x hats", "tokens": [51365, 666, 439, 264, 2031, 20549, 51460], "temperature": 0.0, "avg_logprob": -0.07767294335553027, "compression_ratio": 2.0514018691588785, "no_speech_prob": 0.0008197844726964831}, {"id": 2607, "seek": 605954, "start": 6081.54, "end": 6083.44, "text": " and then there's a back-replicating signal", "tokens": [51465, 293, 550, 456, 311, 257, 646, 12, 265, 4770, 990, 6358, 51560], "temperature": 0.0, "avg_logprob": -0.07767294335553027, "compression_ratio": 2.0514018691588785, "no_speech_prob": 0.0008197844726964831}, {"id": 2608, "seek": 605954, "start": 6083.54, "end": 6085.44, "text": " from each x hat into sigma square", "tokens": [51565, 490, 1184, 2031, 2385, 666, 12771, 3732, 51660], "temperature": 0.0, "avg_logprob": -0.07767294335553027, "compression_ratio": 2.0514018691588785, "no_speech_prob": 0.0008197844726964831}, {"id": 2609, "seek": 605954, "start": 6085.54, "end": 6087.44, "text": " and that's why we actually need to sum over", "tokens": [51665, 293, 300, 311, 983, 321, 767, 643, 281, 2408, 670, 51760], "temperature": 0.0, "avg_logprob": -0.07767294335553027, "compression_ratio": 2.0514018691588785, "no_speech_prob": 0.0008197844726964831}, {"id": 2610, "seek": 605954, "start": 6087.54, "end": 6089.44, "text": " all those i's", "tokens": [51765, 439, 729, 741, 311, 51860], "temperature": 0.0, "avg_logprob": -0.07767294335553027, "compression_ratio": 2.0514018691588785, "no_speech_prob": 0.0008197844726964831}, {"id": 2611, "seek": 608944, "start": 6089.44, "end": 6091.339999999999, "text": " into 1 to m", "tokens": [50365, 666, 502, 281, 275, 50460], "temperature": 0.0, "avg_logprob": -0.09306266695954078, "compression_ratio": 1.853448275862069, "no_speech_prob": 0.0020730572286993265}, {"id": 2612, "seek": 608944, "start": 6091.44, "end": 6093.339999999999, "text": " of the dl by dx hat", "tokens": [50465, 295, 264, 37873, 538, 30017, 2385, 50560], "temperature": 0.0, "avg_logprob": -0.09306266695954078, "compression_ratio": 1.853448275862069, "no_speech_prob": 0.0020730572286993265}, {"id": 2613, "seek": 608944, "start": 6093.44, "end": 6095.339999999999, "text": " which is the global gradient", "tokens": [50565, 597, 307, 264, 4338, 16235, 50660], "temperature": 0.0, "avg_logprob": -0.09306266695954078, "compression_ratio": 1.853448275862069, "no_speech_prob": 0.0020730572286993265}, {"id": 2614, "seek": 608944, "start": 6095.44, "end": 6097.339999999999, "text": " times", "tokens": [50665, 1413, 50760], "temperature": 0.0, "avg_logprob": -0.09306266695954078, "compression_ratio": 1.853448275862069, "no_speech_prob": 0.0020730572286993265}, {"id": 2615, "seek": 608944, "start": 6097.44, "end": 6099.339999999999, "text": " the xi hat by d sigma square", "tokens": [50765, 264, 36800, 2385, 538, 274, 12771, 3732, 50860], "temperature": 0.0, "avg_logprob": -0.09306266695954078, "compression_ratio": 1.853448275862069, "no_speech_prob": 0.0020730572286993265}, {"id": 2616, "seek": 608944, "start": 6099.44, "end": 6101.339999999999, "text": " which is the local gradient", "tokens": [50865, 597, 307, 264, 2654, 16235, 50960], "temperature": 0.0, "avg_logprob": -0.09306266695954078, "compression_ratio": 1.853448275862069, "no_speech_prob": 0.0020730572286993265}, {"id": 2617, "seek": 608944, "start": 6101.44, "end": 6103.339999999999, "text": " of this operation here", "tokens": [50965, 295, 341, 6916, 510, 51060], "temperature": 0.0, "avg_logprob": -0.09306266695954078, "compression_ratio": 1.853448275862069, "no_speech_prob": 0.0020730572286993265}, {"id": 2618, "seek": 608944, "start": 6103.44, "end": 6105.339999999999, "text": " and then mathematically", "tokens": [51065, 293, 550, 44003, 51160], "temperature": 0.0, "avg_logprob": -0.09306266695954078, "compression_ratio": 1.853448275862069, "no_speech_prob": 0.0020730572286993265}, {"id": 2619, "seek": 608944, "start": 6105.44, "end": 6107.339999999999, "text": " I'm just working it out here", "tokens": [51165, 286, 478, 445, 1364, 309, 484, 510, 51260], "temperature": 0.0, "avg_logprob": -0.09306266695954078, "compression_ratio": 1.853448275862069, "no_speech_prob": 0.0020730572286993265}, {"id": 2620, "seek": 608944, "start": 6107.44, "end": 6109.339999999999, "text": " and I'm simplifying and you get a certain expression", "tokens": [51265, 293, 286, 478, 6883, 5489, 293, 291, 483, 257, 1629, 6114, 51360], "temperature": 0.0, "avg_logprob": -0.09306266695954078, "compression_ratio": 1.853448275862069, "no_speech_prob": 0.0020730572286993265}, {"id": 2621, "seek": 608944, "start": 6109.44, "end": 6111.339999999999, "text": " for dl by d sigma square", "tokens": [51365, 337, 37873, 538, 274, 12771, 3732, 51460], "temperature": 0.0, "avg_logprob": -0.09306266695954078, "compression_ratio": 1.853448275862069, "no_speech_prob": 0.0020730572286993265}, {"id": 2622, "seek": 608944, "start": 6111.44, "end": 6113.339999999999, "text": " and we're going to be using this expression", "tokens": [51465, 293, 321, 434, 516, 281, 312, 1228, 341, 6114, 51560], "temperature": 0.0, "avg_logprob": -0.09306266695954078, "compression_ratio": 1.853448275862069, "no_speech_prob": 0.0020730572286993265}, {"id": 2623, "seek": 608944, "start": 6113.44, "end": 6115.339999999999, "text": " when we back-propagate into mu", "tokens": [51565, 562, 321, 646, 12, 79, 1513, 559, 473, 666, 2992, 51660], "temperature": 0.0, "avg_logprob": -0.09306266695954078, "compression_ratio": 1.853448275862069, "no_speech_prob": 0.0020730572286993265}, {"id": 2624, "seek": 608944, "start": 6115.44, "end": 6117.339999999999, "text": " and then eventually into x", "tokens": [51665, 293, 550, 4728, 666, 2031, 51760], "temperature": 0.0, "avg_logprob": -0.09306266695954078, "compression_ratio": 1.853448275862069, "no_speech_prob": 0.0020730572286993265}, {"id": 2625, "seek": 608944, "start": 6117.44, "end": 6119.339999999999, "text": " so now let's continue our back-propagation into mu", "tokens": [51765, 370, 586, 718, 311, 2354, 527, 646, 12, 79, 1513, 559, 399, 666, 2992, 51860], "temperature": 0.0, "avg_logprob": -0.09306266695954078, "compression_ratio": 1.853448275862069, "no_speech_prob": 0.0020730572286993265}, {"id": 2626, "seek": 611934, "start": 6119.34, "end": 6121.24, "text": " which is dl by d mu", "tokens": [50365, 597, 307, 37873, 538, 274, 2992, 50460], "temperature": 0.0, "avg_logprob": -0.06942619102588599, "compression_ratio": 1.788679245283019, "no_speech_prob": 0.004074704833328724}, {"id": 2627, "seek": 611934, "start": 6121.34, "end": 6123.24, "text": " now again be careful", "tokens": [50465, 586, 797, 312, 5026, 50560], "temperature": 0.0, "avg_logprob": -0.06942619102588599, "compression_ratio": 1.788679245283019, "no_speech_prob": 0.004074704833328724}, {"id": 2628, "seek": 611934, "start": 6123.34, "end": 6125.24, "text": " that mu influences x hat", "tokens": [50565, 300, 2992, 21222, 2031, 2385, 50660], "temperature": 0.0, "avg_logprob": -0.06942619102588599, "compression_ratio": 1.788679245283019, "no_speech_prob": 0.004074704833328724}, {"id": 2629, "seek": 611934, "start": 6125.34, "end": 6127.24, "text": " and x hat is actually lots of values", "tokens": [50665, 293, 2031, 2385, 307, 767, 3195, 295, 4190, 50760], "temperature": 0.0, "avg_logprob": -0.06942619102588599, "compression_ratio": 1.788679245283019, "no_speech_prob": 0.004074704833328724}, {"id": 2630, "seek": 611934, "start": 6127.34, "end": 6129.24, "text": " so for example if our mini-batch size is 32", "tokens": [50765, 370, 337, 1365, 498, 527, 8382, 12, 65, 852, 2744, 307, 8858, 50860], "temperature": 0.0, "avg_logprob": -0.06942619102588599, "compression_ratio": 1.788679245283019, "no_speech_prob": 0.004074704833328724}, {"id": 2631, "seek": 611934, "start": 6129.34, "end": 6131.24, "text": " as it is in our example that we were working on", "tokens": [50865, 382, 309, 307, 294, 527, 1365, 300, 321, 645, 1364, 322, 50960], "temperature": 0.0, "avg_logprob": -0.06942619102588599, "compression_ratio": 1.788679245283019, "no_speech_prob": 0.004074704833328724}, {"id": 2632, "seek": 611934, "start": 6131.34, "end": 6133.24, "text": " then this is 32 numbers", "tokens": [50965, 550, 341, 307, 8858, 3547, 51060], "temperature": 0.0, "avg_logprob": -0.06942619102588599, "compression_ratio": 1.788679245283019, "no_speech_prob": 0.004074704833328724}, {"id": 2633, "seek": 611934, "start": 6133.34, "end": 6135.24, "text": " and 32 arrows going back to mu", "tokens": [51065, 293, 8858, 19669, 516, 646, 281, 2992, 51160], "temperature": 0.0, "avg_logprob": -0.06942619102588599, "compression_ratio": 1.788679245283019, "no_speech_prob": 0.004074704833328724}, {"id": 2634, "seek": 611934, "start": 6135.34, "end": 6137.24, "text": " and then mu going to sigma square", "tokens": [51165, 293, 550, 2992, 516, 281, 12771, 3732, 51260], "temperature": 0.0, "avg_logprob": -0.06942619102588599, "compression_ratio": 1.788679245283019, "no_speech_prob": 0.004074704833328724}, {"id": 2635, "seek": 611934, "start": 6137.34, "end": 6139.24, "text": " is just a single arrow", "tokens": [51265, 307, 445, 257, 2167, 11610, 51360], "temperature": 0.0, "avg_logprob": -0.06942619102588599, "compression_ratio": 1.788679245283019, "no_speech_prob": 0.004074704833328724}, {"id": 2636, "seek": 611934, "start": 6139.34, "end": 6141.24, "text": " because sigma square is a scalar", "tokens": [51365, 570, 12771, 3732, 307, 257, 39684, 51460], "temperature": 0.0, "avg_logprob": -0.06942619102588599, "compression_ratio": 1.788679245283019, "no_speech_prob": 0.004074704833328724}, {"id": 2637, "seek": 611934, "start": 6141.34, "end": 6143.24, "text": " so in total there are 33 arrows", "tokens": [51465, 370, 294, 3217, 456, 366, 11816, 19669, 51560], "temperature": 0.0, "avg_logprob": -0.06942619102588599, "compression_ratio": 1.788679245283019, "no_speech_prob": 0.004074704833328724}, {"id": 2638, "seek": 611934, "start": 6143.34, "end": 6145.24, "text": " emanating from mu", "tokens": [51565, 28211, 990, 490, 2992, 51660], "temperature": 0.0, "avg_logprob": -0.06942619102588599, "compression_ratio": 1.788679245283019, "no_speech_prob": 0.004074704833328724}, {"id": 2639, "seek": 611934, "start": 6145.34, "end": 6147.24, "text": " and then all of them have gradients coming into mu", "tokens": [51665, 293, 550, 439, 295, 552, 362, 2771, 2448, 1348, 666, 2992, 51760], "temperature": 0.0, "avg_logprob": -0.06942619102588599, "compression_ratio": 1.788679245283019, "no_speech_prob": 0.004074704833328724}, {"id": 2640, "seek": 611934, "start": 6147.34, "end": 6149.24, "text": " and they all need to be summed up", "tokens": [51765, 293, 436, 439, 643, 281, 312, 2408, 1912, 493, 51860], "temperature": 0.0, "avg_logprob": -0.06942619102588599, "compression_ratio": 1.788679245283019, "no_speech_prob": 0.004074704833328724}, {"id": 2641, "seek": 614934, "start": 6149.34, "end": 6151.24, "text": " and so that's why", "tokens": [50365, 293, 370, 300, 311, 983, 50460], "temperature": 0.0, "avg_logprob": -0.06986687201580019, "compression_ratio": 1.8333333333333333, "no_speech_prob": 0.0010527888080105186}, {"id": 2642, "seek": 614934, "start": 6151.34, "end": 6153.24, "text": " when we look at the expression for dl by d mu", "tokens": [50465, 562, 321, 574, 412, 264, 6114, 337, 37873, 538, 274, 2992, 50560], "temperature": 0.0, "avg_logprob": -0.06986687201580019, "compression_ratio": 1.8333333333333333, "no_speech_prob": 0.0010527888080105186}, {"id": 2643, "seek": 614934, "start": 6153.34, "end": 6155.24, "text": " I'm summing up over all the gradients", "tokens": [50565, 286, 478, 2408, 2810, 493, 670, 439, 264, 2771, 2448, 50660], "temperature": 0.0, "avg_logprob": -0.06986687201580019, "compression_ratio": 1.8333333333333333, "no_speech_prob": 0.0010527888080105186}, {"id": 2644, "seek": 614934, "start": 6155.34, "end": 6157.24, "text": " of dl by dx i hat", "tokens": [50665, 295, 37873, 538, 30017, 741, 2385, 50760], "temperature": 0.0, "avg_logprob": -0.06986687201580019, "compression_ratio": 1.8333333333333333, "no_speech_prob": 0.0010527888080105186}, {"id": 2645, "seek": 614934, "start": 6157.34, "end": 6159.24, "text": " times dx i hat by d mu", "tokens": [50765, 1413, 30017, 741, 2385, 538, 274, 2992, 50860], "temperature": 0.0, "avg_logprob": -0.06986687201580019, "compression_ratio": 1.8333333333333333, "no_speech_prob": 0.0010527888080105186}, {"id": 2646, "seek": 614934, "start": 6159.34, "end": 6161.24, "text": " so that's this arrow", "tokens": [50865, 370, 300, 311, 341, 11610, 50960], "temperature": 0.0, "avg_logprob": -0.06986687201580019, "compression_ratio": 1.8333333333333333, "no_speech_prob": 0.0010527888080105186}, {"id": 2647, "seek": 614934, "start": 6161.34, "end": 6163.24, "text": " and that's 32 arrows here", "tokens": [50965, 293, 300, 311, 8858, 19669, 510, 51060], "temperature": 0.0, "avg_logprob": -0.06986687201580019, "compression_ratio": 1.8333333333333333, "no_speech_prob": 0.0010527888080105186}, {"id": 2648, "seek": 614934, "start": 6163.34, "end": 6165.24, "text": " and then plus the one arrow from here", "tokens": [51065, 293, 550, 1804, 264, 472, 11610, 490, 510, 51160], "temperature": 0.0, "avg_logprob": -0.06986687201580019, "compression_ratio": 1.8333333333333333, "no_speech_prob": 0.0010527888080105186}, {"id": 2649, "seek": 614934, "start": 6165.34, "end": 6167.24, "text": " which is dl by d sigma square", "tokens": [51165, 597, 307, 37873, 538, 274, 12771, 3732, 51260], "temperature": 0.0, "avg_logprob": -0.06986687201580019, "compression_ratio": 1.8333333333333333, "no_speech_prob": 0.0010527888080105186}, {"id": 2650, "seek": 614934, "start": 6167.34, "end": 6169.24, "text": " times d sigma square by d mu", "tokens": [51265, 1413, 274, 12771, 3732, 538, 274, 2992, 51360], "temperature": 0.0, "avg_logprob": -0.06986687201580019, "compression_ratio": 1.8333333333333333, "no_speech_prob": 0.0010527888080105186}, {"id": 2651, "seek": 614934, "start": 6169.34, "end": 6171.24, "text": " so now we have to work out", "tokens": [51365, 370, 586, 321, 362, 281, 589, 484, 51460], "temperature": 0.0, "avg_logprob": -0.06986687201580019, "compression_ratio": 1.8333333333333333, "no_speech_prob": 0.0010527888080105186}, {"id": 2652, "seek": 614934, "start": 6171.34, "end": 6173.24, "text": " that expression", "tokens": [51465, 300, 6114, 51560], "temperature": 0.0, "avg_logprob": -0.06986687201580019, "compression_ratio": 1.8333333333333333, "no_speech_prob": 0.0010527888080105186}, {"id": 2653, "seek": 614934, "start": 6173.34, "end": 6175.24, "text": " and let me just reveal the rest of it", "tokens": [51565, 293, 718, 385, 445, 10658, 264, 1472, 295, 309, 51660], "temperature": 0.0, "avg_logprob": -0.06986687201580019, "compression_ratio": 1.8333333333333333, "no_speech_prob": 0.0010527888080105186}, {"id": 2654, "seek": 614934, "start": 6175.34, "end": 6177.24, "text": " simplifying here is not complicated", "tokens": [51665, 6883, 5489, 510, 307, 406, 6179, 51760], "temperature": 0.0, "avg_logprob": -0.06986687201580019, "compression_ratio": 1.8333333333333333, "no_speech_prob": 0.0010527888080105186}, {"id": 2655, "seek": 614934, "start": 6177.34, "end": 6179.24, "text": " the first term", "tokens": [51765, 264, 700, 1433, 51860], "temperature": 0.0, "avg_logprob": -0.06986687201580019, "compression_ratio": 1.8333333333333333, "no_speech_prob": 0.0010527888080105186}, {"id": 2656, "seek": 617924, "start": 6179.24, "end": 6181.139999999999, "text": " and you just get an expression here", "tokens": [50365, 293, 291, 445, 483, 364, 6114, 510, 50460], "temperature": 0.0, "avg_logprob": -0.06262385947072607, "compression_ratio": 1.7510373443983402, "no_speech_prob": 0.000428863859269768}, {"id": 2657, "seek": 617924, "start": 6181.24, "end": 6183.139999999999, "text": " for the second term though", "tokens": [50465, 337, 264, 1150, 1433, 1673, 50560], "temperature": 0.0, "avg_logprob": -0.06262385947072607, "compression_ratio": 1.7510373443983402, "no_speech_prob": 0.000428863859269768}, {"id": 2658, "seek": 617924, "start": 6183.24, "end": 6185.139999999999, "text": " there's something really interesting that happens", "tokens": [50565, 456, 311, 746, 534, 1880, 300, 2314, 50660], "temperature": 0.0, "avg_logprob": -0.06262385947072607, "compression_ratio": 1.7510373443983402, "no_speech_prob": 0.000428863859269768}, {"id": 2659, "seek": 617924, "start": 6185.24, "end": 6187.139999999999, "text": " when we look at d sigma square by d mu", "tokens": [50665, 562, 321, 574, 412, 274, 12771, 3732, 538, 274, 2992, 50760], "temperature": 0.0, "avg_logprob": -0.06262385947072607, "compression_ratio": 1.7510373443983402, "no_speech_prob": 0.000428863859269768}, {"id": 2660, "seek": 617924, "start": 6187.24, "end": 6189.139999999999, "text": " and we simplify", "tokens": [50765, 293, 321, 20460, 50860], "temperature": 0.0, "avg_logprob": -0.06262385947072607, "compression_ratio": 1.7510373443983402, "no_speech_prob": 0.000428863859269768}, {"id": 2661, "seek": 617924, "start": 6189.24, "end": 6191.139999999999, "text": " at one point if we assume", "tokens": [50865, 412, 472, 935, 498, 321, 6552, 50960], "temperature": 0.0, "avg_logprob": -0.06262385947072607, "compression_ratio": 1.7510373443983402, "no_speech_prob": 0.000428863859269768}, {"id": 2662, "seek": 617924, "start": 6191.24, "end": 6193.139999999999, "text": " that in a special case where mu is actually", "tokens": [50965, 300, 294, 257, 2121, 1389, 689, 2992, 307, 767, 51060], "temperature": 0.0, "avg_logprob": -0.06262385947072607, "compression_ratio": 1.7510373443983402, "no_speech_prob": 0.000428863859269768}, {"id": 2663, "seek": 617924, "start": 6193.24, "end": 6195.139999999999, "text": " the average of xi's", "tokens": [51065, 264, 4274, 295, 36800, 311, 51160], "temperature": 0.0, "avg_logprob": -0.06262385947072607, "compression_ratio": 1.7510373443983402, "no_speech_prob": 0.000428863859269768}, {"id": 2664, "seek": 617924, "start": 6195.24, "end": 6197.139999999999, "text": " as it is in this case", "tokens": [51165, 382, 309, 307, 294, 341, 1389, 51260], "temperature": 0.0, "avg_logprob": -0.06262385947072607, "compression_ratio": 1.7510373443983402, "no_speech_prob": 0.000428863859269768}, {"id": 2665, "seek": 617924, "start": 6197.24, "end": 6199.139999999999, "text": " then if we plug that in", "tokens": [51265, 550, 498, 321, 5452, 300, 294, 51360], "temperature": 0.0, "avg_logprob": -0.06262385947072607, "compression_ratio": 1.7510373443983402, "no_speech_prob": 0.000428863859269768}, {"id": 2666, "seek": 617924, "start": 6199.24, "end": 6201.139999999999, "text": " then actually the gradient vanishes", "tokens": [51365, 550, 767, 264, 16235, 3161, 16423, 51460], "temperature": 0.0, "avg_logprob": -0.06262385947072607, "compression_ratio": 1.7510373443983402, "no_speech_prob": 0.000428863859269768}, {"id": 2667, "seek": 617924, "start": 6201.24, "end": 6203.139999999999, "text": " and becomes exactly zero", "tokens": [51465, 293, 3643, 2293, 4018, 51560], "temperature": 0.0, "avg_logprob": -0.06262385947072607, "compression_ratio": 1.7510373443983402, "no_speech_prob": 0.000428863859269768}, {"id": 2668, "seek": 617924, "start": 6203.24, "end": 6205.139999999999, "text": " and that makes the entire second term cancel", "tokens": [51565, 293, 300, 1669, 264, 2302, 1150, 1433, 10373, 51660], "temperature": 0.0, "avg_logprob": -0.06262385947072607, "compression_ratio": 1.7510373443983402, "no_speech_prob": 0.000428863859269768}, {"id": 2669, "seek": 617924, "start": 6205.24, "end": 6207.139999999999, "text": " and so", "tokens": [51665, 293, 370, 51760], "temperature": 0.0, "avg_logprob": -0.06262385947072607, "compression_ratio": 1.7510373443983402, "no_speech_prob": 0.000428863859269768}, {"id": 2670, "seek": 617924, "start": 6207.24, "end": 6209.139999999999, "text": " these", "tokens": [51765, 613, 51860], "temperature": 0.0, "avg_logprob": -0.06262385947072607, "compression_ratio": 1.7510373443983402, "no_speech_prob": 0.000428863859269768}, {"id": 2671, "seek": 620914, "start": 6209.14, "end": 6211.04, "text": " if you have a mathematical expression like this", "tokens": [50365, 498, 291, 362, 257, 18894, 6114, 411, 341, 50460], "temperature": 0.0, "avg_logprob": -0.08217738172133192, "compression_ratio": 1.8368055555555556, "no_speech_prob": 0.0009338280069641769}, {"id": 2672, "seek": 620914, "start": 6211.14, "end": 6213.04, "text": " and you look at d sigma square by d mu", "tokens": [50465, 293, 291, 574, 412, 274, 12771, 3732, 538, 274, 2992, 50560], "temperature": 0.0, "avg_logprob": -0.08217738172133192, "compression_ratio": 1.8368055555555556, "no_speech_prob": 0.0009338280069641769}, {"id": 2673, "seek": 620914, "start": 6213.14, "end": 6215.04, "text": " you would get some mathematical formula", "tokens": [50565, 291, 576, 483, 512, 18894, 8513, 50660], "temperature": 0.0, "avg_logprob": -0.08217738172133192, "compression_ratio": 1.8368055555555556, "no_speech_prob": 0.0009338280069641769}, {"id": 2674, "seek": 620914, "start": 6215.14, "end": 6217.04, "text": " for how mu impacts sigma square", "tokens": [50665, 337, 577, 2992, 11606, 12771, 3732, 50760], "temperature": 0.0, "avg_logprob": -0.08217738172133192, "compression_ratio": 1.8368055555555556, "no_speech_prob": 0.0009338280069641769}, {"id": 2675, "seek": 620914, "start": 6217.14, "end": 6219.04, "text": " but if it is the special case", "tokens": [50765, 457, 498, 309, 307, 264, 2121, 1389, 50860], "temperature": 0.0, "avg_logprob": -0.08217738172133192, "compression_ratio": 1.8368055555555556, "no_speech_prob": 0.0009338280069641769}, {"id": 2676, "seek": 620914, "start": 6219.14, "end": 6221.04, "text": " that mu is actually equal to the average", "tokens": [50865, 300, 2992, 307, 767, 2681, 281, 264, 4274, 50960], "temperature": 0.0, "avg_logprob": -0.08217738172133192, "compression_ratio": 1.8368055555555556, "no_speech_prob": 0.0009338280069641769}, {"id": 2677, "seek": 620914, "start": 6221.14, "end": 6223.04, "text": " as it is in the case of batch normalization", "tokens": [50965, 382, 309, 307, 294, 264, 1389, 295, 15245, 2710, 2144, 51060], "temperature": 0.0, "avg_logprob": -0.08217738172133192, "compression_ratio": 1.8368055555555556, "no_speech_prob": 0.0009338280069641769}, {"id": 2678, "seek": 620914, "start": 6223.14, "end": 6225.04, "text": " that gradient will actually vanish", "tokens": [51065, 300, 16235, 486, 767, 43584, 51160], "temperature": 0.0, "avg_logprob": -0.08217738172133192, "compression_ratio": 1.8368055555555556, "no_speech_prob": 0.0009338280069641769}, {"id": 2679, "seek": 620914, "start": 6225.14, "end": 6227.04, "text": " and become zero", "tokens": [51165, 293, 1813, 4018, 51260], "temperature": 0.0, "avg_logprob": -0.08217738172133192, "compression_ratio": 1.8368055555555556, "no_speech_prob": 0.0009338280069641769}, {"id": 2680, "seek": 620914, "start": 6227.14, "end": 6229.04, "text": " so the whole term cancels", "tokens": [51265, 370, 264, 1379, 1433, 393, 66, 1625, 51360], "temperature": 0.0, "avg_logprob": -0.08217738172133192, "compression_ratio": 1.8368055555555556, "no_speech_prob": 0.0009338280069641769}, {"id": 2681, "seek": 620914, "start": 6229.14, "end": 6231.04, "text": " and we just get a fairly straightforward expression here", "tokens": [51365, 293, 321, 445, 483, 257, 6457, 15325, 6114, 510, 51460], "temperature": 0.0, "avg_logprob": -0.08217738172133192, "compression_ratio": 1.8368055555555556, "no_speech_prob": 0.0009338280069641769}, {"id": 2682, "seek": 620914, "start": 6231.14, "end": 6233.04, "text": " for dl by d mu", "tokens": [51465, 337, 37873, 538, 274, 2992, 51560], "temperature": 0.0, "avg_logprob": -0.08217738172133192, "compression_ratio": 1.8368055555555556, "no_speech_prob": 0.0009338280069641769}, {"id": 2683, "seek": 620914, "start": 6233.14, "end": 6235.04, "text": " okay and now we get to the craziest part", "tokens": [51565, 1392, 293, 586, 321, 483, 281, 264, 46339, 644, 51660], "temperature": 0.0, "avg_logprob": -0.08217738172133192, "compression_ratio": 1.8368055555555556, "no_speech_prob": 0.0009338280069641769}, {"id": 2684, "seek": 620914, "start": 6235.14, "end": 6237.04, "text": " which is deriving dl by d xi", "tokens": [51665, 597, 307, 1163, 2123, 37873, 538, 274, 36800, 51760], "temperature": 0.0, "avg_logprob": -0.08217738172133192, "compression_ratio": 1.8368055555555556, "no_speech_prob": 0.0009338280069641769}, {"id": 2685, "seek": 620914, "start": 6237.14, "end": 6239.04, "text": " which is ultimately what we're after", "tokens": [51765, 597, 307, 6284, 437, 321, 434, 934, 51860], "temperature": 0.0, "avg_logprob": -0.08217738172133192, "compression_ratio": 1.8368055555555556, "no_speech_prob": 0.0009338280069641769}, {"id": 2686, "seek": 623914, "start": 6239.14, "end": 6241.04, "text": " now let's count", "tokens": [50365, 586, 718, 311, 1207, 50460], "temperature": 0.0, "avg_logprob": -0.05554607831514799, "compression_ratio": 1.9504504504504505, "no_speech_prob": 0.0012673549354076385}, {"id": 2687, "seek": 623914, "start": 6241.14, "end": 6243.04, "text": " first of all how many numbers are there inside x", "tokens": [50465, 700, 295, 439, 577, 867, 3547, 366, 456, 1854, 2031, 50560], "temperature": 0.0, "avg_logprob": -0.05554607831514799, "compression_ratio": 1.9504504504504505, "no_speech_prob": 0.0012673549354076385}, {"id": 2688, "seek": 623914, "start": 6243.14, "end": 6245.04, "text": " as I mentioned there are 32 numbers", "tokens": [50565, 382, 286, 2835, 456, 366, 8858, 3547, 50660], "temperature": 0.0, "avg_logprob": -0.05554607831514799, "compression_ratio": 1.9504504504504505, "no_speech_prob": 0.0012673549354076385}, {"id": 2689, "seek": 623914, "start": 6245.14, "end": 6247.04, "text": " there are 32 little xi's", "tokens": [50665, 456, 366, 8858, 707, 36800, 311, 50760], "temperature": 0.0, "avg_logprob": -0.05554607831514799, "compression_ratio": 1.9504504504504505, "no_speech_prob": 0.0012673549354076385}, {"id": 2690, "seek": 623914, "start": 6247.14, "end": 6249.04, "text": " and let's count the number of arrows", "tokens": [50765, 293, 718, 311, 1207, 264, 1230, 295, 19669, 50860], "temperature": 0.0, "avg_logprob": -0.05554607831514799, "compression_ratio": 1.9504504504504505, "no_speech_prob": 0.0012673549354076385}, {"id": 2691, "seek": 623914, "start": 6249.14, "end": 6251.04, "text": " emanating from each xi", "tokens": [50865, 28211, 990, 490, 1184, 36800, 50960], "temperature": 0.0, "avg_logprob": -0.05554607831514799, "compression_ratio": 1.9504504504504505, "no_speech_prob": 0.0012673549354076385}, {"id": 2692, "seek": 623914, "start": 6251.14, "end": 6253.04, "text": " there's an arrow going to mu", "tokens": [50965, 456, 311, 364, 11610, 516, 281, 2992, 51060], "temperature": 0.0, "avg_logprob": -0.05554607831514799, "compression_ratio": 1.9504504504504505, "no_speech_prob": 0.0012673549354076385}, {"id": 2693, "seek": 623914, "start": 6253.14, "end": 6255.04, "text": " an arrow going to sigma square", "tokens": [51065, 364, 11610, 516, 281, 12771, 3732, 51160], "temperature": 0.0, "avg_logprob": -0.05554607831514799, "compression_ratio": 1.9504504504504505, "no_speech_prob": 0.0012673549354076385}, {"id": 2694, "seek": 623914, "start": 6255.14, "end": 6257.04, "text": " and then there's an arrow going to x hat", "tokens": [51165, 293, 550, 456, 311, 364, 11610, 516, 281, 2031, 2385, 51260], "temperature": 0.0, "avg_logprob": -0.05554607831514799, "compression_ratio": 1.9504504504504505, "no_speech_prob": 0.0012673549354076385}, {"id": 2695, "seek": 623914, "start": 6257.14, "end": 6259.04, "text": " but this arrow here", "tokens": [51265, 457, 341, 11610, 510, 51360], "temperature": 0.0, "avg_logprob": -0.05554607831514799, "compression_ratio": 1.9504504504504505, "no_speech_prob": 0.0012673549354076385}, {"id": 2696, "seek": 623914, "start": 6259.14, "end": 6261.04, "text": " let's scrutinize that a little bit", "tokens": [51365, 718, 311, 28949, 259, 1125, 300, 257, 707, 857, 51460], "temperature": 0.0, "avg_logprob": -0.05554607831514799, "compression_ratio": 1.9504504504504505, "no_speech_prob": 0.0012673549354076385}, {"id": 2697, "seek": 623914, "start": 6261.14, "end": 6263.04, "text": " each xi hat is just a function of xi", "tokens": [51465, 1184, 36800, 2385, 307, 445, 257, 2445, 295, 36800, 51560], "temperature": 0.0, "avg_logprob": -0.05554607831514799, "compression_ratio": 1.9504504504504505, "no_speech_prob": 0.0012673549354076385}, {"id": 2698, "seek": 623914, "start": 6263.14, "end": 6265.04, "text": " and all the other scalars", "tokens": [51565, 293, 439, 264, 661, 15664, 685, 51660], "temperature": 0.0, "avg_logprob": -0.05554607831514799, "compression_ratio": 1.9504504504504505, "no_speech_prob": 0.0012673549354076385}, {"id": 2699, "seek": 623914, "start": 6265.14, "end": 6267.04, "text": " so xi hat", "tokens": [51665, 370, 36800, 2385, 51760], "temperature": 0.0, "avg_logprob": -0.05554607831514799, "compression_ratio": 1.9504504504504505, "no_speech_prob": 0.0012673549354076385}, {"id": 2700, "seek": 623914, "start": 6267.14, "end": 6269.04, "text": " only depends on xi", "tokens": [51765, 787, 5946, 322, 36800, 51860], "temperature": 0.0, "avg_logprob": -0.05554607831514799, "compression_ratio": 1.9504504504504505, "no_speech_prob": 0.0012673549354076385}, {"id": 2701, "seek": 626904, "start": 6269.04, "end": 6270.94, "text": " and all the other x's", "tokens": [50365, 293, 439, 264, 661, 2031, 311, 50460], "temperature": 0.0, "avg_logprob": -0.07656387298826188, "compression_ratio": 1.8983739837398375, "no_speech_prob": 0.0007799116428941488}, {"id": 2702, "seek": 626904, "start": 6271.04, "end": 6272.94, "text": " and so therefore", "tokens": [50465, 293, 370, 4412, 50560], "temperature": 0.0, "avg_logprob": -0.07656387298826188, "compression_ratio": 1.8983739837398375, "no_speech_prob": 0.0007799116428941488}, {"id": 2703, "seek": 626904, "start": 6273.04, "end": 6274.94, "text": " there are actually in this single arrow", "tokens": [50565, 456, 366, 767, 294, 341, 2167, 11610, 50660], "temperature": 0.0, "avg_logprob": -0.07656387298826188, "compression_ratio": 1.8983739837398375, "no_speech_prob": 0.0007799116428941488}, {"id": 2704, "seek": 626904, "start": 6275.04, "end": 6276.94, "text": " there are 32 arrows", "tokens": [50665, 456, 366, 8858, 19669, 50760], "temperature": 0.0, "avg_logprob": -0.07656387298826188, "compression_ratio": 1.8983739837398375, "no_speech_prob": 0.0007799116428941488}, {"id": 2705, "seek": 626904, "start": 6277.04, "end": 6278.94, "text": " but those 32 arrows are going exactly parallel", "tokens": [50765, 457, 729, 8858, 19669, 366, 516, 2293, 8952, 50860], "temperature": 0.0, "avg_logprob": -0.07656387298826188, "compression_ratio": 1.8983739837398375, "no_speech_prob": 0.0007799116428941488}, {"id": 2706, "seek": 626904, "start": 6279.04, "end": 6280.94, "text": " they don't interfere", "tokens": [50865, 436, 500, 380, 23946, 50960], "temperature": 0.0, "avg_logprob": -0.07656387298826188, "compression_ratio": 1.8983739837398375, "no_speech_prob": 0.0007799116428941488}, {"id": 2707, "seek": 626904, "start": 6281.04, "end": 6282.94, "text": " they're just going parallel between x and x hat", "tokens": [50965, 436, 434, 445, 516, 8952, 1296, 2031, 293, 2031, 2385, 51060], "temperature": 0.0, "avg_logprob": -0.07656387298826188, "compression_ratio": 1.8983739837398375, "no_speech_prob": 0.0007799116428941488}, {"id": 2708, "seek": 626904, "start": 6283.04, "end": 6284.94, "text": " you can look at it that way", "tokens": [51065, 291, 393, 574, 412, 309, 300, 636, 51160], "temperature": 0.0, "avg_logprob": -0.07656387298826188, "compression_ratio": 1.8983739837398375, "no_speech_prob": 0.0007799116428941488}, {"id": 2709, "seek": 626904, "start": 6285.04, "end": 6286.94, "text": " and so how many arrows are emanating from each xi", "tokens": [51165, 293, 370, 577, 867, 19669, 366, 28211, 990, 490, 1184, 36800, 51260], "temperature": 0.0, "avg_logprob": -0.07656387298826188, "compression_ratio": 1.8983739837398375, "no_speech_prob": 0.0007799116428941488}, {"id": 2710, "seek": 626904, "start": 6287.04, "end": 6288.94, "text": " there are three arrows", "tokens": [51265, 456, 366, 1045, 19669, 51360], "temperature": 0.0, "avg_logprob": -0.07656387298826188, "compression_ratio": 1.8983739837398375, "no_speech_prob": 0.0007799116428941488}, {"id": 2711, "seek": 626904, "start": 6289.04, "end": 6290.94, "text": " mu sigma square", "tokens": [51365, 2992, 12771, 3732, 51460], "temperature": 0.0, "avg_logprob": -0.07656387298826188, "compression_ratio": 1.8983739837398375, "no_speech_prob": 0.0007799116428941488}, {"id": 2712, "seek": 626904, "start": 6291.04, "end": 6292.94, "text": " and the associated x hat", "tokens": [51465, 293, 264, 6615, 2031, 2385, 51560], "temperature": 0.0, "avg_logprob": -0.07656387298826188, "compression_ratio": 1.8983739837398375, "no_speech_prob": 0.0007799116428941488}, {"id": 2713, "seek": 626904, "start": 6293.04, "end": 6294.94, "text": " and so in back propagation", "tokens": [51565, 293, 370, 294, 646, 38377, 51660], "temperature": 0.0, "avg_logprob": -0.07656387298826188, "compression_ratio": 1.8983739837398375, "no_speech_prob": 0.0007799116428941488}, {"id": 2714, "seek": 626904, "start": 6295.04, "end": 6296.94, "text": " we now need to apply the chain rule", "tokens": [51665, 321, 586, 643, 281, 3079, 264, 5021, 4978, 51760], "temperature": 0.0, "avg_logprob": -0.07656387298826188, "compression_ratio": 1.8983739837398375, "no_speech_prob": 0.0007799116428941488}, {"id": 2715, "seek": 626904, "start": 6297.04, "end": 6298.94, "text": " and we need to add up those three contributions", "tokens": [51765, 293, 321, 643, 281, 909, 493, 729, 1045, 15725, 51860], "temperature": 0.0, "avg_logprob": -0.07656387298826188, "compression_ratio": 1.8983739837398375, "no_speech_prob": 0.0007799116428941488}, {"id": 2716, "seek": 629894, "start": 6298.94, "end": 6300.839999999999, "text": " like if I just write that out", "tokens": [50365, 411, 498, 286, 445, 2464, 300, 484, 50460], "temperature": 0.0, "avg_logprob": -0.0762504653930664, "compression_ratio": 2.0603015075376883, "no_speech_prob": 0.000985703314654529}, {"id": 2717, "seek": 629894, "start": 6300.94, "end": 6302.839999999999, "text": " we have", "tokens": [50465, 321, 362, 50560], "temperature": 0.0, "avg_logprob": -0.0762504653930664, "compression_ratio": 2.0603015075376883, "no_speech_prob": 0.000985703314654529}, {"id": 2718, "seek": 629894, "start": 6302.94, "end": 6304.839999999999, "text": " we're going through", "tokens": [50565, 321, 434, 516, 807, 50660], "temperature": 0.0, "avg_logprob": -0.0762504653930664, "compression_ratio": 2.0603015075376883, "no_speech_prob": 0.000985703314654529}, {"id": 2719, "seek": 629894, "start": 6304.94, "end": 6306.839999999999, "text": " we're chaining through mu sigma square", "tokens": [50665, 321, 434, 417, 3686, 807, 2992, 12771, 3732, 50760], "temperature": 0.0, "avg_logprob": -0.0762504653930664, "compression_ratio": 2.0603015075376883, "no_speech_prob": 0.000985703314654529}, {"id": 2720, "seek": 629894, "start": 6306.94, "end": 6308.839999999999, "text": " and through x hat", "tokens": [50765, 293, 807, 2031, 2385, 50860], "temperature": 0.0, "avg_logprob": -0.0762504653930664, "compression_ratio": 2.0603015075376883, "no_speech_prob": 0.000985703314654529}, {"id": 2721, "seek": 629894, "start": 6308.94, "end": 6310.839999999999, "text": " and those three terms are just here", "tokens": [50865, 293, 729, 1045, 2115, 366, 445, 510, 50960], "temperature": 0.0, "avg_logprob": -0.0762504653930664, "compression_ratio": 2.0603015075376883, "no_speech_prob": 0.000985703314654529}, {"id": 2722, "seek": 629894, "start": 6310.94, "end": 6312.839999999999, "text": " now we already have three of these", "tokens": [50965, 586, 321, 1217, 362, 1045, 295, 613, 51060], "temperature": 0.0, "avg_logprob": -0.0762504653930664, "compression_ratio": 2.0603015075376883, "no_speech_prob": 0.000985703314654529}, {"id": 2723, "seek": 629894, "start": 6312.94, "end": 6314.839999999999, "text": " we have dl by d xi hat", "tokens": [51065, 321, 362, 37873, 538, 274, 36800, 2385, 51160], "temperature": 0.0, "avg_logprob": -0.0762504653930664, "compression_ratio": 2.0603015075376883, "no_speech_prob": 0.000985703314654529}, {"id": 2724, "seek": 629894, "start": 6314.94, "end": 6316.839999999999, "text": " we have dl by d mu", "tokens": [51165, 321, 362, 37873, 538, 274, 2992, 51260], "temperature": 0.0, "avg_logprob": -0.0762504653930664, "compression_ratio": 2.0603015075376883, "no_speech_prob": 0.000985703314654529}, {"id": 2725, "seek": 629894, "start": 6316.94, "end": 6318.839999999999, "text": " which we derived here", "tokens": [51265, 597, 321, 18949, 510, 51360], "temperature": 0.0, "avg_logprob": -0.0762504653930664, "compression_ratio": 2.0603015075376883, "no_speech_prob": 0.000985703314654529}, {"id": 2726, "seek": 629894, "start": 6318.94, "end": 6320.839999999999, "text": " and we have dl by d sigma square", "tokens": [51365, 293, 321, 362, 37873, 538, 274, 12771, 3732, 51460], "temperature": 0.0, "avg_logprob": -0.0762504653930664, "compression_ratio": 2.0603015075376883, "no_speech_prob": 0.000985703314654529}, {"id": 2727, "seek": 629894, "start": 6320.94, "end": 6322.839999999999, "text": " which we derived here", "tokens": [51465, 597, 321, 18949, 510, 51560], "temperature": 0.0, "avg_logprob": -0.0762504653930664, "compression_ratio": 2.0603015075376883, "no_speech_prob": 0.000985703314654529}, {"id": 2728, "seek": 629894, "start": 6322.94, "end": 6324.839999999999, "text": " but we need three other terms here", "tokens": [51565, 457, 321, 643, 1045, 661, 2115, 510, 51660], "temperature": 0.0, "avg_logprob": -0.0762504653930664, "compression_ratio": 2.0603015075376883, "no_speech_prob": 0.000985703314654529}, {"id": 2729, "seek": 629894, "start": 6324.94, "end": 6326.839999999999, "text": " this one, this one, and this one", "tokens": [51665, 341, 472, 11, 341, 472, 11, 293, 341, 472, 51760], "temperature": 0.0, "avg_logprob": -0.0762504653930664, "compression_ratio": 2.0603015075376883, "no_speech_prob": 0.000985703314654529}, {"id": 2730, "seek": 629894, "start": 6326.94, "end": 6328.839999999999, "text": " so I invite you to try to derive them", "tokens": [51765, 370, 286, 7980, 291, 281, 853, 281, 28446, 552, 51860], "temperature": 0.0, "avg_logprob": -0.0762504653930664, "compression_ratio": 2.0603015075376883, "no_speech_prob": 0.000985703314654529}, {"id": 2731, "seek": 632884, "start": 6328.84, "end": 6330.74, "text": " if you find it complicated", "tokens": [50365, 498, 291, 915, 309, 6179, 50460], "temperature": 0.0, "avg_logprob": -0.1132346777091349, "compression_ratio": 1.9318181818181819, "no_speech_prob": 0.001460390747524798}, {"id": 2732, "seek": 632884, "start": 6330.84, "end": 6332.74, "text": " you're just looking at these expressions here", "tokens": [50465, 291, 434, 445, 1237, 412, 613, 15277, 510, 50560], "temperature": 0.0, "avg_logprob": -0.1132346777091349, "compression_ratio": 1.9318181818181819, "no_speech_prob": 0.001460390747524798}, {"id": 2733, "seek": 632884, "start": 6332.84, "end": 6334.74, "text": " and differentiating with respect to xi", "tokens": [50565, 293, 27372, 990, 365, 3104, 281, 36800, 50660], "temperature": 0.0, "avg_logprob": -0.1132346777091349, "compression_ratio": 1.9318181818181819, "no_speech_prob": 0.001460390747524798}, {"id": 2734, "seek": 632884, "start": 6334.84, "end": 6336.74, "text": " so give it a shot", "tokens": [50665, 370, 976, 309, 257, 3347, 50760], "temperature": 0.0, "avg_logprob": -0.1132346777091349, "compression_ratio": 1.9318181818181819, "no_speech_prob": 0.001460390747524798}, {"id": 2735, "seek": 632884, "start": 6336.84, "end": 6338.74, "text": " but here's the result", "tokens": [50765, 457, 510, 311, 264, 1874, 50860], "temperature": 0.0, "avg_logprob": -0.1132346777091349, "compression_ratio": 1.9318181818181819, "no_speech_prob": 0.001460390747524798}, {"id": 2736, "seek": 632884, "start": 6338.84, "end": 6340.74, "text": " or at least what I got", "tokens": [50865, 420, 412, 1935, 437, 286, 658, 50960], "temperature": 0.0, "avg_logprob": -0.1132346777091349, "compression_ratio": 1.9318181818181819, "no_speech_prob": 0.001460390747524798}, {"id": 2737, "seek": 632884, "start": 6340.84, "end": 6342.74, "text": " I'm just differentiating with respect to xi", "tokens": [50965, 286, 478, 445, 27372, 990, 365, 3104, 281, 36800, 51060], "temperature": 0.0, "avg_logprob": -0.1132346777091349, "compression_ratio": 1.9318181818181819, "no_speech_prob": 0.001460390747524798}, {"id": 2738, "seek": 632884, "start": 6342.84, "end": 6344.74, "text": " for all of these expressions", "tokens": [51065, 337, 439, 295, 613, 15277, 51160], "temperature": 0.0, "avg_logprob": -0.1132346777091349, "compression_ratio": 1.9318181818181819, "no_speech_prob": 0.001460390747524798}, {"id": 2739, "seek": 632884, "start": 6344.84, "end": 6346.74, "text": " and honestly I don't think there's anything too tricky here", "tokens": [51165, 293, 6095, 286, 500, 380, 519, 456, 311, 1340, 886, 12414, 510, 51260], "temperature": 0.0, "avg_logprob": -0.1132346777091349, "compression_ratio": 1.9318181818181819, "no_speech_prob": 0.001460390747524798}, {"id": 2740, "seek": 632884, "start": 6346.84, "end": 6348.74, "text": " it's basic calculus", "tokens": [51265, 309, 311, 3875, 33400, 51360], "temperature": 0.0, "avg_logprob": -0.1132346777091349, "compression_ratio": 1.9318181818181819, "no_speech_prob": 0.001460390747524798}, {"id": 2741, "seek": 632884, "start": 6348.84, "end": 6350.74, "text": " now what gets a little bit more tricky", "tokens": [51365, 586, 437, 2170, 257, 707, 857, 544, 12414, 51460], "temperature": 0.0, "avg_logprob": -0.1132346777091349, "compression_ratio": 1.9318181818181819, "no_speech_prob": 0.001460390747524798}, {"id": 2742, "seek": 632884, "start": 6350.84, "end": 6352.74, "text": " is we are now going to plug everything together", "tokens": [51465, 307, 321, 366, 586, 516, 281, 5452, 1203, 1214, 51560], "temperature": 0.0, "avg_logprob": -0.1132346777091349, "compression_ratio": 1.9318181818181819, "no_speech_prob": 0.001460390747524798}, {"id": 2743, "seek": 632884, "start": 6352.84, "end": 6354.74, "text": " so all of these terms", "tokens": [51565, 370, 439, 295, 613, 2115, 51660], "temperature": 0.0, "avg_logprob": -0.1132346777091349, "compression_ratio": 1.9318181818181819, "no_speech_prob": 0.001460390747524798}, {"id": 2744, "seek": 632884, "start": 6354.84, "end": 6356.74, "text": " multiplied with all of these terms", "tokens": [51665, 17207, 365, 439, 295, 613, 2115, 51760], "temperature": 0.0, "avg_logprob": -0.1132346777091349, "compression_ratio": 1.9318181818181819, "no_speech_prob": 0.001460390747524798}, {"id": 2745, "seek": 632884, "start": 6356.84, "end": 6358.74, "text": " and added up according to this formula", "tokens": [51765, 293, 3869, 493, 4650, 281, 341, 8513, 51860], "temperature": 0.0, "avg_logprob": -0.1132346777091349, "compression_ratio": 1.9318181818181819, "no_speech_prob": 0.001460390747524798}, {"id": 2746, "seek": 635874, "start": 6358.74, "end": 6360.639999999999, "text": " and that gets a little bit hairy", "tokens": [50365, 293, 300, 2170, 257, 707, 857, 42346, 50460], "temperature": 0.0, "avg_logprob": -0.05880859978178628, "compression_ratio": 1.8073394495412844, "no_speech_prob": 0.001519190613180399}, {"id": 2747, "seek": 635874, "start": 6360.74, "end": 6362.639999999999, "text": " so what ends up happening is", "tokens": [50465, 370, 437, 5314, 493, 2737, 307, 50560], "temperature": 0.0, "avg_logprob": -0.05880859978178628, "compression_ratio": 1.8073394495412844, "no_speech_prob": 0.001519190613180399}, {"id": 2748, "seek": 635874, "start": 6364.74, "end": 6366.639999999999, "text": " you get a large expression", "tokens": [50665, 291, 483, 257, 2416, 6114, 50760], "temperature": 0.0, "avg_logprob": -0.05880859978178628, "compression_ratio": 1.8073394495412844, "no_speech_prob": 0.001519190613180399}, {"id": 2749, "seek": 635874, "start": 6366.74, "end": 6368.639999999999, "text": " and the thing to be very careful with here", "tokens": [50765, 293, 264, 551, 281, 312, 588, 5026, 365, 510, 50860], "temperature": 0.0, "avg_logprob": -0.05880859978178628, "compression_ratio": 1.8073394495412844, "no_speech_prob": 0.001519190613180399}, {"id": 2750, "seek": 635874, "start": 6368.74, "end": 6370.639999999999, "text": " of course is", "tokens": [50865, 295, 1164, 307, 50960], "temperature": 0.0, "avg_logprob": -0.05880859978178628, "compression_ratio": 1.8073394495412844, "no_speech_prob": 0.001519190613180399}, {"id": 2751, "seek": 635874, "start": 6370.74, "end": 6372.639999999999, "text": " we are working with a dl by d xi", "tokens": [50965, 321, 366, 1364, 365, 257, 37873, 538, 274, 36800, 51060], "temperature": 0.0, "avg_logprob": -0.05880859978178628, "compression_ratio": 1.8073394495412844, "no_speech_prob": 0.001519190613180399}, {"id": 2752, "seek": 635874, "start": 6372.74, "end": 6374.639999999999, "text": " for a specific i here", "tokens": [51065, 337, 257, 2685, 741, 510, 51160], "temperature": 0.0, "avg_logprob": -0.05880859978178628, "compression_ratio": 1.8073394495412844, "no_speech_prob": 0.001519190613180399}, {"id": 2753, "seek": 635874, "start": 6374.74, "end": 6376.639999999999, "text": " but when we are plugging in some of these terms", "tokens": [51165, 457, 562, 321, 366, 42975, 294, 512, 295, 613, 2115, 51260], "temperature": 0.0, "avg_logprob": -0.05880859978178628, "compression_ratio": 1.8073394495412844, "no_speech_prob": 0.001519190613180399}, {"id": 2754, "seek": 635874, "start": 6376.74, "end": 6378.639999999999, "text": " like say", "tokens": [51265, 411, 584, 51360], "temperature": 0.0, "avg_logprob": -0.05880859978178628, "compression_ratio": 1.8073394495412844, "no_speech_prob": 0.001519190613180399}, {"id": 2755, "seek": 635874, "start": 6378.74, "end": 6380.639999999999, "text": " this term here", "tokens": [51365, 341, 1433, 510, 51460], "temperature": 0.0, "avg_logprob": -0.05880859978178628, "compression_ratio": 1.8073394495412844, "no_speech_prob": 0.001519190613180399}, {"id": 2756, "seek": 635874, "start": 6380.74, "end": 6382.639999999999, "text": " dl by d sigma squared", "tokens": [51465, 37873, 538, 274, 12771, 8889, 51560], "temperature": 0.0, "avg_logprob": -0.05880859978178628, "compression_ratio": 1.8073394495412844, "no_speech_prob": 0.001519190613180399}, {"id": 2757, "seek": 635874, "start": 6382.74, "end": 6384.639999999999, "text": " you see how dl by d sigma squared", "tokens": [51565, 291, 536, 577, 37873, 538, 274, 12771, 8889, 51660], "temperature": 0.0, "avg_logprob": -0.05880859978178628, "compression_ratio": 1.8073394495412844, "no_speech_prob": 0.001519190613180399}, {"id": 2758, "seek": 635874, "start": 6384.74, "end": 6386.639999999999, "text": " I end up with an expression", "tokens": [51665, 286, 917, 493, 365, 364, 6114, 51760], "temperature": 0.0, "avg_logprob": -0.05880859978178628, "compression_ratio": 1.8073394495412844, "no_speech_prob": 0.001519190613180399}, {"id": 2759, "seek": 635874, "start": 6386.74, "end": 6388.639999999999, "text": " and I'm iterating over little i's here", "tokens": [51765, 293, 286, 478, 17138, 990, 670, 707, 741, 311, 510, 51860], "temperature": 0.0, "avg_logprob": -0.05880859978178628, "compression_ratio": 1.8073394495412844, "no_speech_prob": 0.001519190613180399}, {"id": 2760, "seek": 638874, "start": 6388.74, "end": 6390.639999999999, "text": " but I can't use i as the variable", "tokens": [50365, 457, 286, 393, 380, 764, 741, 382, 264, 7006, 50460], "temperature": 0.0, "avg_logprob": -0.049490770800360316, "compression_ratio": 1.8828125, "no_speech_prob": 0.0007292750524356961}, {"id": 2761, "seek": 638874, "start": 6390.74, "end": 6392.639999999999, "text": " when I plug in here", "tokens": [50465, 562, 286, 5452, 294, 510, 50560], "temperature": 0.0, "avg_logprob": -0.049490770800360316, "compression_ratio": 1.8828125, "no_speech_prob": 0.0007292750524356961}, {"id": 2762, "seek": 638874, "start": 6392.74, "end": 6394.639999999999, "text": " because this is a different i from this i", "tokens": [50565, 570, 341, 307, 257, 819, 741, 490, 341, 741, 50660], "temperature": 0.0, "avg_logprob": -0.049490770800360316, "compression_ratio": 1.8828125, "no_speech_prob": 0.0007292750524356961}, {"id": 2763, "seek": 638874, "start": 6394.74, "end": 6396.639999999999, "text": " this i here is just a placeholder", "tokens": [50665, 341, 741, 510, 307, 445, 257, 1081, 20480, 50760], "temperature": 0.0, "avg_logprob": -0.049490770800360316, "compression_ratio": 1.8828125, "no_speech_prob": 0.0007292750524356961}, {"id": 2764, "seek": 638874, "start": 6396.74, "end": 6398.639999999999, "text": " like a local variable for a for loop", "tokens": [50765, 411, 257, 2654, 7006, 337, 257, 337, 6367, 50860], "temperature": 0.0, "avg_logprob": -0.049490770800360316, "compression_ratio": 1.8828125, "no_speech_prob": 0.0007292750524356961}, {"id": 2765, "seek": 638874, "start": 6398.74, "end": 6400.639999999999, "text": " in here", "tokens": [50865, 294, 510, 50960], "temperature": 0.0, "avg_logprob": -0.049490770800360316, "compression_ratio": 1.8828125, "no_speech_prob": 0.0007292750524356961}, {"id": 2766, "seek": 638874, "start": 6400.74, "end": 6402.639999999999, "text": " so here when I plug that in", "tokens": [50965, 370, 510, 562, 286, 5452, 300, 294, 51060], "temperature": 0.0, "avg_logprob": -0.049490770800360316, "compression_ratio": 1.8828125, "no_speech_prob": 0.0007292750524356961}, {"id": 2767, "seek": 638874, "start": 6402.74, "end": 6404.639999999999, "text": " you notice that I rename the i to a j", "tokens": [51065, 291, 3449, 300, 286, 36741, 264, 741, 281, 257, 361, 51160], "temperature": 0.0, "avg_logprob": -0.049490770800360316, "compression_ratio": 1.8828125, "no_speech_prob": 0.0007292750524356961}, {"id": 2768, "seek": 638874, "start": 6404.74, "end": 6406.639999999999, "text": " because I need to make sure that this j", "tokens": [51165, 570, 286, 643, 281, 652, 988, 300, 341, 361, 51260], "temperature": 0.0, "avg_logprob": -0.049490770800360316, "compression_ratio": 1.8828125, "no_speech_prob": 0.0007292750524356961}, {"id": 2769, "seek": 638874, "start": 6406.74, "end": 6408.639999999999, "text": " is not this i", "tokens": [51265, 307, 406, 341, 741, 51360], "temperature": 0.0, "avg_logprob": -0.049490770800360316, "compression_ratio": 1.8828125, "no_speech_prob": 0.0007292750524356961}, {"id": 2770, "seek": 638874, "start": 6408.74, "end": 6410.639999999999, "text": " this j is like a little local iterator", "tokens": [51365, 341, 361, 307, 411, 257, 707, 2654, 17138, 1639, 51460], "temperature": 0.0, "avg_logprob": -0.049490770800360316, "compression_ratio": 1.8828125, "no_speech_prob": 0.0007292750524356961}, {"id": 2771, "seek": 638874, "start": 6410.74, "end": 6412.639999999999, "text": " over 32 terms", "tokens": [51465, 670, 8858, 2115, 51560], "temperature": 0.0, "avg_logprob": -0.049490770800360316, "compression_ratio": 1.8828125, "no_speech_prob": 0.0007292750524356961}, {"id": 2772, "seek": 638874, "start": 6412.74, "end": 6414.639999999999, "text": " and so you have to be careful with that", "tokens": [51565, 293, 370, 291, 362, 281, 312, 5026, 365, 300, 51660], "temperature": 0.0, "avg_logprob": -0.049490770800360316, "compression_ratio": 1.8828125, "no_speech_prob": 0.0007292750524356961}, {"id": 2773, "seek": 638874, "start": 6414.74, "end": 6416.639999999999, "text": " when you are plugging in the expressions from here to here", "tokens": [51665, 562, 291, 366, 42975, 294, 264, 15277, 490, 510, 281, 510, 51760], "temperature": 0.0, "avg_logprob": -0.049490770800360316, "compression_ratio": 1.8828125, "no_speech_prob": 0.0007292750524356961}, {"id": 2774, "seek": 638874, "start": 6416.74, "end": 6418.639999999999, "text": " you may have to rename i's into j's", "tokens": [51765, 291, 815, 362, 281, 36741, 741, 311, 666, 361, 311, 51860], "temperature": 0.0, "avg_logprob": -0.049490770800360316, "compression_ratio": 1.8828125, "no_speech_prob": 0.0007292750524356961}, {"id": 2775, "seek": 641864, "start": 6418.64, "end": 6420.54, "text": " but you have to be very careful", "tokens": [50365, 457, 291, 362, 281, 312, 588, 5026, 50460], "temperature": 0.0, "avg_logprob": -0.08858850576581746, "compression_ratio": 1.9147286821705427, "no_speech_prob": 0.0011068896856158972}, {"id": 2776, "seek": 641864, "start": 6420.64, "end": 6422.54, "text": " what is actually an i", "tokens": [50465, 437, 307, 767, 364, 741, 50560], "temperature": 0.0, "avg_logprob": -0.08858850576581746, "compression_ratio": 1.9147286821705427, "no_speech_prob": 0.0011068896856158972}, {"id": 2777, "seek": 641864, "start": 6422.64, "end": 6424.54, "text": " with respect to dl by d xi", "tokens": [50565, 365, 3104, 281, 37873, 538, 274, 36800, 50660], "temperature": 0.0, "avg_logprob": -0.08858850576581746, "compression_ratio": 1.9147286821705427, "no_speech_prob": 0.0011068896856158972}, {"id": 2778, "seek": 641864, "start": 6424.64, "end": 6426.54, "text": " so some of these are j's", "tokens": [50665, 370, 512, 295, 613, 366, 361, 311, 50760], "temperature": 0.0, "avg_logprob": -0.08858850576581746, "compression_ratio": 1.9147286821705427, "no_speech_prob": 0.0011068896856158972}, {"id": 2779, "seek": 641864, "start": 6426.64, "end": 6428.54, "text": " some of these are i's", "tokens": [50765, 512, 295, 613, 366, 741, 311, 50860], "temperature": 0.0, "avg_logprob": -0.08858850576581746, "compression_ratio": 1.9147286821705427, "no_speech_prob": 0.0011068896856158972}, {"id": 2780, "seek": 641864, "start": 6428.64, "end": 6430.54, "text": " and then we simplify this expression", "tokens": [50865, 293, 550, 321, 20460, 341, 6114, 50960], "temperature": 0.0, "avg_logprob": -0.08858850576581746, "compression_ratio": 1.9147286821705427, "no_speech_prob": 0.0011068896856158972}, {"id": 2781, "seek": 641864, "start": 6430.64, "end": 6432.54, "text": " and I guess like", "tokens": [50965, 293, 286, 2041, 411, 51060], "temperature": 0.0, "avg_logprob": -0.08858850576581746, "compression_ratio": 1.9147286821705427, "no_speech_prob": 0.0011068896856158972}, {"id": 2782, "seek": 641864, "start": 6432.64, "end": 6434.54, "text": " the big thing to notice here is", "tokens": [51065, 264, 955, 551, 281, 3449, 510, 307, 51160], "temperature": 0.0, "avg_logprob": -0.08858850576581746, "compression_ratio": 1.9147286821705427, "no_speech_prob": 0.0011068896856158972}, {"id": 2783, "seek": 641864, "start": 6434.64, "end": 6436.54, "text": " a bunch of terms are just going to come out to the front", "tokens": [51165, 257, 3840, 295, 2115, 366, 445, 516, 281, 808, 484, 281, 264, 1868, 51260], "temperature": 0.0, "avg_logprob": -0.08858850576581746, "compression_ratio": 1.9147286821705427, "no_speech_prob": 0.0011068896856158972}, {"id": 2784, "seek": 641864, "start": 6436.64, "end": 6438.54, "text": " and you can refactor them", "tokens": [51265, 293, 291, 393, 1895, 15104, 552, 51360], "temperature": 0.0, "avg_logprob": -0.08858850576581746, "compression_ratio": 1.9147286821705427, "no_speech_prob": 0.0011068896856158972}, {"id": 2785, "seek": 641864, "start": 6438.64, "end": 6440.54, "text": " there is a sigma squared plus epsilon", "tokens": [51365, 456, 307, 257, 12771, 8889, 1804, 17889, 51460], "temperature": 0.0, "avg_logprob": -0.08858850576581746, "compression_ratio": 1.9147286821705427, "no_speech_prob": 0.0011068896856158972}, {"id": 2786, "seek": 641864, "start": 6440.64, "end": 6442.54, "text": " raised to the power of negative 3 over 2", "tokens": [51465, 6005, 281, 264, 1347, 295, 3671, 805, 670, 568, 51560], "temperature": 0.0, "avg_logprob": -0.08858850576581746, "compression_ratio": 1.9147286821705427, "no_speech_prob": 0.0011068896856158972}, {"id": 2787, "seek": 641864, "start": 6442.64, "end": 6444.54, "text": " this sigma squared plus epsilon", "tokens": [51565, 341, 12771, 8889, 1804, 17889, 51660], "temperature": 0.0, "avg_logprob": -0.08858850576581746, "compression_ratio": 1.9147286821705427, "no_speech_prob": 0.0011068896856158972}, {"id": 2788, "seek": 641864, "start": 6444.64, "end": 6446.54, "text": " can be actually separated out into 3 terms", "tokens": [51665, 393, 312, 767, 12005, 484, 666, 805, 2115, 51760], "temperature": 0.0, "avg_logprob": -0.08858850576581746, "compression_ratio": 1.9147286821705427, "no_speech_prob": 0.0011068896856158972}, {"id": 2789, "seek": 641864, "start": 6446.64, "end": 6448.54, "text": " each of them are sigma squared plus epsilon", "tokens": [51765, 1184, 295, 552, 366, 12771, 8889, 1804, 17889, 51860], "temperature": 0.0, "avg_logprob": -0.08858850576581746, "compression_ratio": 1.9147286821705427, "no_speech_prob": 0.0011068896856158972}, {"id": 2790, "seek": 644854, "start": 6448.54, "end": 6450.44, "text": " raised to the power of negative 1 over 2", "tokens": [50365, 6005, 281, 264, 1347, 295, 3671, 502, 670, 568, 50460], "temperature": 0.0, "avg_logprob": -0.05982389450073242, "compression_ratio": 2.0154440154440154, "no_speech_prob": 0.00047378247836604714}, {"id": 2791, "seek": 644854, "start": 6450.54, "end": 6452.44, "text": " so the 3 of them multiplied", "tokens": [50465, 370, 264, 805, 295, 552, 17207, 50560], "temperature": 0.0, "avg_logprob": -0.05982389450073242, "compression_ratio": 2.0154440154440154, "no_speech_prob": 0.00047378247836604714}, {"id": 2792, "seek": 644854, "start": 6452.54, "end": 6454.44, "text": " is equal to this", "tokens": [50565, 307, 2681, 281, 341, 50660], "temperature": 0.0, "avg_logprob": -0.05982389450073242, "compression_ratio": 2.0154440154440154, "no_speech_prob": 0.00047378247836604714}, {"id": 2793, "seek": 644854, "start": 6454.54, "end": 6456.44, "text": " and then those 3 terms can go different places", "tokens": [50665, 293, 550, 729, 805, 2115, 393, 352, 819, 3190, 50760], "temperature": 0.0, "avg_logprob": -0.05982389450073242, "compression_ratio": 2.0154440154440154, "no_speech_prob": 0.00047378247836604714}, {"id": 2794, "seek": 644854, "start": 6456.54, "end": 6458.44, "text": " because of the multiplication", "tokens": [50765, 570, 295, 264, 27290, 50860], "temperature": 0.0, "avg_logprob": -0.05982389450073242, "compression_ratio": 2.0154440154440154, "no_speech_prob": 0.00047378247836604714}, {"id": 2795, "seek": 644854, "start": 6458.54, "end": 6460.44, "text": " so one of them actually comes out to the front", "tokens": [50865, 370, 472, 295, 552, 767, 1487, 484, 281, 264, 1868, 50960], "temperature": 0.0, "avg_logprob": -0.05982389450073242, "compression_ratio": 2.0154440154440154, "no_speech_prob": 0.00047378247836604714}, {"id": 2796, "seek": 644854, "start": 6460.54, "end": 6462.44, "text": " and will end up here outside", "tokens": [50965, 293, 486, 917, 493, 510, 2380, 51060], "temperature": 0.0, "avg_logprob": -0.05982389450073242, "compression_ratio": 2.0154440154440154, "no_speech_prob": 0.00047378247836604714}, {"id": 2797, "seek": 644854, "start": 6462.54, "end": 6464.44, "text": " one of them joins up with this term", "tokens": [51065, 472, 295, 552, 24397, 493, 365, 341, 1433, 51160], "temperature": 0.0, "avg_logprob": -0.05982389450073242, "compression_ratio": 2.0154440154440154, "no_speech_prob": 0.00047378247836604714}, {"id": 2798, "seek": 644854, "start": 6464.54, "end": 6466.44, "text": " and one of them joins up with this other term", "tokens": [51165, 293, 472, 295, 552, 24397, 493, 365, 341, 661, 1433, 51260], "temperature": 0.0, "avg_logprob": -0.05982389450073242, "compression_ratio": 2.0154440154440154, "no_speech_prob": 0.00047378247836604714}, {"id": 2799, "seek": 644854, "start": 6466.54, "end": 6468.44, "text": " and then when you simplify the expression", "tokens": [51265, 293, 550, 562, 291, 20460, 264, 6114, 51360], "temperature": 0.0, "avg_logprob": -0.05982389450073242, "compression_ratio": 2.0154440154440154, "no_speech_prob": 0.00047378247836604714}, {"id": 2800, "seek": 644854, "start": 6468.54, "end": 6470.44, "text": " you will notice that", "tokens": [51365, 291, 486, 3449, 300, 51460], "temperature": 0.0, "avg_logprob": -0.05982389450073242, "compression_ratio": 2.0154440154440154, "no_speech_prob": 0.00047378247836604714}, {"id": 2801, "seek": 644854, "start": 6470.54, "end": 6472.44, "text": " some of these terms that are coming out", "tokens": [51465, 512, 295, 613, 2115, 300, 366, 1348, 484, 51560], "temperature": 0.0, "avg_logprob": -0.05982389450073242, "compression_ratio": 2.0154440154440154, "no_speech_prob": 0.00047378247836604714}, {"id": 2802, "seek": 644854, "start": 6472.54, "end": 6474.44, "text": " are just the xi hats", "tokens": [51565, 366, 445, 264, 36800, 20549, 51660], "temperature": 0.0, "avg_logprob": -0.05982389450073242, "compression_ratio": 2.0154440154440154, "no_speech_prob": 0.00047378247836604714}, {"id": 2803, "seek": 644854, "start": 6474.54, "end": 6476.44, "text": " so you can simplify just by rewriting that", "tokens": [51665, 370, 291, 393, 20460, 445, 538, 319, 19868, 300, 51760], "temperature": 0.0, "avg_logprob": -0.05982389450073242, "compression_ratio": 2.0154440154440154, "no_speech_prob": 0.00047378247836604714}, {"id": 2804, "seek": 644854, "start": 6476.54, "end": 6478.44, "text": " and what we end up with at the end", "tokens": [51765, 293, 437, 321, 917, 493, 365, 412, 264, 917, 51860], "temperature": 0.0, "avg_logprob": -0.05982389450073242, "compression_ratio": 2.0154440154440154, "no_speech_prob": 0.00047378247836604714}, {"id": 2805, "seek": 647844, "start": 6478.44, "end": 6480.339999999999, "text": " is a fairly simple mathematical expression", "tokens": [50365, 307, 257, 6457, 2199, 18894, 6114, 50460], "temperature": 0.0, "avg_logprob": -0.05184193075138287, "compression_ratio": 1.8931297709923665, "no_speech_prob": 0.0011065613944083452}, {"id": 2806, "seek": 647844, "start": 6480.44, "end": 6482.339999999999, "text": " over here that I cannot simplify further", "tokens": [50465, 670, 510, 300, 286, 2644, 20460, 3052, 50560], "temperature": 0.0, "avg_logprob": -0.05184193075138287, "compression_ratio": 1.8931297709923665, "no_speech_prob": 0.0011065613944083452}, {"id": 2807, "seek": 647844, "start": 6482.44, "end": 6484.339999999999, "text": " but basically you'll notice that", "tokens": [50565, 457, 1936, 291, 603, 3449, 300, 50660], "temperature": 0.0, "avg_logprob": -0.05184193075138287, "compression_ratio": 1.8931297709923665, "no_speech_prob": 0.0011065613944083452}, {"id": 2808, "seek": 647844, "start": 6484.44, "end": 6486.339999999999, "text": " it only uses the stuff we have", "tokens": [50665, 309, 787, 4960, 264, 1507, 321, 362, 50760], "temperature": 0.0, "avg_logprob": -0.05184193075138287, "compression_ratio": 1.8931297709923665, "no_speech_prob": 0.0011065613944083452}, {"id": 2809, "seek": 647844, "start": 6486.44, "end": 6488.339999999999, "text": " and it derives the thing we need", "tokens": [50765, 293, 309, 1163, 1539, 264, 551, 321, 643, 50860], "temperature": 0.0, "avg_logprob": -0.05184193075138287, "compression_ratio": 1.8931297709923665, "no_speech_prob": 0.0011065613944083452}, {"id": 2810, "seek": 647844, "start": 6488.44, "end": 6490.339999999999, "text": " so we have dl by dy", "tokens": [50865, 370, 321, 362, 37873, 538, 14584, 50960], "temperature": 0.0, "avg_logprob": -0.05184193075138287, "compression_ratio": 1.8931297709923665, "no_speech_prob": 0.0011065613944083452}, {"id": 2811, "seek": 647844, "start": 6490.44, "end": 6492.339999999999, "text": " for all the i's", "tokens": [50965, 337, 439, 264, 741, 311, 51060], "temperature": 0.0, "avg_logprob": -0.05184193075138287, "compression_ratio": 1.8931297709923665, "no_speech_prob": 0.0011065613944083452}, {"id": 2812, "seek": 647844, "start": 6492.44, "end": 6494.339999999999, "text": " and those are used plenty of times here", "tokens": [51065, 293, 729, 366, 1143, 7140, 295, 1413, 510, 51160], "temperature": 0.0, "avg_logprob": -0.05184193075138287, "compression_ratio": 1.8931297709923665, "no_speech_prob": 0.0011065613944083452}, {"id": 2813, "seek": 647844, "start": 6494.44, "end": 6496.339999999999, "text": " and also in addition what we're using", "tokens": [51165, 293, 611, 294, 4500, 437, 321, 434, 1228, 51260], "temperature": 0.0, "avg_logprob": -0.05184193075138287, "compression_ratio": 1.8931297709923665, "no_speech_prob": 0.0011065613944083452}, {"id": 2814, "seek": 647844, "start": 6496.44, "end": 6498.339999999999, "text": " is these xi hats and xj hats", "tokens": [51265, 307, 613, 36800, 20549, 293, 2031, 73, 20549, 51360], "temperature": 0.0, "avg_logprob": -0.05184193075138287, "compression_ratio": 1.8931297709923665, "no_speech_prob": 0.0011065613944083452}, {"id": 2815, "seek": 647844, "start": 6498.44, "end": 6500.339999999999, "text": " and they just come from the forward pass", "tokens": [51365, 293, 436, 445, 808, 490, 264, 2128, 1320, 51460], "temperature": 0.0, "avg_logprob": -0.05184193075138287, "compression_ratio": 1.8931297709923665, "no_speech_prob": 0.0011065613944083452}, {"id": 2816, "seek": 647844, "start": 6500.44, "end": 6502.339999999999, "text": " and otherwise this is a", "tokens": [51465, 293, 5911, 341, 307, 257, 51560], "temperature": 0.0, "avg_logprob": -0.05184193075138287, "compression_ratio": 1.8931297709923665, "no_speech_prob": 0.0011065613944083452}, {"id": 2817, "seek": 647844, "start": 6502.44, "end": 6504.339999999999, "text": " simple expression and it gives us", "tokens": [51565, 2199, 6114, 293, 309, 2709, 505, 51660], "temperature": 0.0, "avg_logprob": -0.05184193075138287, "compression_ratio": 1.8931297709923665, "no_speech_prob": 0.0011065613944083452}, {"id": 2818, "seek": 647844, "start": 6504.44, "end": 6506.339999999999, "text": " dl by d xi for all the i's", "tokens": [51665, 37873, 538, 274, 36800, 337, 439, 264, 741, 311, 51760], "temperature": 0.0, "avg_logprob": -0.05184193075138287, "compression_ratio": 1.8931297709923665, "no_speech_prob": 0.0011065613944083452}, {"id": 2819, "seek": 647844, "start": 6506.44, "end": 6508.339999999999, "text": " and that's ultimately what we're interested in", "tokens": [51765, 293, 300, 311, 6284, 437, 321, 434, 3102, 294, 51860], "temperature": 0.0, "avg_logprob": -0.05184193075138287, "compression_ratio": 1.8931297709923665, "no_speech_prob": 0.0011065613944083452}, {"id": 2820, "seek": 650844, "start": 6508.44, "end": 6510.339999999999, "text": " so that's the end of", "tokens": [50365, 370, 300, 311, 264, 917, 295, 50460], "temperature": 0.0, "avg_logprob": -0.08358010378750888, "compression_ratio": 1.821705426356589, "no_speech_prob": 0.0005966054741293192}, {"id": 2821, "seek": 650844, "start": 6510.44, "end": 6512.339999999999, "text": " batch norm", "tokens": [50465, 15245, 2026, 50560], "temperature": 0.0, "avg_logprob": -0.08358010378750888, "compression_ratio": 1.821705426356589, "no_speech_prob": 0.0005966054741293192}, {"id": 2822, "seek": 650844, "start": 6512.44, "end": 6514.339999999999, "text": " backward pass analytically", "tokens": [50565, 23897, 1320, 10783, 984, 50660], "temperature": 0.0, "avg_logprob": -0.08358010378750888, "compression_ratio": 1.821705426356589, "no_speech_prob": 0.0005966054741293192}, {"id": 2823, "seek": 650844, "start": 6514.44, "end": 6516.339999999999, "text": " let's now implement this final result", "tokens": [50665, 718, 311, 586, 4445, 341, 2572, 1874, 50760], "temperature": 0.0, "avg_logprob": -0.08358010378750888, "compression_ratio": 1.821705426356589, "no_speech_prob": 0.0005966054741293192}, {"id": 2824, "seek": 650844, "start": 6516.44, "end": 6518.339999999999, "text": " okay so I implemented the expression", "tokens": [50765, 1392, 370, 286, 12270, 264, 6114, 50860], "temperature": 0.0, "avg_logprob": -0.08358010378750888, "compression_ratio": 1.821705426356589, "no_speech_prob": 0.0005966054741293192}, {"id": 2825, "seek": 650844, "start": 6518.44, "end": 6520.339999999999, "text": " into a single line of code here", "tokens": [50865, 666, 257, 2167, 1622, 295, 3089, 510, 50960], "temperature": 0.0, "avg_logprob": -0.08358010378750888, "compression_ratio": 1.821705426356589, "no_speech_prob": 0.0005966054741293192}, {"id": 2826, "seek": 650844, "start": 6520.44, "end": 6522.339999999999, "text": " and you can see that the max diff", "tokens": [50965, 293, 291, 393, 536, 300, 264, 11469, 7593, 51060], "temperature": 0.0, "avg_logprob": -0.08358010378750888, "compression_ratio": 1.821705426356589, "no_speech_prob": 0.0005966054741293192}, {"id": 2827, "seek": 650844, "start": 6522.44, "end": 6524.339999999999, "text": " is tiny so this is the correct implementation", "tokens": [51065, 307, 5870, 370, 341, 307, 264, 3006, 11420, 51160], "temperature": 0.0, "avg_logprob": -0.08358010378750888, "compression_ratio": 1.821705426356589, "no_speech_prob": 0.0005966054741293192}, {"id": 2828, "seek": 650844, "start": 6524.44, "end": 6526.339999999999, "text": " of this formula", "tokens": [51165, 295, 341, 8513, 51260], "temperature": 0.0, "avg_logprob": -0.08358010378750888, "compression_ratio": 1.821705426356589, "no_speech_prob": 0.0005966054741293192}, {"id": 2829, "seek": 650844, "start": 6526.44, "end": 6528.339999999999, "text": " now I'll just", "tokens": [51265, 586, 286, 603, 445, 51360], "temperature": 0.0, "avg_logprob": -0.08358010378750888, "compression_ratio": 1.821705426356589, "no_speech_prob": 0.0005966054741293192}, {"id": 2830, "seek": 650844, "start": 6528.44, "end": 6530.339999999999, "text": " basically tell you that getting this", "tokens": [51365, 1936, 980, 291, 300, 1242, 341, 51460], "temperature": 0.0, "avg_logprob": -0.08358010378750888, "compression_ratio": 1.821705426356589, "no_speech_prob": 0.0005966054741293192}, {"id": 2831, "seek": 650844, "start": 6530.44, "end": 6532.339999999999, "text": " formula here from this mathematical expression", "tokens": [51465, 8513, 510, 490, 341, 18894, 6114, 51560], "temperature": 0.0, "avg_logprob": -0.08358010378750888, "compression_ratio": 1.821705426356589, "no_speech_prob": 0.0005966054741293192}, {"id": 2832, "seek": 650844, "start": 6532.44, "end": 6534.339999999999, "text": " was not trivial and there's a lot", "tokens": [51565, 390, 406, 26703, 293, 456, 311, 257, 688, 51660], "temperature": 0.0, "avg_logprob": -0.08358010378750888, "compression_ratio": 1.821705426356589, "no_speech_prob": 0.0005966054741293192}, {"id": 2833, "seek": 650844, "start": 6534.44, "end": 6536.339999999999, "text": " going on packed into this one formula", "tokens": [51665, 516, 322, 13265, 666, 341, 472, 8513, 51760], "temperature": 0.0, "avg_logprob": -0.08358010378750888, "compression_ratio": 1.821705426356589, "no_speech_prob": 0.0005966054741293192}, {"id": 2834, "seek": 650844, "start": 6536.44, "end": 6538.339999999999, "text": " and this is a whole exercise by itself", "tokens": [51765, 293, 341, 307, 257, 1379, 5380, 538, 2564, 51860], "temperature": 0.0, "avg_logprob": -0.08358010378750888, "compression_ratio": 1.821705426356589, "no_speech_prob": 0.0005966054741293192}, {"id": 2835, "seek": 653844, "start": 6538.44, "end": 6540.339999999999, "text": " because you have to consider", "tokens": [50365, 570, 291, 362, 281, 1949, 50460], "temperature": 0.0, "avg_logprob": -0.05190011543956229, "compression_ratio": 1.826086956521739, "no_speech_prob": 0.0007024046499282122}, {"id": 2836, "seek": 653844, "start": 6540.44, "end": 6542.339999999999, "text": " the fact that this formula here", "tokens": [50465, 264, 1186, 300, 341, 8513, 510, 50560], "temperature": 0.0, "avg_logprob": -0.05190011543956229, "compression_ratio": 1.826086956521739, "no_speech_prob": 0.0007024046499282122}, {"id": 2837, "seek": 653844, "start": 6542.44, "end": 6544.339999999999, "text": " is just for a single neuron", "tokens": [50565, 307, 445, 337, 257, 2167, 34090, 50660], "temperature": 0.0, "avg_logprob": -0.05190011543956229, "compression_ratio": 1.826086956521739, "no_speech_prob": 0.0007024046499282122}, {"id": 2838, "seek": 653844, "start": 6544.44, "end": 6546.339999999999, "text": " and a batch of 32 examples", "tokens": [50665, 293, 257, 15245, 295, 8858, 5110, 50760], "temperature": 0.0, "avg_logprob": -0.05190011543956229, "compression_ratio": 1.826086956521739, "no_speech_prob": 0.0007024046499282122}, {"id": 2839, "seek": 653844, "start": 6546.44, "end": 6548.339999999999, "text": " but what I'm doing here is I'm actually", "tokens": [50765, 457, 437, 286, 478, 884, 510, 307, 286, 478, 767, 50860], "temperature": 0.0, "avg_logprob": -0.05190011543956229, "compression_ratio": 1.826086956521739, "no_speech_prob": 0.0007024046499282122}, {"id": 2840, "seek": 653844, "start": 6548.44, "end": 6550.339999999999, "text": " we actually have 64 neurons", "tokens": [50865, 321, 767, 362, 12145, 22027, 50960], "temperature": 0.0, "avg_logprob": -0.05190011543956229, "compression_ratio": 1.826086956521739, "no_speech_prob": 0.0007024046499282122}, {"id": 2841, "seek": 653844, "start": 6550.44, "end": 6552.339999999999, "text": " and so this expression has to in parallel", "tokens": [50965, 293, 370, 341, 6114, 575, 281, 294, 8952, 51060], "temperature": 0.0, "avg_logprob": -0.05190011543956229, "compression_ratio": 1.826086956521739, "no_speech_prob": 0.0007024046499282122}, {"id": 2842, "seek": 653844, "start": 6552.44, "end": 6554.339999999999, "text": " evaluate the batch norm backward pass", "tokens": [51065, 13059, 264, 15245, 2026, 23897, 1320, 51160], "temperature": 0.0, "avg_logprob": -0.05190011543956229, "compression_ratio": 1.826086956521739, "no_speech_prob": 0.0007024046499282122}, {"id": 2843, "seek": 653844, "start": 6554.44, "end": 6556.339999999999, "text": " for all of those 64 neurons", "tokens": [51165, 337, 439, 295, 729, 12145, 22027, 51260], "temperature": 0.0, "avg_logprob": -0.05190011543956229, "compression_ratio": 1.826086956521739, "no_speech_prob": 0.0007024046499282122}, {"id": 2844, "seek": 653844, "start": 6556.44, "end": 6558.339999999999, "text": " in parallel and independently", "tokens": [51265, 294, 8952, 293, 21761, 51360], "temperature": 0.0, "avg_logprob": -0.05190011543956229, "compression_ratio": 1.826086956521739, "no_speech_prob": 0.0007024046499282122}, {"id": 2845, "seek": 653844, "start": 6558.44, "end": 6560.339999999999, "text": " so this has to happen basically in every single", "tokens": [51365, 370, 341, 575, 281, 1051, 1936, 294, 633, 2167, 51460], "temperature": 0.0, "avg_logprob": -0.05190011543956229, "compression_ratio": 1.826086956521739, "no_speech_prob": 0.0007024046499282122}, {"id": 2846, "seek": 653844, "start": 6560.44, "end": 6562.339999999999, "text": " column of", "tokens": [51465, 7738, 295, 51560], "temperature": 0.0, "avg_logprob": -0.05190011543956229, "compression_ratio": 1.826086956521739, "no_speech_prob": 0.0007024046499282122}, {"id": 2847, "seek": 653844, "start": 6562.44, "end": 6564.339999999999, "text": " the inputs here", "tokens": [51565, 264, 15743, 510, 51660], "temperature": 0.0, "avg_logprob": -0.05190011543956229, "compression_ratio": 1.826086956521739, "no_speech_prob": 0.0007024046499282122}, {"id": 2848, "seek": 653844, "start": 6564.44, "end": 6566.339999999999, "text": " and in addition to that", "tokens": [51665, 293, 294, 4500, 281, 300, 51760], "temperature": 0.0, "avg_logprob": -0.05190011543956229, "compression_ratio": 1.826086956521739, "no_speech_prob": 0.0007024046499282122}, {"id": 2849, "seek": 653844, "start": 6566.44, "end": 6568.339999999999, "text": " you see how there are a bunch of sums here", "tokens": [51765, 291, 536, 577, 456, 366, 257, 3840, 295, 34499, 510, 51860], "temperature": 0.0, "avg_logprob": -0.05190011543956229, "compression_ratio": 1.826086956521739, "no_speech_prob": 0.0007024046499282122}, {"id": 2850, "seek": 656834, "start": 6568.34, "end": 6570.24, "text": " and I want to make sure that when I do those sums", "tokens": [50365, 293, 286, 528, 281, 652, 988, 300, 562, 286, 360, 729, 34499, 50460], "temperature": 0.0, "avg_logprob": -0.07294927145305433, "compression_ratio": 1.8676923076923078, "no_speech_prob": 0.001023865770548582}, {"id": 2851, "seek": 656834, "start": 6570.34, "end": 6572.24, "text": " that they broadcast correctly onto everything else", "tokens": [50465, 300, 436, 9975, 8944, 3911, 1203, 1646, 50560], "temperature": 0.0, "avg_logprob": -0.07294927145305433, "compression_ratio": 1.8676923076923078, "no_speech_prob": 0.001023865770548582}, {"id": 2852, "seek": 656834, "start": 6572.34, "end": 6574.24, "text": " that's here and so getting this expression", "tokens": [50565, 300, 311, 510, 293, 370, 1242, 341, 6114, 50660], "temperature": 0.0, "avg_logprob": -0.07294927145305433, "compression_ratio": 1.8676923076923078, "no_speech_prob": 0.001023865770548582}, {"id": 2853, "seek": 656834, "start": 6574.34, "end": 6576.24, "text": " is just like highly non-trivial", "tokens": [50665, 307, 445, 411, 5405, 2107, 12, 83, 470, 22640, 50760], "temperature": 0.0, "avg_logprob": -0.07294927145305433, "compression_ratio": 1.8676923076923078, "no_speech_prob": 0.001023865770548582}, {"id": 2854, "seek": 656834, "start": 6576.34, "end": 6578.24, "text": " and I invite you to basically look through it", "tokens": [50765, 293, 286, 7980, 291, 281, 1936, 574, 807, 309, 50860], "temperature": 0.0, "avg_logprob": -0.07294927145305433, "compression_ratio": 1.8676923076923078, "no_speech_prob": 0.001023865770548582}, {"id": 2855, "seek": 656834, "start": 6578.34, "end": 6580.24, "text": " and step through it and it's a whole exercise", "tokens": [50865, 293, 1823, 807, 309, 293, 309, 311, 257, 1379, 5380, 50960], "temperature": 0.0, "avg_logprob": -0.07294927145305433, "compression_ratio": 1.8676923076923078, "no_speech_prob": 0.001023865770548582}, {"id": 2856, "seek": 656834, "start": 6580.34, "end": 6582.24, "text": " to make sure that this checks out", "tokens": [50965, 281, 652, 988, 300, 341, 13834, 484, 51060], "temperature": 0.0, "avg_logprob": -0.07294927145305433, "compression_ratio": 1.8676923076923078, "no_speech_prob": 0.001023865770548582}, {"id": 2857, "seek": 656834, "start": 6582.34, "end": 6584.24, "text": " but once all the shapes agree", "tokens": [51065, 457, 1564, 439, 264, 10854, 3986, 51160], "temperature": 0.0, "avg_logprob": -0.07294927145305433, "compression_ratio": 1.8676923076923078, "no_speech_prob": 0.001023865770548582}, {"id": 2858, "seek": 656834, "start": 6584.34, "end": 6586.24, "text": " and once you convince yourself that it's correct", "tokens": [51165, 293, 1564, 291, 13447, 1803, 300, 309, 311, 3006, 51260], "temperature": 0.0, "avg_logprob": -0.07294927145305433, "compression_ratio": 1.8676923076923078, "no_speech_prob": 0.001023865770548582}, {"id": 2859, "seek": 656834, "start": 6586.34, "end": 6588.24, "text": " you can also verify that PyTorch", "tokens": [51265, 291, 393, 611, 16888, 300, 9953, 51, 284, 339, 51360], "temperature": 0.0, "avg_logprob": -0.07294927145305433, "compression_ratio": 1.8676923076923078, "no_speech_prob": 0.001023865770548582}, {"id": 2860, "seek": 656834, "start": 6588.34, "end": 6590.24, "text": " gets the exact same answer as well", "tokens": [51365, 2170, 264, 1900, 912, 1867, 382, 731, 51460], "temperature": 0.0, "avg_logprob": -0.07294927145305433, "compression_ratio": 1.8676923076923078, "no_speech_prob": 0.001023865770548582}, {"id": 2861, "seek": 656834, "start": 6590.34, "end": 6592.24, "text": " and so that gives you a lot of peace of mind", "tokens": [51465, 293, 370, 300, 2709, 291, 257, 688, 295, 4336, 295, 1575, 51560], "temperature": 0.0, "avg_logprob": -0.07294927145305433, "compression_ratio": 1.8676923076923078, "no_speech_prob": 0.001023865770548582}, {"id": 2862, "seek": 656834, "start": 6592.34, "end": 6594.24, "text": " that this mathematical formula is correctly", "tokens": [51565, 300, 341, 18894, 8513, 307, 8944, 51660], "temperature": 0.0, "avg_logprob": -0.07294927145305433, "compression_ratio": 1.8676923076923078, "no_speech_prob": 0.001023865770548582}, {"id": 2863, "seek": 656834, "start": 6594.34, "end": 6596.24, "text": " implemented here and broadcasted correctly", "tokens": [51665, 12270, 510, 293, 9975, 292, 8944, 51760], "temperature": 0.0, "avg_logprob": -0.07294927145305433, "compression_ratio": 1.8676923076923078, "no_speech_prob": 0.001023865770548582}, {"id": 2864, "seek": 656834, "start": 6596.34, "end": 6598.24, "text": " and replicated in parallel", "tokens": [51765, 293, 46365, 294, 8952, 51860], "temperature": 0.0, "avg_logprob": -0.07294927145305433, "compression_ratio": 1.8676923076923078, "no_speech_prob": 0.001023865770548582}, {"id": 2865, "seek": 659824, "start": 6598.24, "end": 6600.139999999999, "text": " for all of the 64 neurons", "tokens": [50365, 337, 439, 295, 264, 12145, 22027, 50460], "temperature": 0.0, "avg_logprob": -0.11927719939526894, "compression_ratio": 1.7755102040816326, "no_speech_prob": 0.0010662275599315763}, {"id": 2866, "seek": 659824, "start": 6600.24, "end": 6602.139999999999, "text": " inside this batch norm layer", "tokens": [50465, 1854, 341, 15245, 2026, 4583, 50560], "temperature": 0.0, "avg_logprob": -0.11927719939526894, "compression_ratio": 1.7755102040816326, "no_speech_prob": 0.0010662275599315763}, {"id": 2867, "seek": 659824, "start": 6602.24, "end": 6604.139999999999, "text": " okay and finally exercise number 4", "tokens": [50565, 1392, 293, 2721, 5380, 1230, 1017, 50660], "temperature": 0.0, "avg_logprob": -0.11927719939526894, "compression_ratio": 1.7755102040816326, "no_speech_prob": 0.0010662275599315763}, {"id": 2868, "seek": 659824, "start": 6604.24, "end": 6606.139999999999, "text": " asks you to put it all together", "tokens": [50665, 8962, 291, 281, 829, 309, 439, 1214, 50760], "temperature": 0.0, "avg_logprob": -0.11927719939526894, "compression_ratio": 1.7755102040816326, "no_speech_prob": 0.0010662275599315763}, {"id": 2869, "seek": 659824, "start": 6606.24, "end": 6608.139999999999, "text": " and here we have a redefinition", "tokens": [50765, 293, 510, 321, 362, 257, 14328, 5194, 849, 50860], "temperature": 0.0, "avg_logprob": -0.11927719939526894, "compression_ratio": 1.7755102040816326, "no_speech_prob": 0.0010662275599315763}, {"id": 2870, "seek": 659824, "start": 6608.24, "end": 6610.139999999999, "text": " of the entire problem", "tokens": [50865, 295, 264, 2302, 1154, 50960], "temperature": 0.0, "avg_logprob": -0.11927719939526894, "compression_ratio": 1.7755102040816326, "no_speech_prob": 0.0010662275599315763}, {"id": 2871, "seek": 659824, "start": 6610.24, "end": 6612.139999999999, "text": " so you see that we re-initialized the neural net from scratch", "tokens": [50965, 370, 291, 536, 300, 321, 319, 12, 259, 270, 831, 1602, 264, 18161, 2533, 490, 8459, 51060], "temperature": 0.0, "avg_logprob": -0.11927719939526894, "compression_ratio": 1.7755102040816326, "no_speech_prob": 0.0010662275599315763}, {"id": 2872, "seek": 659824, "start": 6612.24, "end": 6614.139999999999, "text": " and everything and then here", "tokens": [51065, 293, 1203, 293, 550, 510, 51160], "temperature": 0.0, "avg_logprob": -0.11927719939526894, "compression_ratio": 1.7755102040816326, "no_speech_prob": 0.0010662275599315763}, {"id": 2873, "seek": 659824, "start": 6614.24, "end": 6616.139999999999, "text": " instead of calling loss that backward", "tokens": [51165, 2602, 295, 5141, 4470, 300, 23897, 51260], "temperature": 0.0, "avg_logprob": -0.11927719939526894, "compression_ratio": 1.7755102040816326, "no_speech_prob": 0.0010662275599315763}, {"id": 2874, "seek": 659824, "start": 6616.24, "end": 6618.139999999999, "text": " we want to have the manual back propagation", "tokens": [51265, 321, 528, 281, 362, 264, 9688, 646, 38377, 51360], "temperature": 0.0, "avg_logprob": -0.11927719939526894, "compression_ratio": 1.7755102040816326, "no_speech_prob": 0.0010662275599315763}, {"id": 2875, "seek": 659824, "start": 6618.24, "end": 6620.139999999999, "text": " here as we derived it up above", "tokens": [51365, 510, 382, 321, 18949, 309, 493, 3673, 51460], "temperature": 0.0, "avg_logprob": -0.11927719939526894, "compression_ratio": 1.7755102040816326, "no_speech_prob": 0.0010662275599315763}, {"id": 2876, "seek": 659824, "start": 6620.24, "end": 6622.139999999999, "text": " so go up copy paste", "tokens": [51465, 370, 352, 493, 5055, 9163, 51560], "temperature": 0.0, "avg_logprob": -0.11927719939526894, "compression_ratio": 1.7755102040816326, "no_speech_prob": 0.0010662275599315763}, {"id": 2877, "seek": 659824, "start": 6622.24, "end": 6624.139999999999, "text": " all the chunks of code that we've already derived", "tokens": [51565, 439, 264, 24004, 295, 3089, 300, 321, 600, 1217, 18949, 51660], "temperature": 0.0, "avg_logprob": -0.11927719939526894, "compression_ratio": 1.7755102040816326, "no_speech_prob": 0.0010662275599315763}, {"id": 2878, "seek": 659824, "start": 6624.24, "end": 6626.139999999999, "text": " put them here and derive your own gradients", "tokens": [51665, 829, 552, 510, 293, 28446, 428, 1065, 2771, 2448, 51760], "temperature": 0.0, "avg_logprob": -0.11927719939526894, "compression_ratio": 1.7755102040816326, "no_speech_prob": 0.0010662275599315763}, {"id": 2879, "seek": 659824, "start": 6626.24, "end": 6628.139999999999, "text": " and then optimize this model", "tokens": [51765, 293, 550, 19719, 341, 2316, 51860], "temperature": 0.0, "avg_logprob": -0.11927719939526894, "compression_ratio": 1.7755102040816326, "no_speech_prob": 0.0010662275599315763}, {"id": 2880, "seek": 662814, "start": 6628.14, "end": 6630.04, "text": " using this neural net", "tokens": [50365, 1228, 341, 18161, 2533, 50460], "temperature": 0.0, "avg_logprob": -0.06434424596888419, "compression_ratio": 1.9144981412639406, "no_speech_prob": 0.002573546953499317}, {"id": 2881, "seek": 662814, "start": 6630.14, "end": 6632.04, "text": " basically using your own gradients", "tokens": [50465, 1936, 1228, 428, 1065, 2771, 2448, 50560], "temperature": 0.0, "avg_logprob": -0.06434424596888419, "compression_ratio": 1.9144981412639406, "no_speech_prob": 0.002573546953499317}, {"id": 2882, "seek": 662814, "start": 6632.14, "end": 6634.04, "text": " all the way to the calibration of the batch norm", "tokens": [50565, 439, 264, 636, 281, 264, 38732, 295, 264, 15245, 2026, 50660], "temperature": 0.0, "avg_logprob": -0.06434424596888419, "compression_ratio": 1.9144981412639406, "no_speech_prob": 0.002573546953499317}, {"id": 2883, "seek": 662814, "start": 6634.14, "end": 6636.04, "text": " and the evaluation of the loss", "tokens": [50665, 293, 264, 13344, 295, 264, 4470, 50760], "temperature": 0.0, "avg_logprob": -0.06434424596888419, "compression_ratio": 1.9144981412639406, "no_speech_prob": 0.002573546953499317}, {"id": 2884, "seek": 662814, "start": 6636.14, "end": 6638.04, "text": " and I was able to achieve quite a good loss", "tokens": [50765, 293, 286, 390, 1075, 281, 4584, 1596, 257, 665, 4470, 50860], "temperature": 0.0, "avg_logprob": -0.06434424596888419, "compression_ratio": 1.9144981412639406, "no_speech_prob": 0.002573546953499317}, {"id": 2885, "seek": 662814, "start": 6638.14, "end": 6640.04, "text": " basically the same loss you would achieve before", "tokens": [50865, 1936, 264, 912, 4470, 291, 576, 4584, 949, 50960], "temperature": 0.0, "avg_logprob": -0.06434424596888419, "compression_ratio": 1.9144981412639406, "no_speech_prob": 0.002573546953499317}, {"id": 2886, "seek": 662814, "start": 6640.14, "end": 6642.04, "text": " and that shouldn't be surprising", "tokens": [50965, 293, 300, 4659, 380, 312, 8830, 51060], "temperature": 0.0, "avg_logprob": -0.06434424596888419, "compression_ratio": 1.9144981412639406, "no_speech_prob": 0.002573546953499317}, {"id": 2887, "seek": 662814, "start": 6642.14, "end": 6644.04, "text": " because all we've done is we've", "tokens": [51065, 570, 439, 321, 600, 1096, 307, 321, 600, 51160], "temperature": 0.0, "avg_logprob": -0.06434424596888419, "compression_ratio": 1.9144981412639406, "no_speech_prob": 0.002573546953499317}, {"id": 2888, "seek": 662814, "start": 6644.14, "end": 6646.04, "text": " really got into loss that backward", "tokens": [51165, 534, 658, 666, 4470, 300, 23897, 51260], "temperature": 0.0, "avg_logprob": -0.06434424596888419, "compression_ratio": 1.9144981412639406, "no_speech_prob": 0.002573546953499317}, {"id": 2889, "seek": 662814, "start": 6646.14, "end": 6648.04, "text": " and we've pulled out all the code", "tokens": [51265, 293, 321, 600, 7373, 484, 439, 264, 3089, 51360], "temperature": 0.0, "avg_logprob": -0.06434424596888419, "compression_ratio": 1.9144981412639406, "no_speech_prob": 0.002573546953499317}, {"id": 2890, "seek": 662814, "start": 6648.14, "end": 6650.04, "text": " and inserted it here", "tokens": [51365, 293, 27992, 309, 510, 51460], "temperature": 0.0, "avg_logprob": -0.06434424596888419, "compression_ratio": 1.9144981412639406, "no_speech_prob": 0.002573546953499317}, {"id": 2891, "seek": 662814, "start": 6650.14, "end": 6652.04, "text": " but those gradients are identical", "tokens": [51465, 457, 729, 2771, 2448, 366, 14800, 51560], "temperature": 0.0, "avg_logprob": -0.06434424596888419, "compression_ratio": 1.9144981412639406, "no_speech_prob": 0.002573546953499317}, {"id": 2892, "seek": 662814, "start": 6652.14, "end": 6654.04, "text": " and everything is identical", "tokens": [51565, 293, 1203, 307, 14800, 51660], "temperature": 0.0, "avg_logprob": -0.06434424596888419, "compression_ratio": 1.9144981412639406, "no_speech_prob": 0.002573546953499317}, {"id": 2893, "seek": 662814, "start": 6654.14, "end": 6656.04, "text": " and the results are identical", "tokens": [51665, 293, 264, 3542, 366, 14800, 51760], "temperature": 0.0, "avg_logprob": -0.06434424596888419, "compression_ratio": 1.9144981412639406, "no_speech_prob": 0.002573546953499317}, {"id": 2894, "seek": 662814, "start": 6656.14, "end": 6658.04, "text": " it's just that we have full visibility", "tokens": [51765, 309, 311, 445, 300, 321, 362, 1577, 19883, 51860], "temperature": 0.0, "avg_logprob": -0.06434424596888419, "compression_ratio": 1.9144981412639406, "no_speech_prob": 0.002573546953499317}, {"id": 2895, "seek": 665804, "start": 6658.04, "end": 6659.94, "text": " in this specific case", "tokens": [50365, 294, 341, 2685, 1389, 50460], "temperature": 0.0, "avg_logprob": -0.08083424917081507, "compression_ratio": 1.9024390243902438, "no_speech_prob": 0.001708847121335566}, {"id": 2896, "seek": 665804, "start": 6660.04, "end": 6661.94, "text": " okay and this is all of our code", "tokens": [50465, 1392, 293, 341, 307, 439, 295, 527, 3089, 50560], "temperature": 0.0, "avg_logprob": -0.08083424917081507, "compression_ratio": 1.9024390243902438, "no_speech_prob": 0.001708847121335566}, {"id": 2897, "seek": 665804, "start": 6662.04, "end": 6663.94, "text": " this is the full backward pass", "tokens": [50565, 341, 307, 264, 1577, 23897, 1320, 50660], "temperature": 0.0, "avg_logprob": -0.08083424917081507, "compression_ratio": 1.9024390243902438, "no_speech_prob": 0.001708847121335566}, {"id": 2898, "seek": 665804, "start": 6664.04, "end": 6665.94, "text": " using basically the simplified backward pass", "tokens": [50665, 1228, 1936, 264, 26335, 23897, 1320, 50760], "temperature": 0.0, "avg_logprob": -0.08083424917081507, "compression_ratio": 1.9024390243902438, "no_speech_prob": 0.001708847121335566}, {"id": 2899, "seek": 665804, "start": 6666.04, "end": 6667.94, "text": " for the cross entropy loss", "tokens": [50765, 337, 264, 3278, 30867, 4470, 50860], "temperature": 0.0, "avg_logprob": -0.08083424917081507, "compression_ratio": 1.9024390243902438, "no_speech_prob": 0.001708847121335566}, {"id": 2900, "seek": 665804, "start": 6668.04, "end": 6669.94, "text": " and the batch normalization", "tokens": [50865, 293, 264, 15245, 2710, 2144, 50960], "temperature": 0.0, "avg_logprob": -0.08083424917081507, "compression_ratio": 1.9024390243902438, "no_speech_prob": 0.001708847121335566}, {"id": 2901, "seek": 665804, "start": 6670.04, "end": 6671.94, "text": " so back propagating through cross entropy", "tokens": [50965, 370, 646, 12425, 990, 807, 3278, 30867, 51060], "temperature": 0.0, "avg_logprob": -0.08083424917081507, "compression_ratio": 1.9024390243902438, "no_speech_prob": 0.001708847121335566}, {"id": 2902, "seek": 665804, "start": 6672.04, "end": 6673.94, "text": " the second layer", "tokens": [51065, 264, 1150, 4583, 51160], "temperature": 0.0, "avg_logprob": -0.08083424917081507, "compression_ratio": 1.9024390243902438, "no_speech_prob": 0.001708847121335566}, {"id": 2903, "seek": 665804, "start": 6674.04, "end": 6675.94, "text": " the 10H null linearity", "tokens": [51165, 264, 1266, 39, 18184, 8213, 507, 51260], "temperature": 0.0, "avg_logprob": -0.08083424917081507, "compression_ratio": 1.9024390243902438, "no_speech_prob": 0.001708847121335566}, {"id": 2904, "seek": 665804, "start": 6676.04, "end": 6677.94, "text": " the batch normalization", "tokens": [51265, 264, 15245, 2710, 2144, 51360], "temperature": 0.0, "avg_logprob": -0.08083424917081507, "compression_ratio": 1.9024390243902438, "no_speech_prob": 0.001708847121335566}, {"id": 2905, "seek": 665804, "start": 6678.04, "end": 6679.94, "text": " through the first layer", "tokens": [51365, 807, 264, 700, 4583, 51460], "temperature": 0.0, "avg_logprob": -0.08083424917081507, "compression_ratio": 1.9024390243902438, "no_speech_prob": 0.001708847121335566}, {"id": 2906, "seek": 665804, "start": 6680.04, "end": 6681.94, "text": " and through the embedding", "tokens": [51465, 293, 807, 264, 12240, 3584, 51560], "temperature": 0.0, "avg_logprob": -0.08083424917081507, "compression_ratio": 1.9024390243902438, "no_speech_prob": 0.001708847121335566}, {"id": 2907, "seek": 665804, "start": 6682.04, "end": 6683.94, "text": " and so you see that this is only maybe", "tokens": [51565, 293, 370, 291, 536, 300, 341, 307, 787, 1310, 51660], "temperature": 0.0, "avg_logprob": -0.08083424917081507, "compression_ratio": 1.9024390243902438, "no_speech_prob": 0.001708847121335566}, {"id": 2908, "seek": 665804, "start": 6684.04, "end": 6685.94, "text": " what is this 20 lines of code or something like that", "tokens": [51665, 437, 307, 341, 945, 3876, 295, 3089, 420, 746, 411, 300, 51760], "temperature": 0.0, "avg_logprob": -0.08083424917081507, "compression_ratio": 1.9024390243902438, "no_speech_prob": 0.001708847121335566}, {"id": 2909, "seek": 665804, "start": 6686.04, "end": 6687.94, "text": " and that's what gives us gradients", "tokens": [51765, 293, 300, 311, 437, 2709, 505, 2771, 2448, 51860], "temperature": 0.0, "avg_logprob": -0.08083424917081507, "compression_ratio": 1.9024390243902438, "no_speech_prob": 0.001708847121335566}, {"id": 2910, "seek": 668794, "start": 6687.94, "end": 6689.839999999999, "text": " in this case loss that backward", "tokens": [50365, 294, 341, 1389, 4470, 300, 23897, 50460], "temperature": 0.0, "avg_logprob": -0.0905425331809304, "compression_ratio": 1.7204301075268817, "no_speech_prob": 0.001048167934641242}, {"id": 2911, "seek": 668794, "start": 6689.94, "end": 6691.839999999999, "text": " so the way I have the code set up is", "tokens": [50465, 370, 264, 636, 286, 362, 264, 3089, 992, 493, 307, 50560], "temperature": 0.0, "avg_logprob": -0.0905425331809304, "compression_ratio": 1.7204301075268817, "no_speech_prob": 0.001048167934641242}, {"id": 2912, "seek": 668794, "start": 6691.94, "end": 6693.839999999999, "text": " you should be able to run this entire cell", "tokens": [50565, 291, 820, 312, 1075, 281, 1190, 341, 2302, 2815, 50660], "temperature": 0.0, "avg_logprob": -0.0905425331809304, "compression_ratio": 1.7204301075268817, "no_speech_prob": 0.001048167934641242}, {"id": 2913, "seek": 668794, "start": 6693.94, "end": 6695.839999999999, "text": " once you fill this in", "tokens": [50665, 1564, 291, 2836, 341, 294, 50760], "temperature": 0.0, "avg_logprob": -0.0905425331809304, "compression_ratio": 1.7204301075268817, "no_speech_prob": 0.001048167934641242}, {"id": 2914, "seek": 668794, "start": 6695.94, "end": 6697.839999999999, "text": " and this will run for only 100 iterations", "tokens": [50765, 293, 341, 486, 1190, 337, 787, 2319, 36540, 50860], "temperature": 0.0, "avg_logprob": -0.0905425331809304, "compression_ratio": 1.7204301075268817, "no_speech_prob": 0.001048167934641242}, {"id": 2915, "seek": 668794, "start": 6697.94, "end": 6699.839999999999, "text": " and then break", "tokens": [50865, 293, 550, 1821, 50960], "temperature": 0.0, "avg_logprob": -0.0905425331809304, "compression_ratio": 1.7204301075268817, "no_speech_prob": 0.001048167934641242}, {"id": 2916, "seek": 668794, "start": 6699.94, "end": 6701.839999999999, "text": " and it breaks because it gives you an opportunity", "tokens": [50965, 293, 309, 9857, 570, 309, 2709, 291, 364, 2650, 51060], "temperature": 0.0, "avg_logprob": -0.0905425331809304, "compression_ratio": 1.7204301075268817, "no_speech_prob": 0.001048167934641242}, {"id": 2917, "seek": 668794, "start": 6701.94, "end": 6703.839999999999, "text": " to check your gradients against PyTorch", "tokens": [51065, 281, 1520, 428, 2771, 2448, 1970, 9953, 51, 284, 339, 51160], "temperature": 0.0, "avg_logprob": -0.0905425331809304, "compression_ratio": 1.7204301075268817, "no_speech_prob": 0.001048167934641242}, {"id": 2918, "seek": 668794, "start": 6703.94, "end": 6705.839999999999, "text": " so here our gradients we see", "tokens": [51165, 370, 510, 527, 2771, 2448, 321, 536, 51260], "temperature": 0.0, "avg_logprob": -0.0905425331809304, "compression_ratio": 1.7204301075268817, "no_speech_prob": 0.001048167934641242}, {"id": 2919, "seek": 668794, "start": 6705.94, "end": 6707.839999999999, "text": " are not exactly equal", "tokens": [51265, 366, 406, 2293, 2681, 51360], "temperature": 0.0, "avg_logprob": -0.0905425331809304, "compression_ratio": 1.7204301075268817, "no_speech_prob": 0.001048167934641242}, {"id": 2920, "seek": 668794, "start": 6707.94, "end": 6709.839999999999, "text": " they are approximately equal", "tokens": [51365, 436, 366, 10447, 2681, 51460], "temperature": 0.0, "avg_logprob": -0.0905425331809304, "compression_ratio": 1.7204301075268817, "no_speech_prob": 0.001048167934641242}, {"id": 2921, "seek": 668794, "start": 6709.94, "end": 6711.839999999999, "text": " and the differences are tiny", "tokens": [51465, 293, 264, 7300, 366, 5870, 51560], "temperature": 0.0, "avg_logprob": -0.0905425331809304, "compression_ratio": 1.7204301075268817, "no_speech_prob": 0.001048167934641242}, {"id": 2922, "seek": 668794, "start": 6711.94, "end": 6713.839999999999, "text": " one in negative nine or so", "tokens": [51565, 472, 294, 3671, 4949, 420, 370, 51660], "temperature": 0.0, "avg_logprob": -0.0905425331809304, "compression_ratio": 1.7204301075268817, "no_speech_prob": 0.001048167934641242}, {"id": 2923, "seek": 668794, "start": 6713.94, "end": 6715.839999999999, "text": " and I don't exactly know where they're coming from", "tokens": [51665, 293, 286, 500, 380, 2293, 458, 689, 436, 434, 1348, 490, 51760], "temperature": 0.0, "avg_logprob": -0.0905425331809304, "compression_ratio": 1.7204301075268817, "no_speech_prob": 0.001048167934641242}, {"id": 2924, "seek": 668794, "start": 6715.94, "end": 6717.839999999999, "text": " to be honest", "tokens": [51765, 281, 312, 3245, 51860], "temperature": 0.0, "avg_logprob": -0.0905425331809304, "compression_ratio": 1.7204301075268817, "no_speech_prob": 0.001048167934641242}, {"id": 2925, "seek": 671784, "start": 6717.84, "end": 6719.74, "text": " but if I'm basically correct", "tokens": [50365, 457, 498, 286, 478, 1936, 3006, 50460], "temperature": 0.0, "avg_logprob": -0.10883568073141164, "compression_ratio": 1.8837209302325582, "no_speech_prob": 0.0012280048104003072}, {"id": 2926, "seek": 671784, "start": 6719.84, "end": 6721.74, "text": " we can take out the gradient checking", "tokens": [50465, 321, 393, 747, 484, 264, 16235, 8568, 50560], "temperature": 0.0, "avg_logprob": -0.10883568073141164, "compression_ratio": 1.8837209302325582, "no_speech_prob": 0.0012280048104003072}, {"id": 2927, "seek": 671784, "start": 6721.84, "end": 6725.74, "text": " we can disable this breaking statement", "tokens": [50565, 321, 393, 28362, 341, 7697, 5629, 50760], "temperature": 0.0, "avg_logprob": -0.10883568073141164, "compression_ratio": 1.8837209302325582, "no_speech_prob": 0.0012280048104003072}, {"id": 2928, "seek": 671784, "start": 6725.84, "end": 6727.74, "text": " and then we can", "tokens": [50765, 293, 550, 321, 393, 50860], "temperature": 0.0, "avg_logprob": -0.10883568073141164, "compression_ratio": 1.8837209302325582, "no_speech_prob": 0.0012280048104003072}, {"id": 2929, "seek": 671784, "start": 6727.84, "end": 6729.74, "text": " basically disable loss that backward", "tokens": [50865, 1936, 28362, 4470, 300, 23897, 50960], "temperature": 0.0, "avg_logprob": -0.10883568073141164, "compression_ratio": 1.8837209302325582, "no_speech_prob": 0.0012280048104003072}, {"id": 2930, "seek": 671784, "start": 6729.84, "end": 6731.74, "text": " we don't need it anymore", "tokens": [50965, 321, 500, 380, 643, 309, 3602, 51060], "temperature": 0.0, "avg_logprob": -0.10883568073141164, "compression_ratio": 1.8837209302325582, "no_speech_prob": 0.0012280048104003072}, {"id": 2931, "seek": 671784, "start": 6731.84, "end": 6733.74, "text": " feels amazing to say that", "tokens": [51065, 3417, 2243, 281, 584, 300, 51160], "temperature": 0.0, "avg_logprob": -0.10883568073141164, "compression_ratio": 1.8837209302325582, "no_speech_prob": 0.0012280048104003072}, {"id": 2932, "seek": 671784, "start": 6733.84, "end": 6735.74, "text": " and then here", "tokens": [51165, 293, 550, 510, 51260], "temperature": 0.0, "avg_logprob": -0.10883568073141164, "compression_ratio": 1.8837209302325582, "no_speech_prob": 0.0012280048104003072}, {"id": 2933, "seek": 671784, "start": 6735.84, "end": 6737.74, "text": " when we are doing the update", "tokens": [51265, 562, 321, 366, 884, 264, 5623, 51360], "temperature": 0.0, "avg_logprob": -0.10883568073141164, "compression_ratio": 1.8837209302325582, "no_speech_prob": 0.0012280048104003072}, {"id": 2934, "seek": 671784, "start": 6737.84, "end": 6739.74, "text": " we're not going to use p.grad", "tokens": [51365, 321, 434, 406, 516, 281, 764, 280, 13, 7165, 51460], "temperature": 0.0, "avg_logprob": -0.10883568073141164, "compression_ratio": 1.8837209302325582, "no_speech_prob": 0.0012280048104003072}, {"id": 2935, "seek": 671784, "start": 6739.84, "end": 6741.74, "text": " this is the old way of PyTorch", "tokens": [51465, 341, 307, 264, 1331, 636, 295, 9953, 51, 284, 339, 51560], "temperature": 0.0, "avg_logprob": -0.10883568073141164, "compression_ratio": 1.8837209302325582, "no_speech_prob": 0.0012280048104003072}, {"id": 2936, "seek": 671784, "start": 6741.84, "end": 6743.74, "text": " we don't have that anymore", "tokens": [51565, 321, 500, 380, 362, 300, 3602, 51660], "temperature": 0.0, "avg_logprob": -0.10883568073141164, "compression_ratio": 1.8837209302325582, "no_speech_prob": 0.0012280048104003072}, {"id": 2937, "seek": 671784, "start": 6743.84, "end": 6745.74, "text": " because we're not doing backward", "tokens": [51665, 570, 321, 434, 406, 884, 23897, 51760], "temperature": 0.0, "avg_logprob": -0.10883568073141164, "compression_ratio": 1.8837209302325582, "no_speech_prob": 0.0012280048104003072}, {"id": 2938, "seek": 671784, "start": 6745.84, "end": 6747.74, "text": " we are going to use this update", "tokens": [51765, 321, 366, 516, 281, 764, 341, 5623, 51860], "temperature": 0.0, "avg_logprob": -0.10883568073141164, "compression_ratio": 1.8837209302325582, "no_speech_prob": 0.0012280048104003072}, {"id": 2939, "seek": 674774, "start": 6747.74, "end": 6749.639999999999, "text": " I'm grading over", "tokens": [50365, 286, 478, 35540, 670, 50460], "temperature": 0.0, "avg_logprob": -0.1169292221069336, "compression_ratio": 1.7797356828193833, "no_speech_prob": 0.0018215874442830682}, {"id": 2940, "seek": 674774, "start": 6749.74, "end": 6751.639999999999, "text": " I've arranged the grads to be in the same order", "tokens": [50465, 286, 600, 18721, 264, 2771, 82, 281, 312, 294, 264, 912, 1668, 50560], "temperature": 0.0, "avg_logprob": -0.1169292221069336, "compression_ratio": 1.7797356828193833, "no_speech_prob": 0.0018215874442830682}, {"id": 2941, "seek": 674774, "start": 6751.74, "end": 6753.639999999999, "text": " as the parameters", "tokens": [50565, 382, 264, 9834, 50660], "temperature": 0.0, "avg_logprob": -0.1169292221069336, "compression_ratio": 1.7797356828193833, "no_speech_prob": 0.0018215874442830682}, {"id": 2942, "seek": 674774, "start": 6753.74, "end": 6755.639999999999, "text": " and I'm zipping them up", "tokens": [50665, 293, 286, 478, 710, 6297, 552, 493, 50760], "temperature": 0.0, "avg_logprob": -0.1169292221069336, "compression_ratio": 1.7797356828193833, "no_speech_prob": 0.0018215874442830682}, {"id": 2943, "seek": 674774, "start": 6755.74, "end": 6757.639999999999, "text": " the gradients and the parameters", "tokens": [50765, 264, 2771, 2448, 293, 264, 9834, 50860], "temperature": 0.0, "avg_logprob": -0.1169292221069336, "compression_ratio": 1.7797356828193833, "no_speech_prob": 0.0018215874442830682}, {"id": 2944, "seek": 674774, "start": 6757.74, "end": 6759.639999999999, "text": " into p and grad", "tokens": [50865, 666, 280, 293, 2771, 50960], "temperature": 0.0, "avg_logprob": -0.1169292221069336, "compression_ratio": 1.7797356828193833, "no_speech_prob": 0.0018215874442830682}, {"id": 2945, "seek": 674774, "start": 6759.74, "end": 6761.639999999999, "text": " and then here I'm going to step with", "tokens": [50965, 293, 550, 510, 286, 478, 516, 281, 1823, 365, 51060], "temperature": 0.0, "avg_logprob": -0.1169292221069336, "compression_ratio": 1.7797356828193833, "no_speech_prob": 0.0018215874442830682}, {"id": 2946, "seek": 674774, "start": 6761.74, "end": 6763.639999999999, "text": " just the grad that we derived manually", "tokens": [51065, 445, 264, 2771, 300, 321, 18949, 16945, 51160], "temperature": 0.0, "avg_logprob": -0.1169292221069336, "compression_ratio": 1.7797356828193833, "no_speech_prob": 0.0018215874442830682}, {"id": 2947, "seek": 674774, "start": 6763.74, "end": 6765.639999999999, "text": " so the last piece", "tokens": [51165, 370, 264, 1036, 2522, 51260], "temperature": 0.0, "avg_logprob": -0.1169292221069336, "compression_ratio": 1.7797356828193833, "no_speech_prob": 0.0018215874442830682}, {"id": 2948, "seek": 674774, "start": 6765.74, "end": 6767.639999999999, "text": " is that none of this now requires", "tokens": [51265, 307, 300, 6022, 295, 341, 586, 7029, 51360], "temperature": 0.0, "avg_logprob": -0.1169292221069336, "compression_ratio": 1.7797356828193833, "no_speech_prob": 0.0018215874442830682}, {"id": 2949, "seek": 674774, "start": 6767.74, "end": 6769.639999999999, "text": " gradients from PyTorch", "tokens": [51365, 2771, 2448, 490, 9953, 51, 284, 339, 51460], "temperature": 0.0, "avg_logprob": -0.1169292221069336, "compression_ratio": 1.7797356828193833, "no_speech_prob": 0.0018215874442830682}, {"id": 2950, "seek": 674774, "start": 6769.74, "end": 6771.639999999999, "text": " and so one thing you can do here", "tokens": [51465, 293, 370, 472, 551, 291, 393, 360, 510, 51560], "temperature": 0.0, "avg_logprob": -0.1169292221069336, "compression_ratio": 1.7797356828193833, "no_speech_prob": 0.0018215874442830682}, {"id": 2951, "seek": 674774, "start": 6771.74, "end": 6773.639999999999, "text": " is you can do", "tokens": [51565, 307, 291, 393, 360, 51660], "temperature": 0.0, "avg_logprob": -0.1169292221069336, "compression_ratio": 1.7797356828193833, "no_speech_prob": 0.0018215874442830682}, {"id": 2952, "seek": 674774, "start": 6773.74, "end": 6775.639999999999, "text": " with torch.nograd", "tokens": [51665, 365, 27822, 13, 77, 664, 6206, 51760], "temperature": 0.0, "avg_logprob": -0.1169292221069336, "compression_ratio": 1.7797356828193833, "no_speech_prob": 0.0018215874442830682}, {"id": 2953, "seek": 674774, "start": 6775.74, "end": 6777.639999999999, "text": " and offset this whole code block", "tokens": [51765, 293, 18687, 341, 1379, 3089, 3461, 51860], "temperature": 0.0, "avg_logprob": -0.1169292221069336, "compression_ratio": 1.7797356828193833, "no_speech_prob": 0.0018215874442830682}, {"id": 2954, "seek": 677774, "start": 6777.74, "end": 6779.639999999999, "text": " and really what you're saying is", "tokens": [50365, 293, 534, 437, 291, 434, 1566, 307, 50460], "temperature": 0.0, "avg_logprob": -0.08902068365187872, "compression_ratio": 1.7682403433476395, "no_speech_prob": 0.00045474476064555347}, {"id": 2955, "seek": 677774, "start": 6779.74, "end": 6781.639999999999, "text": " you're telling PyTorch that hey", "tokens": [50465, 291, 434, 3585, 9953, 51, 284, 339, 300, 4177, 50560], "temperature": 0.0, "avg_logprob": -0.08902068365187872, "compression_ratio": 1.7682403433476395, "no_speech_prob": 0.00045474476064555347}, {"id": 2956, "seek": 677774, "start": 6781.74, "end": 6783.639999999999, "text": " I'm not going to call backward on any of this", "tokens": [50565, 286, 478, 406, 516, 281, 818, 23897, 322, 604, 295, 341, 50660], "temperature": 0.0, "avg_logprob": -0.08902068365187872, "compression_ratio": 1.7682403433476395, "no_speech_prob": 0.00045474476064555347}, {"id": 2957, "seek": 677774, "start": 6783.74, "end": 6785.639999999999, "text": " and this allows PyTorch to be", "tokens": [50665, 293, 341, 4045, 9953, 51, 284, 339, 281, 312, 50760], "temperature": 0.0, "avg_logprob": -0.08902068365187872, "compression_ratio": 1.7682403433476395, "no_speech_prob": 0.00045474476064555347}, {"id": 2958, "seek": 677774, "start": 6785.74, "end": 6787.639999999999, "text": " a bit more efficient with all of it", "tokens": [50765, 257, 857, 544, 7148, 365, 439, 295, 309, 50860], "temperature": 0.0, "avg_logprob": -0.08902068365187872, "compression_ratio": 1.7682403433476395, "no_speech_prob": 0.00045474476064555347}, {"id": 2959, "seek": 677774, "start": 6787.74, "end": 6789.639999999999, "text": " and then we should be able to just run this", "tokens": [50865, 293, 550, 321, 820, 312, 1075, 281, 445, 1190, 341, 50960], "temperature": 0.0, "avg_logprob": -0.08902068365187872, "compression_ratio": 1.7682403433476395, "no_speech_prob": 0.00045474476064555347}, {"id": 2960, "seek": 677774, "start": 6789.74, "end": 6791.639999999999, "text": " and", "tokens": [50965, 293, 51060], "temperature": 0.0, "avg_logprob": -0.08902068365187872, "compression_ratio": 1.7682403433476395, "no_speech_prob": 0.00045474476064555347}, {"id": 2961, "seek": 677774, "start": 6791.74, "end": 6793.639999999999, "text": " it's running", "tokens": [51065, 309, 311, 2614, 51160], "temperature": 0.0, "avg_logprob": -0.08902068365187872, "compression_ratio": 1.7682403433476395, "no_speech_prob": 0.00045474476064555347}, {"id": 2962, "seek": 677774, "start": 6793.74, "end": 6795.639999999999, "text": " and you see that", "tokens": [51165, 293, 291, 536, 300, 51260], "temperature": 0.0, "avg_logprob": -0.08902068365187872, "compression_ratio": 1.7682403433476395, "no_speech_prob": 0.00045474476064555347}, {"id": 2963, "seek": 677774, "start": 6795.74, "end": 6797.639999999999, "text": " loss that backward is commented out", "tokens": [51265, 4470, 300, 23897, 307, 26940, 484, 51360], "temperature": 0.0, "avg_logprob": -0.08902068365187872, "compression_ratio": 1.7682403433476395, "no_speech_prob": 0.00045474476064555347}, {"id": 2964, "seek": 677774, "start": 6797.74, "end": 6799.639999999999, "text": " and we're optimizing", "tokens": [51365, 293, 321, 434, 40425, 51460], "temperature": 0.0, "avg_logprob": -0.08902068365187872, "compression_ratio": 1.7682403433476395, "no_speech_prob": 0.00045474476064555347}, {"id": 2965, "seek": 677774, "start": 6799.74, "end": 6801.639999999999, "text": " so we're going to leave this run", "tokens": [51465, 370, 321, 434, 516, 281, 1856, 341, 1190, 51560], "temperature": 0.0, "avg_logprob": -0.08902068365187872, "compression_ratio": 1.7682403433476395, "no_speech_prob": 0.00045474476064555347}, {"id": 2966, "seek": 677774, "start": 6801.74, "end": 6803.639999999999, "text": " and hopefully", "tokens": [51565, 293, 4696, 51660], "temperature": 0.0, "avg_logprob": -0.08902068365187872, "compression_ratio": 1.7682403433476395, "no_speech_prob": 0.00045474476064555347}, {"id": 2967, "seek": 677774, "start": 6803.74, "end": 6805.639999999999, "text": " we get a good result", "tokens": [51665, 321, 483, 257, 665, 1874, 51760], "temperature": 0.0, "avg_logprob": -0.08902068365187872, "compression_ratio": 1.7682403433476395, "no_speech_prob": 0.00045474476064555347}, {"id": 2968, "seek": 677774, "start": 6805.74, "end": 6807.639999999999, "text": " okay so I allowed the neural net", "tokens": [51765, 1392, 370, 286, 4350, 264, 18161, 2533, 51860], "temperature": 0.0, "avg_logprob": -0.08902068365187872, "compression_ratio": 1.7682403433476395, "no_speech_prob": 0.00045474476064555347}, {"id": 2969, "seek": 680764, "start": 6807.64, "end": 6809.54, "text": " optimization then here", "tokens": [50365, 19618, 550, 510, 50460], "temperature": 0.0, "avg_logprob": -0.10001520170782605, "compression_ratio": 1.8105263157894738, "no_speech_prob": 0.0008716996526345611}, {"id": 2970, "seek": 680764, "start": 6809.64, "end": 6811.54, "text": " I calibrate the BatchNorm parameters", "tokens": [50465, 286, 21583, 4404, 264, 363, 852, 45, 687, 9834, 50560], "temperature": 0.0, "avg_logprob": -0.10001520170782605, "compression_ratio": 1.8105263157894738, "no_speech_prob": 0.0008716996526345611}, {"id": 2971, "seek": 680764, "start": 6811.64, "end": 6813.54, "text": " because I did not keep track of the running", "tokens": [50565, 570, 286, 630, 406, 1066, 2837, 295, 264, 2614, 50660], "temperature": 0.0, "avg_logprob": -0.10001520170782605, "compression_ratio": 1.8105263157894738, "no_speech_prob": 0.0008716996526345611}, {"id": 2972, "seek": 680764, "start": 6813.64, "end": 6815.54, "text": " mean and variance", "tokens": [50665, 914, 293, 21977, 50760], "temperature": 0.0, "avg_logprob": -0.10001520170782605, "compression_ratio": 1.8105263157894738, "no_speech_prob": 0.0008716996526345611}, {"id": 2973, "seek": 680764, "start": 6815.64, "end": 6817.54, "text": " in the training loop", "tokens": [50765, 294, 264, 3097, 6367, 50860], "temperature": 0.0, "avg_logprob": -0.10001520170782605, "compression_ratio": 1.8105263157894738, "no_speech_prob": 0.0008716996526345611}, {"id": 2974, "seek": 680764, "start": 6817.64, "end": 6819.54, "text": " then here I ran the loss", "tokens": [50865, 550, 510, 286, 5872, 264, 4470, 50960], "temperature": 0.0, "avg_logprob": -0.10001520170782605, "compression_ratio": 1.8105263157894738, "no_speech_prob": 0.0008716996526345611}, {"id": 2975, "seek": 680764, "start": 6819.64, "end": 6821.54, "text": " and you see that we actually obtained a pretty good loss", "tokens": [50965, 293, 291, 536, 300, 321, 767, 14879, 257, 1238, 665, 4470, 51060], "temperature": 0.0, "avg_logprob": -0.10001520170782605, "compression_ratio": 1.8105263157894738, "no_speech_prob": 0.0008716996526345611}, {"id": 2976, "seek": 680764, "start": 6821.64, "end": 6823.54, "text": " very similar to what we've achieved before", "tokens": [51065, 588, 2531, 281, 437, 321, 600, 11042, 949, 51160], "temperature": 0.0, "avg_logprob": -0.10001520170782605, "compression_ratio": 1.8105263157894738, "no_speech_prob": 0.0008716996526345611}, {"id": 2977, "seek": 680764, "start": 6823.64, "end": 6825.54, "text": " and then here I'm sampling from the model", "tokens": [51165, 293, 550, 510, 286, 478, 21179, 490, 264, 2316, 51260], "temperature": 0.0, "avg_logprob": -0.10001520170782605, "compression_ratio": 1.8105263157894738, "no_speech_prob": 0.0008716996526345611}, {"id": 2978, "seek": 680764, "start": 6825.64, "end": 6827.54, "text": " and we see some of the name-like gibberish", "tokens": [51265, 293, 321, 536, 512, 295, 264, 1315, 12, 4092, 4553, 43189, 51360], "temperature": 0.0, "avg_logprob": -0.10001520170782605, "compression_ratio": 1.8105263157894738, "no_speech_prob": 0.0008716996526345611}, {"id": 2979, "seek": 680764, "start": 6827.64, "end": 6829.54, "text": " that we're sort of used to", "tokens": [51365, 300, 321, 434, 1333, 295, 1143, 281, 51460], "temperature": 0.0, "avg_logprob": -0.10001520170782605, "compression_ratio": 1.8105263157894738, "no_speech_prob": 0.0008716996526345611}, {"id": 2980, "seek": 680764, "start": 6829.64, "end": 6831.54, "text": " so basically the model worked and samples", "tokens": [51465, 370, 1936, 264, 2316, 2732, 293, 10938, 51560], "temperature": 0.0, "avg_logprob": -0.10001520170782605, "compression_ratio": 1.8105263157894738, "no_speech_prob": 0.0008716996526345611}, {"id": 2981, "seek": 680764, "start": 6831.64, "end": 6833.54, "text": " pretty decent results", "tokens": [51565, 1238, 8681, 3542, 51660], "temperature": 0.0, "avg_logprob": -0.10001520170782605, "compression_ratio": 1.8105263157894738, "no_speech_prob": 0.0008716996526345611}, {"id": 2982, "seek": 680764, "start": 6833.64, "end": 6835.54, "text": " compared to what we were used to", "tokens": [51665, 5347, 281, 437, 321, 645, 1143, 281, 51760], "temperature": 0.0, "avg_logprob": -0.10001520170782605, "compression_ratio": 1.8105263157894738, "no_speech_prob": 0.0008716996526345611}, {"id": 2983, "seek": 680764, "start": 6835.64, "end": 6837.54, "text": " so everything is the same but of course", "tokens": [51765, 370, 1203, 307, 264, 912, 457, 295, 1164, 51860], "temperature": 0.0, "avg_logprob": -0.10001520170782605, "compression_ratio": 1.8105263157894738, "no_speech_prob": 0.0008716996526345611}, {"id": 2984, "seek": 683754, "start": 6837.54, "end": 6839.44, "text": " the big deal is that we did not use lots of backward", "tokens": [50365, 264, 955, 2028, 307, 300, 321, 630, 406, 764, 3195, 295, 23897, 50460], "temperature": 0.0, "avg_logprob": -0.06682968820844377, "compression_ratio": 1.7901639344262295, "no_speech_prob": 0.00030178186716511846}, {"id": 2985, "seek": 683754, "start": 6839.54, "end": 6841.44, "text": " we did not use PyTorch AutoGrad", "tokens": [50465, 321, 630, 406, 764, 9953, 51, 284, 339, 13738, 38, 6206, 50560], "temperature": 0.0, "avg_logprob": -0.06682968820844377, "compression_ratio": 1.7901639344262295, "no_speech_prob": 0.00030178186716511846}, {"id": 2986, "seek": 683754, "start": 6841.54, "end": 6843.44, "text": " and we estimated our gradients ourselves", "tokens": [50565, 293, 321, 14109, 527, 2771, 2448, 4175, 50660], "temperature": 0.0, "avg_logprob": -0.06682968820844377, "compression_ratio": 1.7901639344262295, "no_speech_prob": 0.00030178186716511846}, {"id": 2987, "seek": 683754, "start": 6843.54, "end": 6845.44, "text": " by hand", "tokens": [50665, 538, 1011, 50760], "temperature": 0.0, "avg_logprob": -0.06682968820844377, "compression_ratio": 1.7901639344262295, "no_speech_prob": 0.00030178186716511846}, {"id": 2988, "seek": 683754, "start": 6845.54, "end": 6847.44, "text": " and so hopefully you're looking at this", "tokens": [50765, 293, 370, 4696, 291, 434, 1237, 412, 341, 50860], "temperature": 0.0, "avg_logprob": -0.06682968820844377, "compression_ratio": 1.7901639344262295, "no_speech_prob": 0.00030178186716511846}, {"id": 2989, "seek": 683754, "start": 6847.54, "end": 6849.44, "text": " the backward pass of this neural net", "tokens": [50865, 264, 23897, 1320, 295, 341, 18161, 2533, 50960], "temperature": 0.0, "avg_logprob": -0.06682968820844377, "compression_ratio": 1.7901639344262295, "no_speech_prob": 0.00030178186716511846}, {"id": 2990, "seek": 683754, "start": 6849.54, "end": 6851.44, "text": " and you're thinking to yourself", "tokens": [50965, 293, 291, 434, 1953, 281, 1803, 51060], "temperature": 0.0, "avg_logprob": -0.06682968820844377, "compression_ratio": 1.7901639344262295, "no_speech_prob": 0.00030178186716511846}, {"id": 2991, "seek": 683754, "start": 6851.54, "end": 6853.44, "text": " actually that's not too complicated", "tokens": [51065, 767, 300, 311, 406, 886, 6179, 51160], "temperature": 0.0, "avg_logprob": -0.06682968820844377, "compression_ratio": 1.7901639344262295, "no_speech_prob": 0.00030178186716511846}, {"id": 2992, "seek": 683754, "start": 6853.54, "end": 6855.44, "text": " each one of these layers is like three lines of code", "tokens": [51165, 1184, 472, 295, 613, 7914, 307, 411, 1045, 3876, 295, 3089, 51260], "temperature": 0.0, "avg_logprob": -0.06682968820844377, "compression_ratio": 1.7901639344262295, "no_speech_prob": 0.00030178186716511846}, {"id": 2993, "seek": 683754, "start": 6855.54, "end": 6857.44, "text": " or something like that", "tokens": [51265, 420, 746, 411, 300, 51360], "temperature": 0.0, "avg_logprob": -0.06682968820844377, "compression_ratio": 1.7901639344262295, "no_speech_prob": 0.00030178186716511846}, {"id": 2994, "seek": 683754, "start": 6857.54, "end": 6859.44, "text": " and most of it is fairly straightforward", "tokens": [51365, 293, 881, 295, 309, 307, 6457, 15325, 51460], "temperature": 0.0, "avg_logprob": -0.06682968820844377, "compression_ratio": 1.7901639344262295, "no_speech_prob": 0.00030178186716511846}, {"id": 2995, "seek": 683754, "start": 6859.54, "end": 6861.44, "text": " potentially with the notable exception", "tokens": [51465, 7263, 365, 264, 22556, 11183, 51560], "temperature": 0.0, "avg_logprob": -0.06682968820844377, "compression_ratio": 1.7901639344262295, "no_speech_prob": 0.00030178186716511846}, {"id": 2996, "seek": 683754, "start": 6861.54, "end": 6863.44, "text": " of the BatchNormalization backward pass", "tokens": [51565, 295, 264, 363, 852, 45, 24440, 2144, 23897, 1320, 51660], "temperature": 0.0, "avg_logprob": -0.06682968820844377, "compression_ratio": 1.7901639344262295, "no_speech_prob": 0.00030178186716511846}, {"id": 2997, "seek": 683754, "start": 6863.54, "end": 6865.44, "text": " otherwise it's pretty good", "tokens": [51665, 5911, 309, 311, 1238, 665, 51760], "temperature": 0.0, "avg_logprob": -0.06682968820844377, "compression_ratio": 1.7901639344262295, "no_speech_prob": 0.00030178186716511846}, {"id": 2998, "seek": 683754, "start": 6865.54, "end": 6867.44, "text": " okay and that's everything I wanted to cover", "tokens": [51765, 1392, 293, 300, 311, 1203, 286, 1415, 281, 2060, 51860], "temperature": 0.0, "avg_logprob": -0.06682968820844377, "compression_ratio": 1.7901639344262295, "no_speech_prob": 0.00030178186716511846}, {"id": 2999, "seek": 686754, "start": 6867.54, "end": 6869.44, "text": " so hopefully you found this interesting", "tokens": [50365, 370, 4696, 291, 1352, 341, 1880, 50460], "temperature": 0.0, "avg_logprob": -0.04549573043297077, "compression_ratio": 1.8295819935691318, "no_speech_prob": 0.0009155283914878964}, {"id": 3000, "seek": 686754, "start": 6869.54, "end": 6871.44, "text": " and what I liked about it honestly is that", "tokens": [50465, 293, 437, 286, 4501, 466, 309, 6095, 307, 300, 50560], "temperature": 0.0, "avg_logprob": -0.04549573043297077, "compression_ratio": 1.8295819935691318, "no_speech_prob": 0.0009155283914878964}, {"id": 3001, "seek": 686754, "start": 6871.54, "end": 6873.44, "text": " it gave us a very nice diversity of layers", "tokens": [50565, 309, 2729, 505, 257, 588, 1481, 8811, 295, 7914, 50660], "temperature": 0.0, "avg_logprob": -0.04549573043297077, "compression_ratio": 1.8295819935691318, "no_speech_prob": 0.0009155283914878964}, {"id": 3002, "seek": 686754, "start": 6873.54, "end": 6875.44, "text": " to backpropagate through", "tokens": [50665, 281, 646, 79, 1513, 559, 473, 807, 50760], "temperature": 0.0, "avg_logprob": -0.04549573043297077, "compression_ratio": 1.8295819935691318, "no_speech_prob": 0.0009155283914878964}, {"id": 3003, "seek": 686754, "start": 6875.54, "end": 6877.44, "text": " and I think it gives a pretty nice", "tokens": [50765, 293, 286, 519, 309, 2709, 257, 1238, 1481, 50860], "temperature": 0.0, "avg_logprob": -0.04549573043297077, "compression_ratio": 1.8295819935691318, "no_speech_prob": 0.0009155283914878964}, {"id": 3004, "seek": 686754, "start": 6877.54, "end": 6879.44, "text": " and comprehensive sense of how these", "tokens": [50865, 293, 13914, 2020, 295, 577, 613, 50960], "temperature": 0.0, "avg_logprob": -0.04549573043297077, "compression_ratio": 1.8295819935691318, "no_speech_prob": 0.0009155283914878964}, {"id": 3005, "seek": 686754, "start": 6879.54, "end": 6881.44, "text": " backward passes are implemented", "tokens": [50965, 23897, 11335, 366, 12270, 51060], "temperature": 0.0, "avg_logprob": -0.04549573043297077, "compression_ratio": 1.8295819935691318, "no_speech_prob": 0.0009155283914878964}, {"id": 3006, "seek": 686754, "start": 6881.54, "end": 6883.44, "text": " and how they work", "tokens": [51065, 293, 577, 436, 589, 51160], "temperature": 0.0, "avg_logprob": -0.04549573043297077, "compression_ratio": 1.8295819935691318, "no_speech_prob": 0.0009155283914878964}, {"id": 3007, "seek": 686754, "start": 6883.54, "end": 6885.44, "text": " and you'd be able to derive them yourself", "tokens": [51165, 293, 291, 1116, 312, 1075, 281, 28446, 552, 1803, 51260], "temperature": 0.0, "avg_logprob": -0.04549573043297077, "compression_ratio": 1.8295819935691318, "no_speech_prob": 0.0009155283914878964}, {"id": 3008, "seek": 686754, "start": 6885.54, "end": 6887.44, "text": " but of course in practice you probably don't want to", "tokens": [51265, 457, 295, 1164, 294, 3124, 291, 1391, 500, 380, 528, 281, 51360], "temperature": 0.0, "avg_logprob": -0.04549573043297077, "compression_ratio": 1.8295819935691318, "no_speech_prob": 0.0009155283914878964}, {"id": 3009, "seek": 686754, "start": 6887.54, "end": 6889.44, "text": " and you want to use the PyTorch AutoGrad", "tokens": [51365, 293, 291, 528, 281, 764, 264, 9953, 51, 284, 339, 13738, 38, 6206, 51460], "temperature": 0.0, "avg_logprob": -0.04549573043297077, "compression_ratio": 1.8295819935691318, "no_speech_prob": 0.0009155283914878964}, {"id": 3010, "seek": 686754, "start": 6889.54, "end": 6891.44, "text": " but hopefully you have some intuition about", "tokens": [51465, 457, 4696, 291, 362, 512, 24002, 466, 51560], "temperature": 0.0, "avg_logprob": -0.04549573043297077, "compression_ratio": 1.8295819935691318, "no_speech_prob": 0.0009155283914878964}, {"id": 3011, "seek": 686754, "start": 6891.54, "end": 6893.44, "text": " how gradients flow backwards through the neural net", "tokens": [51565, 577, 2771, 2448, 3095, 12204, 807, 264, 18161, 2533, 51660], "temperature": 0.0, "avg_logprob": -0.04549573043297077, "compression_ratio": 1.8295819935691318, "no_speech_prob": 0.0009155283914878964}, {"id": 3012, "seek": 686754, "start": 6893.54, "end": 6895.44, "text": " starting at the loss", "tokens": [51665, 2891, 412, 264, 4470, 51760], "temperature": 0.0, "avg_logprob": -0.04549573043297077, "compression_ratio": 1.8295819935691318, "no_speech_prob": 0.0009155283914878964}, {"id": 3013, "seek": 686754, "start": 6895.54, "end": 6897.44, "text": " and how they flow through all the variables", "tokens": [51765, 293, 577, 436, 3095, 807, 439, 264, 9102, 51860], "temperature": 0.0, "avg_logprob": -0.04549573043297077, "compression_ratio": 1.8295819935691318, "no_speech_prob": 0.0009155283914878964}, {"id": 3014, "seek": 689754, "start": 6897.54, "end": 6899.44, "text": " and if you understood a good chunk of it", "tokens": [50365, 293, 498, 291, 7320, 257, 665, 16635, 295, 309, 50460], "temperature": 0.0, "avg_logprob": -0.07815044109637921, "compression_ratio": 1.7586206896551724, "no_speech_prob": 0.0012782770209014416}, {"id": 3015, "seek": 689754, "start": 6899.54, "end": 6901.44, "text": " and if you have a sense of that", "tokens": [50465, 293, 498, 291, 362, 257, 2020, 295, 300, 50560], "temperature": 0.0, "avg_logprob": -0.07815044109637921, "compression_ratio": 1.7586206896551724, "no_speech_prob": 0.0012782770209014416}, {"id": 3016, "seek": 689754, "start": 6901.54, "end": 6903.44, "text": " then you can count yourself as one of these", "tokens": [50565, 550, 291, 393, 1207, 1803, 382, 472, 295, 613, 50660], "temperature": 0.0, "avg_logprob": -0.07815044109637921, "compression_ratio": 1.7586206896551724, "no_speech_prob": 0.0012782770209014416}, {"id": 3017, "seek": 689754, "start": 6903.54, "end": 6905.44, "text": " buff dojis on the left", "tokens": [50665, 9204, 360, 40371, 322, 264, 1411, 50760], "temperature": 0.0, "avg_logprob": -0.07815044109637921, "compression_ratio": 1.7586206896551724, "no_speech_prob": 0.0012782770209014416}, {"id": 3018, "seek": 689754, "start": 6905.54, "end": 6907.44, "text": " instead of the dojis on the right here", "tokens": [50765, 2602, 295, 264, 360, 40371, 322, 264, 558, 510, 50860], "temperature": 0.0, "avg_logprob": -0.07815044109637921, "compression_ratio": 1.7586206896551724, "no_speech_prob": 0.0012782770209014416}, {"id": 3019, "seek": 689754, "start": 6907.54, "end": 6909.44, "text": " now in the next lecture", "tokens": [50865, 586, 294, 264, 958, 7991, 50960], "temperature": 0.0, "avg_logprob": -0.07815044109637921, "compression_ratio": 1.7586206896551724, "no_speech_prob": 0.0012782770209014416}, {"id": 3020, "seek": 689754, "start": 6909.54, "end": 6911.44, "text": " we're actually going to go to recurrent neural nets", "tokens": [50965, 321, 434, 767, 516, 281, 352, 281, 18680, 1753, 18161, 36170, 51060], "temperature": 0.0, "avg_logprob": -0.07815044109637921, "compression_ratio": 1.7586206896551724, "no_speech_prob": 0.0012782770209014416}, {"id": 3021, "seek": 689754, "start": 6911.54, "end": 6913.44, "text": " LSTMs and all the other variants", "tokens": [51065, 441, 6840, 26386, 293, 439, 264, 661, 21669, 51160], "temperature": 0.0, "avg_logprob": -0.07815044109637921, "compression_ratio": 1.7586206896551724, "no_speech_prob": 0.0012782770209014416}, {"id": 3022, "seek": 689754, "start": 6913.54, "end": 6915.44, "text": " of RNNs", "tokens": [51165, 295, 45702, 45, 82, 51260], "temperature": 0.0, "avg_logprob": -0.07815044109637921, "compression_ratio": 1.7586206896551724, "no_speech_prob": 0.0012782770209014416}, {"id": 3023, "seek": 689754, "start": 6915.54, "end": 6917.44, "text": " and we're going to start to complexify the architecture", "tokens": [51265, 293, 321, 434, 516, 281, 722, 281, 3997, 2505, 264, 9482, 51360], "temperature": 0.0, "avg_logprob": -0.07815044109637921, "compression_ratio": 1.7586206896551724, "no_speech_prob": 0.0012782770209014416}, {"id": 3024, "seek": 689754, "start": 6917.54, "end": 6919.44, "text": " and start to achieve better log likelihoods", "tokens": [51365, 293, 722, 281, 4584, 1101, 3565, 22119, 82, 51460], "temperature": 0.0, "avg_logprob": -0.07815044109637921, "compression_ratio": 1.7586206896551724, "no_speech_prob": 0.0012782770209014416}, {"id": 3025, "seek": 689754, "start": 6919.54, "end": 6921.44, "text": " and so I'm really looking forward to that", "tokens": [51465, 293, 370, 286, 478, 534, 1237, 2128, 281, 300, 51560], "temperature": 0.0, "avg_logprob": -0.07815044109637921, "compression_ratio": 1.7586206896551724, "no_speech_prob": 0.0012782770209014416}, {"id": 3026, "seek": 689754, "start": 6921.54, "end": 6923.44, "text": " and I'll see you then", "tokens": [51565, 293, 286, 603, 536, 291, 550, 51660], "temperature": 0.0, "avg_logprob": -0.07815044109637921, "compression_ratio": 1.7586206896551724, "no_speech_prob": 0.0012782770209014416}], "language": "en"}