Bradm/markup fixes (#271)

* Markup fix for heading size * \hat{r} fix
twitter · Oct 10, 2024 · da77c13 · da77c13
1 parent f53a697
commit da77c13
Showing 1 changed file with 5 additions and 5 deletions.
diff --git a/documentation/under-the-hood/ranking-notes.md b/documentation/under-the-hood/ranking-notes.md
@@ -90,13 +90,13 @@ Additionally, because the matrix factorization is re-trained from scratch every
 While the matrix factorization approach above has many nice properties, it doesn't give us a natural built-in way to estimate the uncertainty of its parameters.
 We take two approaches to model uncertainty:
 
-### Pseudo-rating sensitivity analysis
+#### Pseudo-rating sensitivity analysis
 
 While the matrix factorization approach above has many nice properties, it doesn't give us a natural built-in way to estimate the uncertainty of its parameters. One approach that we use to help quantify the uncertainty in our parameter estimates is by adding in "extreme" ratings from "pseudo-raters", and measuring the maximum and minimum possible values that each note's intercept and factor parameters take on after all possible pseudo-ratings are adding. We add both helpful and not-helpful ratings, from pseudo-raters with the max and min possible rater intercepts, and with the max and min possible factors (as well as 0, since 0-factor raters can often have outsized impact on note intercepts). This approach is similar in spirit to the idea of pseudocounts in Bayesian modeling, or to Shapley values.
 
 We currently assign notes a "Not Helpful" status if the max (upper confidence bound) of their intercept is less than -0.04, in addition to the rules on the raw intercept values defined in the previous section.
 
-### Supervised confidence modeling
+#### Supervised confidence modeling
 
 We also employ a supervised model to detect low confidence matrix factorization results.
 If the model predicts that a note will lose Helpful status, then the note will remain in Needs More Ratings status for an additional 30 minutes to allow it to gather a larger set of ratings.
@@ -238,13 +238,13 @@ The second round uses the user factors learned during the first round to weight
 
 As with the baseline matrix factorization approach, we predict each rating as
 
-$$ \hat{r}_{un} = \mu + i_u + i_n + f_u \cdot f_n $$
+$$ r̂_{un} = \mu + i_u + i_n + f_u \cdot f_n $$
 
 During the first round, we minimize the loss shown below over the set of all observed ratings $r_{un}$.
 Note that this model uses a single-dimensional factor representation. 
 
 $$
-\sum_{r_{un}} (r_{un} - \hat{r}_{un})^2 + \lambda_{iu} i_u^2 + \lambda_{in} i_n^2 + \lambda_{\mu} \mu^2 + \lambda_{fu} f_u^2 + \lambda_{fn} f_n^2 + \lambda_{if} i_n |f_n|
+\sum_{r_{un}} (r_{un} - r̂_{un})^2 + \lambda_{iu} i_u^2 + \lambda_{in} i_n^2 + \lambda_{\mu} \mu^2 + \lambda_{fu} f_u^2 + \lambda_{fn} f_n^2 + \lambda_{if} i_n |f_n|
 $$
 
 Where $\lambda_{iu}=30\lambda$, $\lambda_{in}=5\lambda$, $\lambda_{\mu}=5\lambda$, $\lambda_{fu}=\dfrac{\lambda}{4}$, $\lambda_{fn}=\dfrac{\lambda}{3}$, $\lambda_{if}=25\lambda$ and $\lambda=0.03$.
@@ -267,7 +267,7 @@ Notice that the weights $w^S_{un}$ function to balance the loss across ratings f
 Consequently, the loss optimized during the second round is:
 
 $$
-\sum_{r_{un}} w_{un} (r_{un} - \hat{r}_{un})^2 + \lambda_{iu} i_u^2 + \lambda_{in} i_n^2 + \lambda_{\mu} \mu^2 + \lambda_{fu} f_u^2 + \lambda_{fn} f_n^2 + \lambda_{if} i_n |f_n|
+\sum_{r_{un}} w_{un} (r_{un} - r̂_{un})^2 + \lambda_{iu} i_u^2 + \lambda_{in} i_n^2 + \lambda_{\mu} \mu^2 + \lambda_{fu} f_u^2 + \lambda_{fn} f_n^2 + \lambda_{if} i_n |f_n|
 $$
 
 Combined with the regularization adjustments from the first round, the added weighting functions to improve the learned user representation, ultimately allowing the model to recognize more instances of consensus among users that hold different perspectives.