You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I believe event entropy is being computed incorrectly in the simple_lm and tiered_lm models for jagged sequences (eg; multi-stream data case). Note that jagged sequences were not used in any of the publications, and the option was only included as an experimental feature for follow up work.
Overview
in the case of jagged arrays, the code appears to compute event entropy as
∑ mask * s
but I think it should be
1/seq_len * ∑ mask * s
where
seq_len: is the sequence length of this line
mask: is a D dimensional binary vector where every index beyond seq_len is 0
Eg; [1,1,1,1,0,0]
s: is a D dimensional vector of token level cross entropy scores
Eg; [0.18, 0.23, 0.08, 0.87, 0.06, 0.18]
----- ----- unusable, zero out with mask
D: is the maximum sequence length set during training
eg; max([len(line) for line in data])
∑: is a summation over a vector
Eg; sum([0.18, 0.23, 0.08, 0.87, 0.0, 0.0])
Without dividing by true sequence lengths, the line loss -consequentially the anomaly score (src)- becomes a function of sequence length, and the batch losses are on variable scales defined by their mean sequence lengths. This appears to be a typo, as reduce_mean is used along the sequence axis when lengths are not jagged.
I believe event entropy is being computed incorrectly in the simple_lm and tiered_lm models for jagged sequences (eg; multi-stream data case). Note that jagged sequences were not used in any of the publications, and the option was only included as an experimental feature for follow up work.
Overview
in the case of jagged arrays, the code appears to compute event entropy as
but I think it should be
where
Without dividing by true sequence lengths, the line loss -consequentially the anomaly score (src)- becomes a function of sequence length, and the batch losses are on variable scales defined by their mean sequence lengths. This appears to be a typo, as reduce_mean is used along the sequence axis when lengths are not jagged.
Trace:
Proposed Fix
Change this line in simple_lm.py and tiered_lm.py
to
The text was updated successfully, but these errors were encountered: