Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jagged Sequences Cross Entropy Aggregation #8

Open
knowlen opened this issue Apr 13, 2021 · 0 comments
Open

Jagged Sequences Cross Entropy Aggregation #8

knowlen opened this issue Apr 13, 2021 · 0 comments

Comments

@knowlen
Copy link

knowlen commented Apr 13, 2021

I believe event entropy is being computed incorrectly in the simple_lm and tiered_lm models for jagged sequences (eg; multi-stream data case). Note that jagged sequences were not used in any of the publications, and the option was only included as an experimental feature for follow up work.

Overview

in the case of jagged arrays, the code appears to compute event entropy as

∑ mask * s

but I think it should be

1/seq_len * ∑ mask * s

where

seq_len: is the sequence length of this line

mask: is a D dimensional binary vector where every index beyond seq_len is 0
      Eg; [1,1,1,1,0,0] 
      
s: is a D dimensional vector of token level cross entropy scores
   Eg; [0.18, 0.23, 0.08, 0.87, 0.06, 0.18]
                               -----  ----- unusable, zero out with mask
                               
D: is the maximum sequence length set during training
   eg; max([len(line) for line in data])

∑: is a summation over a vector
   Eg; sum([0.18, 0.23, 0.08, 0.87, 0.0, 0.0])

Without dividing by true sequence lengths, the line loss -consequentially the anomaly score (src)- becomes a function of sequence length, and the batch losses are on variable scales defined by their mean sequence lengths. This appears to be a typo, as reduce_mean is used along the sequence axis when lengths are not jagged.

Trace:

Proposed Fix

Change this line in simple_lm.py and tiered_lm.py

line_losses = tf.reduce_sum(token_losses, axis=1)  # batch_size X 1

to

true_seq_len = tf.reduce_sum(ph_dict['mask'], axis=-1)
line_losses = tf.reduce_sum(token_losses, axis=1) / true_seq_len
@knowlen knowlen changed the title Bug for jagged-sequences training Jagged Sequences Cross Entropy Aggregation Apr 13, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant