-
Notifications
You must be signed in to change notification settings - Fork 561
[Ideas] Open ideas #460
Comments
Check for dead neurons: |
z = z * move_num/length higher learning rate early |
Adding stuff about distillation and Seth ideas |
Checking if eval games have enough diversity and using this opening panel |
@sethtroisi Re "timemanagement" from LZ, I'm concerned it might be detrimental for self-play and RL, as it amounts to some sort of policy sharpening: cutting the search early means low policy moves won't get any visit and will be trained towards 0. That may hinder the learning of new stuff. IMHO, the key to spare compute budget might truly be KataGo's variable visits scheme, for game move search vs policy training target search. And both types of KataGo's search could benefit from the KLD threshold trick from LC0, that sounds very appealing for policy, though much complex to implement ;-) |
From Brian Lee: One concrete idea: instead of selecting 2% flat from the last 50 generations, select 4%->0% over the last 50 generations, with some sort of exponentially decaying curve, and also make this parameter configurable. Early on, we might want to have 10% -> 0% over the last ~10 generations of data, but later on we might want to flatten that curve to select 2% -> 0% over the last 100 generations. |
Seth ideas
Ideas inspired by @lightvector and KataGo
Ideas inspired by LZ
Ideas from AG/AGZ/AZ papers
Ideas from elsewhere
(https://medium.com/oracledevs/lessons-from-alpha-zero-part-6-hyperparameter-tuning-b1cfcbe4ca9a))
Done
The text was updated successfully, but these errors were encountered: