Skip to content

pyCompressor GanPDFs

Tanjona Rabemananjara edited this page Nov 18, 2020 · 1 revision

The pyCompressor code has now been modified (PR #27) such that if a given PDF set was enhanced, the compressed samples are drawn from the enhanced set. The following results are (still) produced with the 130920-nnpdf40_jcm_iterated_70k_epochs PDF set with 75 replicas, the GANs was then used to generate 25 more. There was no particular motivation in choosing 75 and 25 apart from the fact that I have already these results.

Nomenclature:
  • Prior: the input PDF set to be enhanced
  • Synthetic samples: the PDF replicas generated by the GANs
  • Enhanced set: the Prior set supplemented with the Synthetic replicas

Results:

The following ERF values are computed using Eqs.(6,13,21) of the compressor paper without the normalization factors (in a similar way as in Fig.(7) of the paper). Both the ERF values for the (compressed) prior and enhanced are now behaving as expected for all the estimators except for the correlation (which was also apparent in the original paper). Now, the enhanced results are performing slightly better than the prior, except in a few cases for reasons that might be clear later on.

The plots below show (1) the total ERF per iteration and (2) the final ERF value of a compressed set sampled from the prior or enhanced set. In the first plot, the dashdotted lines represent the prior while the solid lines represent the enhanced. At N=70, it looks like the GA (for the enhanced part) is trapped in a local minimum. This surely explains why the total ERF value for the enhanced part is larger than the prior, especially since in both prior- and enhanced-compressed set there is no replicas from the synthetic sets (see plots below). I am not actually not sure if there were well-defined motivations in choosing the number of generation and mutation rate in the original compressor paper. Currently, I am just using the same parameters with a larger generation. However, as shown in the plots, having a larger generation does not really solve the issue and more fine tuning is required. The pyCompressor has a CMA minimizer, but this is of the order of 10 times slower than the GA.

Logo Logo

The first plot below compares, in terms of percentage, the similarity between the replicas present in the prior and the enhanced compressed set. The second plot shows the number of synthetic replica that end up in the enhanced compressed set. The fact that the prior and enhanced compressed sets only has 98% similarity given that the enhanced compressed set does not contain replicas from the synthetic sample again reflects the fact that the parameters entering into the GA have to be improved.

Logo Logo

Clone this wiki locally