Skip to content

SVFit tools

Fabio Colombo edited this page May 28, 2016 · 13 revisions

There are several possibilities to access the reconstructed di-tau mass in the KITHiggsToTauTau framework.

Calculation of Svfit within the Artus run

The most easy, but computing intensive way is to run Svfit within the producer:SvfitProducer and directly use the results in the output.

The behaviour is configured in data/ArtusConfigs/Includes/settingsSvfit.json.

Reading in Cache files with pre-calculated Svfit values

By specifying a "SvfitCacheFile", usually sample-dependent, the SvfitProducer first tries to retrieve the result of the Svfit calculation from this cache file. In case of cache misses, there are three options

  1. Stop the job and throw an assert, setting "SvfitCacheMissBehaviour" : "assert"
  2. Calculate the Svfit value within the job as explained in the previous section, "SvfitCacheMissBehaviour" : "recalculate"
  3. Fill the Svfit result with dummy values, "SvfitCacheMissBehaviour" : "undefined"
Filling the caches within the Artus run

If the Svfit values are calculated within the job, the SvfitCacheConsumer can write them to the job output files.

To achieve this, use

"GenerateSvfitInput" : false
"SvfitCacheMissBehaviour" : "recalculate"

These results can be collected by svfitCacheTreeMerge.py:

for dir in <Artus project directory>/[output|merged]/*; do echo $dir; svfitCacheTreeMerge.py -i $dir/*.root -o `echo "HiggsAnalysis/KITHiggsToTauTau/auxiliaries/svfit/svfitCache_${dir}.root" | sed -e 's@<Artus project directory>/[output|merged]/@@g'`; done

If you only have to update a few cached values, this approach is fine and does not affect runtime too much. For a complete calculation of a full dataset, use the following settings.

HiggsToTauTauAnalysis.py -b --files-per-job 1 --wall-time 48:00:00 ...

It is recommended to store the cached results on dCache.

Filling the caches standalone on the grid

This approach has one step more to solve the balancing-problem by splitting the Svfit calculation on any user-defined number of events and use the WLCG for the calculation itself.

The first step is to run the SvfitProducer and fill only dummy values. The SvfitConsumer writes one output file per pipeline containing at most the number of events specified in "SvfitInputCutOff". This can by done by specifying in the config

	"SvfitCacheMissBehaviour" : "undefined"
	"SvfitOutFile" : "SvfitCache.root",
	"GenerateSvFitInput" : true,
	"SvFitInputCutOff" : 10000,

Depending on if you want to fill only events from cache misses, set "UpdateSvfitCache" to either true or false.

To minimize the runtime spread and unnecessary overhead, run Artus with as man files per job as reasonable. Also, do the stageout directly to dCache to make it accessible from anywhere.

HiggsToTauTauAnalysis.py -b --files-per-job 40 --wall-time 12:00:00 --se-path srm://dcache-se-cms.desy.de:8443/srm/managerv2?SFN=//pnfs/desy.de/cms/tier2/store/user/$USER/higgs-kit/svfitinputs/

Once the Artus run is finished, you can submit the jobs to the grid with

python HiggsAnalysis/KITHiggsToTauTau/scripts/submitCrabSvfitJobs.py /pnfs/desy.de/cms/tier2/store/user/$USER/higgs-kit/svfitinputs/

This script creates for each Sample an executable that is sent with the jobs. The output path is specified automatically in the parameter config.Data.outLFNDirBase.

Note #1: the CRAB submission script does not seem to like broken or not-working symlinks which you might have in your folders (for example, in the sample filelists). Remove them.

Note #2: before launching the script, change the process source in CombineHarvester/CombineTools/scripts/do_nothing_cfg.py with: "process.source = cms.Source("EmptySource")"

Once the jobs are all done, you can merge the outputs for each sample and upload them again to dCache by e.g.

python HiggsAnalysis/KITHiggsToTauTau/scripts/svfitCacheTreeMerge.py --dcache True --input /pnfs/desy.de/cms/tier2/store/user/$USER/higgs-kit/Svfit/2016-04-12/\* -o srm://dcache-se-cms.desy.de:8443/srm/managerv2?SFN=//pnfs/desy.de/cms/tier2/store/user/$USER/higgs-kit/MergedCaches/

Make sure not to miss the '*' in the path or specify a single directory by hand. The macro prints out the new configuration settings you can copy&paste to settingsSvfit.json.

Clone this wiki locally