ENSEMBLE MOS and how to deal with them #843
Replies: 8 comments 3 replies
-
@michaelgau @avaldebe @lewisblake I started this discussion about the soon-to-come MOS data for cams2_83 (if I got any of the above wrong please rectify 😅) |
Beta Was this translation helpful? Give feedback.
-
oh dear, that doesn't look good... M. Gauss did not (and could not) anticipate this. Let me bring this up at the next CAMS SLB meeting 14 April. No action required before that. |
Beta Was this translation helpful? Give feedback.
-
Could we create co-located files outside pyaerocom and inject them in the middle of the processing chain? |
Beta Was this translation helpful? Give feedback.
-
if we are able to create the cached files and the import time
start = time.time()
#logging.info(f"Clearing cache at {const.CACHEDIR}")
#clear_cache()
#if pool > 1:
# logger.info(f"Running observation reading with pool {pool}")
# files = cfg["obs_cfg"]["EEA"]["read_opts_ungridded"]["files"]
# #pool_data = [[s, files, cache] for s in species_list]
# with ProcessPoolExecutor(max_workers=pool) as executor:
# futures = [
# executor.submit(read_observations,specie,files=files,cache=cache)
# for specie in species_list
# ]
# #executor.map(read_observations, pool_data)
# for future in as_completed(futures):
# future.result()
#logger.info(f"Running Rest of Statistics")
#ExperimentProcessor(stp).run()
#print("Done Running Rest of Statistics")
if eval_type in {"season", "long"}:
logger.info(f"Running CAMS2_83 Spesific Statistics")
if pool > 1:
logger.info(f"Making forecast plot with pool {pool}")
with ProcessPoolExecutor(max_workers=pool) as executor:
futures = [
executor.submit(run_forecast, specie, stp=stp, analysis=analysis)
for specie in species_list
]
for future in as_completed(futures):
future.result()
else:
CAMS2_83_Processer(stp).run(analysis=analysis)
print(f"Long run: {time.time() - start} sec") |
Beta Was this translation helpful? Give feedback.
-
also then the MOS data would need dedicated separate jobs |
Beta Was this translation helpful? Give feedback.
-
I have discussed this with Alvaro today and we concluded that the best option in terms of amount of labor is definitely to spoof the csv files as model data, does not matter if the operation is logically redundant in theory. Alvaro is familiar with the steps needed to create the gridded data from the csv files and can add the functionality to the data downloader directly, basically, so that for pyaerocom nothing will be different, except having more models (ENSEMBLE_MOS_D0, ENSEMBLE_MOS_D1, etc) |
Beta Was this translation helpful? Give feedback.
-
ok, great! No rush just yet. I hope I can bring this up at the CAMS SLB tomorrow. |
Beta Was this translation helpful? Give feedback.
-
Being dealt with outside of pyaerocom, therefore closing the discussion. |
Beta Was this translation helpful? Give feedback.
-
we will have to evaluate MOS data for cams2_83 by the end of the year
it will come from ADS as daily csv files per species per station with hourly data, exemple:
mos_D1_NO2_hourly_2023-03-31_CH.csv
mos_D1_O3_hourly_2023-03-31_CY.csv
and other similar series (D0, D2, D3), and the data looks like:
head mos_D1_O3_hourly_2023-03-31_CY.csv
if we do not want to change anything in the cams2_83 production code, all this needs to be turned into netcdf files that would mock "new" models something like ENSEMBLE_MOS_D1, ENSEMBLE_MOS_D2, etc., and then they can be treated exactly like the other models
the issue (as far as I have understood, but we'll talk more with M. Gauss about it I guess) is that this MOS "mocked" models would have non-zero values only at the grid points closest to where the stations are (and this sparcity could be in itself a problem for things like the map), so to create these netcdf files we would basically need to do the equivalent of the colocation in reverse, only for then pyaerocom to colocate back to compare the "model" data with the station data as usual... this feels very redundant..
what we would need ideally is the aeroval-kind-of statistics only, because in a way the "colocation" is already done here, since the MOS data is only for coordinates that represent stations basically, but the current pyaerocom is not designed to do statistics only and changing the codebase so that we can do that is not for the near future...
an alternative which also requires quite some new code and amount of labor, is to read the new data as ungridded and then pyaerocom could do an ungridded-ungridded kind of colocation (which also would be redundant but the point is to create a colocated object because in pyaerocom they are the basis of everything) and then run the cams2_83 statistics ..
the cheapest solution is to wrangle the data to make it look like we got new MOS models, so we might still want to pick that one despite the logic feeling 'dirty'
Beta Was this translation helpful? Give feedback.
All reactions