released at @WIFS2022 (Shanghai, China)
Abstract : Within an operational framework, covers used by a steganographer are likely to come from different sensors and different processing pipelines than the ones used by researchers for training their steganalysis models. Thus, a performance gap is unavoidable when it comes to out-of-distributions covers, an extremely frequent scenario called Cover Source Mismatch (CSM). Here, we explore a grid of processing pipelines to study the origins of CSM, to better understand it, and to better tackle it. A set-covering greedy algorithm is used to select representative pipelines minimizing the maximum regret between the representative and the pipelines within the set. Our main contribution is a methodology for generating relevant bases able to tackle operational CSM.
-
The file
pipelines.csv
is a directory of pipeline disclosing their parameters and identifying them precisely with a number. -
The file
RAW_DATABASE.csv
contains some information about all the RAW images we used for our experiments. They are all from the database ALASKA. -
The file
FLICKR_BASE.csv
contains some information about all the FLICKR images we used as our wild base for our last experiment. They are extracted from the website FLICKR and copyright free. -
The folder
1-Developing
contains some code enabling to develop RAW Images like we did. -
The folder
2-Clustering
contains some code enabling to extract relevant pipelines from the grid using the greedy set-covering algorithm we used. There is also a playground notebook to help you reproduce some results we obtained in the paper. Don't hesitate to use our PE/Regret Matrix to derive new conclusions. -
To be able to reproduce our experiments and do your own ones, please follow our Installation Instructions
Using a maximum regret radius of 10%, the greedy algorithm returned a set of 5 pipelines. Hence, 5 sources are enabling to cover every other source from the grid to an accuracy of 10% in terms of regret. Meaning,
Whatever the source you consider from the grid, I can always find a representative among the 5 found such that, training on this representative will give me a test performance almost as satisfying as if I trained directly on the original source, the maximum difference of performance being 10%.
Illustration of the covering obtained with a maximum regret radius of 10%
@article{giboulot:hal-02631559,
TITLE = {{Effects and Solutions of Cover-Source Mismatch in Image Steganalysis}},
AUTHOR = {Giboulot, Quentin and Cogranne, R{\'e}mi and Borghys, Dirk and Bas, Patrick},
URL = {https://hal-utt.archives-ouvertes.fr/hal-02631559},
JOURNAL = {{Signal Processing: Image Communication}},
PUBLISHER = {{Elsevier}},
SERIES = {86},
YEAR = {2020},
MONTH = Aug,
DOI = {10.1016/j.image.2020.115888},
KEYWORDS = {Steganography ; Steganalysis ; Cover-Source Mismatch ; Image processing ; Image Heterogeneity},
PDF = {https://hal-utt.archives-ouvertes.fr/hal-02631559/file/ImageCommunication_Final.pdf},
HAL_ID = {hal-02631559},
HAL_VERSION = {v1},
}
@inproceedings{giboulot:hal-03694662,
TITLE = {{The Cover Source Mismatch Problem in Deep-Learning Steganalysis}},
AUTHOR = {Giboulot, Quentin and Bas, Patrick and Cogranne, R{\'e}mi and Borghys, Dirk},
URL = {https://hal-utt.archives-ouvertes.fr/hal-03694662},
BOOKTITLE = {{European Signal Processing Conference}},
ADDRESS = {Belgrade, Serbia},
YEAR = {2022},
MONTH = Aug,
PDF = {https://hal-utt.archives-ouvertes.fr/hal-03694662/file/Giboulot_EUSIPCO_2022.pdf},
HAL_ID = {hal-03694662},
HAL_VERSION = {v1},
}
@inproceedings{cogranne:hal-02147763,
TITLE = {{The ALASKA Steganalysis Challenge: A First Step Towards Steganalysis ''Into The Wild''}},
AUTHOR = {Cogranne, R{\'e}mi and Giboulot, Quentin and Bas, Patrick},
URL = {https://hal.archives-ouvertes.fr/hal-02147763},
BOOKTITLE = {{ACM IH\&MMSec (Information Hiding \& Multimedia Security)}},
ADDRESS = {Paris, France},
SERIES = {ACM IH\&MMSec (Information Hiding \& Multimedia Security)},
YEAR = {2019},
MONTH = Jul,
DOI = {10.1145/3335203.3335726},
KEYWORDS = {steganography ; steganalysis ; contest ; forensics ; Security and privacy},
PDF = {https://hal.archives-ouvertes.fr/hal-02147763/file/ALASKA_lesson_learn_Vsubmitted.pdf},
HAL_ID = {hal-02147763},
HAL_VERSION = {v1},
}
@inproceedings{sepak,
title = {Formalizing cover-source mismatch as a robust optimization},
author = {Šepák, Dominik and Adam, Lukáš and Pevný, Tomáš},
BOOKTITLE = {{EUSIPCO: European Signal Processing Conference}},
MONTH = Sep,
ADDRESS = {Belgrade, Serbia},
YEAR = {2022},
}
@inproceedings{10.1145/3437880.3460395,
author = {Butora, Jan and Yousfi, Yassine and Fridrich, Jessica},
title = {How to Pretrain for Steganalysis},
year = {2021},
isbn = {9781450382953},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3437880.3460395},
doi = {10.1145/3437880.3460395},
booktitle = {Proceedings of the 2021 ACM Workshop on Information Hiding and Multimedia Security},
pages = {143–148},
numpages = {6},
keywords = {steganalysis, imagenet, convolutional neural network, JPEG, transfer learning},
location = {Virtual Event, Belgium},
series = {IH&MMSec '21}
}
@inproceedings{abecidan:hal-03840926,
TITLE = {{Using Set Covering to Generate Databases for Holistic Steganalysis}},
AUTHOR = {Abecidan, Rony and Itier, Vincent and Boulanger, J{\'e}r{\'e}mie and Bas, Patrick and Pevn{\'y}, Tom{\'a}{\v s}},
URL = {https://hal.archives-ouvertes.fr/hal-03840926},
BOOKTITLE = {{IEEE International Workshop on Information Forensics and Security (WIFS 2022)}},
ADDRESS = {Shanghai, China},
YEAR = {2022},
MONTH = Dec,
PDF = {https://hal.archives-ouvertes.fr/hal-03840926/file/2022_wifs.pdf},
HAL_ID = {hal-03840926},
HAL_VERSION = {v1},
}
Our experiments were possible thanks to computing means of IDRIS through the resource allocation 2021- AD011013285 assigned by GENCI. This work received funding from the European Union’s Horizon 2020 research and innovation program under grant agreement No 101021687 (project “UNCOVER”) and the French Defense & Innovation Agency. The work of Tomas Pevny was supported by Czech Ministry of Education 19-29680L.