AdvSV: An Over-the-Air Adversarial Attack Dataset for Speaker Verification

The generation of AdVSV dataset is divided into two steps, adversarial attack and over-the-air attack.

The adversarial attack is digital in level, specifying the victim's automatic speaker verification model as well as the Attacker (adversarial attack algorithm) to generate the adversarial samples.
Adversarial samples are replayed after an over-the-air attack (replay-recording) to get replay samples.

You can listen to some demos on the demo page and check out the paper.

Abstract

It is known that deep neural networks are vulnerable to adversarial attacks. Although Automatic Speaker Verification (ASV) built on top of deep neural networks exhibits robust performance in controlled scenarios, many studies confirm that ASV is vulnerable to adversarial attacks. The lack of a standard dataset is a bottleneck for further research, especially reproducible research. In this study, we developed an open-source adversarial attack dataset for speaker verification research. As an initial step, we focused on the over-the-air attack. An over-the-air adversarial attack involves a perturbation generation algorithm, a loudspeaker, a microphone, and an acoustic environment. The variations in the recording configurations make it very challenging to reproduce previous research. The AdvSV dataset is constructed using the Voxceleb1 Verification test set as its foundation. This dataset employs representative ASV models subjected to adversarial attacks and records adversarial samples to simulate over-the-air attack settings. The scope of the dataset can be easily extended to include more types of adversarial attacks. The dataset will be released to the public under the CC BY-SA 4.0 license. In addition, we also provide a detection baseline for reproducible research.

Key Info

Utterances	Hours	Adversarial Victim Models	Adversarial Attack Methods	Replay Devices	Record Devices
387,160	894	4	2	3	3

Adversarial Victim Models

Victim Model	Implement Detail	Reference
ECAPA	ECAPATDNN	paper
RawNet	RawNet3	paper
ResNet	ResNetSE34V2	paper
XVec	XVector	paper

Download

Please fill in the form. We'll promptly review and respond. Thank you for your support.

Tag file

Confrontation samples and over-the-air samples were recorded with the AdvSV_tag.txt. Each record has five attributes

File_path, Attack method, Victim ASV Model, Replay Device, Recording Device

Examples are shown in the table below.

File Path	Attack Method	Victim ASV Model	Replay Device	Recording Device
Adv/Ensemble_PGD/ResNet-ECAPA-RawNet_eps-0.008_alpha-0.0004_steps-20/id10270-5r0dWxy17C8-00001_id10270-8jEAjG6SegY-00012.wav	Ensemble_PGD	ResNet-ECAPA-RawNet	NA	NA
Adv/PGD/ECAPA_eps-0.008_alpha-0.0004_steps-20/id10309-e-IdJ8a4gy4-00005_id10292-aVmHBUeThTQ-00001.wav	PGD	ECAPA	NA	NA
OverTheAir/Low/AndroidHigh/Ensemble_PGD/XVec-ResNet-ECAPA_eps-0.008_alpha-0.0004_steps-20/id10292-gm6PJowclv0-00009_id10273-8cfyJEV7hP8-00019.wav	Ensemble_PGD	XVec-ResNet-ECAPA	Low	AndroidHigh
OverTheAir/Low/AndroidHigh/PGD/XVec_eps-0.008_alpha-0.0004_steps-20/id10307-120gjdqGWNQ-00004_id10292-3kzw8lTcUBU-00015.wav	PGD	XVec	Low	AndroidHigh

Generation of file names

The file name consists of a enrollment sample and a evaluation sample, for example id10307-120gjdqGWNQ-00004_id10292-3kzw8lTcUBU-00015.wav, id10307/120gjdqGWNQ/00004.wav (A) is the enrollment sample, id10292/ 3kzw8lTcUBU/00015.wav (B) is the evaluation sample, and the two are different speaker voices (id10307 vs. id10292). The adversarial attack B makes the ASV model think that A and B are the same speaker.

Folder hierarchy

The folder hierarchy is shown below.

Divide adversarial attack and over the air into two folders: Adv and OverTheAir.
Adv: Divided into PGD and Ensemble_PGD, identifying the attacked speaker verification model as well as the PGD parameters.
OverTheAir: Identify the replay device by High, Low, Medium and the recording device by AndroidHigh, AndroidLow, iOS.
Note that we also provide replay samples that have not been subjected to adversarial attacks, stored in the Raw folder.

|-- Adv
|   |-- PGD
|   |    |-- ECAPA_eps-0.008_alpha-0.0004_steps-20
|   |    |-- RawNet_eps-0.008_alpha-0.0004_steps-20
|   |    |-- XVec_eps-0.008_alpha-0.0004_steps-20
|   |    |-- ResNet_eps-0.008_alpha-0.0004_steps-20
|   |-- Ensemble_PGD
|   |    |-- ResNet-ECAPA-RawNet_eps-0.008_alpha-0.0004_steps-20
|   |    |-- XVec-ECAPA-RawNet_eps-0.008_alpha-0.0004_steps-20
|   |    |-- XVec-ResNet-ECAPA_eps-0.008_alpha-0.0004_steps-20
|   |    |-- XVec-ResNet-RawNet_eps-0.008_alpha-0.0004_steps-20
|-- OverTheAir
|   |-- High
|   |   |-- AndroidHigh
|   |   |   |-- Raw
|   |   |   |    |-- id00012
|   |   |   |    |-- ...
|   |   |   |-- PGD
|   |   |   |    |-- ...
|   |   |   |-- Ensemble_PGD
|   |   |   |    |-- ...
|   |   |-- AndroidLow
|   |   |   |-- ...
|   |   |-- iOS
|   |   |   |-- ...
|   |-- Low
|   |   |-- ...
|   |-- Medium
|   |   |-- ...

Official Data Split

If you want to follow this data split, please download the VoxCeleb1 dataset first.

The bonafide and spoof samples were recorded in bonafide.txt and spoof.txt, respectively. We provide splits for the training set, the development set, and the evaluation set.

Examples are shown in the table below.

Bonafide.txt
id10533/gWHHxedxtUA/00005.wav train
id11037/FKV4YA7_-YQ/00006.wav dev
id10030/DSrDNGJrN5U/00002.wav eval

spoof.txt
OverTheAir/Low/iOS/PGD/ResNet_eps-0.008_alpha-0.0004_steps-20/id10283-h87Y8nir1o0-00007_id10300-ize_eiCFEg0-00005.wav train
Adv/Ensemble_PGD/XVec-ResNet-ECAPA_eps-0.008_alpha-0.0004_steps-20/id10298-hjvQiiG71rM-00026_id10285-uArtiTSTnSU-00015.wav train
OverTheAir/Low/iOS/Ensemble_PGD/XVec-ResNet-ECAPA_eps-0.008_alpha-0.0004_steps-20/id10292-3kzw8lTcUBU-00005_id10307-IASj5B-pAyM-00002.wav dev
OverTheAir/High/AndroidHigh/PGD/RawNet_eps-0.008_alpha-0.0004_steps-20/id10272-olePnztkm6U-00012_id10292-ENIHEvg_VLM-00015.wav eval

All data in bonafide is derived from VoxCeleb1.

In the spoof data, in order to test the performance of the out-of-domain data, the samples related to the RawNet model(RawNet, ResNet-ECAPA-RawNet, XVec-ECAPA-RawNet, XVec-ResNet-RawNet), the Medium replay device, and the Android High device, all of them are all unknown in the training phase, i.e., they do not appear in the training set as well as in the development set.

The number of dataset splits is shown in the table below.

	train	dev	eval	total
spoof	84,976	10,622	291,562	387,160
bonafide	15,351	15,352	122,813	153,516
total	100,327	25,974	414,375	540,676

License

The AdvSV dataset is released under the Creative Commons Attribution-ShareAlike 4.0 International License. This license enables reusers to distribute, remix, adapt, and build upon the material in any medium or format, so long as attribution is given to the creator. The license allows for commercial use. Detailed terms can be found on LICENSE. If you have any questions about this, please contact us via E-mail: liwang1@link.cuhk.edu.cn cc wuzhizheng@cuhk.edu.cn.

Reference

@misc{wang2023advsv,
      title={AdvSV: An Over-the-Air Adversarial Attack Dataset for Speaker Verification}, 
      author={Li Wang and Jiaqi Li and Yuhao Luo and Jiahao Zheng and Lei Wang and Hao Li and Ke Xu and Chengfang Fang and Jie Shi and Zhizheng Wu},
      year={2023},
      eprint={2310.05369},
      archivePrefix={arXiv},
      primaryClass={cs.SD}
}

Appendix

A.Sample from Voxceleb1

The base dataset of AdvSV is sampled in the Voxceleb1 verification test set. The Voxceleb1 verification test set has 37,720 data samples, each including enrollment, test sample, and a label (0 for different speakers, 1 for the same speaker).

Due to the considerable burden in subsequent replay recording, downsampling is employed on the dataset (37,720 samples) to ease this challenge. Two data downsampling principles exist.

Original speaker distribution to prevent varied distributions from impacting attack results.
Ensures consistent SV performance between the subset and full dataset.

To implement these principles, the approach concatenates enrollment and test speaker IDs in each data entry, e.g. 0 id10270/XXXX/00001.wav id10284/XXXX/00029.wav becomes id10270-id10284.

Then, each data ID undergoes a random 25% downsampling, preserving a quarter of the data. Notably, when downsampling reaches 0 samples, one sample is retained. Final retained 9,083 samples, which recorded in file veri_test_25.txt. Currently, in order to reduce the burden of replay recording, for the Over-the-air attack, we only record the results of the target attack, i.e., the data pairs labeled as "different speakers", and the goal of the attack is to make the speaker verification model recognize them as the "same speaker".

veri_test_25.txt is a list of downsampled samples. In order to test the EER metric for automatic speaker recognition, we retained samples of the same speaker. Only attack different speaker samples during adversarial attacks.

B. attackResult.txt

Inside the Adv folder, information about the Adversarial sample is recorded in the attackResult.txt file. Each record has six attributes

Enrollment File, Adversarial File, Is Attack Success, Original Label, Cosine Similarity, Average Perturbation

Examples are shown in the table below.

Enrollment File	Adversarial File	Is Attack Success	Original Label	Cosine Similarity	Average Perturbation
id10270-8jEAjG6SegY-00035	id10270-8jEAjG6SegY-00035_id10270-5r0dWxy17C8-00021	True	1	-0.496655136346817	0.0065907384268939495
id10270-5r0dWxy17C8-00024	id10270-5r0dWxy17C8-00024_id10270-OhfKF8FSq3Y-00005	True	1	-0.5840052366256714	0.006569686811417341

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

AdvSV: An Over-the-Air Adversarial Attack Dataset for Speaker Verification

Abstract

Key Info

Adversarial Victim Models

Download

Tag file

Generation of file names

Folder hierarchy

Official Data Split

License

Reference

Appendix

A.Sample from Voxceleb1

B. attackResult.txt

Files

README.md

Latest commit

History

README.md

File metadata and controls

AdvSV: An Over-the-Air Adversarial Attack Dataset for Speaker Verification

Abstract

Key Info

Adversarial Victim Models

Download

Tag file

Generation of file names

Folder hierarchy

Official Data Split

License

Reference

Appendix

A.Sample from Voxceleb1

B. attackResult.txt