Audio ASR Baseline Evaluation:
Deep Speech 2
Table 1 (Results obtained using Armory v0.13.3)
Attack | Targeted | Budget | Benign WER (Undefended) | Adversarial WER (Undefended) | Benign WER (Defended) | Adversarial WER (Defended) | Test Size |
---|---|---|---|---|---|---|---|
Imperceptible ASR | yes | max_iter_1=100 | 0.10 | 0.63 | 0.13 | N/A* | 320 |
Imperceptible ASR | yes | max_iter_1=200 | 0.10 | 0.20 | 0.13 | N/A | 320 |
Imperceptible ASR | yes | max_iter_1=400 | 0.10 | 0.11 | 0.13 | N/A | 320 |
Kenansville | no | snr=20dB | 0.10 | 0.27 | 0.13 | 0.36 | 1000 |
Kenansville | no | snr=30dB | 0.10 | 0.11 | 0.13 | 0.17 | 1000 |
Kenansville | no | snr=40dB | 0.10 | 0.10 | 0.13 | 0.13 | 1000 |
PGD (single channel) | no | snr=20dB | 0.10 | 0.46 | 0.13 | 0.53 | 100 |
PGD (single channel) | no | snr=30dB | 0.10 | 0.46 | 0.13 | 0.50 | 100 |
PGD (single channel) | no | snr=40dB | 0.10 | 0.33 | 0.13 | 0.36 | 100 |
PGD (single channel)* | yes | snr=20dB | 0.11 | 1.03 | 0.15 | 1.01 | 100 |
PGD (single channel)* | yes | snr=30dB | 0.11 | 1.02 | 0.15 | 0.99 | 100 |
PGD (single channel)* | yes | snr=40dB | 0.11 | 0.88 | 0.15 | 0.84 | 100 |
PGD (multiple channels) | no | snr=20dB | 0.13 | 0.96 | N/A | N/A | 100 |
PGD (multiple channels) | no | snr=30dB | 0.13 | 0.59 | N/A | N/A | 100 |
PGD (multiple channels) | no | snr=40dB | 0.13 | 0.38 | N/A | N/A | 100 |
PGD (multiple channels)* | yes | snr=20dB | 0.13 | 0.99 | N/A | N/A | 100 |
PGD (multiple channels)* | yes | snr=30dB | 0.13 | 0.92 | N/A | N/A | 100 |
PGD (multiple channels)* | yes | snr=40dB | 0.13 | 0.75 | N/A | N/A | 100 |
- *Targeted attack, where a random target phrase of similar length as the ground truth, was applied but WER wrt the ground truth was calculated
Find reference baseline configurations here * Missing defended baseline is due to current incompatibility of the attack and defense.
Table 2 (Results are obtained using Armory v0.15.2)
Attack | Targeted | Budget | Attack Parameters | Entailment/Contradiction/Neutral Rates (Benign Undefended) | Number of Entailment/Contradiction/Neutral Rates (Adversarial Undefended) | Entailment/Contradiction/Neutral Rates (Benign Defended) | Entailment/Contradiction/Neutral Rates (Adversarial Defended) | Test Size |
---|---|---|---|---|---|---|---|---|
PGD* | yes | snr=20dB | eps_step=0.05, max_iter=500 | 0.95/0.05/0.00 | 0.01/0.98/0.01 | 0.93/0.07/0.00 | 0.02/0.96/0.02 | 100 |
PGD* | yes | snr=30dB | eps_step=0.03, max_iter=500 | 0.95/0.05/0.00 | 0.04/0.95/0.01 | 0.93/0.07/0.00 | 0.19/0.79/0.02 | 100 |
PGD* | yes | snr=40dB | eps_step=0.01, max_iter=500 | 0.95/0.05/0.00 | 0.43/0.53/0.04 | 0.93/0.07/0.00 | 0.66/0.34/0.00 | 100 |
- *Targeted attack, where contradictory target phrases are generated from ground truth phrases by changing a few key words (e.g., target phrase:
he is a bad person
; ground truth phrase:he is a good person
)
Find reference baseline configurations here
HuBERT
Coming soon