Scenarios

Armory is intended to evaluate threat-model scenarios. Baseline evaluation scenarios are described below. Additionally, we've provided some academic standard scenarios.

Configuration Files

Scenario configuration files are found in the scenario_configs directory here. The most recent config files are found in the eval6 subfolder and older configs are found in the eval5 and eval1-4 subfolders. There are also symlinks to representative configs found in the base of the scenario_configs directory.

Base Scenario Class

All scenarios inherit from the Scenario class. This class parses an armory configuration file and calls its evaluate method to perform all of the computation for a given threat-models robustness to attack. All evaluate methods save a dictionary of recorded metrics which are saved into the armory output_dir upon completion. Scenarios are implemented as subclasses of Scenario, and typically given their own file in the Scenarios Directory.

Of particular note is the Poison class, from which all poisoning scenarios are subclassed. More information on poisoning scenarios is documented here.

User Initialization

When adding custom metrics or instrumentation meters to a scenario, it may be necessary to initialize or perform user-specific operations before loading. This can also be helpful for other goals, such as fine-grained control over random initializations, instantiating external integrations (e.g., TensorBoard), or setting things like environment variables. For this purpose, there is a user_init method that is called at the beginning of load (but after scenario initialization). In poisoning, this occurs right after random seed setting in load (to enable the user to easily override random initialization).

This uses the underlying scenario config field of the same name, user_init. See configuration for the json specification. An example config would be as follows:

    ...
    "user_init": {
        "module": "import.path.to.my_module",
        "name": "my_init_function",
        "kwargs": {
             "case": 1,
             "print_stuff": false
        }
    }
}

Which would essentially do the following before loading anything else in the scenario:

import import.path.to.my_module as my_module
my_module.my_init_function(case=1, print_stuff=False)

If name were "" or None, then it would only do the import:

import import.path.to.my_module

This could be helpful for a variety of things, such as registering metrics prior to loading or setting up custom meters. For instance:

def my_init_function():
    from armory.instrument import Meter, get_hub
    from armory import metrics
    m = Meter(
        "chi_squared_test",
        metrics.get("chi2_p_value"),
        "my_model.contingency_table",
    )
    get_hub().connect_meter(m)

Would enable measurement of a contingency table produced by your model. This would require adding probe points in your model code to connect it (which doesn't need to be in the init block), e.g.:

from armory.instrument import get_probe
probe = get_probe("my_model")

class MyModel(torch.nn.Module):
    ...
    def forward(x):
        ...
        table = np.array([[2, 3], [4, 6]])
        probe.update(contingency_table=table)
        ...

Baseline Scenarios

Currently the following Scenarios are available within the armory package. Some scenario files are tied to a specific attack, while others are customized for a given dataset. Several are more general-purpose. Along with each scenario description, we provide a link to a page with baseline results for applicable datasets and attacks. More information about each referenced dataset can be found in the datasets document.

Audio ASR (Updated June 2022)

  • Description: In this scenario, the system under evaluation is an automatic speech recognition system.
  • Dataset:
  • Armory includes one dataset suited for ASR:
  • Baseline Models: Armory includes two audio models:
  • DeepSpeech 2 with pretrained weights from either the AN4, LibriSpeech, or TEDLIUM datasets.
    Custom weights may also be loaded by the model. Deprecated: will be removed in version 0.17.0
  • HuBERT Large from torchaudio
  • Threat Scenario:
  • Adversary objectives:
    • Untargeted - an adversary may simply wish for speech to be transcribed incorrectly
    • Targeted - an adversary may wish for specific strings to be predicted
    • Contradiction: an adversary may wish to transcribe a specific string with a meaning contrary to the original, albeit with a low word error rate.
  • Adversary Operating Environment:
    • Non-real time, digital evasion attack.
    • Under some threat models, the channel model consists only a single perfect acoustic channel, and under others, it may consist of one additional multipath channel.
  • Metrics of Interest:
  • Primary metrics:
    • Word error rate, SNR, entailment rate
  • Derivative metrics - see end of document
  • Baseline Attacks:
  • Imperceptible ASR attack
  • PGD
  • Kenansville attack
  • Baseline Defense: MP3 Compression
  • Baseline Evaluations:
  • LibriSpeech results

Audio Classification (Updated June 2020)

  • Description: In this scenario, the system under evaluation is a speaker identification system.
  • Dataset:
  • Armory includes one dataset suited for Audio Classification:
  • Baseline Model:
  • Armory includes two baseline speaker classification models:
    • SincNet, a scratch-trained model based on raw audio
    • A scratch-trained model based on spectrogram input (not mel-cepstrum or MFCC)
  • Threat Scenario:
  • Adversary objectives:
    • Untargeted - an adversary may simply wish to evade detection
    • Targeted - an adversary may wish to impersonate someone else
  • Adversary Operating Environment:
    • Non-real time, digital evasion attack
    • Assuming perfect acoustic channel
    • Black-box, white-box, and adaptive attacks
  • Metrics of Interest:
  • Primary metrics:
    • Accuracy (mean, per-class), attack computational cost, defense computational cost, various distance measures of perturbation (Lp-norms, Wasserstein distance, signal-to-noise ratio)
  • Derivative metrics - see end of document
  • Baseline Evaluations:
  • LibriSpeech results

CARLA Multi-Object tracking (MOT) (Updated October 2022)

  • Description: In this scenario specific to the CARLA multi-object tracking dataset, the system under evaluation is an object tracker trained to track multiple pedestrians in video in an urban environment.
  • Dataset: The development dataset is the CARLA Multi-Object Tracking dataset, with videos containing a green-screen in all frames intended for adversarial patch insertion. The dataset contains natural lighting metadata that allow digital, adaptive patches to be inserted and rendered into the scene similar to if they were physically printed.
  • Baseline Model:
  • Pretrained ByteTrack model with an Faster-RCNN base instead of Yolo.
  • Threat Scenario:
  • Adversary objectives:
    • To degrade the performance of the tracker through the insertion of adversarial patches.
  • Adversary Operating Environment:
    • Non-real time, physical-like patch attacks
  • Adversary Capabilities and Resources
    • Patch size of different size/shape as dictated by the green-screen in the frames. The adversary is expected to apply a patch with constant texture across all frames in the video, but the patch relative to the sensor may change due to sensor motion.
  • Metrics of Interest:
  • Primary metrics are HOTA-based (quotes taken from paper), taken from TrackEval implementation.
    • mean DetA - "detection accuracy, DetA, is simply the percentage of aligning detections"
    • mean AssA - "association accuracy, AssA, is simply the average alignment between matched trajectories, averaged over all detections"
    • mean HOTA - "final HOTA score is the geometric mean of these two scores averaged over different localisation thresholds"
  • Baseline Attacks:
  • Custom Robust DPatch with Non-differentiable, Input-Dependent Transformation
  • Custom Adversarial Patch with Differentiable, Input-Dependent Transformation
  • Baseline Defense: JPEG Frame Compression
  • Baseline Evaluation: Carla MOT results

CARLA Multimodal Object Detection (Updated October 2022)

  • Description: In this scenario, the system under evaluation is an object detector trained to identify common objects in an urban environment. This scenario handles multimodal data (RGB/depth).
  • Datasets The datasets are the CARLA Object Detection and Overhead Object Detection datasets. These datasets contain natural lighting metadata that allow digital, adaptive patches to be inserted and rendered into the scene similar to if they were physically printed.
  • Baseline Model:
  • Single-modality:
  • Multimodal:
  • Threat Scenario:
  • Adversary objectives:
    • To degrade the performance of an object detector through the insertion of adversarial patches.
  • Adversary Operating Environment:
    • Non-real time, physical-like patch attacks
  • Adversary Capabilities and Resources
    • Patch size of different size/shape as dictated by the green-screen in each image. In the multimodal case, both RGB and depth channels are to be perturbed.
  • Metrics of Interest:
  • Primary metrics:
    • mAP
    • Disappearance rate
    • Hallucinations per image
    • Misclassification rate
    • True positive rate
  • Baseline Attacks:
  • Custom Robust DPatch with Non-differentiable, Input-Dependent Transformation
  • Custom Adversarial Patch with Differentiable, Input-Dependent Transformation
  • Baseline Defense: JPEG Compression
  • Baseline Evaluations:
  • Street-level dataset
  • Overhead dataset

CARLA Video Tracking (Updated July 2022)

  • Description: In this scenario, the system under evaluation is an object tracker trained to localize a single moving pedestrian.
  • Dataset: The development dataset is the CARLA Video Tracking dataset, which includes 20 videos, each of which contains a green-screen in all frames intended for adversarial patch insertion. The dataset contains natural lighting metadata that allow digital, adaptive patches to be inserted and rendered into the scene similar to if they were physically printed.
  • Baseline Model:
  • Pretrained GoTurn model.
  • Threat Scenario:
  • Adversary objectives:
    • To degrade the performance of the tracker through the insertion of adversarial patches.
  • Adversary Operating Environment:
    • Non-real time, physical-like patch attacks
  • Adversary Capabilities and Resources
    • Patch size of different size/shape as dictated by the green-screen in the frames. The adversary is expected to apply a patch with constant texture across all frames in the video, but the patch relative to the sensor may change due to sensor motion.
  • Metrics of Interest:
  • Primary metrics:
    • mean IOU
    • mean succss rate (mean IOUs are calculated for multiple IOU thresholds and averaged)
  • Baseline Attacks:
  • Custom Adversarial Texture with Input-Dependent Transformation
  • Baseline Defense: Video Compression
  • Baseline Evaluation: CARLA video tracking results

Dapricot Object Detection (Updated July 2021)

  • Description: In this scenario, the system under evaluation is an object detector trained to identify the classes in the Microsoft COCO dataset.
  • Dataset: The dataset is the Dynamic APRICOT (DAPRICOT) dataset 1 and dataset 2. It is similar to the APRICOT dataset (see below), but instead of pre-generated physical patches taken in the natural environment, the DAPRICOT dataset contains greenscreens and natural lighting metadata that allow digital, adaptive patches to be inserted and rendered into the scene similar to if they were physically printed. This dataset contains 15 scenes, where each scene contains 3 different greenscreen shapes, taken at 3 different distances, 3 different heights and using 3 different camera angles, for a total of over 1000 images.
  • Baseline Model: The model uses the pretrained Faster-RCNN with ResNet-50 model.
  • Threat Scenario:
  • Adversary objectives:
    • Targeted attack - objective is to force an object detector to localize and classify the patch as an MSCOCO object.
  • Adversary Operating Environment:
    • Non-real time, digital and physical-like patch attacks
  • Adversary Capabilities and Resources
    • Patch size of different shapes as dictated by the greenscreen sizes in the images
  • Metrics of Interest:
  • Primary metrics:
    • Average precision (mean, per-class) of patches, Average target success
  • Baseline Attacks:
  • Masked PGD
  • Robust DPatch
  • Baseline Defense: JPEG Compression
  • Baseline Evaluation: Dapricot results

Image Classification

  • Description: In this scenario implements attacks against a basic image classification task.
  • Dataset:
  • Armory includes several image classification datasets.
    • Resisc-45. It comprises 45 classes and 700 images for each class. Images 1-500 of each class are in the training split, 500-600 are in the validation split, and 600-700 are in the test split.
    • MNIST
    • Cifar10
  • Baseline Models:
  • Armory includes the following baseline image classification models:
    • Resisc-45: ImageNet-pretrained DenseNet-121 that is fine-tuned on RESISC-45.
    • MNIST: basic CNN
    • Cifar10: basic CNN
  • Threat Scenario:
  • Adversary objectives:
    • Untargeted - an adversary may simply wish to induce an arbitrary misclassification
    • Targeted - an adversary may wish to force misclassification to a particular class
  • Adversary Operating Environment:
    • Non real-time, digital evasion attack
    • Black-box, white-box, and adaptive attacks
  • Metrics of Interest:
  • Primary metrics:
    • Accuracy (mean, per-class), attack computational cost, defense computational cost, various distance measures of perturbation (Lp-norms, Wasserstein distance)
  • Derivative metrics - see end of document
  • Baseline Defenses:
  • JPEG Compression
  • Baseline Evaluations:
  • Resisc-45 results

Multimodal So2Sat Image Classification (Updated July 2021)

  • Description: In this scenario, the system under evaluation is an image classifier which determines local climate zone from a combination of co-registered synthetic aperture radar (SAR) and multispectral electro-optical (EO) images. This Image Classification task gets its own scenario due to the unique features of the dataset.
  • Dataset: The dataset is the so2sat dataset. It comprises 352k/24k images in train/validation datasets and 17 classes of local climate zones.
  • Baseline Model:
  • Armory includes a custom CNN as a baseline model. It has a single input that stacks SAR (first four channels only, representing the real and imaginary components of the reflected electromagnetic waves) and EO (all ten channels) data. Immediately after the input layer, the data is split into SAR and EO data streams and fed into their respective feature extraction networks. In the final layer, the two networks are fused to produce a single prediction output.
  • Threat Scenario:
  • Adversary objectives:
    • Untargeted - an adversary wishes to evade correct classification
  • Adversary Operating Environment:
    • Non-real time, digital evasion attack
    • Adversary perturbs a single modality (SAR or EO)
  • Metrics of Interest:
  • Primary metrics:
    • Accuracy (mean, per-class), Patch size
  • Derivative metrics - see end of document
  • Baseline Attacks:
  • Masked PGD
  • Baseline Defense: JPEG Compression for Multi-Channel
  • Baseline Evaluation: So2Sat results

Object Detection

  • Description: In this scenario, the system under evaluation is an object detector.
  • Datasets:
  • Armory includes two datasets for object detection (besides CARLA object detection which has its own scenario):
    • xView comprises 59k/19k train and test images (each with dimensions 300x300, 400x400 or 500x500) and 62 classes
    • APRICOT, which includes over 1000 natural images with physically-printed adversarial patches, with ten MS-COCO classes as targets
  • Baseline Models:
  • Faster-RCNN ResNet-50 FPN, pre-trained, can be used for xView
  • Faster-RCNN with ResNet-50, SSD with MobileNet, and RetinaNet models, pretrained, can be used for APRICOT. on MSCOCO objects and fine-tuned on xView.
  • Threat Scenario:
  • Adversary objectives:
    • Untargeted - an adversary wishes to disable object detection
  • Adversary Operating Environment:
    • Non-real time, digital and physical-like evasion attacks and translation.
  • Note: the APRICOT dataset consists of advesarial images precomputed for a targeted attack.
  • Metrics of Interest:
  • Primary metrics:
    • Average precision (mean, per-class) of ground truth classes, Patch Size
    • TIDE OD metrics
  • Derivative metrics - see end of document
  • Baseline Attacks:
  • Masked PGD
  • Robust DPatch
  • The patches for APRICOT were generated using variants of ShapeShifter
  • Baseline Defense: JPEG Compression
  • Baseline Evaluations:
  • xView results
  • APRICOT results

UCF101 Video Classification

  • Description: In this scenario, the system under evaluation is a video action recognition system.
  • Datasets: Armory includes the following video classification datasets:
  • UCF101 dataset, which comprises 101 actions and 13,320 total videos. For the training/testing split, we use the official Split 01.
  • Baseline Model: Armory includes a model for UCF101 that uses the MARS architecture, which is a single-stream (RGB) 3D convolution architecture that simultaneously mimics the optical flow stream. The provided model is pre-trained on the Kinetics dataset and fine-tuned on UCF101.
  • Threat Scenario:
  • Adversary objectives:
    • Untargeted - an adversary may simply wish to evade detection
  • Adversary Operating Environment:
    • Non-real time, digital evasion attack
  • Metrics of Interest:
  • Primary metrics:
    • Accuracy (mean, per-class), attack budget
  • Derivative metrics - see end of document
  • Baseline Attacks:
  • Frame Saliency
  • Masked PGD
  • Flicker Attack
  • Custom Frame Border attack
  • Baseline Defense: Video Compression
  • Baseline Evaluations:
  • UCF101 results

Poisoning

For a complete overview of the poisoning scenarios, threat models, attacks, and metrics, see the poisoning doc. Here, we will briefly summarize each scenario and link to the baseline results.

Poison base scenario (DLBD)

  • Description: The base scenario implements a Dirty-label Backdoor attack (DLBD). In this scenario, the attacker is able to poison a percentage of the training data by adding backdoor triggers and flipping the label of data examples. Then, the attacker adds the same trigger to test images to cue the desired misclassification. For a complete overview, see the poisoning doc.
  • Datasets: Datasets for DLBD include but are not limited to:
  • GTSRB
  • Audio Speech Commands
  • Resisc-10
  • Cifar10
  • Baseline Models: Armory includes several models which may be used for this scenario:
  • GTSRB micronnet
  • Audio resnet
  • Resnet18 can be used for Cifar10 or Resisc-10
  • Threat Scenario:
  • Adversary objectives:
    • Targeted misclassification
  • Adversary Operating Environment:
    • Digital dirty label poisoning attack
  • Metrics of Interest: See the poisoning doc for a full description of these metrics.
  • accuracy_on_benign_test_data_all_classes
  • accuracy_on_benign_test_data_source_class
  • accuracy_on_poisoned_test_data_all_classes
  • attack_success_rate
  • Model Bias fairness metric
  • Filter Bias fairness metric
  • Baseline Defenses:
  • Activation Clustering
  • Spectral Signatures
  • DP-InstaHide
  • Random Filter
  • Perfect Filter
  • Baseline Evaluations:
  • GTSRB DLBD
  • Resisc DLBD
  • Audio
  • Cifar10

Poisoning CLBD

  • Description: This scenario implements a Clean-label Backdoor attack (CLBD). In this scenario, the attacker adds triggers to source class training images, leaving the labels the same but also applying imperceptible perturbations that look like target class features. At test time, adding the trigger to a source class image induces misclassification to the target class. For a complete overview, see the poisoning doc.
  • Datasets: Datasets for CLBD include but are not limited to:
  • GTSRB
  • Baseline Models: Armory includes several models which may be used for this scenario:
  • GTSRB micronnet
  • Resnet18
  • Threat Scenario:
  • Adversary objectives:
    • Targeted misclassification
  • Adversary Operating Environment:
    • Digital clean label poisoning attack
  • Metrics of Interest: See the poisoning doc for a full description of these metrics.
  • accuracy_on_benign_test_data_all_classes
  • accuracy_on_benign_test_data_source_class
  • accuracy_on_poisoned_test_data_all_classes
  • attack_success_rate
  • Model Bias fairness metric
  • Filter Bias fairness metric
  • Baseline Defenses:
  • Activation Clustering
  • Spectral Signatures
  • DP-InstaHide
  • Random Filter
  • Perfect Filter
  • Baseline Evaluations:
  • GTSRB
  • Resisc CLBD

Poisoning: Sleeper Agent

  • Description: This scenario implements the Sleeper Agent attack. In this scenario, the attacker poisons train samples through gradient matching, then applies a trigger to test images to induce misclassification. For a complete overview, see the poisoning doc.
  • Datasets: Datasets for Sleeper Agent include but are not limited to:
  • Cifar10
  • Baseline Models: Models include but are not limited to:
  • Resnet18
  • Threat Scenario:
  • Adversary objectives:
    • Targeted misclassification
  • Adversary Operating Environment:
    • Digital clean label poisoning attack
  • Metrics of Interest: See the poisoning doc for a full description of these metrics.
  • accuracy_on_benign_test_data_all_classes
  • accuracy_on_benign_test_data_source_class
  • accuracy_on_poisoned_test_data_all_classes
  • attack_success_rate
  • Model Bias fairness metric
  • Filter Bias fairness metric
  • Baseline Defenses:
  • Activation Clustering
  • Spectral Signatures
  • DP-InstaHide
  • Random Filter
  • Perfect Filter
  • Baseline Evaluations:
  • Cifar results

Poisoning: Witches' Brew

  • Description: This scenario implements the Witches' Brew attack. In this scenario, the attacker poisons train samples through gradient matching, to induce misclassification on a few individual pre-chosen test images. For a complete overview, see the witches' brew poisoning doc.
  • Datasets: The following datasets have been successfully used in this scenario:
  • GTSRB
  • Cifar10
  • Baseline Models: Armory includes several models which may be used for this scenario:
  • GTSRB micronnet
  • Resnet18
  • Threat Scenario:
  • Adversary objectives:
    • Targeted misclassification
  • Adversary Operating Environment:
    • Digital clean label poisoning attack
  • Metrics of Interest: See the WB poisoning doc for a full description of these metrics.
  • accuracy_on_trigger_images
  • accuracy_on_non_trigger_images
  • attack_success_rate
  • Model Bias fairness metric
  • Filter Bias fairness metric
  • Baseline Defenses:
  • Activation Clustering
  • Spectral Signatures
  • DP-InstaHide
  • Random Filter
  • Perfect Filter
  • Baseline Evaluations:
  • Cifar10 results
  • GTSRB results

Poisoning: Object Detection

  • Description: This scenario implements the four BadDet Object Detection Poisoning attacks: Regional Misclassification, Global Misclassification, Object Disappearance, and Object Generation. For a complete overview, see the object detection poisoning doc.
  • Datasets: Datasets of interest include but are not limited to:
  • Minicoco
  • Baseline Models: Object Detection models in Armory include but are not limited to:
  • YOLOv3
  • Threat Scenario:
  • Adversary objectives:
    • Targeted misclassification (regional and global)
    • Targeted object generation
    • Targeted object disappearance
  • Adversary Operating Environment:
    • Digital dirty label poisoning attack
  • Metrics of Interest: See the OD poisoning doc for a full description of these metrics.
  • AP on benign data
  • AP on adversarial data with benign labels
  • AP on adversarial data wtih adversarial labels
  • Attack success rate (misclassification, disappearance, generation)
  • TIDE OD metrics
  • Baseline Defenses:
  • Activation Clustering
  • Spectral Signatures
  • DP-InstaHide
  • Random Filter
  • Perfect Filter
  • Baseline Evaluations:
  • Minicoco Results to be added

Creating a new scenario

Users may want to create their own scenario, because the baseline scenarios do not fit the requirements of some defense/threat-model, or because it may be easier to debug in code that you have access to as opposed to what is pre-installed by the armory package.

To do so, simply inherit the scenario class and override the necessary functions. An example of doing this can be found in our armory-examples repo.