The armory.utils.metrics module implements functionality to measure both task and perturbation metrics.


The MetricsLogger class pairs with scenarios to account for task performance against benign and adversarial data as well as measure the perturbations of adversarial samples. Since our datasets are presented as generators, this has update_task and update_perturbation methods that can update metrics for each batch obtained from the generator. The output, which is given by results, is a JSON-able dict.


Name Type Description
categorical_accuracy Task Categorical Accuracy
object_detection_AP_per_class Task Average Precision @ IOU=0.5
top_n_categorical_accuracy Task Top-n Categorical Accuracy
top_5_categorical_accuracy Task Top-5 Categorical Accuracy
word_error_rate Task Word Error Rate
image_circle_patch_diameter Perturbation Patch Diameter
lp Perturbation L-p norm
linf Perturbation L-infinity norm
l2 Perturbation L2 norm
l1 Perturbation L1 norm
l0 Perturbation L0 "norm"
mars_mean_l2 Perturbation Mean L2 norm across video stacks
mars_mean_patch Perturbation Mean patch diameter across video stacks
norm Perturbation L-p norm
snr Perturbation Signal-to-noise ratio
snr_db Perturbation Signal-to-noise ratio (decibels)
snr_spectrogram Perturbation Signal-to-noise ratio of spectrogram
snr_spectrogram_db Perturbation Signal-to-noise ratio of spectrogram (decibels)

We have implemented the metrics in numpy, instead of using framework-specific metrics, to prevent expanding the required set of dependencies.

Targeted vs. Untargeted Attacks

For targeted attacks, each metric will be reported twice for adversarial data: once relative to the ground truth labels and once relative to the target labels. For untargeted attacks, each metric is only reported relative to the ground truth labels. Performance relative to ground truth measures the effectiveness of the defense, indicating the ability of the model to make correct predictions despite the perturbed input. Performance relative to target labels measures the effectiveness of the attack, indicating the ability of the attacker to force the model to make predictions that are not only incorrect, but that align with the attackers chosen output.