Witches' Brew Poisoning
Witches' Brew is a clean-label attack but there is no backdoor trigger involved. The adversary selects individual source
images from the test set; these are the images that the adversary wants to misclassify as target
and are called triggers, not to be confused with the backdoor trigger in DLBD and CLBD attacks. The attack uses a gradient-matching algorithm to modify a portion of the train-set target class, such that the unmodified test-set triggers will be misclassified.
Configuration file
Trigger image specification
Witches' Brew requires a source_class
, target_class
, and trigger_index
. The field target_class
is required, but either of the other two may be left null
. If trigger_index
is null
, trigger images will be chosen randomly from the source class. If source_class
is null
, it will be inferred from the class labels of images at the provided trigger index.
Witches' Brew seeks to misclassify individual images; each has to be specified explicitly. If multiple triggers are desired, there are several equivalent ways to accomplish this. Some examples will illustrate. Suppose you want three trigger images from class 1, each with a target class of 0. The following configurations are equivalent:
source_class: 1
target_class: 0
trigger_index: [null, null, null]
source_class: [1,1,1]
target_class: 0
trigger_index: null
source_class: 1
target_class: [0,0,0]
trigger_index: null
source_class: [1,1,1]
target_class: [0,0,0]
trigger_index: null
source_class: [1,1,1]
target_class: [0,0,0]
trigger_index: [null, null, null]
Similarly, you can request triggers from different source classes by doing something like this:
source_class: [1,2,3]
target_class: 0
trigger_index: null
(selects triggers randomly from classes 1, 2, and 3, each with a target of 0).
Or this:
source_class: [null, null, null]
target_class: [4,5,6]
trigger_index: [10,20,30]
(Uses images 10, 20, and 30 as triggers, whatever their source label, with targets of 4, 5, and 6 respectively. Note that source and target class may not be the same.)
Witches' Brew dataset saving and loading
Because generating poisoned data takes so much longer for Witches' Brew than for the backdoor attacks, Armory provides a means to save and load a poisoned dataset. A filepath may be provided in the config under attack/kwargs/data_filepath
. If this path does not exist, Armory will generate the dataset and save it to that path. If the path does exist, Armory will load it and check that it was generated consistent with what the current config is requesting, in terms of source, target, perturbation bound, and so forth. If there are any discrepancies, a helpful error is raised. If you are loading a pre-generated dataset, source_class
, target_class
, and trigger_index
may all be null. If you want to re-generate a poisoned dataset that already exists, you can delete the old one or rename it. Alternatively, you may set attack.kwargs.overwrite_presaved_data:true
within the config, but use caution: if you forget to reset it to false
, or pass the config to someone else, it can take a lot of time to re-generate the poison.
Witches' Brew Metrics
Because test-time data is not poisoned for the witches' brew attack, it doesn't make sense to use the four primary metrics described in poisoning.md. Instead, we have these three:
- attack_success_rate
- accuracy_on_trigger_images
- accuracy_on_non_trigger_images
attack_success_rate
is the percentage of trigger images which were classified as their respective target classes, while accuracy_on_trigger_images
is the percentage of trigger images that were classified as their natural labels (source classes). Similarly, accuracy_on_non_trigger_images
is the classification accuracy on non-trigger images.
The fairness and filter metrics remain the same.