Datasets

The armory.data.datasets module implements functionality to return datasets of various data modalities. By default, this is a NumPy ArmoryDataGenerator which implements the methods needed by the ART framework. Specifically get_batch will return a tuple of (data, labels) for a specified batch size in numpy format.

We have experimental support for returning tf.data.Dataset and torch.utils.data.Dataset. These can be specified with the framework argument to the dataset function. Options are <numpy|tf|pytorch>.

Currently, datasets are loaded using TensorFlow Datasets from cached tfrecord files. These tfrecord files will be pulled from S3 if not available on your dataset_dir directory.

Image Datasets

Dataset	Description	x_shape	x_dtype	y_shape	y_dtype	splits
cifar10	CIFAR 10 classes image dataset	(N, 32, 32, 3)	float32	(N,)	int64	train, test
german_traffic_sign	German traffic sign dataset	(N, variable_height, variable_width, 3)	float32	(N,)	int64	train, test
imagenette	Smaller subset of 10 classes from Imagenet	(N, variable_height, variable_width, 3)	uint8	(N,)	int64	train, validation
mnist	MNIST hand written digit image dataset	(N, 28, 28, 1)	float32	(N,)	int64	train, test
resisc45	REmote Sensing Image Scene Classification	(N, 256, 256, 3)	float32	(N,)	int64	train, validation, test
Coco2017	Common Objects in Context	(N, variable_height, variable_width, 3)	float32	n/a	List[dict]	train, validation, test
xView	Objects in Context in Overhead Imagery	(N, variable_height, variable_width, 3)	float32	n/a	List[dict]	train, test
minicoco	A 3-class subset of Common Objects in Context	(N, variable_height, variable_width, 3)	float32	n/a	List[dict]	train, validation

NOTE: the Coco2017 dataset's class labels are 0-indexed (start from 0).

Multimodal Image Datasets

Dataset	Description	x_shape	x_dtype	y_shape	y_dtype	splits
so2sat	Co-registered synthetic aperture radar and multispectral optical images	(N, 32, 32, 14)	float32	(N,)	int64	train, validation
carla_obj_det_train	CARLA Simulator Object Detection	(N, 960, 1280, 3 or 6)	float32	n/a	List[dict]	train, val
carla_over_obj_det_train	CARLA Simulator Object Detection	(N, 960, 1280, 3 or 6)	float32	n/a	List[dict]	train, val

CARLA Object Detection

The carla_obj_det_train dataset contains rgb and depth modalities. The modality defaults to rgb and must be one of ["rgb", "depth", "both"]. When using the dataset function imported from armory.data.datasets, this value is passed via the modality kwarg. When running an Armory scenario, the value is specified in the dataset_config as such:

 "dataset": {
    "batch_size": 1,
    "modality": "rgb",
}

When modality is set to "both", the input will be of shape (nb=1, 960, 1280, 6) where x[..., :3] are the rgb channels and x[..., 3:] the depth channels.

The carla_over_obj_det_train dataset has the same properties as the above mentioned dataset but is collected utilizing overhead perspectives.

Audio Datasets

Dataset	Description	x_shape	x_dtype	y_shape	y_dtype	sampling_rate	splits
digit	Audio dataset of spoken digits	(N, variable_length)	int64	(N,)	int64	8 kHz	train, test
librispeech	Librispeech dataset for automatic speech recognition	(N, variable_length)	float32	(N,)	bytes	16 kHz	dev_clean, dev_other, test_clean, train_clean100
librispeech-full	Full Librispeech dataset for automatic speech recognition	(N, variable_length)	float32	(N,)	bytes	16 kHz	dev_clean, dev_other, test_clean, train_clean100, train_clean360, train_other500
librispeech_dev_clean	Librispeech dev dataset for speaker identification	(N, variable_length)	float32	(N,)	int64	16 kHz	train, validation, test
librispeech_dev_clean_asr	Librispeech dev dataset for automatic speech recognition	(N, variable_length)	float32	(N,)	bytes	16 kHz	train, validation, test
speech_commands	Speech commands dataset for audio poisoning	(N, variable_length)	float32	(N,)	int64	16 kHz	train, validation, test

NOTE: because the Librispeech dataset is over 300 GB with all splits, the librispeech_full dataset has all splits, whereas the librispeech dataset does not have the train_clean360 or train_other500 splits.

Video Datasets

Dataset	Description	x_shape	x_dtype	y_shape	y_dtype	splits
ucf101	UCF 101 Action Recognition	(N, variable_frames, None, None, 3)	float32	(N,)	int64	train, test
ucf101_clean	UCF 101 Action Recognition	(N, variable_frames, None, None, 3)	float32	(N,)	int64	train, test

NOTE: The dimension of UCF101 videos is (N, variable_frames, 240, 320, 3) for the entire training set and all of the test set except for 4 examples. For those, the dimensions are (N, variable_frames, 226, 400, 3). If not shuffled, these correspond to (0-indexed) examples 333, 694, 1343, and 3218. NOTE: The only difference between ucf101 and ucf101_clean is that the latter uses the ffmpeg flag -q:v 2, which results in fewer video compression errors.These are stored as separate datasets, however.

Preprocessing

Armory applies preprocessing to convert each dataset to canonical form (e.g. normalize the range of values, set the data type). The poisoning scenario loads its own custom preprocessing, however the GTSRB data is also available in its canonical form. Any additional preprocessing that is desired should occur as part of the model under evaluation.

Canonical preprocessing is not yet supported when framework is tf or pytorch.

Splits

Adversarial Datasets

See adversarial_datasets.md for descriptions of Armory's adversarial datasets.

Dataset Licensing

See dataset_licensing.md for details related to the licensing of datasets.