Dataset ConfigurationΒΆ
ConfusionFlow requires some additional meta-information on the dataset and fold compositions. The information is stored in a YAML file.
The example below shows a configuration for the MNIST dataset.
The file contains the dataset identifier dataset
, a short description
description
and a list of the class labels classes
.
It then specifies the dataset folds folds
as a list. In the example we have
two fold specifications, one for fold train
and one for fold test
.
A fold has a field description
for short annotations and a field
classfrequencies
where the frequencies for each class are added as a list
of key value items.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 | dataset: mnist
description: MNIST Dataset
classes:
- 0
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
folds:
- train:
description: Fashion MNIST Train Fold
classfrequencies:
- 0: 6000
- 1: 6000
- 2: 6000
- 3: 6000
- 4: 6000
- 5: 6000
- 6: 6000
- 7: 6000
- 8: 6000
- 9: 6000
- test:
description: Fashion MNIST Test Fold
classfrequencies:
- 0: 1000
- 1: 1000
- 2: 1000
- 3: 1000
- 4: 1000
- 5: 1000
- 6: 1000
- 7: 1000
- 8: 1000
- 9: 1000
|
Note
The list of key-value item values might seem a bit strange as it will be
parsed by YAML
as a list of dictionaries of size 1.
We decide for the option which is more user-friendly when editing the yml
file by hand as one can simply copy the list of class labels from classes
and append the frequencies.