Dataset ConfigurationΒΆ

ConfusionFlow requires some additional meta-information on the dataset and fold compositions. The information is stored in a YAML file.

The example below shows a configuration for the MNIST dataset. The file contains the dataset identifier dataset, a short description description and a list of the class labels classes.

It then specifies the dataset folds folds as a list. In the example we have two fold specifications, one for fold train and one for fold test. A fold has a field description for short annotations and a field classfrequencies where the frequencies for each class are added as a list of key value items.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
dataset: mnist
description: MNIST Dataset
classes:
    - 0
    - 1
    - 2
    - 3
    - 4
    - 5
    - 6
    - 7
    - 8
    - 9
folds:
    - train:
          description: Fashion MNIST Train Fold
          classfrequencies:
              - 0: 6000
              - 1: 6000
              - 2: 6000
              - 3: 6000
              - 4: 6000
              - 5: 6000
              - 6: 6000
              - 7: 6000
              - 8: 6000
              - 9: 6000
    - test:
          description: Fashion MNIST Test Fold
          classfrequencies:
              - 0: 1000
              - 1: 1000
              - 2: 1000
              - 3: 1000
              - 4: 1000
              - 5: 1000
              - 6: 1000
              - 7: 1000
              - 8: 1000
              - 9: 1000

Note

The list of key-value item values might seem a bit strange as it will be parsed by YAML as a list of dictionaries of size 1. We decide for the option which is more user-friendly when editing the yml file by hand as one can simply copy the list of class labels from classes and append the frequencies.