Datasets and Models
FedML supports comprehensive research-oriented (synthetic and public) FL datasets and models, including four representative synthetic FL datasets used by top-tier publications:
EMNIST: EMNIST dataset extends MNIST dataset with upper and lower case English characters.
CIFAR-100: CIFAR-100 dataset consists of 100 image classes with each containing 600 images.
Shakespeare: Shakespeare dataset is built from the collective works of William Shakespeare.
Stack Overflow: Stack Overflow dataset originally hosted by Kaggle consists of questions and answers from the website Stack Overflow. This dataset is used to perform two tasks: tag prediction via logistic regression and next word prediction.
Datasets with downloading service and API provided
CV
MNIST
cifar10
cifar100
fed_cifar100
fed_emnist
cinic10
ImageNet
Landmarks
NLP
shakespeare
fed_shakespeare
stackoverflow
Finance
lending_club_loan
NUS_WIDE
Other
UCI
Synthetic
edge_case_examples (tailored for paper "Attack of the Tails: Yes, You Really Can Backdoor Federated Learning")
For a comprehensive dataset list, please check the following APIs:
fedml.data.load(args)
(https://github.com/FedML-AI/FedML/tree/master/python/fedml/data) and fedml.model.create(args)
(https://github.com/FedML-AI/FedML/tree/master/python/fedml/data)
Their usage in different algorithms are as follows:
Horizontal Federated Learning:
- Computer Vision: Federated EMNIST + CNN (2 conv layers)
- Computer Vision: CIFAR100 + ResNet18 (Group Normalization)
- Natural Language Processing: shakespeare + RNN (bi-LSTM)
- Natural Language Processing: stackoverflow (NWP) + RNN (bi-LSTM)
- Computer Vision: CIFAR10, CIFAR100, CINIC10 + ResNet
- Computer Vision: CIFAR10, CIFAR100, CINIC10 + MobileNet
- Computer Vision (linear model): MNIST + Logistic Regression
- Computer Vision (linear model): Synthetic + Logistic Regression
Vertical Federated Learning:
- lending_club_loan + VFL
- NUS_WIDE + VFL
FedNAS
- cross-silo CV: CIFAR10, CIFAR100, CINIC10 + ResNet
- cross-silo CV: CIFAR10, CIFAR100, CINIC10 + MobileNet
Split Learning:
- cross-silo CV: CIFAR10, CIFAR100, CINIC10 + ResNet
- cross-silo CV: CIFAR10, CIFAR100, CINIC10 + MobileNet