Learning

Centralized

The model can generalize based on data from a group of devices and thus instantly work with other compatible devices.

Data can explain all variations in the devices and their environment.

Connectivity - data must be transmitted over a stable connection.
Bandwidth - e.g. a new electrical substation could generate 5 GB/s.
Latency - real-time applications, e.g automation, require very low latency.
Privacy - sensitive operational data must remain on site.

Decentralized

ML that runs onsite, onboard each connected device, by continuously training the ML model on streaming data, the devices learn an individual model for their environment.

Each model only needs to be able to explain what is normal for itself and not how it varies compared to all other devices.

Models adapt to changes over time, learning is not constrained by the internet connection, and no confidential information needs to be transferred to the cloud.

It is not possible to get an overall view and learning.

Federated

ML technique to train algorithms across decentralized edge devices while holding data samples locally.

Google started as the main player.

Aim to train ML models on billions of mobile phones while respecting the privacy of the users.

Only sends fractions of training results:
- E.g: training derivatives, to the could.
Never store anything on the device.

When collected in the cloud, the partial training results can be assembled into a new supermodel that, in the next step, can be sent back to the devices.

Google open source framework TensorFlow Federated.

Model inspection - evaluation of device behavior through its model.

Model comparison - comparing models in the cloud to find outliers, the super-models.

Robust learning - learning can continue even if the connection to the cloud is lost.

Tailored initialization - new devices can start with a model from a similar device, instead of a general super-model.

Iterative

FL employs an iterative method containing multiple client-server exchanges: federated learning round.

Diffuse the current/updated global model state to the contributing nodes (participants).
Train the local nodes on those nodes to yield certain potential model updates from the nodes.
Process and aggregate the updates from local nodes into an aggregated global update so that the central model can be updated accordingly.

FL server is used for this processing and aggregation of local updates to global updates.

Local training is performed by local nodes with respect to the commands of FL server.

The FL approach allows for the mass processing of data in a distributed way.

It can follow a client-server architecture.

The server sends the model to be created to the client (1 - green lines).
Results of the local computation are sent to the server, which aggregates them into the global model (2 - blue lines).
Returns the new aggregated model to the clients (3 - red lines).
This iteration, named federated learning round (FLR), occurs until some stopping criterion is reached, such as model convergence or the maximum number of iterations reached.
Edge devices only send information from their local nodes (parameters, hyperparameters (before training), weights, etc).

Federated learning distributed deep learning by eliminating the necessity of pooling the data into a single place.

In FL, the model is trained at different sites in numerous iterations.

Model aggregation

Effective aggregation of distributed models across devices is essential for creating a generalized global model.

Its efficiency affects precision, convergence time, number of rounds, and network overhead.

Federated stochastic gradient descent (SGD): uses a single instance of the dataset to perform the local training on the client per round of communication. SGD requires a substantial number of training rounds to produce reliable models. This algorithm is the baseline of federated learning.

FedAvg algorithm starts from the SGD, but each client locally performs a train using the local data at the current model with multiple steps of SGD before sending the models back to the server for aggregation.

FedAvg reduces the communication overhead required to upload and download the FL model.
It requires clients to perform more total computation during training.
Local epoch: one complete pass of the training dataset through the algorithm.

Fault Tolerant Federated Average: the ability of a computing system to continue working in the event of a failure.

Can tolerate some nodes being offline during secure aggregation.

Q-Federated Average: reweight the objective in order to achieve fairness in the global model.

Gives higher weights to devices with poor performance.
The network's accuracy distribution becomes more uniform.
Fk to the power of (q+1), q is a parameter that tunes the amount of fairness to impose.

Federated Optimization: uses a client optimizer during the multiple training epochs and a server optimizer during model aggregation.

ADAGRAD, ADAM, and Yogi

TensorFlow Federated

Open source framework for experimenting with machine learning and other computing on decentralized data.

Locally simulating decentralized computations into the hands of all TensorFlow users.

ML model architecture of our choice.
Train it locally across data by all users.

The version of the NIST dataset that has been processed by the Leaf project separates the digits written by each volunteer.

Training an ML model with federated learning is one example of federated computation.

Evaluating it over decentralized data is another.

An array of sensors capturing temperature readings.
Compute the average temperature across these sensors.

Each client computed its local contribution.

A centralized coordinator aggregates all the contributions.

Flower: A Friendly Federated Learning Framework

Open source framework for experimenting with machine learning and other computations on decentralized data.

Able to use in containers in a federated framework, FedFramework.

List containers

http://10.0.22.37:8000/containers/list

Server creation

http://10.0.22.37:8000/containers/create/server?img_server=fed- server&port=5010&id=10&clients=4&algorithm=FedAvg&model=cnn&rounds=10&epochs=5 &predict=true

Rapid deployment of a testbed and running tests

http://10.0.22.37:8000/run?img_server=fed-server&img_client=fed- client&model=cnn&clients=4&rounds=10&epochs=5&predict=true&predict_client=true

Application for Federated

Domain	Applications
Edge computing	FL is implemented in edge systems using the MEC (mobile edge computing) and DRL (deep reinforcement learning) frameworks for anomaly and intrusion detection.
Recommender systems	To learn the matrix, federated collaborative filter methods are built utilizing a stochastic gradient approach and secured matrix factorization using federated SGD.
NLP	FL is applied in next-word prediction in mobile keyboards by adopting the FedAvg algorithm to learn CIFG.
IoT	FL could be one way to handle data privacy concerns while still providing a reliable learning model.
Mobile service	The predicting services are based on the training data coming from edge devices of the users, such as mobile devices.
Biomedical	The volume of biomedical data is continually increasing. However, due to privacy and regulatory considerations, the capacity to evaluate these data is limited. By collectively building a global model for the prediction of brain age, the FL paradigm in the neuroimaging domain works effectively.
Healthcare	Owkin and Intel are researching how FL could be leveraged to protect patients' data privacy while also using the data for better diagnosis.
Autonomous industry	Another important reason to use FL is that it can potentially minimize latency. Federated learning may enable autonomous vehicles to behave more quickly and correctly, minimizing accidents and increasing safety. Furthermore, it can be used to predict traffic flow.
Banking finance	The FL is applied in open banking and in finance for anti-financial crime processes, loan risk prediction, and the detection of financial crimes.

Domain

Applications

Edge computing

FL is implemented in edge systems using the MEC (mobile edge computing) and DRL (deep reinforcement learning) frameworks for anomaly and intrusion detection.

Recommender systems

To learn the matrix, federated collaborative filter methods are built utilizing a stochastic gradient approach and secured matrix factorization using federated SGD.

NLP

FL is applied in next-word prediction in mobile keyboards by adopting the FedAvg algorithm to learn CIFG.

IoT

FL could be one way to handle data privacy concerns while still providing a reliable learning model.

Mobile service

The predicting services are based on the training data coming from edge devices of the users, such as mobile devices.

Biomedical

The volume of biomedical data is continually increasing. However, due to privacy and regulatory considerations, the capacity to evaluate these data is limited. By collectively building a global model for the prediction of brain age, the FL paradigm in the neuroimaging domain works effectively.

Healthcare

Owkin and Intel are researching how FL could be leveraged to protect patients' data privacy while also using the data for better diagnosis.

Autonomous industry

Another important reason to use FL is that it can potentially minimize latency. Federated learning may enable autonomous vehicles to behave more quickly and correctly, minimizing accidents and increasing safety. Furthermore, it can be used to predict traffic flow.

Banking finance

The FL is applied in open banking and in finance for anti-financial crime processes, loan risk prediction, and the detection of financial crimes.

Federated learning in self-driving

Edge vehicles compute the model locally; after completing each local training epoch, they retrieve the global model version and compare it to their local version.

In order to form a global awareness of all local models, the central server performs aggregation based on the ratio determined by the global and local model versions.

The aggregation server returns the aggregated result to the edge vehicles that request the most recent model.