Build Machine Learning models without sharing, moving or collecting data
Apply machine learning where data is located
Traditionally machine learning models are trained on data stored centrally and requires collection of all data somewhere other than the data origin. With our platform we apply machine learning where data is located.
Using the techniques of distributed machine learning we enable training of machine learning models on data scattered between different companies, within silos or within multiple installations of user-facing apps.
Turning challenges into opportunities
We understand how algorithms are inferior without data and are addressing some of the hard challenges related to limited data and privacy.
A considerable amount of data is needed to make use of machine learning. Amounts that many companies do not have available.
A clear and transparent policy of how organizations collect and use data is fundamental to build digital trust.
How it works
The Distributed Machine Learning platform enables several parties to collaboratively train a machine learning model in a privacy-preserving environment. The platform ensures that training data of each participant is kept entirely private and never leaves the premises. The process is shown in the figure below.
Each client runs a number of local epochs on their own training set. The local parameters are uploaded to the Platform and once the other client has uploaded their parameters, both clients receive the aggregated global parameters. This is re-iterated for a number of global epochs.
As illustrated, only model parameters are ever exchanged with the Collektive Platform and each client only sees their own local parameters and the global parameters that are a result of the aggregation of all clients’ parameters. Therefore, the training data of each client is kept entirely private and never leaves the premises.
Other key features of the platform includes:
Privacy by Design
We guarantee and provide full privacy of the training data. The platform only collects the model parameters in a way that cannot be reverse-engineered to reveal the original training data. And even the parameters are aggregated securely on platform.
A trained model can be stored in the Collektive Model Storage where it can easily be retrieved for prediction or re-training. Furthermore, model versioning keeps track of your trained models and ensures that you can always return to a previously trained model.
Follow the training online and get useful graphs and metrics on the model performance during the training, so you can see how well the model performs. This gives valuable insights into the workings of the algorithm and makes it possible to easily compare different models.
The training clients themselves use well-known open-source frameworks for training, so as a data scientist you will work with familiar tools. No new APIs to learn. We currently support TensorFlow, Scikit-Learn and Keras and more is coming soon.
A key to successful machine learning is large amounts of good quality data. A common problem is not having enough data and collecting more may be expensive or not possible due to regulations or privacy concerns.
Here Distributed Machine Learning comes in as an innovative alternative by enabling use of data without sharing, moving or collecting.
Let us start the conversation