Business Case

High employee turnover comes as a high cost to companies. The calculation includes everything from recruiting and training to intangible costs like loss of knowledge, decline in productivity and increased workload for other employees.

It this case we will use Machine Learning to better understand the key factors of employee turnover and use this knowledge to enhance work environment and reduce employee turnover.


To train a Machine Model we need a good amount of present and historical HR-data. Very large enterprises might have sufficient data, but most companies don’t have enough data to train a good and accurate machine learning model.

Use Case

Three companies form a collaboration, each contributing with employee data to address the challenge of limited data. However, employee data is highly sensitive and cannot be shared outside an organization.

With our platform for decentralized Machine Learning we allow the three companies to collaborate and pool their HR data. And then jointly build and train a machine learning model in a privacy preserving environment.

First the parties agree on a common learning objective. In this case it is to predict and get valuable insights on employee turnover.

Using the techniques of decentralized machine learning, the model is trained locally with employee data from each party. Parameters from each party are aggregated and the shared model is trained on data from all three companies.

By only sharing model parameters, no private or sensitive employee data are exposed from either company. And model parameters cannot be reverse-engineered to reveal any original training data.

Finally, the trained model becomes available to all parties. Each company, can independently of the others, use the model to better understand and reduce their employee turnover.


Even though we pool data from several parties we need a reasonable set of variables (features) on each employee. And data should preferable include variables from all 3 below categories

For this case we used fictional data created by Watson Analytics. Find the sample data here


Employee retention and Machine Learning is a hot combination and depending on the use case there might be alternatives to decentralized machine learning. We can think of 3 other options:

  • A large enterprises, with sufficient HR data can train a model inhouse with traditional machine learning. No need for distributed machine learning.
  • Or you can search for an existing trained model. Is there a trained ‘employee turnover model’ available for purchase? Be aware of training data, is the model trained on data which is representative for your organization.
  • Another option is to apply cryptography. Collect and store data from all parties centrally and train a model with traditional Machine Learning. Apply advanced cryptography to ensure privacy. However, combining cryptography and machine learning is difficult and is currently an active research field.