Machine Learning Challenges on sensitive data
Machine learning challenges are a structuring part of the data science landscape.
The most famous platform, Kaggle (owned by Google) has a community of more than one million users since 2010 and offers prize-pool that can exceed $100,000.
THESE CHALLENGES HAVE A DOUBLE INTEREST: FOR COMPANIES AND FOR THE DATA SCIENTISTS PARTICIPATING IN THE CHALLENGES.
FOR DATA PROVIDERS....
These challenges allow data managers (companies, research centres, etc.) to present their problems, collect new ideas and estimate the best possible performance with state-of-the-art techniques.
For data scientists…
For competing data scientists, these challenges allow them to train in machine learning, to be able to work on real data and to demonstrate their ability to develop machine learning algorithms.
THE SUBSTRA FRAMEWORK ALLOWS TO SOLVE THIS PROBLEM AND THUS MAKES POSSIBLE MACHINE LEARNING CHALLENGES ON SENSITIVE DATA
NO SHARED DATA
The data provider hosts a Substra node where it stores its data. He is the only one who can access and visualize the data.
AN AVAILABILITY OFFERED
Competitor data scientists can then develop algorithms and send these algorithms to train on the data provider's node.
PERFORMANCE MONITORING
The traceability intrinsic to the framework allows to track the performance of all algorithms that have been sent to the platform.