Artefact is proud to announce that it has been awarded the Labelia - Responsible and Trusted AI label by the independent association Labelia Labs.
Read MoreIn the first part of this article, you had your first encounter with Differential Privacy and learned why it’s so awesome. In this second part, we’ll present to you three python libraries for implementing Differential Privacy: Difflibpriv, TensorFlow-Privacy and Opacus.
Read MoreDoes the term Differential Privacy ring a bell? If it doesn’t then you’re in for a treat! The first part of this article provides a quick introduction to the notion of Differential Privacy, a new robust mathematical approach to formulate the privacy guarantee of data related tasks. We will be covering some of its use cases, untangle its mechanisms and key properties and see how it works in practice.
Read MoreThis blog post is an introduction to the concept of pattern distillation and its link to privacy. It was written by Gijs Barmentlo as part of Data For Good season 8.
Read MoreThis article attempts to clear up the complex and vast subject of fairness in machine learning. Without being exhaustive, it proposes a certain number of definitions and very useful tools that every Data Scientist must appropriate in order to address this topic.
Read MoreSo as to meet the privacy requirements that certain domains demand for their data, one solution is to move towards distributed, collaborative and multi-actor machine learning. This implies the development of a notion of contributivity, to quantify the participation of a partner to the final model. The definition of such a notion is far from being immediate. Being able to easily implement, experiment and compare different approaches is therefore mandatory. It requires a simulation tool, which we have undertaken to create, in the form of an open source Python library. Its development is ongoing, in the context of a workgroup bringing together several partners.
Read MoreThe general documentation for the Substra framework has been published for a while now without taking the time to have a look at it, so let's take a retrospective look at it together!
As you probably know, open source software isn't all about publicly accessible code. It is also and above all a project, a place where discussions, tests and developments come together. It is therefore essential to be able to settle down comfortably!
Read MoreThe objective of this article is to present the participative approach on the theme "responsible and trustworthy data science" that we initiated in the summer of 2019 and that we have been leading since then. I will follow the thread of the presentation I made at the "Big data & ML" meeting on September 29, 2020. I hope that this blog format will allow as many people as possible to discover this initiative, perhaps to react to it, or even to come and contribute to it. All the feedbacks are welcome, they come to feed the reflection and the work and we need it!
Read MoreIn the first part of this article, we introduced a secure, traceable and distributed ML approach for a deepfake detection benchmark using the Substra framework. In this second part, we will present the technical details of the Substra framework and see the whole process of implementing an example on Substra, in order to allow you to submit your own algorithms or add your dataset to the Substra network.
Read MoreIn this article, we present several facial manipulation techniques known as “deepfakes” and show why it’s important to improve the research on deepfake detection. We present the state of art of the deepfake detection datasets and algorithms, and introduce a secure, traceable and distributed machine learning approach for a deepfake detection benchmark using the Substra framework.
Read MoreThis guest post was written and originally published on Owkin website. Owkin, a fast-growing health data AI startup, is a core partner to Substra Foundation and dedicates a full tech team to the development of the Substra Framework first version. Several founding members of Substra Foundation are working at Owkin. Owkin and Substrra Foundation are both members of Healthchain consortium.
Read MoreThe MELLODDY consortium brings together 17 partner organizations of different types, working towards a common goal, in multiple countries with different cultures. The project is fully remote, with various businesses and technicals skills brought in by the partners. How does one develop transversality to a project in this context? How to create a common way of working in such an innovative and new collaboration endeavor? That is what Substra Foundation is trying to contribute to...
Read More“More sharing gets more data, more data creates more values but also worries”. This blog post is the first guest post on the Substra Foundation blog. It is written by Noggin, a digital, data & analytics company based in Singapore.
Read MoreIn this blog article we present the most important Privacy-Enhancing Techniques (PETs) that are currently being developed and used by various tech actors. We briefly explain their principles and discuss their potential complementarities with Substra Framework. The aim of this article is to present potential Substra Framework development and possible integration with other technologies.
Read MoreThe massive collection of personal data represents a new risk to privacy, and citizens-consumers are asking their representatives and their companies for higher security standards. Whereas personal data has been historically protected through anonymization, this technique often appears inefficient when artificial intelligence models are trained on personal data. New securing frameworks must be developed that can rely either on data centralization in a single vault or decentralization in multiple data warehouses.
Read MoreHospitals have a huge amount of data
Every year, millions of patients are treated in French hospitals. The data of these patients are naturally stored in each of the hospitals' Information Systems (IS) and constitute an essential material not only for patient care but also for clinical research.
Hospitals are full of data (patient data and associated diagnoses) in many departments: mammograms and their diagnosis for breast cancer, genomic data and associated diseases, etc.
How can this data be made available for use in the world?
Read MoreToday, anywhere in the world, when a researcher or a data scientist wants to train an algorithm to do machine learning and create a prediction model, s/he must usually begin by grouping or gaining access to an already constituted dataset. S/he observes these data, consults some descriptive statistics, and manipulates them, etc. At this point, a problem of trust arises; from the moment one accesses the data the only protections against an illegitimate use of it are the ethical stances of the data scientist and/or the law, upheld through contracts or data usage agreements. Ethics and the law, that is-- trust, which is at the heart of collaborative work. But is trust always enough?
Read More