Train AI models while keeping customer trust
Many companies in the AI game are facing the dilemma of how to collect precious AI model training data when the nature of their product means users are likely to be disclosing personal, sensitive or commercially confidential information.
This is especially the case in the booming personal AI space. These are “productivity tools” that can be used by people to save time doing tasks that are specific to their jobs. They range from apps which take meeting notes and generate summaries, to apps that generate content imitating your own personal writing style.
While training data is essential for AI startups to improve their products, collecting user data and including it in training sets for AI models not only puts users at risk of naively disclosing their company’s confidential data, it leads companies to ban their employees from using AI tools.
Fortunately there are two tried and tested techniques to resolve this tension.
Federated Learning
Instead of collecting data into a central location and doing the AI model training centrally, the technique known as federated learning trains locally on participating user devices each with its own dataset and then shares the updates to a central location. Local trained models are repeatedly sent back to the central location where they are combined step-by-step to produce the final, master AI model. In the end, the final AI model is shared with all the participating devices.
This approach is valuable for any company or use case where data privacy is crucial. It could, for example, be used in the medical setting using AI diagnostic tools where multiple participating hospitals would be able to maintain privacy for their patient clinical data.
This technique uses anonymization (removing personally identifying data) together with aggregation methods that combine the model updates so that individual data cannot be deciphered or extracted.
Differential Privacy
Differential privacy works by adding a predetermined amount of randomness, or “noise,” into a computation performed on a data set. This slight alteration hides identifiable characteristics of individuals, ensuring personal information is protected, but it’s small enough to not materially impact the accuracy of the data.
Be Transparent and Get Permission
Of course, it’s always the best approach to tell customers what you are doing and get permission if the law requires it. If you are using the best privacy protecting strategies such as those outlined here, you will have this as a selling point rather than a source of customer anxiety.
There are plenty of reasons to do it right. In some recent cases, the US Federal Trade Commission (FTC) has imposed a penalty known as algorithmic disgorgement onto companies that used data to train AI models without proper transparency and lawful basis. This penalty is severe as it requires deletion of the data, the models and any outputs.
Waltzer Consulting is able to help organizations cheaply and simply setup their process for collecting training data for their AI systems while keeping customer and stakeholder trust. Get in touch for a confidential discussion.