Key Moments

Privacy Preserving AI (Andrew Trask) | MIT Deep Learning Series

Lex FridmanLex Fridman
Science & Technology3 min read74 min video
Jan 19, 2020|77,580 views|1,925|62
Save to Pod
TL;DR

Privacy-preserving AI enables answering questions with unseen data using tools like remote execution, differential privacy, and secure multi-party computation.

Key Insights

1

Privacy-preserving AI allows data science on sensitive data without direct access, unlocking new research avenues.

2

Remote execution enables computations on data located on remote servers, keeping the data secure.

3

Differential privacy adds noise to query results to protect individual data points, offering a privacy budget.

4

Secure Multi-Party Computation (MPC) allows multiple parties to compute functions on encrypted data without revealing inputs.

5

These technologies can revolutionize fields like healthcare, open science, and personalized services by safeguarding user data.

6

Adoption is driven by commercial viability and regulatory changes, with a long-term goal of individual control over personal data.

THE FUNDAMENTAL QUESTION: DATA WE CANNOT SEE

The core challenge in modern data science is accessing and utilizing sensitive data, such as medical records, which are often inaccessible due to privacy concerns and regulations. This limitation restricts research to easily available datasets, like handwritten digits, while more critical societal problems, like predicting dementia or cancer,remain largely unexplored by the broader machine learning community. The central question posed is whether it's possible to derive meaningful insights and answer questions using data that researchers cannot directly see or access.

REMOTE EXECUTION AND PRIVATE SEARCH

The initial step towards privacy-preserving AI involves remote execution. This technology allows computations to be performed on data residing on a remote machine, such as a hospital's data center, without the data ever leaving its secure environment. Tools like 'pi sift' extend deep learning frameworks to facilitate this. By using pointers, data scientists can interact with remote tensors as if they were local, with computations executing remotely. Complementary to this, private search capabilities allow users to get detailed descriptions of datasets, including metadata and even curated samples, enabling feature engineering and initial data evaluation without direct data exposure.

DIFFERENTIAL PRIVACY: QUANTIFYING PRIVACY PROTECTION

To address the vulnerability of naively retrieving data, differential privacy provides a rigorous mathematical framework for statistical analysis without compromising individual privacy. It ensures that the output of a query is largely invariant to the inclusion or exclusion of any single individual's data. This is achieved by carefully adding noise to the results, controlled by a 'privacy budget' (epsilon). The concept is analogous to randomized response techniques used in social sciences, offering plausible deniability. This contrasts with traditional anonymization techniques, which have proven leaky and susceptible to re-identification attacks, making differential privacy a more robust solution for data protection claims.

SECURE MULTI-PARTY COMPUTATION: COLLABORATIVE ENCRYPTION

Secure Multi-Party Computation (MPC) takes privacy preservation a step further by enabling multiple parties to jointly compute a function over their private inputs without revealing those inputs to each other. Data is split into shares held by different parties, and computations are performed on these encrypted shares. This allows for collaborative model training and prediction across multiple data owners who may not trust each other. While computationally intensive, MPC ensures that both the data and the models remain encrypted throughout the process, preventing any single party from accessing sensitive information, thereby enabling truly private collaborative AI.

BROAD USE CASES AND SOCIETAL IMPACT

The convergence of these privacy-preserving technologies has profound implications across various sectors. 'Open data for science' can unlock vast, previously inaccessible datasets, accelerating research and innovation, akin to the impact of ImageNet. 'Single-use accountability' systems enhance privacy in surveillance and auditing by limiting data access to specific, auditable functions, minimizing potential misuse. 'Encrypted services' promise end-to-end encrypted medical diagnoses, financial advice, or personalized recommendations, where users retain full control over their sensitive data while still benefiting from advanced AI-driven services.

INFRASTRUCTURE, ADOPTION, AND THE FUTURE VISION

The ultimate goal is to empower individuals with full control over their data, allowing them to assign personal privacy budgets. This necessitates building robust infrastructure, likely starting with enterprise adoption driven by commercial benefits (data scarcity increasing value) rather than purely privacy concerns. Future development includes faster networks, optimizations for cloud-based computations, and potentially new institutions like 'data banks' to manage shared data assets and ensure accountability. While challenges remain, the theoretical framework exists, and the focus now shifts to engineering, adoption, and maturing these technologies to create a more equitable and secure data landscape.

Privacy-Preserving AI: Key Tools and Concepts

Practical takeaways from this episode

Do This

Utilize remote execution to keep data on local machines.
Leverage search and sampling for feature engineering without full data access.
Employ differential privacy for a formal, rigorous privacy budgeting mechanism.
Use secure multi-party computation for shared governance and encrypted computations.
Consider encrypted services combining ML, MPC, and differential privacy for end-to-end protection.
Focus on generalization in models, not just individual data points.

Avoid This

Do not naively use 'get' requests with remote execution; this can expose data.
Avoid relying solely on data anonymization; it is often insufficient and misleading.
Be aware of the computational complexity and potential slowdowns with encrypted computations.
Do not assume federated learning alone is a secure protocol; it needs to be combined with techniques like differential privacy to prevent data leakage.
Do not forget the risk of exposing models sent for remote training.

Common Questions

Privacy-preserving AI aims to solve the problem of accessing and utilizing sensitive data for valuable insights, such as medical research, without compromising individual privacy. This is crucial because access to sensitive data is often restricted, hindering progress on important societal issues.

Topics

Mentioned in this video

More from Lex Fridman

View all 505 summaries

Found this useful? Build your knowledge library

Get AI-powered summaries of any YouTube video, podcast, or article in seconds. Save them to your personal pods and access them anytime.

Try Summify free