How does symbolic execution help in repairing data corruption?

Symbolic execution runs a program with symbolic values to explore all possible execution paths. This allows it to identify the specific conditions under which data corruption occurs, enabling the generation of correct data values to maintain program execution.

What is the difference between repairing structural integrity and data integrity?

Repairing structural integrity involves fixing issues with the organization of the data structure itself (e.g., broken links). Repairing data integrity focuses on correcting corrupted values within the structure.

How does the systematic search approach work for repairing data structures?

Systematic search involves executing the repair check, monitoring field accesses, and backtracking to the last accessed field when a violation occurs. It then attempts to mutate this field (e.g., setting a reference to null or a symbolic value) and re-checks until the structure is valid.

Is writing assertions for repair difficult?

Writing assertions is presented as relatively easy, as it involves describing the expected properties of the data structure, which are often already known or specified. The repair mechanism then uses these descriptions.

What are the applications of assertion-based repair?

Applications include enabling programs to continue execution despite data structure corruption, generating large data structures (potentially for testing), and potentially automatically generating fixes for program bugs.

How does this approach relate to Test-Driven Development (TDD)?

The repair approach can be used for generating test cases, similar to TDD, as it leverages the data structure descriptions to create tests, even without a fully implemented program. However, for actual repair, a program implementation is necessary.

What happens if the data itself is corrupted or missing?

If data is severely corrupted or lost (e.g., garbage collected), the repair may not be able to fully restore it. In such cases, the system might cut the structure at the point of data loss or substitute with default values, rather than trying to guess missing information.

Key Moments

Assertion-based Repair of Complex Data Structures

Google Talks

Education5 min read34 min video

Aug 22, 2012|115 views|2

googlevideo

Save to Pod

Want to know something specific about what's covered?

We've already dissected every moment. Ask and we will deliver (with timestamps).

Key Moments

TL;DR

New technique repairs corrupted data structures by using assertions not just to detect but also to fix bugs, allowing programs to continue running instead of crashing.

Key Insights

Instead of terminating a program upon an assertion violation, the proposed method mutates the data structure to a valid state, allowing execution to continue.

Complex data structure integrity constraints can be grouped into structural characteristics (e.g., acyclicity) and data characteristics (e.g., ordered values).

Symbolic execution is used to repair data corruption by identifying conditions under which an assertion violation would not occur, while systematic search repairs structural issues.

The repair process involves backtracking to the last accessed field and trying non-deterministic mutations (e.g., setting to null, trying new nodes) until the assertion passes.

Experiments show the repair approach can handle large data structures with hundreds of thousands of nodes and multiple faults within seconds.

The core idea is to leverage existing assertion descriptions (specifications) for repair, reducing the need for separate, manually written repair routines.

Rethinking assertion violations from detection to repair

Traditionally, when an assertion in a program fails, signaling a data structure corruption, the standard procedure is to terminate the program, debug it, and then re-execute. This approach, however, is often insufficient, especially for persistent data like file systems. If the underlying data is corrupted, the same assertion violation will occur repeatedly, even after debugging. This presentation introduces a novel approach where, instead of halting execution, a violated assertion is used as a basis for repairing the data structure. The program's state is modified to satisfy the assertion, enabling continued execution and reporting the initial violation for later analysis.

Defining and identifying data structure integrity

Complex data structures, such as red-black trees, intentional naming networks, and software caches, are fundamental to many programs. These structures possess specific constraints that, when violated, can lead to program instability or crashes. These constraints can be broadly categorized into two types: structural integrity constraints, which define the form of the structure itself (e.g., a linked list must be acyclic), and data integrity constraints, which pertain to the values stored within the structure (e.g., elements in a red-black tree must be ordered). Assertions, often described using declarative languages like Alloy or Z, or imperative languages like Java, are used to check these properties.

The repair process: Mutating state to satisfy assertions

The core of the proposed repair mechanism involves taking a program state with a violated assertion and mutating it into a new state (S') where the assertion holds true (S'.repOK). This goal is not necessarily to restore the program's intended state, but rather to reach a state where the program can continue executing without crashing due to the underlying data structure corruption. The repair might not always yield the exact intended structure, as multiple valid repairs might exist for a single corruption. The repair focuses on correcting both faults within the structure's topology and faults in the data itself.

Repairing data corruption with symbolic execution

To address corruption in the data values within a structure, the technique of symbolic execution is employed. Unlike concrete execution which uses specific values, symbolic execution runs the program with symbolic values representing inputs. This allows the system to explore all possible program paths and generate path conditions. If a data corruption causes an assertion to fail, symbolic execution can identify the conditions under which the assertion would pass. The system can then generate corrected data values that satisfy these conditions, enabling the program to continue.

Repairing structural integrity with systematic search

For structural integrity constraints, a systematic search approach is used, inspired by techniques for test generation. The process begins by executing the 'repOK' method (which checks the structure's validity) on the current state. When 'repOK' returns false, indicating a fault, the system identifies the last field accessed before the failure. This field is then non-deterministically mutated. For data fields, this might involve changing the value to a symbolic variable. For reference fields, it could mean setting the reference to null or trying a previously known unvisited node. The 'repOK' method is then re-executed with the mutated structure until it returns true, signifying a repaired state.

Practical demonstrations and performance

Experiments were conducted by injecting faults into various data structures like doubly linked lists and software caches. When a fault triggered an exception in a software cache scenario, the unrepaired system would crash. With the repair routine, the system continued execution, logging the issue for later developer attention. Performance tests on data structures with hundreds of thousands of nodes, handling up to 20 faults, demonstrated repair times within 10 seconds for structures like intentional naming systems and file systems. While performance can degrade quadratically with a very high number of faults, the expectation is that significant structural issues are typically caught early.

Leveraging existing assertions for efficient repair

A key advantage of this approach is its reliance on existing 'repOK' methods or assertions, which often already exist as program specifications. This eliminates the need to write separate, manually crafted repair routines for each potential bug. The effort involved in writing assertions for these benchmarks was minimal, with each taking approximately 15-20 minutes. This contrasts with traditional fault tolerance methods that require explicit inspection and coding for every possible failure scenario.

Applications and future directions of assertion-based repair

Beyond direct bug repair, this assertion-based framework has applications in generating large, valid data structures for testing purposes. By providing a structure description and a 'repOK' method, the system can iteratively repair random structures until a valid one is generated. Another investigated application is automatically translating repair actions into actual code fixes, though this remains complex for intricate bugs. The framework also supports different assertion checking strategies, from infrequent checks to comprehensive checks with every garbage collection, depending on the required reliability.

Mentioned in This Episode

●Products

●Software & Apps

●Organizations

●Concepts

●People Referenced

Common Questions

Assertion-based repair is a technique that uses assertions to check the integrity of data structures. If a violation is detected, instead of crashing, the system attempts to repair the data structure to a valid state, allowing the program to continue execution.

Topics

Technology & Innovation Programming & Software Science & Mathematics Fault Tolerance Software Reliability Data Structure Integrity Program Repair Assertion Checking Automatic Code Repair

Mentioned in this video

Software & Apps

Java

An imperative programming language that can be used to describe data structure characteristics via boolean-returning methods.

FS Check Disk

A utility used to check and repair file system integrity, an example of a traditional repair routine.

IBM MVS

An operating system that utilized specialized routines for fault tolerance and recovery.

C++

An imperative programming language that can be used to describe data structure characteristics.

ESTA

A tool for generating large data structures from descriptions.

Alloy

A declarative language used for describing data structure characteristics, based on first-order logic.

JML

A specification language that can be used to describe data structure properties, similar to the 'rep ok' method.

TESTRA

A tool for generating large data structures from descriptions.

CORAT

A tool for generating large data structures from descriptions.

Products

Lucent 5ES

A phone switch that used specialized routines for continuous repair and recovery, ensuring high reliability.

Mars Rover

An example used to illustrate the potential risks of non-deterministic repair in critical applications.

Organizations

MIT

The institution where researchers developed a method for program repair through language translation.

University of Texas at Austin

The institution where the presenter's thesis work was conducted.

Companies

ASML

A declarative language used for describing data structure characteristics.

People

Demsky and Rinard

Researchers at MIT who developed a method to translate programs into another language for repair analysis.

Concepts

Symbolic Execution

A technique that runs programs on symbolic values instead of concrete values to traverse all program paths and generate conditions for each path.

Ask anything from this episode.

Save it, chat with it, and connect it to Claude or ChatGPT. Get cited answers from the actual content — and build your own knowledge base of every podcast and video you care about.

Get Started Free

Assertion-based Repair of Complex Data Structures

Want to know something specific about what's covered?

Key Insights

Rethinking assertion violations from detection to repair

Defining and identifying data structure integrity

The repair process: Mutating state to satisfy assertions

Repairing data corruption with symbolic execution

Repairing structural integrity with systematic search

Practical demonstrations and performance

Leveraging existing assertions for efficient repair

Applications and future directions of assertion-based repair

Mentioned in This Episode

Common Questions

Topics

Mentioned in this video

More from GoogleTalksArchive

Everything is Miscellaneous

Statistical Aspects of Data Mining (Stats 202) Day 7

Key Phrase Indexing With Controlled Vocabularies

Mysteries of the Human Genome

Ask anything from this episode.