This page describes a summer project undertaken by Rajeev Joshi (UT Austin) while a summer intern at SRC, summer 1997, hosted by Rustan Leino.
The goal of Extended Static Checking (ESC) is to check, at compile time, for common runtime errors, e.g., dereferencing a nil pointer, accessing an array out of bounds. Given a program annotated with a specification, an ESC compiler produces a verification condition, a predicate in first-order logic, which expresses that the program meets the given specification. This verification condition is then given to a theorem prover, which checks whether or not it is valid. In most cases, the prover is expected to report either that the condition is valid or that it is invalid (and then also provide an error context); in some cases, the prover may just give up.
My project this summer (under Rustan Leino) was to study some of the difficulties that arise in reasoning about programs with recursive data structures such as linked lists. To track our progress, we chose an example based on the Sieve of Eratosthenes algorithm for generating primes; the example seems to be fairly representative of the class of programs we were interested in. Over the course of the summer, we manually sketched a proof of correctness of this program and determined that we needed to modify one of the original axioms, as the prover is not equipped with induction. I also extended Rustan's ESC/Ecstatic compiler to generate appropriate verification conditions for programs with data abstraction and simple recursive data types. We then tried running the prover on the resulting conditions and found that, with the exception of one axiom schema which we had to instantiate ourselves, the entire example could be checked automatically.
In the remainder of this note, I shall give a brief introduction to ESC and describe the problem that I worked on during the summer.
Consider the following sample specification:
class Rational { field num, den : integer spec field ok : boolean = (den # 0) } procedure normalise(r : Rational) { pre r.ok modifies r.num, r.den post r.num * r.den' = r.num' * r.den }
Class Rational has three fields: two integer fields representing the numerator and the denominator, and a specification field, whose value is a function of the values of other fields. In ESC jargon, the specification field is said to depend on the fields of which it is a function. In the example above, for all objects r of type Rational, we have that r.ok depends on r.den .
The specification of procedure normalise has three clauses which refer, in order, to the precondition, the modification list, and the postcondition respectively. The precondition is a first-order predicate which is assumed to hold upon entry into the procedure; it is expected to be established by the caller. The postcondition is a first-order predicate which is required to hold upon exit from the procedure body; it relates unprimed variables (denoting variable values upon entry) and primed variables (denoting variable values upon exit) and it is expected to be satisfied by the procedure implementation. The modification list describes the variables that the implementation is allowed to modify; it is translated into predicates, called modification constraints, which are conjoined to the postcondition.
Informally speaking, a modification constraint asserts that the value of a field (such as den above) changes only in those objects where the procedure is allowed to modify it. Modification lists may mention specification fields; in such cases, the list is first desugared (i.e., rewritten) into another list before modification constraints are generated. The rules for desugaring modification lists are somewhat complex -- they are based on engineering decisions intended to make them easier for programmers to use -- but, to get some idea of how they work, consider the modification list
modifies r.ok
for the example above. This list would be desugared into
modifies r.ok , r.den
which produces the following modification constraints, where s ranges over objects of type Rational:
(All s :: s.ok = s.ok' \/ s = r) (All s :: s.den = s.den' \/ s = r) (All s :: s.num = s.num')
In more complex examples in which several levels of abstraction are layered one on top of another, desugaring a modification list typically involves computing a transitive closure of the dependency relation.
To understand the difficulties that may arise with recursive data structures, consider the following definition of a linked list:
class List { field value : integer field next : List spec field valid : boolean = (value # 0 /\ (next = nil \/ next.valid)) } procedure Init(l : List) { modifies l.valid post l.valid' }
Note that the declaration of field valid introduces the following dependency:
for all objects l of type List , l.valid depends on l.next.valid .
Such a dependency of a specification field (valid at object l) on itself (valid at object l.next) is called a cyclic dependency. A preliminary difficulty with cyclic dependencies is finding a compact representation for modification lists. Recall that desugaring typically requires computing the transitive closure of the dependency relation. In the presence of cyclic dependencies, this closure is no longer finite. For instance, a naive desugaring of the modification list for procedure Init above yields
modifies l.valid, l.next.valid, l.next.next.valid, ... l.value, l.next.value, l.next.next.value, ... l.next, l.next.next, l.next.next.next, ...
so we need a way to represent such infinite sets succinctly. A simple solution is to use a closure operator "*" (pronounced "star") which denotes "0 or more occurrences of" ; then, the modification list above may be written as
modifies l.next*.valid, l.next*.value, l.next*.next .
The next difficulty is to determine how to generate appropriate modification constaints for such expressions. A naive translation of
modifies l.next*.value
would give
(All s :: s.value = s.value' \/ (s \in l.next*)) .
Since the prover is not equipped to reason about such closures, we rewrite the second disjunct using a 4-argument predicate REACH, introduced in 1983 by Greg Nelson (who also provided a set of useful axioms for it). Informally speaking, (REACH u z f x) asserts that it is possible to reach object z from object u by following the pointer field f, without ever following the f pointer of object x. For the example above, the modification constraint would be written as
(All s :: s.value = s.value' \/ (REACH l s next nil) ) .
In proving the Sieve of Eratosthenes example, we found that we needed only two axioms about REACH (viz., A1 and A2 from Nelson's paper). We also discovered that we had to modify the pointwiseness axiom schema for cyclic dependencies (due to Dave Detlefs), which, for the example above, yielded
(All s :: (All w : w # nil /\ (REACH s w next nil) : w.value = w.value' /\ w.next = w.next') ==> s.valid = s.valid' )
We found that, since the prover is not equipped for induction, this axiom was not enough to prove even the following simple Hoare triple (due to Jim Saxe):
{ s.valid /\ s # t /\ (REACH s t next nil) /\ t.valid } t.next := nil { s.valid }
Our new axiom schema now yields
(All s :: (All w : w # nil /\ (REACH s w next nil) : (w.value = w.value' /\ w.next = w.next') \/ w.valid = w.valid' ) ==> s.valid = s.valid' )
which seems sufficient, at least for the examples we have considered.
There are at least two directions for future work. The first is to study programs with recursive data structures in which more than one field is involved in the recursion. (An example is a binary tree, with pointers for left- and right- children.) This involves generalising the REACH predicate to accept a relation (or a set of fields) as its third argument. Since we have used only 2 axioms about REACH so far, this extension may turn out to be reasonably straightforward. The second direction of research is to find sufficient conditions under which users may not introduce inconsistencies into the checker. Currently, as pointed out by Jim Saxe, it is possible to introduce an inconsistency by using cyclic dependencies. As an example, consider changing the definition of valid to
spec field valid : boolean = (value # 0 /\ (next = nil \/ ~next.valid))
where ~ denotes negation. Suppose initially that there is only one object t, whose value field is nonzero and whose next field is nil. Then the assignment
t.next := t
introduces an inconsistency, since now there is no solution to the recursive equation in valid.