Automation of Induction Proofs

Define Problem Scope

We’re going to restrict our reasoning to values described by the grammar

VALUE ::= null
        | integer(INTEGER)
        | cons(VALUE,VALUE)
        

The functions and values below will be our primitives. They behave as we would expect.

Symbol Arity Range Symbol Arity Range
null 0 Value false 0 Bool
integer 1 Value ite 3 *
cons 2 Value and ≥ 0 Bool
is-null 1 Bool or ≥ 0 Bool
is-integer 1 Bool not 1 Bool
is-cons 1 Bool => ≥ 2 Bool
true 0 Bool = 2 Bool

The grammar of the input file is still SMT-LIB 2.0, but with some restrictions.

Desugar

The body of every user-defined function must be desugared.

The claim to be proved must also be desugared.

Other assertions can be left as they are.

To desugar, change each and, or, => to an ite according to these rules.

and()             = true
and(p₁,p₂,...,pₙ) = ite(p₁,and(p₂,...,pₙ),false)

or()             = false
or(p₁,p₂,...,pₙ) = ite(p₁,true,or(p₂,...,pₙ))

=>(p₁,p₂)              = ite(p₁,p₂,true)
=>(p₁,p₂,p₃,p₄,...,pₙ) = ite(p₁,=>(p₂,p₃,p₄,...,pₙ),true)

Unfold

The first step is to compute the depth of each user-defined function called within the claim.

The depth of a given formal argument of a function is the number of times car or cdr is consecutively applied to that argument. The depth of an entire function is the maximum over the depths of its formal arguments. Consider a few examples.

The function below has depth 1.

(define-fun-rec is-natural ((v Value)) Bool
  (or (is-null v)
      (and (is-cons v)
           (is-null (car v))
           (is-natural (cdr v)))))

The function below has depth 1.

(define-fun-rec plus ((v Value) (w Value)) Value
  (ite (is-null v) w
  (ite (is-cons v) (plus (cdr v) (cons null w))
                   null)))

The function below has depth 3.

(define-fun-rec is-list ((v Value)) Bool
  (or (is-null v)
      (and (is-cons v)
           (is-integer (car v))
           (is-null (cdr v)))
      (and (is-cons v)
           (is-integer (car v))
           (is-cons (cdr v))
           (is-integer (car (cdr v)))
           (is-null (cdr (cdr v))))
      (and (is-cons v)
           (is-integer (car v))
           (is-cons (cdr v))
           (is-integer (car (cdr v)))
           (is-cons (cdr (cdr v)))
           (is-integer (car (cdr (cdr v))))
           (is-list (cdr (cdr (cdr v)))))))

Then compute the least common multiple of the depths of all the user-defined functions involved in the claim. Call this value l

For each involved function f, if its depth is d, compute the l/d th unfolding of f and add this new function to the list of definitions. What’s important is each of the unfolded definitions will have the same depth.

To be precise, the 1st unfolding of a function can be computed by expanding each recursive call within its body into the function’s body, with the appropriate substitution applied. Here’s an example to aid my explanation.

(define-fun-rec is-tree ((v Value)) Bool
  (or (is-null v)
      (and (is-cons v)
           (is-tree (car v))
           (is-tree (cdr v)))))

This is its first unfolding.

(define-fun-rec is-tree-1 ((v Value)) Bool
  (or (is-null v)
      (and (is-cons v)
           (or (is-null (car v))
               (and (is-cons (car v))
                    (is-tree-1 (car (car v)))
                    (is-tree-1 (cdr (car v)))))
           (or (is-null (cdr v))
               (and (is-cons (cdr v))
                    (is-tree-1 (car (cdr v)))
                    (is-tree-1 (cdr (cdr v))))))))

Strategize

This is the step where we inspect the claim to be proved and come up with possible induction schemes. Consider the functions is-natural and plus defined in the last section. Suppose we need to prove

∀ v, w ∈ Value, 
  (is-natural(v) ∧ is-natural(w)) ⇒
    plus(cons(null, v), w) = cons(null, plus(v, w))

The first stage is to collect all non-primitive function calls.

Next, iterate over the function calls. For each call, iterate over its arguments and produce a version of the call where that argument has been replaced with a hole.

Look at the universally quantified variables in the claim. Compute all the ways we can “pop” a single variable to induct on while leaving the other variables under the universal quantifier. We can envision each of these “ways” as a pair whose first element is a variable to induct on and whose second element is a list of universally quantified variables.

For each element in this set of pairs. for each hole-bearing function call, produce a new pair. The first element of this new pair is the call’s hole plugged with the variable to induct on. The second element of the new pair is just the list of universally quantified variables.

We only want to consider strategies that are “good” and we’re going to adopt The Little Prover’s idea of only considering calls whose arguments meet the following criteria.

Therefore the list of good strategies is

Subgoals

Pick a strategy from among those in the last section. For this demonstration let’s choose (plus(v, w), [w])

Let P(v,w) represent the claim to be proved, but abstrated over its universally quantified variables.

P(v, w) =
  (is-natural(v) ∧ is-natural(w)) ⇒
    plus(cons(null, v), w) = cons(null, plus(v, w))

Construct a tree from the body of plus, the function mentioned in the strategy. Each node in this tree is an ite form. The expression at the node is the form’s test.

             w
            /
      true /     
          /
is-null(v)                plus(cdr(v), cons(null, w))
          \              /
     false \            / true
            \          /
             is-cons(v)
                       \
                        \ false
                         \
                          null

We work from the leaves to the root to gather the subgoals suggested by this strategy.

Say we’re at a leaf. We scan the expression at the leaf for recursive calls to plus. Say there are two recursive calls – plus(p, q) and plus(r, s) – where p, q, r and s are expressions that may depend on v and w. The induction hypotheses created from these expressions are P(p, q) and P(r, s). The list of subgoals at this leaf is then just the singleton

[ (∀ w, P(p, q)) ⇒ (∀ w, P(r, s)) ⇒ P(v, w) ]

Say we’re at a node. We begin by finding the induction hypotheses (ihs) for the expression at the node (e). We already have the subgoals for the two branches of the node. Let’s call them s-true and s-false. If s-true and s-false are the same, we add ihs as assumptions to each subgoal in s-true and return the result. If s-true and s-false are distinct, we add e and ihs as assumptions to each subgoal in s-true to get s-true'. We also add ¬e and ihs as assumptions to each subgoal in s-false to get s-false'. Then we append s-true' to s-false' and return the result.

Step 1.

             [ P(v, w) ]
            /
      true /     
          /
is-null(v)                [ (∀ w, P(cdr(v), cons(null, w))) ⇒ P(v, w) ]
          \              /
     false \            / true
            \          /
             is-cons(v)
                       \
                        \ false
                         \
                          [ P(v, w) ]

Step 2.

             [ P(v, w) ]
            /
      true /     
          /
is-null(v)                
          \
     false \
            \
             [ is-cons(v) ⇒ (∀ w, P(cdr(v), cons(null, w))) ⇒ P(v, w)
             , ¬is-cons(v) ⇒ P(v, w) ]

Step 3.

[ is-null(v) ⇒ P(v, w)
, ¬is-null(v) ⇒ is-cons(v) ⇒ (∀ w, P(cdr(v), cons(null, w))) ⇒ P(v, w)
, ¬is-null(v) ⇒ ¬is-cons(v) ⇒ P(v, w) ]

Code & Examples

Available upon request, as the code and usage instructions will change over time.

Shortcomings of Z3

  1. Z3 sometimes needs a quantifier pattern to correctly guess the necessary instantiation.

  2. Z3 sometimes needs about 2 seconds – more time than I want to yield – to guess the necessary instantiation.

Planned Features

  1. Support nested induction. An example of a problem that needs nested induction. Start with

    ∀ v, w ∈ Value, (is-natural(v) ∧ is-natural(w)) ⇒ plus(v, w) = cons(w, v)

We know (plus(x,y), [y]) is among the good strategies. One subgoal that arises from it is

is-null(v) ⇒            
  (is-natural(v) ∧ is-natural(w)) ⇒ 
    plus(null, w) = plus(w, null)

Note that it’s okay to have v and w open in this subgoal because this is how the problem will be printed to Z3:

(declare-const v Value)
(declare-const w Value)
(assert (not
  (=> (is-null v)
      (and (is-natural v)
           (is-natural w))
      (= (plus null w)
         (plus w null)))))
(check-sat)

The challenge is to recognize that this subgoal also needs to be considered for a proof by induction. It’s called a nested induction because we need to use induction to prove a subgoal of the main goal.

∀ v, w ∈ Value,
  (is-natural(v) ∧ is-natural(w)) ⇒ 
    plus(null, w) = plus(w, null)
  1. Support functions that call other user-defined functions, including mutually recursive functions.

  2. Guess potentially useful lemmas. See Hipster and IsaPlanner.