Z3 - sequences

# Strings and Sequences in Z3

## Introduction

This section describes Z3's handling of strings, sequences and regular expressions. Z3 offers built-in support for using string constants and dedicated solvers for checking satisfiability over constraints using strings, sequences and regular expressions. Note that the (current) specialized solver is able to establish satisfiability for a non-trivial class of formulas, but is far from a decision procedure: Z3 only solves string equalities using an incomplete heuristic solver (in spite of the existence of complete procedures) and the full combination of lengths and sequences (and regular expressions) is not decidable anyway. In Z3, strings are a special case of sequences, as set forth in a now deprecated proposal on sequences in SMT-LIB. The format for strings and regular expressions is mostly compatible with CVC4's. Note that Z3Str2 has been available since 2013 as a separate distribution. It contains algorithms that offer an effective search space pruning for solving string equalities. Strings are sequences over 8 bit bit-vectors. In addition, Z3 allows constraints over sequences of arbitrary types.

## Strings

### Built-in types and constants

The name String is a built-in name for the String sort. String literals can furthermore be entered directly as literals delimited by quotes. The following example asks whether there are strings a and b that concatenate to "abc" followed by b. load in editor
(declare-const a String)
(declare-const b String)
(assert (= (str.++ b a) (str.++ "abc" b)))
(check-sat)
(get-model)


#### String literals

Z3 follows the proposed SMT-LIB2.5 format for string literals. Thus, strings are enclosed using double quotes. A sequence of two adjacent double quotes within a string literal is used as the escape sequence for a quote. So for example "quote ""me"" on this" corresponds to the string quote "me" on this. Other characters are treated as part of the string. For example, a newline within a string is treated as a new-line character.

To represent non-ASCII characters Z3 treats the sequence \xHH where HH are two hexa-decimal numerals (using one of the characters, 0-9, a-f, A-F) as an encoding of an 8 bit character. Furthermore, the following escape sequences correspond to their ASCII escape encodings.
 \a \b \f \n \r \t \v
Note that other occurrences of the character \ are treated as the charcter \. The following example shows three ways to enter the newline character.

(define-const a String "\x0a")
(define-const b String "
")
(define-const c String "\n")
(simplify (= a b))
(simplify (= a c))
(simplify (str.++ a b c))


### Operations

Let us start out with a summary of available string operations.
 Operation Brief description (str.++ a b c) String concatenation of one or more strings (str.len s) String length. Returns an integer. (str.substr s offset length) Retrieves substring of s at offset (str.indexof s sub) Retrieves first position of sub in s, -1 if there are no occurrences (str.indexof s sub offset) Retrieves first position of sub at or after offset in s, -1 if there are no occurrences (str.at s offset) Substring of length 1 at offset in s. (str.contains s sub) Does s contain the substring sub? (str.prefixof pre s) Is pre a prefix of s? (str.suffixof suf s) Is suf a suffix of s? (str.replace s src dst) Replace the first occurrence of src by dst in s. (str.to.int s) Retrieve integer encoded by string s (ground rewriting only). (int.to.str i) Retrieve string encoding of integer i (ground rewriting only).

Note that (str.indexof s offset) is shorthand for (str.indexof s offset 0).

Some operations have under-specified behavior on certain arguments. Namely, (str.at s i) is unconstrained for indices that are either negative or beyond (- (str.len s) 1). Furthermore (str.substr s offset length) is under-specified when the offset is outside the range of positions in s or length is negative or offset+length exceeds the length of s.

### Examples

Basic string operations load in editor
(simplify (str.++ (str.at "abc" 1) (str.at "abc" 0)))
(simplify (str.indexof "abcabc" "a"))
(simplify (str.indexof "abcabc" "a" 1))
(simplify (str.substr "xxabcyy" 2 3))

A string cannot overlap with two different characters. load in editor
(declare-const a String)
(assert (= (str.++ a "b") (str.++ "a" a)))
(check-sat)

Strings a, b, c can have a non-trivial overlap. load in editor
(declare-const a String)
(declare-const b String)
(declare-const c String)
(assert (= (str.++ a b) "abcd"))
(assert (= (str.++ b c) "cdef"))
(assert (not (= b "")))
(check-sat)

There is a solution to a of length at most 2. load in editor
(declare-const a String)
(declare-const b String)
(assert (= (str.++ "abc" a) (str.++ b "cef")))
(assert (<= (str.len a) 2))
(check-sat)

There is a solution to a that is not a sequence of "a"'s. load in editor
(declare-const a String)
(declare-const b String)
(declare-const c String)
(assert (= (str.++ a "ab" b) (str.++ b "ba" c)))
(assert (= c (str.++ a b)))
(assert (not (= (str.++ a "a") (str.++ "a" a))))
(check-sat)
(get-model)

Contains is transitive. load in editor
(declare-const a String)
(declare-const b String)
(declare-const c String)
(assert (str.contains a b))
(assert (str.contains b c))
(assert (not (str.contains a c)))
(check-sat)

But containment is not a linear order. load in editor
(declare-const a String)
(declare-const b String)
(declare-const c String)
(assert (str.contains a b))
(assert (str.contains a c))
(assert (not (str.contains b c)))
(assert (not (str.contains c b)))
(check-sat)
(get-model)

Any string is equal to the prefix and suffix that add up to a its length. load in editor
(declare-const a String)
(declare-const b String)
(declare-const c String)
(assert (str.prefixof b a))
(assert (str.suffixof c a))
(assert (= (str.len a) (+ (str.len b) (str.len c))))
(assert (not (= a (str.++ b c))))
(check-sat)


## Sequences

The sort constructor Seq can be used to create sequences over any base sort. For example, a sequence of integers is (Seq Int), and (Seq (_ BitVec 8)) is the definition of String.

### Operations

Most string operations have corresponding sequence variants. In addition, there are operations to create a unit sequence and the empty sequence over any base sort.
 Operation Brief description (seq.unit elem) Sequence with a single element elem. (as seq.empty (Seq Int)) The empty sequence of integers. (seq.++ a b c) Concatenation of one or more sequences. (seq.len s) Sequence length. Returns an integer. (seq.extract s offset length) Retrieves sub-sequence of s at offset (seq.indexof s sub) Retrieves first position of sub in s, -1 if there are no occurrences (seq.indexof s sub offset) Retrieves first position of sub at or after offset in s, -1 if there are no occurrences (seq.at s offset) Sub-sequence of length 1 at offset in s. (seq.contains s sub) Does s contain the sub-sequence sub? (seq.prefixof pre s) Is pre a prefix of s? (seq.suffixof suf s) Is suf a suffix of s? (seq.replace s src dst) Replace the first occurrence of src by dst in s.

### Examples

When inserting b at or after position 8, but before the length of the string, which is at least 10, then the resulting string has the same length, and either character 8 or 9 are unchanged. load in editor
(declare-const s (Seq Int))
(declare-const t (Seq Int))
(declare-const j Int)
(declare-const b Int)

(assert (<= 10 (seq.len s)))
(assert (<= 8 j))
(assert (< j (seq.len s)))
(assert (= t (seq.++ (seq.extract s 0 j) (seq.unit b) (seq.extract s (+ j 1) (- (seq.len s) j 1)))))
(push)
(assert (not (= (seq.unit b) (seq.at t j))))
(check-sat)
(pop)
(push)
(assert (not (= (seq.len s) (seq.len t))))
(check-sat)
(pop)
(push)
(assert (not (= (seq.at s 8) (seq.at t 8))))
(assert (not (= (seq.at s 9) (seq.at t 9))))
(check-sat)
(pop)


## Regular Expressions

The sort constructor RegEx takes as argument a sequence type. The set of regular expressions over strings is thus (RegEx String).

### Operations

 Operation Brief description (str.to.re s) Convert string to regular expression accepting s. (str.in.re s r) Determine if s is in the language generated by r. re.allchar The regular expression accepting every string. re.nostr The regular expression rejecting every string. (re.range ch1 ch2) The range of characters between ch1 and ch2. (re.++ r1 r2 r3) Concatenation of regular expressions. (re.* r) Kleene star. (re.+ r) Kleene plus. (re.opt r) Zero or one use of r. ((_ re.loop lo hi) r) from lo to hi number of repetitions of r. (re.union r1 r2) The union of regular languages. (re.inter r1 r2) The intersection of regular languages. (seq.to.re s) Convert sequenceto regular expression accepting s. (seq.in.re s r) Determine if sequence s is in the language generated by r. (as re.all R) The regular expression of sort R accepting every sequence. (as re.empty R) The regular expression of sort R rejecting every sequence.

The re.range operator expects two strings each encoding a single character. For example (str.range "a" "\xff") is a valid range of characters, while (str.range "aa" "") is not associated with any specified range.

For added compatibility with CVC4's format, Z3 also accepts expressions of the form (re.loop r lo hi). Z3 understands only the meaning of these terms when lo, hi are integer numerals.

### What (not) to expect of regular expressions

Z3 converts regular expressions into non-deterministic finite automata and expands membership constraints over symbolic strings and sequences when it tries to satisfy constraints. This approach works for many membership and non-membership constraints, but is not a complete procedure (and even less complete when there are other constraints on the symbolic strings). It also does not handle regular expressions symbolic sequences (it allows to express non-regular languages). Thus, the string s in (str.to.re s) should be a string literal. You can write formulas with equalities over regular expressions, but chances are that Z3 will not do anything profound with them. Therefore, for now, use regular expressions only in constraints of the form (str.in.re s r).

### Examples

The maximal length is 6 for a string of length 2 repeated at most 3 times. load in editor
(declare-const a String)
(push)
(set-info :status sat)
(assert (str.in.re a ((_ re.loop 1 3) (str.to.re "ab"))))
(assert (= (str.len a) 6))
(check-sat)
(get-model)
(pop)

(push)
(set-info :status unsat)
(assert (str.in.re a ((_ re.loop 1 3) (str.to.re "ab"))))
(assert (> (str.len a) 6))
(check-sat)
(pop)