How to formalize binary search and similar algorithms?

10 views
Skip to first unread message

Ender Ting

unread,
Feb 3, 2026, 6:26:50 PM (2 days ago) Feb 3
to Metamath
Binary search

The best-known application of binary search is finding an element in sorted array, or insertion place for a new value, in logarithmic time - precisely speaking, in logarithmic count of reads from the array (saying more requires details about the comparison operator too). While proving it works is one thing, proving the logarithmic complexity is much harder.

If we want to formalize the count of reads (and there is no need to hardcode the log(n) bound in my opinion), we can throw away the actual array and instead require a step function which, from a "view" into array, would either require to read more or break with the final result.

A = the word alphabet
B = the algorithm result
C = any initial and intermediate state
N = the length of initial array

Then a step function S : ( Word ( ( 0 ..^ N ) X. A ) X. C ) --> ( B |_| ( ( 0 ..^ N ) X. C ) ).
And interactor, which evaluates the appropriate number of steps on a given array, has type
I: ( ( { w e. Word A | ( # ` w ) = N } X. C ) X. { s | s : ( Word ( ( 0 ..^ N ) X. A ) X. C ) --> ( B |_| ( ( 0 ..^ N ) X. C ) ) } --> ( ( Word ( ( 0 ..^ N ) X. A ) X. B ) |_| 1o ).

The interactor can be interpreted like this Rust code:

fn I<
    A: Copy,
    B,
    C,
    const N: usize
>(
    word: [A; N],
    initial_state: C,
    loop_body: impl Fn(&[(usize, A)], C) -> std::ops::ControlFlow<B, (usize, C)>,
) -> Result<
         (Vec<(usize, A)>, B),
         !
     > {
    let mut history = Vec::new();
    let mut state = initial_state;
   
    loop {
        match loop_body(&history, state) {
            ControlFlow::Break(b) => return Ok((history, b)),
            ControlFlow::Continue((next_index, next_state)) => {
                if next_index >= N {
                    unreachable!();
                }
               
                let value = word[next_index];
                history.push((next_index, value));
                state = next_state;
            }
        }
    }
}

except for two things:
1) the `usize` indices are actually ( 0 ..^ N ) which is why next_index >= N is unreachable;
2) code cannot detect infinite loops, so its failure type is ! (uninhabited), while math can sometimes prove such a loop and has to use disjoint-union type {(/)} = 1o for failure (or anything else which is inhabited).

Now, I'm not quite sure how to write the definition for I. It might have something like infimum over time moments "by which the iteration of step function will return the final value"? Extending the step function so that it remains constant after it was meant to return?

The useful theorems for binary search would then be that
1) for any word and suitable initial state, the interactor succeeds: left part of disjoint union, i.e. ( 1st ` ( I ` ... ) ) = (/)
2) under same conditions, the history length divided by log(N) belongs to O(1),
3) under same conditions, the returned value is correct.
4) that binary search doesn't read anything beyond what is provided, can be seen by inspecting its definition.

I sense a wrong/incomplete abstraction here, though. No idea what it is.

Euclid's algorithm

Interestingly, here my idea of a complexity assertion goes in another direction.

Let us define ChainsTo( .< , A , B ) which are the non-empty ( .< Chain A ) with last element belonging to B (where B normally is a subclass of A). Take a slight modification of ~eucalg's definition:

original: |- E = ( x e. NN0 , y e. NN0 |-> if( y = 0 , <. x , y >. , <. y , ( x mod y ) >. ) )
modified: |- E = ( x e. NN0 , y e. NN |-> <. y , ( x mod y ) >. )

Then, for each starting pair <. x , y >. there is only one ChainsTo( E , ( NN0 X. NN0 ) , ( NN0 X. { 0 } ) ), the last value is <. ( x gcd y ) , 0 >., and its length divided by log(sup({x,y}, RR, <)) is O(1). This looks rather clean (except perhaps the details which need to be squeezed into proofs). I'm not certain if this particular definition of ChainsTo is any good though, in particular its asymmetry between the first and the last value domain.

(And these two ideas are what ultimately prompted me to create UpWord and then promote it, in form of Chain, to main.)

Any ideas are welcome!
Reply all
Reply to author
Forward
0 new messages