Documentation

Mathlib.Data.List.EditDistance.Estimator

Estimator for Levenshtein distance. #

The usual algorithm for computing Levenshtein distances provides successively better lower bounds for the Levenshtein distance as it runs, as proved in Mathlib.Data.List.EditDistance.Bounds.

In this file we package that fact as an instance of

Estimator (Thunk.mk fun _ => levenshtein C xs ys) (LevenshteinEstimator C xs ys)

allowing us to use the Estimator framework for Levenshtein distances.

This is then used in the implementation of rewrite_search to avoid needing the entire edit distance calculation in unlikely search paths.

structure LevenshteinEstimator' {α : Type u_1} {β : Type} {δ : Type} [CanonicallyLinearOrderedAddCommMonoid δ] (C : Levenshtein.Cost α β δ) (xs : List α) (ys : List β) :

Data showing that the Levenshtein distance from xs to ys is bounded below by the minimum Levenshtein distance between some suffix of xs and a particular suffix of ys.

This bound is (non-strict) monotone as we take longer suffixes of ys.

This is an auxiliary definition for the later LevenshteinEstimator: this variant constructs a lower bound for the pair consisting of the Levenshtein distance from xs to ys, along with the length of ys.

  • pre_rev : List β

    The prefix of ys that is not is not involved in the bound, in reverse order.

  • suff : List β

    The suffix of ys, such that the distance from xs to ys is bounded below by the minimum distance from any suffix of xs to this suffix.

  • split : self.pre_rev.reverse ++ self.suff = ys

    Witness that ys has been decomposed into a prefix and suffix.

  • distances : { r : List δ // 0 < r.length }

    The distances from each suffix of xs to suff.

  • distances_eq : self.distances = suffixLevenshtein C xs self.suff

    Witness that distances are correct.

  • bound : δ ×

    The current bound on the pair (distance from xs to ys, length of ys).

  • bound_eq : self.bound = match self.pre_rev, with | [], split => ((↑self.distances)[0], ys.length) | x, split => (List.minimum_of_length_pos , self.suff.length)

    Predicate describing the current bound.

Instances For
    theorem LevenshteinEstimator'.split {α : Type u_1} {β : Type} {δ : Type} [CanonicallyLinearOrderedAddCommMonoid δ] {C : Levenshtein.Cost α β δ} {xs : List α} {ys : List β} (self : LevenshteinEstimator' C xs ys) :
    self.pre_rev.reverse ++ self.suff = ys

    Witness that ys has been decomposed into a prefix and suffix.

    theorem LevenshteinEstimator'.distances_eq {α : Type u_1} {β : Type} {δ : Type} [CanonicallyLinearOrderedAddCommMonoid δ] {C : Levenshtein.Cost α β δ} {xs : List α} {ys : List β} (self : LevenshteinEstimator' C xs ys) :
    self.distances = suffixLevenshtein C xs self.suff

    Witness that distances are correct.

    theorem LevenshteinEstimator'.bound_eq {α : Type u_1} {β : Type} {δ : Type} [CanonicallyLinearOrderedAddCommMonoid δ] {C : Levenshtein.Cost α β δ} {xs : List α} {ys : List β} (self : LevenshteinEstimator' C xs ys) :
    self.bound = match self.pre_rev, with | [], split => ((↑self.distances)[0], ys.length) | x, split => (List.minimum_of_length_pos , self.suff.length)

    Predicate describing the current bound.

    instance instEstimatorDataProdNatMkMkLevenshteinLengthLevenshteinEstimator' {α : Type u_1} {β : Type} {δ : Type} [CanonicallyLinearOrderedAddCommMonoid δ] (C : Levenshtein.Cost α β δ) (xs : List α) (ys : List β) :
    EstimatorData { fn := fun (x : Unit) => (levenshtein C xs ys, ys.length) } (LevenshteinEstimator' C xs ys)
    Equations
    • One or more equations did not get rendered due to their size.
    instance estimator' {α : Type u_1} {β : Type} {δ : Type} [CanonicallyLinearOrderedAddCommMonoid δ] (C : Levenshtein.Cost α β δ) (xs : List α) (ys : List β) :
    Estimator { fn := fun (x : Unit) => (levenshtein C xs ys, ys.length) } (LevenshteinEstimator' C xs ys)
    Equations
    def LevenshteinEstimator {α : Type u_1} {β : Type} {δ : Type} [CanonicallyLinearOrderedAddCommMonoid δ] (C : Levenshtein.Cost α β δ) (xs : List α) (ys : List β) :

    An estimator for Levenshtein distances.

    Equations
    Instances For
      instance instEstimatorMkLevenshteinLevenshteinEstimatorOfWellFoundedGTSubtypeProdNatLe {α : Type u_1} {β : Type} {δ : Type} [CanonicallyLinearOrderedAddCommMonoid δ] (C : Levenshtein.Cost α β δ) (xs : List α) (ys : List β) [∀ (a : δ × ), WellFoundedGT { x : δ × // x a }] :
      Estimator { fn := fun (x : Unit) => levenshtein C xs ys } (LevenshteinEstimator C xs ys)
      Equations
      • One or more equations did not get rendered due to their size.
      instance instBotLevenshteinEstimator {α : Type u_1} {β : Type} {δ : Type} [CanonicallyLinearOrderedAddCommMonoid δ] (C : Levenshtein.Cost α β δ) (xs : List α) (ys : List β) :

      The initial estimator for Levenshtein distances.

      Equations
      • One or more equations did not get rendered due to their size.