NAME

Sort::Search - binary search on contiguous sorted ranges

SYNOPSIS

use Sort::Search qw(
  bisectl bisectr bixectl bixectr
  blsrch0 brsrch0 blsrch1 brsrch1
  blsrch2 brsrch2 blsrchx brsrchx
);

bisectl { OK } ARRAY;   # leftmost True (F to T)
bisectr { OK } ARRAY;   # rightmost True (T to F)
bixectl { OK } ARRAY;   # any True  (F to T)
bixectr { OK } ARRAY;   # any True  (T to F)

blsrch0 { ORD } ARRAY;  # leftmost >= 0 (- to +)
blsrch1 { ORD } ARRAY;  # leftmost > 0  (- to +)
brsrch0 { ORD } ARRAY;  # rightmost >= 0 (+ to -)
brsrch1 { ORD } ARRAY;  # rightmost > 0  (+ to -)

blsrch2 { ORD } ARRAY;  # (blsrch0, blsrch1)
brsrch2 { ORD } ARRAY;  # (brsrch0, brsrch1)
blsrchx { ORD } ARRAY;  # any == 0  (- to +)
brsrchx { ORD } ARRAY;  # any == 0  (+ to -)

DESCRIPTION

NOTE: If you are new to this module, I recommend starting with Sort::Search::Cookbook, which has quick references and examples for common use cases.

Terminology

This module implements two classes of search algorithms, which will named separately in this document as the bisection method and the (ordinary) binary search for distinction. A brief definition follows.

The bisection method works with a monotonic predicate (OK) with a unique transition point from False to True, or vice versa.

For example, the array [0, 1, 2, 3, 4] with respect to the predicate $_ >= 2 transitions from False at 1 into True at 2, and the array [5, 4, 3, 2, 1] with respect to the predicate $_ >= 3 transitions from True at 3 to False at 2. We name the boundary between these two elements the bisection point.
The binary search works with a C-like comparison function, named the ordering (ORD), with two unique transition points from negative to 0, then 0 to positive. Such a function is typically constructed with respect to a target value to be located within the array (hence the name "search").

For example, suppose we want to locate $x = 3 in the array [1, 2, 3, 3, 4]. We can consider the ordering $_ <=> $x, which returns negatives for [1, 2], zeros for [3, 3], and positives for [4]. We name the boundary between the negatives and the zeros the lower bisection point, and that between the zeros and the positives the upper bisection point.

In both cases, we call the return value of the predicate/ordering the comparison result, or $cmp for short.

Orientation variants

For every function in the conventional direction (i.e., left-to-right, true-to-false, increasing), a dual variant is provided. The conventional and dual directions are identified by the lowercase letter L and R. To re-use my examples, this:

use Sort::Search qw(bisectl);
my @A = ( 0, 1, 2, 3, 4 );
print $A[ bisectl { $_ >= 2 } \@A ];

prints 2 because the left-variants prefer to return the leftmost index of the matching range [2, 3, 4], or one past the final index (5) if no such range exists.

use Sort::Search qw(bisectr);
my @B = ( 5, 4, 3, 2, 1 );
print $B[ bisectr { $_ >= 3 } \@B ];

prints 3 because the right-variants prefer to return the rightmost index of the matching range [5, 4, 3], or one below the initial index (-1) if no such range exists. (Care should be taken in this specific case as indexing by -1 wraps back to the last element.)

Lower/upper variants

For binary search functions in particular, there are the 0,1 variants corresponding to the aforementioned lower bisection point and upper bisection point. These points are best visualized by plotting the image of the ordering and taking note of the range of elements mapped to zero. For instance, consider the first 13 decimal digits of pi (including the integer part) arranged in increasing order as the haystack, and the number $x = 5 as the needle. Then the zeros (as marked by "=" below) are bound by the boundary where it meets the negatives ("<") and positives (">"):

use Sort::Search qw(blsrch0 blsrch1 brsrch0 brsrch1);
my (@a, $lb, $lx, $ub, $ux) = sort { $a <=> $b }
   ( sprintf("%.13f", 4*atan2(1,1)) =~ /[0-9]/g )[0..12];
my $x = 5;

$lb = blsrch0 {$_<=>$x} \@a;  $ux = blsrch1 {$_<=>$x} \@a;  print "[$lb,$ux)";
#  (left lower bound)           (left upper bound)             =>  [6,  9  )
#    first $_ >= $x  -.          ,-  first $_ > $x
#                      \        /
#    0  1  2  3  4  5 [6  7  8 [9  a  b  c   index of $_
#    -  -  -  -  -  -  0  0  0  +  +  +  +   ord ($_ <=> $x)
#    <<<<<<<<<<<<<<<<<<========------------
#    1  1  2  3  3  4  5  5  5  6  8  9  9   value of $_
#    -----------------========>>>>>>>>>>>>>
#    +  +  +  +  +  +  0  0  0  -  -  -  -   ord ($x <=> $_)
#    0  1  2  3  4  5] 6  7  8] 9  a  b  c   index of $_
#                   /        \
#   last $x > $_  -'          `-  last $x >= $_
# (right lower bound)            (right upper bound)           =>  (  5,  8]
$lx = brsrch1 {$x<=>$_} \@a;  $ub = brsrch0 {$x<=>$_} \@a;  print "($lx,$ub]";

The 0 variants (blsrch0, brsrch0) return the inclusive bounds for when ordering transitions from "-" to "0", while the 1 variants (blsrch1, brsrch1) return the exclusive bounds for when the ordering transitions from "0" to "+". For left search in particular, blsrch0 is the inclusive lower bound, and blsrch1 is the exclusive upper bound. The 2 variants (blsrch2, brsrch2) return both inclusive and exclusive bounds at once, in that order; in scalar context, they return the absolute difference which corresponds to the range of equal values (i.e. the "0"s).

A more compact (albeit possibly obscure) characterization of the 0,1 variants is that they are the composition of the predicates &$ord >= 0 and &$ord > 0 on top of the bisection functions of the corresponding orientations. Note that this formal characterization intentionally makes no mention of consistency with the total order from which it is derived; in particular it is possible to extend the notion of equality by mapping any contiguous region one wishes to find to "0", provided that it is properly bounded by "+" and "-".

Another thing to note is that, given the same underlying array, the ordering should be reversed when orientation is flipped. Usually when you could flip the operands to achieve this, as I have shown above with $_ <=> $x for left search and $x <=> $_ for right search. But in restrained cases where either (1) you don't have access to both operands and have to work with the ordering as a blackboxed unary function of $_, or (2) the needle is not interchangeable with $_ (say, when the needle is a field of a larger structure), you can negate the ordering instead.

Existence variants

The bisection and the 0 variants have x variants which return as soon as it finds a match. (Mnemonic: "x" stands for "eXists".)

bixectl is a modified version of bisectl that returns any index for which the predicate returns True. In the case that the predicate attains True at exactly one point or never attains True, they behave identically. For bixectr, it's the same against bisectr.
blsrchx is a modified version of blsrch0 that returns any index for which the predicate returns 0. In the case that the ordering attains 0 at exactly one point or skips it entirely, they behave identically. For brsrchx, it's the same against brsrch0.

SUBROUTINES

All subroutines are exported on-demand. All subroutines share the same interface for input and output described in "COMMON INTERFACE". The general behavior of each subroutine on arbitrary ranges is described, but in the common form that I anticipate most people to use (calling with the array form with no explicit limits, and returning in scalar context), it is sufficient to know the following:

For left searches (subroutines containing the letter "l"), $lo is the index of the leftmost/first array element and $hi is one more than the rightmost/final array element. Notably, $hi is the fallback index when search finds a bisection point to the right of the whole array.
For right searches (subroutines containing the letter "r"), $lo is one less than the index of the leftmost/first array element and $hi is the rightmost/final array element. Notably, $lo is the fallback index when search finds a bisection point to the left of the whole array.

To re-re-use the examples from the "DESCRIPTION" section (assuming $[ is 0), the implicit search range for bisectl { ... } \@A is $lo = 0 and $hi = 5, and the implicit search range for bisectr { ... } \@B is $lo = -1 and $hi = 4. Note that the two searches have different fallback indices: $hi = 5 for the left search and $lo = -1 for the right search.

bisectl { OK } ARRAY

bixectl { OK } ARRAY

Perform left bisection with respect to a predicate OK on the half-open interval [$lo, $hi), where $lo precedes $hi in the argument list. The prototype is &$;$$.

The predicate should be increasing over every pair of adjacent indices it is defined on. That is, true values always follow false values:

F F F ... F T T T   ok(x)
0 1 2     6 7 8 9      x

Let $ok denote the predicate, and set $ok->($hi) = 1. The return value of bisectl is the unique index $x on the closed interval [$lo, $hi] such that:

$ok->($x) is true.
For all valid index $y < $x, it follows that $ok->($y) is false.

The return value of bixectl is any index satisfying the first constraint.

If the interval [$lo, $hi] is empty, return $hi.

bisectr { OK } ARRAY

bixectr { OK } ARRAY

Perform right bisection with respect to a predicate OK on the half-open interval ($lo, $hi], where $hi precedes $lo in the argument list. The prototype is &$;$$.

The predicate should be decreasing over every pair of adjacent indices it is defined on. That is, true values always precede false values:

T T T ... T F F F   ok(x)
0 1 2     6 7 8 9      x

Let $ok denote the predicate, and set $ok->($lo) = 1. The return value of bisectr is the unique index $x on the closed interval [$lo, $hi] such that:

$ok->($x) is true.
For all valid index $y > $x, it follows that $ok->($y) is false.

The return value of bixectr is any index satisfying the first constraint.

If the interval [$lo, $hi] is empty, return $lo.

In general, a return value of $hi (for left bisection) and $lo (for right bisection) is not meaningful as it is the only index taken to be true for granted. Caller should handle it separately, or check if the comparison result and/or element are defined by calling in list context.

blsrch0 { ORD } ARRAY

blsrchx { ORD } ARRAY

blsrch1 { ORD } ARRAY

blsrch2 { ORD } ARRAY

Perform binary left search with respect to an ordering ORD on the half-open interval [$lo, $hi), where $lo precedes $hi in the argument list. The prototype is &$;$$.

The sign of the ordering should be increasing over every pair of adjacent indices it is defined on. That is, <0 (negative) precedes =0 (zero), which precedes >0 (positive).

Let $ord denote the ordering and set $ord->($hi) = +Infinity. The return value of blsrch0 is the unique index $lb on the closed interval [$lo, $hi] such that:

$ord->($lb) >= 0.
For all valid index $x < $lb, it follows that $ord->($x) < 0. As a result, $lb is the first instance where $ord >= 0, while $lb - 1 is the last instance where $ord < 0. (Compare this to $lx returned by brsrch1.)

The return value of blsrchx is any index $x satisfying $ord->($x) == 0, but if one could not be found, it returns the same result as blsrch0.

The return value of blsrch1 is the unique index $ux on the closed interval [$lo, $hi] such that:

$ord->($ux) > 0.
For all valid index $x < $ux, it follows that $ord->($x) <= 0. As a result, $ux is the first instance where $ord > 0, while $ux - 1 is the last instance where $ord <= 0. (Compare this to $ub returned by brsrch0.)

In list context, blsrch2 returns both $lb and $ux, in that order. In scalar context, it returns the nonnegative difference $ux - $lb, which equals the number of elements for which $ord == 0.

brsrch0 { ORD } ARRAY

brsrchx { ORD } ARRAY

brsrch1 { ORD } ARRAY

brsrch2 { ORD } ARRAY

Perform binary right search with respect to an ordering ORD on the half-open interval ($lo, $hi], where $hi precedes $lo in the argument list. The prototype is &$;$$.

The sign of the ordering should be decreasing over every pair of adjacent indices it is defined on. That is, >0 (positive) precedes =0 (zero), which precedes <0 (negative).

Let $ord denote the ordering and set $ord->($lo) = +Infinity. The return value of brsrch0 is the unique index $ub on the closed interval [$lo, $hi] such that:

$ord->($ub) >= 0.
For all valid index $x > $ub, it follows that $ord->($x) < 0. As a result, $ub is the last instance where -$ord <= 0 and $ub + 1 is the first instance where -$ord > 0. (Compare this to $ux returned by blsrch1.)

The return value of brsrchx is any index $x satisfying $ord->($x) == 0, but if one could not be found, it returns the same result as brsrch0.

The return value of brsrch1 is the unique index $lx on the closed interval [$lo, $hi] such that:

$ord->($lx) > 0.
For all valid index $x > $lx, it follows that $ord->($x) <= 0. As a result, $lx is the last instance where -$ord < 0 and $lx + 1 is the first instance where -$ord >= 0. (Compare this to $lb returned by blsrch0.)

In list context, brsrch2 returns both $ub and $lx, in that order. In scalar context, it returns the nonnegative difference $ub - $lx, which equals the number of elements for which $ord == 0.

COMMON INTERFACE

# TODO !

CAVEATS

# TODO !