NAME

Sort::Search - binary search on contiguous sorted ranges

SYNOPSIS

use Sort::Search qw(
  bisectl bisectr bixectl bixectr
  blsrch0 brsrch0 blsrch1 brsrch1
  blsrch2 brsrch2 blsrchx brsrchx
);

bisectl { OK } ARRAY;   # leftmost True (F to T)
bisectr { OK } ARRAY;   # rightmost True (T to F)
bixectl { OK } ARRAY;   # any True  (F to T)
bixectr { OK } ARRAY;   # any True  (T to F)

blsrch0 { ORD } ARRAY;  # leftmost >= 0 (- to +)
blsrch1 { ORD } ARRAY;  # leftmost > 0  (- to +)
brsrch0 { ORD } ARRAY;  # rightmost >= 0 (+ to -)
brsrch1 { ORD } ARRAY;  # rightmost > 0  (+ to -)

blsrch2 { ORD } ARRAY;  # (blsrch0, blsrch1)
brsrch2 { ORD } ARRAY;  # (brsrch0, brsrch1)
blsrchx { ORD } ARRAY;  # any == 0  (- to +)
brsrchx { ORD } ARRAY;  # any == 0  (+ to -)

DESCRIPTION

NOTE: If you are new to this module, I recommend starting with Sort::Search::Cookbook, which has quick references and examples for common use cases.

Terminology

This module implements two classes of search algorithms, which will named separately in this document as the bisection method and the (ordinary) binary search for distinction. A brief definition follows.

In both cases, we call the return value of the predicate/ordering the comparison result, or $cmp for short.

Orientation variants

For every function in the conventional direction (i.e., left-to-right, true-to-false, increasing), a dual variant is provided. The conventional and dual directions are identified by the lowercase letter L and R. To re-use my examples, this:

use Sort::Search qw(bisectl);
my @A = ( 0, 1, 2, 3, 4 );
print $A[ bisectl { $_ >= 2 } \@A ];

prints 2 because the left-variants prefer to return the leftmost index of the matching range [2, 3, 4], or one past the final index (5) if no such range exists.

use Sort::Search qw(bisectr);
my @B = ( 5, 4, 3, 2, 1 );
print $B[ bisectr { $_ >= 3 } \@B ];

prints 3 because the right-variants prefer to return the rightmost index of the matching range [5, 4, 3], or one below the initial index (-1) if no such range exists. (Care should be taken in this specific case as indexing by -1 wraps back to the last element.)

Lower/upper variants

For binary search functions in particular, there are the 0,1 variants corresponding to the aforementioned lower bisection point and upper bisection point. These points are best visualized by plotting the image of the ordering and taking note of the range of elements mapped to zero. For instance, consider the first 13 decimal digits of pi (including the integer part) arranged in increasing order as the haystack, and the number $x = 5 as the needle. Then the zeros (as marked by "=" below) are bound by the boundary where it meets the negatives ("<") and positives (">"):

use Sort::Search qw(blsrch0 blsrch1 brsrch0 brsrch1);
my (@a, $lb, $lx, $ub, $ux) = sort { $a <=> $b }
   ( sprintf("%.13f", 4*atan2(1,1)) =~ /[0-9]/g )[0..12];
my $x = 5;

$lb = blsrch0 {$_<=>$x} \@a;  $ux = blsrch1 {$_<=>$x} \@a;  print "[$lb,$ux)";
#  (left lower bound)           (left upper bound)             =>  [6,  9  )
#    first $_ >= $x  -.          ,-  first $_ > $x
#                      \        /
#    0  1  2  3  4  5 [6  7  8 [9  a  b  c   index of $_
#    -  -  -  -  -  -  0  0  0  +  +  +  +   ord ($_ <=> $x)
#    <<<<<<<<<<<<<<<<<<========------------
#    1  1  2  3  3  4  5  5  5  6  8  9  9   value of $_
#    -----------------========>>>>>>>>>>>>>
#    +  +  +  +  +  +  0  0  0  -  -  -  -   ord ($x <=> $_)
#    0  1  2  3  4  5] 6  7  8] 9  a  b  c   index of $_
#                   /        \
#   last $x > $_  -'          `-  last $x >= $_
# (right lower bound)            (right upper bound)           =>  (  5,  8]
$lx = brsrch1 {$x<=>$_} \@a;  $ub = brsrch0 {$x<=>$_} \@a;  print "($lx,$ub]";

The 0 variants (blsrch0, brsrch0) return the inclusive bounds for when ordering transitions from "-" to "0", while the 1 variants (blsrch1, brsrch1) return the exclusive bounds for when the ordering transitions from "0" to "+". For left search in particular, blsrch0 is the inclusive lower bound, and blsrch1 is the exclusive upper bound. The 2 variants (blsrch2, brsrch2) return both inclusive and exclusive bounds at once, in that order; in scalar context, they return the absolute difference which corresponds to the range of equal values (i.e. the "0"s).

A more compact (albeit possibly obscure) characterization of the 0,1 variants is that they are the composition of the predicates &$ord >= 0 and &$ord > 0 on top of the bisection functions of the corresponding orientations. Note that this formal characterization intentionally makes no mention of consistency with the total order from which it is derived; in particular it is possible to extend the notion of equality by mapping any contiguous region one wishes to find to "0", provided that it is properly bounded by "+" and "-".

Another thing to note is that, given the same underlying array, the ordering should be reversed when orientation is flipped. Usually when you could flip the operands to achieve this, as I have shown above with $_ <=> $x for left search and $x <=> $_ for right search. But in restrained cases where either (1) you don't have access to both operands and have to work with the ordering as a blackboxed unary function of $_, or (2) the needle is not interchangeable with $_ (say, when the needle is a field of a larger structure), you can negate the ordering instead.

Existence variants

The bisection and the 0 variants have x variants which return as soon as it finds a match. (Mnemonic: "x" stands for "eXists".)

SUBROUTINES

All subroutines are exported on-demand. All subroutines share the same interface for input and output described in "COMMON INTERFACE". The general behavior of each subroutine on arbitrary ranges is described, but in the common form that I anticipate most people to use (calling with the array form with no explicit limits, and returning in scalar context), it is sufficient to know the following:

To re-re-use the examples from the "DESCRIPTION" section (assuming $[ is 0), the implicit search range for bisectl { ... } \@A is $lo = 0 and $hi = 5, and the implicit search range for bisectr { ... } \@B is $lo = -1 and $hi = 4. Note that the two searches have different fallback indices: $hi = 5 for the left search and $lo = -1 for the right search.

bisectl { OK } ARRAY
bixectl { OK } ARRAY

Perform left bisection with respect to a predicate OK on the half-open interval [$lo, $hi), where $lo precedes $hi in the argument list. The prototype is &$;$$.

The predicate should be increasing over every pair of adjacent indices it is defined on. That is, true values always follow false values:

F F F ... F T T T   ok(x)
0 1 2     6 7 8 9      x

Let $ok denote the predicate, and set $ok->($hi) = 1. The return value of bisectl is the unique index $x on the closed interval [$lo, $hi] such that:

The return value of bixectl is any index satisfying the first constraint.

If the interval [$lo, $hi] is empty, return $hi.

bisectr { OK } ARRAY
bixectr { OK } ARRAY

Perform right bisection with respect to a predicate OK on the half-open interval ($lo, $hi], where $hi precedes $lo in the argument list. The prototype is &$;$$.

The predicate should be decreasing over every pair of adjacent indices it is defined on. That is, true values always precede false values:

T T T ... T F F F   ok(x)
0 1 2     6 7 8 9      x

Let $ok denote the predicate, and set $ok->($lo) = 1. The return value of bisectr is the unique index $x on the closed interval [$lo, $hi] such that:

The return value of bixectr is any index satisfying the first constraint.

If the interval [$lo, $hi] is empty, return $lo.

In general, a return value of $hi (for left bisection) and $lo (for right bisection) is not meaningful as it is the only index taken to be true for granted. Caller should handle it separately, or check if the comparison result and/or element are defined by calling in list context.

blsrch0 { ORD } ARRAY
blsrchx { ORD } ARRAY
blsrch1 { ORD } ARRAY
blsrch2 { ORD } ARRAY

Perform binary left search with respect to an ordering ORD on the half-open interval [$lo, $hi), where $lo precedes $hi in the argument list. The prototype is &$;$$.

The sign of the ordering should be increasing over every pair of adjacent indices it is defined on. That is, <0 (negative) precedes =0 (zero), which precedes >0 (positive).

Let $ord denote the ordering and set $ord->($hi) = +Infinity. The return value of blsrch0 is the unique index $lb on the closed interval [$lo, $hi] such that:

The return value of blsrchx is any index $x satisfying $ord->($x) == 0, but if one could not be found, it returns the same result as blsrch0.

The return value of blsrch1 is the unique index $ux on the closed interval [$lo, $hi] such that:

In list context, blsrch2 returns both $lb and $ux, in that order. In scalar context, it returns the nonnegative difference $ux - $lb, which equals the number of elements for which $ord == 0.

brsrch0 { ORD } ARRAY
brsrchx { ORD } ARRAY
brsrch1 { ORD } ARRAY
brsrch2 { ORD } ARRAY

Perform binary right search with respect to an ordering ORD on the half-open interval ($lo, $hi], where $hi precedes $lo in the argument list. The prototype is &$;$$.

The sign of the ordering should be decreasing over every pair of adjacent indices it is defined on. That is, >0 (positive) precedes =0 (zero), which precedes <0 (negative).

Let $ord denote the ordering and set $ord->($lo) = +Infinity. The return value of brsrch0 is the unique index $ub on the closed interval [$lo, $hi] such that:

The return value of brsrchx is any index $x satisfying $ord->($x) == 0, but if one could not be found, it returns the same result as brsrch0.

The return value of brsrch1 is the unique index $lx on the closed interval [$lo, $hi] such that:

In list context, brsrch2 returns both $ub and $lx, in that order. In scalar context, it returns the nonnegative difference $ub - $lx, which equals the number of elements for which $ord == 0.

COMMON INTERFACE

# TODO !

CAVEATS

# TODO !

SEE ALSO

List::MoreUtils, List::Search, List::BinarySearch, List::BinarySearch::XS.

LINKS