NAME

Sort::Search::Cookbook - practical examples for Sort::Search

SYNOPSIS

This is a collection of quick references and examples on how to use Sort::Search. It is not expected that you read this document linearly, although the first two sections here are introductory in nature, and are thus understood to have been read.

As a general rule of thumb, any existing binary search function most definitely translates to a "left search" here. Despite examples are provided for both, keep in mind that "right search" is unconventional (at least to my knowledge) and can be counterintuitive.

DEFINITION

The terms "predicate", "ordering", "search range" are used in the same sense as they are defined in "Terminology" in Sort::Search. In addition, assume these variables mean the following:

$_: The current element during a search.
$idx: The array index result found by a search. The index may not correspond to a real element in the array, in which case the following two variables would be undef.
$cmp: The comparison result returned by the predicate/ordering on the found element. If there is no such element, this is undef.
$elp: The scalar ref to the found element, or undef if none was found.

You import the appropriate functions into your current namespace like this:

use Sort::Search qw( blsrch0 blsrch1 );
       # imports `blsrch0', `blsrch1'

... you get the point. If you need more just add more to the list. :)

EXAMPLES

Exact match

The second field $cmp can be used to determine (1) if the return index corresponds to a real array element and (2) if the return index corresponds to an exact match.

For (1), check $cmp is defined. For (2), check if $cmp equals zero, which is equivalent to checking $array->[$idx] <=> $want, but it save you an extra comparison. If you also need the array element, use the third field $elt, which is equivalent to $array->[$idx].

Concretely, given the numerically sorted array:

my @array = qw( 12 17a 17b 17c 21 );

and the numeric ordering:

                      (-) $_  precedes $want
sub { $_ <=> $want }  (0) $_  equals   $want
                      (+) $_  follows  $want

you can find the first match with blsrch0:

sub lsearch {
    my ($array, $want) = @_;
    my ($idx, $cmp, $elp) = blsrch0 { $_ <=> $want } $array;
    if (!defined $cmp) { print "$want not found anywhere\n"; }
    elsif ($cmp != 0)  { print "$want not found; best "
                             . "is $$elp at [$idx]\n"; }
    else               { print "$want found at $$elp [$idx] :)\n"; }
}

lsearch \@array => 11;  # 11 not found; best is 12 at [0]
lsearch \@array => 12;  # 12 found at 12 [0] :)
lsearch \@array => 17;  # 17 found at 17a [1] :)
lsearch \@array => 18;  # 18 not found; best is 21 at [4]
lsearch \@array => 99;  # 99 not found anywhere

and the last match with brsrch0:

sub rsearch {
    my ($array, $want) = @_;
    my ($idx, $cmp, $elp) = brsrch0 { $want <=> $_ } $array;
    if (!defined $cmp) { print "$want not found anywhere\n"; }
    elsif ($cmp != 0)  { print "$want not found; best "
                             . "is $$elp at [$idx]\n"; }
    else               { print "$want found at $$elp [$idx] :)\n"; }
}

rsearch \@array => 11;  # 11 not found anywhere
rsearch \@array => 12;  # 12 found at 12 [0] :)
rsearch \@array => 17;  # 17 found at 17c [3] :)
rsearch \@array => 18;  # 18 not found; best is 17c at [3]
rsearch \@array => 99;  # 99 not found; best is 21 at [4]

Notice that the ordering is reversed: $want <=> $_ instead of $_ <=> $want, due to using a right search instead of a left search. The conventional left search may be used, though $cmp and $elp would be meaningless, thus you must check and retrieve the element separately:

sub rsearch {
    my ($array, $want) = @_;
    my $idx = blsrch1 { $_ <=> $want } $array;
    --$idx;
    if ($idx < 0) {
        print "$want not found anywhere\n";
    } else {
        my $elt = $array->[$idx];
        my $cmp = $elt <=> $want;
        print $cmp == 0
        ? "$want found at $elt [$idx] :)\n"
        : "$want not found; best is $elt at [$idx]\n";
    }
}

The output of this rsearch is identical to above.

To get all matches at once, use blsrch2 or brsrch2:

sub srchall {
    my ($lo, $hi);
    my ($array, $want) = @_;

    # EITHER:
    ($lo, $hi) = blsrch2 { $_ <=> $want } $array;
    print "[$_]=$array->[$_] " for ($lo .. $hi-1);

    # OR:
    ($hi, $lo) = brsrch2 { $want <=> $_ } $array;
    print "[$_]=$array->[$_] " for ($lo+1 .. $hi);
}

srchall \@array => 11;  # <empty>
srchall \@array => 12;  # [0]=12
srchall \@array => 17;  # [1]=17a [2]=17b [3]=17c
srchall \@array => 21;  # [4]=21
srchall \@array => 99;  # <empty>

Note that the left search range is right-exclusive and the right search is left-exclusive.

CONVERSION GUIDE

From (other) CPAN modules:

For List::MoreUtils (which this module in fact inspired by!) it's basically a verbatim translation. Namely this:

my (@ids, $lb, $ub);
@ids = (1, 1, 2, 2, 3, 3, 3, 3, 3, 3, 4, 4,
        4, 4, 5, 5, 6, 7, 7, 7, 8, 8, 9, 9,
        9, 9, 9, 11, 13, 13, 13, 17);

# use List::MoreUtils qw(lower_bound);
$lb = lower_bound { $_ <=> 2 } @ids; # returns 2
$lb = lower_bound { $_ <=> 4 } @ids; # returns 10

# use List::MoreUtils qw(upper_bound);
$ub = upper_bound { $_ <=> 2 } @ids; # returns 4
$ub = upper_bound { $_ <=> 4 } @ids; # returns 14

# use List::MoreUtils qw(equal_range);
($lb, $ub) = equal_range { $_ <=> 2 } @ids; # (2,4)
($lb, $ub) = equal_range { $_ <=> 4 } @ids; # (10,14)

is equivalent to:

$lb = blsrch0 { $_ <=> 2 } \@ids; # returns 2
$lb = blsrch0 { $_ <=> 4 } \@ids; # returns 10

$ub = blsrch1 { $_ <=> 2 } \@ids; # returns 4
$ub = blsrch1 { $_ <=> 4 } \@ids; # returns 14

($lb, $ub) = blsrch2 { $_ <=> 2 } \@ids; # (2,4)
($lb, $ub) = blsrch2 { $_ <=> 4 } \@ids; # (10,14)

Note that caller context matters for blsrch0 and blsrch1. Namely they should be called in scalar context.

For List::BinarySearch, borrowing the definitions there:

my ($index, $low_ix, $high_ix);
my @num_array = (100, 200, 300, 400, 500);
my @str_array = qw(Bach Beethoven Brahms Mozart Schubert);

binsearch is equivalent to blsrch0, minus the special case where no match is found which must be invalidated explicitly:

# use List::BinarySearch qw(binsearch);
$index = binsearch {$a <=> $b} 300, @num_array;      # 2
$index = binsearch {$a cmp $b} 'Mozart', @str_array; # 3
$index = binsearch {$a <=> $b} 42, @num_array;       # undef

# equivalent to...
sub exactly {
    my ($idx, $cmp) = @_;
    defined $cmp && $cmp == 0 ? $idx : undef;
}

$index = exactly blsrch0 { $_ <=> 300 } \@num_array;      # 2
$index = exactly blsrch0 { $_ cmp 'Mozart' } \@str_array; # 3
$index = exactly blsrch0 { $_ <=> 42 } \@num_array;       # undef

exactly invalidates the last 0, where $cmp is 1 (100 > 42).

binsearch_pos is just equivalent to blsrch0, called in ordinary scalar context:

# use List::BinarySearch qw(binsearch_pos);
$index = binsearch_pos {$a cmp $b} 'Chopin', @str_array; # 3
$index = binsearch_pos {$a <=> $b} 600, @num_array;      # 5
$index = binsearch_pos {$a <=> $b} 200, @num_array;      # 1

# equivalent to:
$index = blsrch0 { $_ cmp 'Chopin' } \@str_array; # 3
$index = blsrch0 { $_ <=> 600 } \@num_array;      # 5
$index = blsrch0 { $_ <=> 200 } \@num_array;      # 1

binsearch_range is equivalent to calling binsearch_pos against the two needles, which means calling blsrch0 twice. This is different from blsrch2, though it would only significant if the second needle occurs more than once.

# use List::BinarySearch qw(binsearch_range);
( $low_ix, $high_ix )
    = binsearch_range { $a cmp $b } 'Beethoven', 'Mozart', @str_array;
    # Returns ( 1, 3 )

( $low_ix, $high_ix )
    = binsearch_range { $a <=> $b } 200, 400, @num_array;
    # Returns ( 1, 3 )

# equivalent to:
($low_ix, $high_ix)
    = ( # ... in either scalar context:
        scalar ( blsrch0 { $_ cmp 'Beethoven' } \@str_array ),
        # ... or get just the index in list context:
        ( blsrch0 { $_ cmp 'Mozart' } \@str_array )[0],
      );
    # Returns (1, 3)

($low_ix, $high_ix)
    = map { my $want = $_;
         scalar blsrch0 { $_ cmp $want }
               \@num_array } (200, 400);
    # Returns (1, 3)

For List::Search, borrowing the definitions there:

my ($index, $found);
my @list = sort qw( bravo charlie delta );
my @numbers = sort { $a <=> $b } ( 10, 20, 100, 200, );
my $cmp_code = sub { lc( $_[0] ) cmp lc( $_[1] ) };
my @custom_list = sort { $cmp_code->( $a, $b ) } qw( FOO bar BAZ bundy );
    # => ('bar', 'BAZ', 'bundy', 'FOO');

the following:

# use List::Search qw(list_search nlist_search custom_list_search);
print list_search( 'alpha',   \@list );    #  0
print list_search( 'charlie', \@list );    #  1
print list_search( 'zebra',   \@list );    #  -1 (not found)

print nlist_search( 20, \@numbers );       #  1

print custom_list_search( $cmp_code, 'foo', \@custom_list );  # 3

can be written as

sub actually {
    my ($idx, $cmp) = @_;
    defined $cmp ? $idx : -1;
}

$index = actually blsrch0 { $_ cmp 'alpha'   } \@list;  # 0
$index = actually blsrch0 { $_ cmp 'charlie' } \@list;  # 1
$index = actually blsrch0 { $_ cmp 'zebra'   } \@list;  # -1
$index = actually blsrch0 { $_ <=> 20 } \@numbers;      # 1
$index = actually blsrch0
             { $cmp_code->($_, 'foo') } \@custom_list;  # 3

and for s/_search/_contains/, you can just check $cmp without regards to the index and (optionally) swap out with an eXistential search (which is only marginally faster, but semantically better to read IMO :).