Programmish
Posts Tagged ruby
The Deep Perl Gotcha

Every computer language has gotchas — strange quirks of syntax or persistent languages bugs (which, unfortunately, will sometimes morph into features) that trip you up or cause bizarre behavior of what would otherwise be a sound chunk of code. Gotchas happen, and there isn’t much use in complaining about them, other than to try and get the underlying behavior fixed or changed. But sometimes a gotcha is so strange, so removed from what you would normally consider an edge condition, that you do a double take.

Diagnosing a gotcha when working on a larger body of code can be somewhat aided by the feel of a language, the expectations you’ve come to terms with when programming in a specific syntax. I was recently working on a medium sized (1000+ line) Perl program when I happened upon what I would later realize was a strange gotcha — the problem was that it took quite awhile to determine the actual source of the problem. Some gotchas arise from using new or different libraries; thankfully this wasn’t the case, as those problems can often cause collateral work that significantly effects the length of time a programming task can take. In this case the gotcha was so fundamental as to cause me to step back and reevaluate some basic assumptions about how standard language features worked in Perl and other languages. In fact I had to test out this language feature in a few other languages to reassure myself that:

  • I wasn’t, in fact, crazy.
  • Perl is, in fact, crazy.

It’s a pretty simple feature, when it comes down to it (nevermind the amount of time it took to isolate this problem, as it also suffered from being buried in a rather deep stack of recursive calls): iterate over an associative array of elements twice; the first time exit the iteration once an appropriate item is found, the second time iterate over the associative array completely.

Suffice it to say this is the behavior I expected, as programmed in Python:

#!/usr/bin/python

set = { 'a' : 'aaa', 'b' : 'bbb', 'c' : 'ccc', 'd' : 'ddd', 'e' : 'eee', 'f' : 'fff', 'g' : 'ggg' }

print "First pass..."
for key, val in set.items():
    print "%s -> %s" % (key, val)
    if val == 'bbb':
        break;
print

print "Second pass..."
for key, val in set.items():
    print "%s -> %s" % (key, val)

The results for this vary slightly depending upon how the hash elements are stored, but a sample run of this program provides the output:

First pass...
a -> aaa
c -> ccc
b -> bbb

Second pass...
a -> aaa
c -> ccc
b -> bbb
e -> eee
d -> ddd
g -> ggg
f -> fff

So the first pass prints the elements of the associative array up until a certain value is found, and the second iteration prints out all of the elements.

The same behavior is evident in Ruby:

#!/usr/bin/ruby
set = {
        'a' => 'aaa',
        'b' => 'bbb',
        'c' => 'ccc',
        'd' => 'ddd',
        'e' => 'eee',
        'f' => 'fff',
        'g' => 'ggg'
      }

puts "First pass..."
set.each do |k,v|
    puts k + " -> " + v
    if v == 'bbb' then
        break
    end
end
puts

puts "Second pass..."
set.each do |k,v|
    puts k + " -> " + v
end

generates

First pass...
a -> aaa
b -> bbb

Second pass...
a -> aaa
b -> bbb
c -> ccc
d -> ddd
e -> eee
f -> fff
g -> ggg

Now for the Perl version, which left me scratching my head:

#!/usr/bin/perl
use strict;
use warnings;

my %set = (
    -a => 'aaa',
    -b => 'bbb',
    -c => 'ccc',
    -d => 'ddd',
    -e => 'eee',
    -f => 'fff',
    -g => 'ggg'
);

print "First run...n";
while ( my ($key, $val) = each %set ) {
    print "$key -> $valn";
    last if ($val eq 'ggg');
}
print "n";

print "Second run...n";
while ( my ($key, $val) = each %set ) {
    print "$key -> $valn";
}

results in

First run...
-a -> aaa
-c -> ccc
-g -> ggg

Second run...
-f -> fff
-e -> eee
-d -> ddd
-b -> bbb

Look at the output for a minute. Let it sink in: the first iteration displays all of the elements up until a specific value is found, and the second iteration continues from where the first left off. Being contrary to my basic understanding of how such an algorithm would work, some digging turned up an entry in perldoc:

The next call to each after that will start iterating again. There is a single iterator for each hash, shared by all each, keys, and values function calls in the program; it can be reset by reading all the elements from the hash, or by evaluating keys HASH or values HASH .
via perldoc

Thankfully I had a work around in my code, but the workaround is non-optimal as it makes deep assumptions about the layout and quality of incoming values from a database. Most of the time you don’t need to iterate through values in an associate arry in an arbitrary break dependent manner, as you either know or don’t know the key for the value, but in cases where partial iteration through an associative array is necessary this quirk in Perl is a fair pain to work around.