Every computer language has gotchas — strange quirks of syntax or persistent languages bugs (which, unfortunately, will sometimes morph into features) that trip you up or cause bizarre behavior of what would otherwise be a sound chunk of code. Gotchas happen, and there isn’t much use in complaining about them, other than to try and get the underlying behavior fixed or changed. But sometimes a gotcha is so strange, so removed from what you would normally consider an edge condition, that you do a double take.
Diagnosing a gotcha when working on a larger body of code can be somewhat aided by the feel of a language, the expectations you’ve come to terms with when programming in a specific syntax. I was recently working on a medium sized (1000+ line) Perl program when I happened upon what I would later realize was a strange gotcha — the problem was that it took quite awhile to determine the actual source of the problem. Some gotchas arise from using new or different libraries; thankfully this wasn’t the case, as those problems can often cause collateral work that significantly effects the length of time a programming task can take. In this case the gotcha was so fundamental as to cause me to step back and reevaluate some basic assumptions about how standard language features worked in Perl and other languages. In fact I had to test out this language feature in a few other languages to reassure myself that:
It’s a pretty simple feature, when it comes down to it (nevermind the amount of time it took to isolate this problem, as it also suffered from being buried in a rather deep stack of recursive calls): iterate over an associative array of elements twice; the first time exit the iteration once an appropriate item is found, the second time iterate over the associative array completely.
Suffice it to say this is the behavior I expected, as programmed in Python:
#!/usr/bin/python
set = { 'a' : 'aaa', 'b' : 'bbb', 'c' : 'ccc', 'd' : 'ddd', 'e' : 'eee', 'f' : 'fff', 'g' : 'ggg' }
print "First pass..."
for key, val in set.items():
print "%s -> %s" % (key, val)
if val == 'bbb':
break;
print
print "Second pass..."
for key, val in set.items():
print "%s -> %s" % (key, val)
The results for this vary slightly depending upon how the hash elements are stored, but a sample run of this program provides the output:
First pass... a -> aaa c -> ccc b -> bbb Second pass... a -> aaa c -> ccc b -> bbb e -> eee d -> ddd g -> ggg f -> fff
So the first pass prints the elements of the associative array up until a certain value is found, and the second iteration prints out all of the elements.
The same behavior is evident in Ruby:
#!/usr/bin/ruby
set = {
'a' => 'aaa',
'b' => 'bbb',
'c' => 'ccc',
'd' => 'ddd',
'e' => 'eee',
'f' => 'fff',
'g' => 'ggg'
}
puts "First pass..."
set.each do |k,v|
puts k + " -> " + v
if v == 'bbb' then
break
end
end
puts
puts "Second pass..."
set.each do |k,v|
puts k + " -> " + v
end
generates
First pass... a -> aaa b -> bbb Second pass... a -> aaa b -> bbb c -> ccc d -> ddd e -> eee f -> fff g -> ggg
Now for the Perl version, which left me scratching my head:
#!/usr/bin/perl
use strict;
use warnings;
my %set = (
-a => 'aaa',
-b => 'bbb',
-c => 'ccc',
-d => 'ddd',
-e => 'eee',
-f => 'fff',
-g => 'ggg'
);
print "First run...n";
while ( my ($key, $val) = each %set ) {
print "$key -> $valn";
last if ($val eq 'ggg');
}
print "n";
print "Second run...n";
while ( my ($key, $val) = each %set ) {
print "$key -> $valn";
}
results in
First run... -a -> aaa -c -> ccc -g -> ggg Second run... -f -> fff -e -> eee -d -> ddd -b -> bbb
Look at the output for a minute. Let it sink in: the first iteration displays all of the elements up until a specific value is found, and the second iteration continues from where the first left off. Being contrary to my basic understanding of how such an algorithm would work, some digging turned up an entry in perldoc:
The next call to each after that will start iterating again. There is a single iterator for each hash, shared by all each, keys, and values function calls in the program; it can be reset by reading all the elements from the hash, or by evaluating keys HASH or values HASH .
via perldoc
Thankfully I had a work around in my code, but the workaround is non-optimal as it makes deep assumptions about the layout and quality of incoming values from a database. Most of the time you don’t need to iterate through values in an associate arry in an arbitrary break dependent manner, as you either know or don’t know the key for the value, but in cases where partial iteration through an associative array is necessary this quirk in Perl is a fair pain to work around.
Thats why it makes more sense to do it as:
foreach $key (keys %hash) {
print “$key => $hash{$key}\n”;
last if $hash{$key} = ‘ccc’;
}
foreach $key (keys %hash) {
print “$key => $hash{$key}\n”;
}
That has been the behavior of ‘each’ since the earliest days of perl, it is a speed-optimized hash retrieval function.
What you were looking for is something like:
foreach my $k (keys %set)
{
my $v = $set{$k};
…
}
And even though each is faster, unless your hash is huge, you should not encounter significant efficiency issues using keys rather than each.
It seems to me that ‘gotcha’ is just your name for the situation when your expectactions (sometimes unreasonable) don’t match the reality. Just because your expectations didn’t match the behavior doesn’t make the language “crazy”.
If I find something “odd” (read. different from Perl) in Ruby or Python, and I blame the language for that, then I am the crazy one. not the language.
I am sorry, but the behavior is clearly documented, and a reasonably fluent Perl programmer should be expected to know this or at least look up in the docs and move on.
There is no gotcha here, let alone a “deep” one.
you should not iterate hashes in perl if the order matters anyway..
The way that the ‘each’ is being used is different. In Python and Ruby examples, the loops are iterating over a list returned by the each function call. In the Perl version, the each function returns two scalars, not a list. It is being repeatedly called, so having a global iterator makes sense.
It makes sense if you think about it. What would probably work better for you is something like: (you can take out the sort, but you seemed to want them sorted too)
for my $key (sort keys %set) {
my $val = $set{$key}
# logic goes here
}
Pretty odd behaviour I agree, but why not just call keys() in the void context to reset the iterator? There’s no overhead in this approach…
Well, I guess it’s obvious that not every each is equal.
May I ask why you’re iterating in that manner? Why not use:
foreach my $key ( keys %set )
I’ve never actually seen “each” used in Perl code.
That’s unfair bashing. You picked the iterator “each” in Perl, which is NOT defined the same way as in Python or Ruby. As the perldoc clearly says, there ARE iterators with the behavior you expect and those are “keys HASH” and “values HASH”, which always iterate over the entire hash and don’t remember if you stop at some position in an earlier loop. So the iterator “each” was obviously designed in Perl with some special/quirky purpose in mind (whatever that may be) and the fact that it has the same name as the one in Python and Ruby does NOT make it the same operator. It may very well be useful in some situations which you can’t see right now. Btw, suppose you did need state remembering iterator to scroll through a hash with 10mil elements. In Python and Ruby you are out of luck b/c you’ll always have to scroll through the entire array since they don’t remember state.
It is indeed unfortunate that it has the same name as Python/Ruby as that would confuse people but it is the responsibility of the programmer to study the language specifications. It’s like saying, “oh I am used to thinking of road as cars running forward in the right lane”. Well, not everywhere as you know. Quirky – yes. Crazy, no. As long as you are provided with the operators that achieve what you need then don’t complain about the idiosyncrasies you don’t like.
I personally have never used “each” and always use “keys” or “values” and have never had problems.
Are you sure you’re doing it right?
My sole categories of hash use are:
– lookup (or presence test, rarely deletion)
– iteration. On the whole of it.
If iteration was dependent on order in some way, I’d wonder why I were using a hash in the first place.
Perl isn’t crazy, you just used the wrong mechanism.
You should have used a foreach() loop:
set = (“a” => “aaa”, “b” => “bbb”, “c” => “ccc”, “d” => “ddd”);
foreach $k (keys(%set)){
print $k, “: “, $set{$k}, “n”;
}
This should do the same thing as the ruby or python implementations.
I am a php programmer, so I don’t know Perl back and forwards, but in php:
while(list($key,$value) = each($set)){ …}
// will do the same as your perl.
foreach($set as $key => $value) { … }
// will do the same thing as your ruby/python code.
you could just say:
foreach $key (keys (%set)){
print “$key -> $set{$key}\n”;
}
I never use each() for this reason–it’s just a hack to make perl look like it can index through hashes nicely. The guaranteed way to work is:
for my $key (keys %set) {
my $val = $set{$key};
}
It is a bit uglier, but should work in all situations where you have a hash. And if you’re just using it once I usually just skip making $val and use the $set{$key} raw.
The docs imply another work-around, which is to call “keys(%set);” once after the first loop. That will work too, but you have to remember to do it in the appropriate situation. Perhaps ‘keys %set, last if …’ would be more appropriate since it reminds you of why it’s there in the first place.
PHP has essentially the same behavior. An array stores an iteration pointer which is incremented by each(), and if you break an each() loop early you have to call reset() on the array:
http://www.php.net/manual/en/function.reset.php
Here’s test code:
http://pastie.org/283847
Is there something wrong with using the suggested workaround on perldoc, calling “keys %set” or “values %set”? Does it add a performance hit larger than your own workaround?
foreach is more compact than while/each and does not have this side effect.
foreach my $key ( keys ( %set ) )
{
print ( “$key\n” );
last if ( $set{ $key } eq ‘ggg’ );
}
Hu. This is quite basic. Do you know that perl doesn’t guarantee in which order ouput occur, and that it will actually differs at every run? It just shows that you actually know very little perl.
From the page http://perldoc.perl.org/functions/keys.html we find the following code sample
foreach $key (keys %ENV) {
print $key, ‘=’, $ENV{$key}, “\n”;
}
this would allow you to iterate in a manor more consistent with what you were expecting.
Perl is truly weird. Small niggle: the name “set” is a Python builtin function, and should not be used as a variable name. Especially if you hope to use sets in your program, which are very useful.
I’ve been programming perl professionally for ~10 yrs. I’ve never used “each”. Its flat out ugly, and frankly, very non-perly IMHO.
The proper “perl” way to do what you’re wishing is:
for my $key (keys %set) {
print “$key -> $set{$key}\n”;
last if $set{$key} eq ‘ggg’;
}
The function “keys” is “atomic” unlike the painful “each”, and so is immune to your described weakness.
-p
The comments so far are quite right; the correct method in this case would have been to use the KEYS operator rather than the EACH operator. In this case I had spent much of the previous day working in Python, and my brain immediately jumped to the more Pythonic syntax.
The weirdness in this case (or the Gotcha, not necessarily a deep fault of Perl, but a Gotcha for me when trying to debug this) was that my immediate instinct was not to question a basic idiom, but to assume that the data in question was somehow wonky.
Indeed, after a bit of debugging to make sure the data was complete, checking the perldoc (linked above, in fact) showed that the behavior of EACH was not what I expected.
Of course the example code here is contrived – order of the key storage isn’t important. In fact the first few times the real code was tested the key order happened to iterate in such a way that the bug wasn’t apparent.
@evariste
Your “niggle” is a good point, but for a quick test script it didn’t really come into play.
@perldashw
I think my many years of Perl experience were eclipsed by Pythonic idioms still floating around in my brain from the previous day’s work. I think you’ve hit the nail on the head with “each” being ugly and non-perly: this is probably why I’ve never run into this before now, as I normally would have used “keys” in this situation.
Perl’s “each” isn’t very useful because of this property. Too often the iteration mechanism and the function being applied on each iteration get conceptually disconnected during development and bugs like what you ran into creep in.
Sure it’s behavior is documented but that doesn’t excuse it for being a poor API. No amount of documentation will make a poor API magically good.
# If you wanted Perl’s behavior in Python, you could use dict’s iteritems:
set = { ‘a’ : ‘aaa’, ‘b’ : ‘bbb’, ‘c’ : ‘ccc’, ‘d’ : ‘ddd’, ‘e’ : ‘eee’, ‘f’ : ‘fff’, ‘g’ : ‘ggg’ }
iterator = set.iteritems()
print “First pass…”
for key, val in iterator:
print “%s -> %s” % (key, val)
if val == ‘bbb’:
break
print
print “Second pass…”
for key, val in iterator:
print “%s -> %s” % (key, val)
Of course, the ordering of the items in both Perl and Python is undefined, so iterating through them like this is often incorrect.
perldoc -f each
Deep Fail. By which I mean you, not Perl.
Next time, RTFM.
@Mark
Really? That’s your contribution to a conversation? The Perl community is worse off for your presence.
I don’t like so much Perl, i’d rather prefer either ruby or python, but i don’t see anything wrong in perl’s behaviour, the code is which is broken in this case. This is the way it should be.
Php works in the same way [0].
You cannot compare the .items() or .each methods in python/ruby against ‘each’ function in perl (or php), they are not the same. You are wrong, not perl.
[0] http://php.net/each
No offense, but if you would choose “set” as the name for a dict, you’re at least a little crazy.
For those confused about the value (usefulness) of the keyword each, consider what is actually happening behind the scenes. “keys” is actually returning an array which the loop then iterates over and is used to reference into the hash. The array being returned requires temporary memory as it is being used. “each” uses an internal iterator and does not require the additional temporary memory.
Yes perl does a decent job handling memory, yadda, yadda, yadda — but sometimes the programmer does in fact know better.
Sadly, resetting the iterator is comparatively expensive since it requires a call to keys or values rather than there being a simple “reset” call.
The Perl documentation is _very_ clear.
You are constructing an argument based on a fallacy – that the operator would work “as exepected” [i.e., as you expect it from you Ruby and Python experience].
And you were wrong. The languages are different languages. Perl is much richer – Perl 6 is way ahead the rest of the pact.
[...] http://www.programmish.com/?p=31 : un comportement bizarre de Perl. [...]
There are all these people who are yelling RTFM. To these folks, please enlighten us by explaining under which use-case would this particular “feature” would be considered useful and not a bug?
perhaps you should have looked at the documentation for each()?