2008/11/11 mihara@twister.dev.iwa.fujixerox.co.jp (via RT) <perlbug-followup@perl.org>: > # New Ticket Created by mihara@twister.dev.iwa.fujixerox.co.jp > # Please include the string: [perl #60472] > # in the subject line of all future correspondence about this issue. > # <URL: http://rt.perl.org/rt3/Ticket/Display.html?id=60472 > > > > > This is a bug report for perl from mihara@twister.dev.iwa.fujixerox.co.jp, > generated with the help of perlbug 1.36 running under perl 5.10.0. > > > ----------------------------------------------------------------- > [Please enter your report here] > I found this bug while playing with Encode::IMAPUTF7 module. > Encode::IMAPUTF7 become to have a problem when I upgrade Perl from 5.8 to 5.10. > The problem is that if "$1" is fed into "encode()" directly, $1 for > subsequent pattern matching is not updated. Here is a small code to produce this. > > > ---- > #!/usr/bin/perl > use MIME::Base64; > use Encode; > my $e_utf16 = find_encoding("UTF-16BE"); > > $str="abcあdef"; > my $len=length($str); > $re1=qr/(?:[a-z])/; > $re2=qr/(?:[^a-z])/; > $byte=''; > pos($str) = 0; > while (pos($str) < $len) { > #print pos($str),"\n"; > print "\n",pos($str),":",$str,"[$1]\n"; > if ($str =~ /\G($re1+)/cg) { > print "($1)"; > $bytes .= $1; > } elsif ($str =~ /\G($re2+)/cg) { > my $base64 = encode_base64($e_utf16->encode($1), ''); > print "<$1:$base64>"; > $bytes .= $1; > } else { > die "aaa"; > } > } > ----- > > Please be sure that $str contains Japanese character in UTF-8. > > when "$1" is fed like this in line 19 > my $base64 = encode_base64($e_utf16->encode($1), ''); > > next pattern machi in line 15 > > if ($str =~ /\G($re1+)/cg) { > > cannot update $1. Previous $1 remains. > > Work around is this. > > my $tmp = $1; > my $base64 = encode_base64($e_utf16->encode($tmp), ''); > > But inside the $e_utf16->encode(), something may be wrong. > > Sorry for my poor explanation. Please mail me at osamu.mihara'@'fujixerox.co.jp, if you have further questions. Can you attach the code as a file? I dont seem to be able to create the appropriate input string. I change the Japanese character to a question mark i get the following output: D:\dev\perl\ver\p4\win32>..\perl ..\encode_bug.pl 0:abc?def[] (abc) 3:abc?def[abc] <?:AD8=> 4:abc?def[?] (?) D:\dev\perl\ver\p4\win32>perl ..\encode_bug.pl 0:abc?def[] (abc) 3:abc?def[abc] <?:AD8=> 4:abc?def[?] (def) which clearly is different. (the bottom one is perl 5.8.6, the top is a pretty recent blead). If i change it to \x{100} then i get: D:\dev\perl\ver\p4\win32>..\perl ..\encode_bug.pl Wide character in print at ..\encode_bug.pl line 14. 0:abc─Çdef[] (abc) Wide character in print at ..\encode_bug.pl line 14. 3:abc─Çdef[abc] Wide character in print at ..\encode_bug.pl line 20. <─Ç:AQA=> Wide character in print at ..\encode_bug.pl line 14. Wide character in print at ..\encode_bug.pl line 14. 4:abc─Çdef[─Ç] Wide character in print at ..\encode_bug.pl line 16. (─Ç) D:\dev\perl\ver\p4\win32>perl ..\encode_bug.pl Wide character in print at ..\encode_bug.pl line 14. 0:abc─Çdef[] (abc) Wide character in print at ..\encode_bug.pl line 14. 3:abc─Çdef[abc] Modification of a read-only value attempted at ..\encode_bug.pl line 19. some of which is explainable by this being a windows box, and not having a utf8 shell, but it looks like at least one issue (concerning the read only value) is fixed in blead. Which sortof reminds me of something: Encode has an annoying habit of modifying its arguments.... Anyway, obviously there is a bug. Damned if i understand it tho. Cheers, yves -- perl -Mre=debug -e "/just|another|perl|hacker/"Thread Previous | Thread Next