Re: [perl #60472] Module Encode degrades in Perl 5.10

Front page | perl.perl5.porters | Postings from November 2008

Re: [perl #60472] Module Encode degrades in Perl 5.10

Thread Previous | Thread Next

From:

demerphq

Date:

November 11, 2008 09:01

Subject:

Re: [perl #60472] Module Encode degrades in Perl 5.10

Message ID:

9b18b3110811110901o2b28a601o2c85d1ca880841fd@mail.gmail.com

2008/11/11 mihara@twister.dev.iwa.fujixerox.co.jp (via RT)
<perlbug-followup@perl.org>:
> # New Ticket Created by  mihara@twister.dev.iwa.fujixerox.co.jp
> # Please include the string:  [perl #60472]
> # in the subject line of all future correspondence about this issue.
> # <URL: http://rt.perl.org/rt3/Ticket/Display.html?id=60472 >
>
>
>
> This is a bug report for perl from mihara@twister.dev.iwa.fujixerox.co.jp,
> generated with the help of perlbug 1.36 running under perl 5.10.0.
>
>
> -----------------------------------------------------------------
> [Please enter your report here]
> I found this bug while playing with Encode::IMAPUTF7 module.
> Encode::IMAPUTF7 become to have a problem when I upgrade Perl from 5.8 to 5.10.
> The problem is that if "$1" is fed into "encode()" directly, $1 for
> subsequent pattern matching is not updated.  Here is a small code to produce this.
>
>
> ----
> #!/usr/bin/perl
> use MIME::Base64;
> use Encode;
> my $e_utf16 = find_encoding("UTF-16BE");
>
> $str="abcあdef";
> my $len=length($str);
> $re1=qr/(?:[a-z])/;
> $re2=qr/(?:[^a-z])/;
> $byte='';
> pos($str) = 0;
> while (pos($str) < $len) {
>        #print pos($str),"\n";
>        print "\n",pos($str),":",$str,"[$1]\n";
>        if ($str =~ /\G($re1+)/cg) {
>                print "($1)";
>                $bytes .= $1;
>        } elsif ($str =~ /\G($re2+)/cg) {
>                my $base64 = encode_base64($e_utf16->encode($1), '');
>                print "<$1:$base64>";
>                $bytes .= $1;
>        } else {
>                die "aaa";
>        }
> }
> -----
>
> Please be sure that $str contains Japanese character in UTF-8.
>
> when "$1" is fed like this in line 19
>                my $base64 = encode_base64($e_utf16->encode($1), '');
>
> next pattern machi in line  15
>
>        if ($str =~ /\G($re1+)/cg) {
>
> cannot update $1.  Previous $1 remains.
>
> Work around is this.
>
>                my $tmp = $1;
>                my $base64 = encode_base64($e_utf16->encode($tmp), '');
>
> But inside the $e_utf16->encode(), something may be wrong.
>
> Sorry for my poor explanation.  Please mail me at osamu.mihara'@'fujixerox.co.jp, if you have further questions.

Can you attach the code as a file? I dont seem to be able to create
the appropriate input string.

I change the Japanese character to a question mark i get the following output:

D:\dev\perl\ver\p4\win32>..\perl ..\encode_bug.pl

0:abc?def[]
(abc)
3:abc?def[abc]
<?:AD8=>
4:abc?def[?]
(?)
D:\dev\perl\ver\p4\win32>perl ..\encode_bug.pl

0:abc?def[]
(abc)
3:abc?def[abc]
<?:AD8=>
4:abc?def[?]
(def)

which clearly is different. (the bottom one is perl 5.8.6, the top is
a pretty recent blead).

If i change it to \x{100}

then i get:

D:\dev\perl\ver\p4\win32>..\perl ..\encode_bug.pl

Wide character in print at ..\encode_bug.pl line 14.
0:abc─Çdef[]
(abc)
Wide character in print at ..\encode_bug.pl line 14.
3:abc─Çdef[abc]
Wide character in print at ..\encode_bug.pl line 20.
<─Ç:AQA=>
Wide character in print at ..\encode_bug.pl line 14.
Wide character in print at ..\encode_bug.pl line 14.
4:abc─Çdef[─Ç]
Wide character in print at ..\encode_bug.pl line 16.
(─Ç)
D:\dev\perl\ver\p4\win32>perl ..\encode_bug.pl

Wide character in print at ..\encode_bug.pl line 14.
0:abc─Çdef[]
(abc)
Wide character in print at ..\encode_bug.pl line 14.
3:abc─Çdef[abc]
Modification of a read-only value attempted at ..\encode_bug.pl line 19.

some of which is explainable by this being a windows box, and not
having a utf8 shell, but it looks like at least one issue (concerning
the read only value) is fixed in blead.

Which sortof reminds me of something: Encode has an annoying habit of
modifying its arguments....

Anyway, obviously there is a bug. Damned if i understand it tho.

Cheers,
yves



-- 
perl -Mre=debug -e "/just|another|perl|hacker/"

Thread Previous | Thread Next

[perl #60472] Module Encode degrades in Perl 5.10 by perlbug-followup

Re: [perl #60472] Module Encode degrades in Perl 5.10 by demerphq

Re: [perl #60472] Module Encode degrades in Perl 5.10 by demerphq

Re: [perl #60472] Module Encode degrades in Perl 5.10 by demerphq

Re: [perl #60472] Module Encode degrades in Perl 5.10 by Nicholas Clark

Re: [perl #60472] Module Encode degrades in Perl 5.10 by demerphq

Re: [perl #60472] Module Encode degrades in Perl 5.10 by Nicholas Clark

Re: [perl #60472] Module Encode degrades in Perl 5.10 by demerphq

nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About