Front page | perl.perl5.changes |
Postings from March 2002
Change 15020: More Han tweaks from Autrjius Tang: most importantly,
From:
Jarkko Hietaniemi
Date:
March 4, 2002 15:07
Subject:
Change 15020: More Han tweaks from Autrjius Tang: most importantly,
Message ID:
200203042300.g24N05L24128@smtp3.ActiveState.com
Change 15020 by jhi@alpha on 2002/03/04 21:44:42
More Han tweaks from Autrjius Tang: most importantly,
gbk is identical to cp936, so gbk can be removed and
taken care of by an alias.
Affected files ...
.... //depot/perl/MANIFEST#755 edit
.... //depot/perl/ext/Encode/CN/CN.pm#3 edit
.... //depot/perl/ext/Encode/CN/Makefile.PL#5 edit
.... //depot/perl/ext/Encode/Encode.pm#63 edit
.... //depot/perl/ext/Encode/Encode/gbk.enc#2 delete
.... //depot/perl/ext/Encode/KR/KR.pm#2 edit
.... //depot/perl/ext/Encode/KR/Makefile.PL#3 edit
.... //depot/perl/ext/Encode/MANIFEST#4 edit
.... //depot/perl/ext/Encode/TW/Makefile.PL#5 edit
.... //depot/perl/ext/Encode/TW/TW.pm#3 edit
Differences ...
==== //depot/perl/MANIFEST#755 (text) ====
Index: perl/MANIFEST
--- perl/MANIFEST.~1~ Mon Mar 4 15:00:05 2002
+++ perl/MANIFEST Mon Mar 4 15:00:05 2002
@@ -295,7 +295,6 @@
ext/Encode/Encode/gb12345.enc Encode table
ext/Encode/Encode/gb1988.enc Encode table
ext/Encode/Encode/gb2312.enc Encode table
-ext/Encode/Encode/gbk.enc Encode table
ext/Encode/Encode/gsm0338.enc Encode table
ext/Encode/Encode/HZ.enc Encode table
ext/Encode/Encode/ir-197.enc Encode table
==== //depot/perl/ext/Encode/CN/CN.pm#3 (text) ====
Index: perl/ext/Encode/CN/CN.pm
--- perl/ext/Encode/CN/CN.pm.~1~ Mon Mar 4 15:00:05 2002
+++ perl/ext/Encode/CN/CN.pm Mon Mar 4 15:00:05 2002
@@ -6,4 +6,48 @@
1;
__END__
-todo: HZ (Escape-based)
+=head1 NAME
+
+Encode::CN - China-based Chinese Encodings
+
+=head1 SYNOPSIS
+
+ use Encode::CN;
+ $euc_cn = encode("euc-cn", $utf8);
+ $utf8 = encode("euc-cn", $euc_cn);
+
+=head1 DESCRIPTION
+
+This module implements China-based Chinese charset encodings.
+Encodings supported are as follows.
+
+ euc-cn EUC (Extended Unix Character)
+ gb2312 The raw (low-bit) GB2312 character map
+ gb12345 Traditional chinese counterpart to GB2312 (raw)
+ iso-ir-165 GB2312 + GB6345 + GB8565 + additions
+ cp936 Code Page 936, also known as GBK (Extended GuoBiao)
+
+To find how to use this module in detail, see L<Encode>.
+
+=head1 NOTES
+
+Due to size concerns, C<GB 18030> (an extension to C<GBK>) is distributed
+separately on CPAN, under the name L<Encode::HanExtra>. That module
+also contains extra Taiwan-based encodings.
+
+=head1 BUGS
+
+The C<HZ> (Hanzi) escaped encoding is not supported.
+
+ASCII part (0x00-0x7f) is preserved for all encodings, even though it
+conflicts with mappings by the Unicode Consortium. See
+
+F<http://www.debian.or.jp/~kubota/unicode-symbols.html.en>
+
+to find why it is implemented that way.
+
+=head1 SEE ALSO
+
+L<Encode>
+
+=cut
==== //depot/perl/ext/Encode/CN/Makefile.PL#5 (text) ====
Index: perl/ext/Encode/CN/Makefile.PL
--- perl/ext/Encode/CN/Makefile.PL.~1~ Mon Mar 4 15:00:05 2002
+++ perl/ext/Encode/CN/Makefile.PL Mon Mar 4 15:00:05 2002
@@ -3,7 +3,6 @@
use ExtUtils::MakeMaker;
my %tables = (EUC_CN => ['euc-cn.enc'],
- GBK => ['gbk.enc'],
GB2312 => ['gb2312.enc'],
GB12345 => ['gb12345.enc'],
CP936 => ['cp936.enc'],
==== //depot/perl/ext/Encode/Encode.pm#63 (text) ====
Index: perl/ext/Encode/Encode.pm
--- perl/ext/Encode/Encode.pm.~1~ Mon Mar 4 15:00:05 2002
+++ perl/ext/Encode/Encode.pm Mon Mar 4 15:00:05 2002
@@ -167,10 +167,13 @@
# Seen in some Linuxes.
define_alias( qr/^ujis$/i => 'euc-jp' );
+# CP936 doesn't have vendor-addon for GBK, so they're identical.
+define_alias( qr/^gbk$/i => '"cp936"');
+
# TODO: HP-UX '8' encodings arabic8 greek8 hebrew8 kana8 thai8 turkish8
# TODO: HP-UX '15' encodings japanese15 korean15 roi15
# TODO: Cyrillic encoding ISO-IR-111 (useful?)
-# TODO: Chinese encodings GB18030 EUC-TW HZ
+# TODO: Chinese encodings HZ
# TODO: Armenian encoding ARMSCII-8
# TODO: Hebrew encoding ISO-8859-8-1
# TODO: Thai encoding TCVN
==== //depot/perl/ext/Encode/KR/KR.pm#2 (text) ====
Index: perl/ext/Encode/KR/KR.pm
--- perl/ext/Encode/KR/KR.pm.~1~ Mon Mar 4 15:00:05 2002
+++ perl/ext/Encode/KR/KR.pm Mon Mar 4 15:00:05 2002
@@ -6,6 +6,40 @@
1;
__END__
+=head1 NAME
+
+Encode::KR - Korean Encodings
+
+=head1 SYNOPSIS
+
+ use Encode::CN;
+ $euc_kr = encode("euc-kr", $utf8);
+ $utf8 = encode("euc-kr", $euc_kr);
+
+=head1 DESCRIPTION
+
+This module implements Korean charset encodings. Encodings supported
+are as follows.
+
+ euc-kr EUC (Extended Unix Character)
+ ksc5601 Korean standard code set
+ cp949 Code Page 949 (EUC-KR + Unified Hangul Code)
+
+To find how to use this module in detail, see L<Encode>.
+
+=head1 BUGS
+
+The C<Johab> (two-byte combination code) encoding is not supported.
+
+ASCII part (0x00-0x7f) is preserved for all encodings, even though it
+conflicts with mappings by the Unicode Consortium. See
-todo:
+F<http://www.debian.or.jp/~kubota/unicode-symbols.html.en>
+
+to find why it is implemented that way.
+
+=head1 SEE ALSO
+
+L<Encode>
+=cut
==== //depot/perl/ext/Encode/KR/Makefile.PL#3 (text) ====
Index: perl/ext/Encode/KR/Makefile.PL
--- perl/ext/Encode/KR/Makefile.PL.~1~ Mon Mar 4 15:00:05 2002
+++ perl/ext/Encode/KR/Makefile.PL Mon Mar 4 15:00:05 2002
@@ -4,6 +4,7 @@
my %tables = (EUC_KR => ['euc-kr.enc'],
KSC5601 => ['ksc5601.enc'],
+ CP949 => ['cp949.enc'],
);
my $name = 'KR';
==== //depot/perl/ext/Encode/MANIFEST#4 (text) ====
Index: perl/ext/Encode/MANIFEST
--- perl/ext/Encode/MANIFEST.~1~ Mon Mar 4 15:00:05 2002
+++ perl/ext/Encode/MANIFEST Mon Mar 4 15:00:05 2002
@@ -47,6 +47,7 @@
Encode/ascii.enc
Encode/ascii.ucm
Encode/big5.enc
+Encode/big5-hkscs.enc
Encode/cp1006.enc
Encode/cp1047.enc
Encode/cp1047.ucm
@@ -95,6 +96,7 @@
Encode/gb2312.enc
Encode/gsm0338.enc
Encode/HZ.enc
+Encode/iso-ir-165.enc
Encode/ir-197.enc
Encode/jis0201.enc
Encode/jis0208.enc
==== //depot/perl/ext/Encode/TW/Makefile.PL#5 (text) ====
==== //depot/perl/ext/Encode/TW/TW.pm#3 (text) ====
Index: perl/ext/Encode/TW/TW.pm
--- perl/ext/Encode/TW/TW.pm.~1~ Mon Mar 4 15:00:05 2002
+++ perl/ext/Encode/TW/TW.pm Mon Mar 4 15:00:05 2002
@@ -6,3 +6,49 @@
1;
__END__
+=head1 NAME
+
+Encode::TW - Taiwan-based Chinese Encodings
+
+=head1 SYNOPSIS
+
+ use Encode::CN;
+ $big5 = encode("big5", $utf8);
+ $utf8 = encode("big5", $big5);
+
+=head1 DESCRIPTION
+
+This module implements Taiwan-based Chinese charset encodings.
+Encodings supported are as follows.
+
+ big5 The original Big5 encoding
+ big5-hkscs Big5 plus Cantonese characters in Hong Kong
+ cp950 Code Page 950 (Big5 + Microsoft vendor mappings)
+
+To find how to use this module in detail, see L<Encode>.
+
+=head1 NOTES
+
+Due to size concerns, C<EUC-TW> (Extended Unix Character) and C<BIG5PLUS>
+(CMEX's Big5+) are distributed separately on CPAN, under the name
+L<Encode::HanExtra>. That module also contains extra China-based encodings.
+
+=head1 BUGS
+
+The C<CNS11643> encoding files are not complete (only the first two planes,
+C<11643-1> and C<11643-2>, exist in the distribution). For common CNS11643
+manipulation, please use C<EUC-TW> in L<Encode::HanExtra>, which contains
+plane 1-7.
+
+ASCII part (0x00-0x7f) is preserved for all encodings, even though it
+conflicts with mappings by the Unicode Consortium. See
+
+F<http://www.debian.or.jp/~kubota/unicode-symbols.html.en>
+
+to find why it is implemented that way.
+
+=head1 SEE ALSO
+
+L<Encode>
+
+=cut
End of Patch.
-
Change 15020: More Han tweaks from Autrjius Tang: most importantly,
by Jarkko Hietaniemi