develooper Front page | perl.perl5.changes | Postings from March 2002

Change 15020: More Han tweaks from Autrjius Tang: most importantly,

From:
Jarkko Hietaniemi
Date:
March 4, 2002 15:07
Subject:
Change 15020: More Han tweaks from Autrjius Tang: most importantly,
Message ID:
200203042300.g24N05L24128@smtp3.ActiveState.com
Change 15020 by jhi@alpha on 2002/03/04 21:44:42

	More Han tweaks from Autrjius Tang: most importantly,
	gbk is identical to cp936, so gbk can be removed and
	taken care of by an alias.

Affected files ...

.... //depot/perl/MANIFEST#755 edit
.... //depot/perl/ext/Encode/CN/CN.pm#3 edit
.... //depot/perl/ext/Encode/CN/Makefile.PL#5 edit
.... //depot/perl/ext/Encode/Encode.pm#63 edit
.... //depot/perl/ext/Encode/Encode/gbk.enc#2 delete
.... //depot/perl/ext/Encode/KR/KR.pm#2 edit
.... //depot/perl/ext/Encode/KR/Makefile.PL#3 edit
.... //depot/perl/ext/Encode/MANIFEST#4 edit
.... //depot/perl/ext/Encode/TW/Makefile.PL#5 edit
.... //depot/perl/ext/Encode/TW/TW.pm#3 edit

Differences ...

==== //depot/perl/MANIFEST#755 (text) ====
Index: perl/MANIFEST
--- perl/MANIFEST.~1~	Mon Mar  4 15:00:05 2002
+++ perl/MANIFEST	Mon Mar  4 15:00:05 2002
@@ -295,7 +295,6 @@
 ext/Encode/Encode/gb12345.enc		Encode table
 ext/Encode/Encode/gb1988.enc		Encode table
 ext/Encode/Encode/gb2312.enc		Encode table
-ext/Encode/Encode/gbk.enc		Encode table
 ext/Encode/Encode/gsm0338.enc		Encode table
 ext/Encode/Encode/HZ.enc		Encode table
 ext/Encode/Encode/ir-197.enc		Encode table

==== //depot/perl/ext/Encode/CN/CN.pm#3 (text) ====
Index: perl/ext/Encode/CN/CN.pm
--- perl/ext/Encode/CN/CN.pm.~1~	Mon Mar  4 15:00:05 2002
+++ perl/ext/Encode/CN/CN.pm	Mon Mar  4 15:00:05 2002
@@ -6,4 +6,48 @@
 
 1;
 __END__
-todo: HZ (Escape-based)
+=head1 NAME
+
+Encode::CN - China-based Chinese Encodings
+
+=head1 SYNOPSIS
+
+    use Encode::CN;
+    $euc_cn = encode("euc-cn", $utf8);
+    $utf8   = encode("euc-cn", $euc_cn);
+
+=head1 DESCRIPTION
+
+This module implements China-based Chinese charset encodings.
+Encodings supported are as follows.
+
+  euc-cn	EUC (Extended Unix Character)
+  gb2312	The raw (low-bit) GB2312 character map
+  gb12345	Traditional chinese counterpart to GB2312 (raw)
+  iso-ir-165	GB2312 + GB6345 + GB8565 + additions
+  cp936	Code Page 936, also known as GBK (Extended GuoBiao)
+
+To find how to use this module in detail, see L<Encode>.
+
+=head1 NOTES
+
+Due to size concerns, C<GB 18030> (an extension to C<GBK>) is distributed
+separately on CPAN, under the name L<Encode::HanExtra>. That module
+also contains extra Taiwan-based encodings.
+
+=head1 BUGS
+
+The C<HZ> (Hanzi) escaped encoding is not supported.
+
+ASCII part (0x00-0x7f) is preserved for all encodings, even though it
+conflicts with mappings by the Unicode Consortium.  See
+
+F<http://www.debian.or.jp/~kubota/unicode-symbols.html.en>
+
+to find why it is implemented that way.
+
+=head1 SEE ALSO
+
+L<Encode>
+
+=cut

==== //depot/perl/ext/Encode/CN/Makefile.PL#5 (text) ====
Index: perl/ext/Encode/CN/Makefile.PL
--- perl/ext/Encode/CN/Makefile.PL.~1~	Mon Mar  4 15:00:05 2002
+++ perl/ext/Encode/CN/Makefile.PL	Mon Mar  4 15:00:05 2002
@@ -3,7 +3,6 @@
 use ExtUtils::MakeMaker;
 
 my %tables = (EUC_CN   => ['euc-cn.enc'],
-	      GBK      => ['gbk.enc'],
 	      GB2312   => ['gb2312.enc'],
 	      GB12345  => ['gb12345.enc'],
 	      CP936    => ['cp936.enc'],

==== //depot/perl/ext/Encode/Encode.pm#63 (text) ====
Index: perl/ext/Encode/Encode.pm
--- perl/ext/Encode/Encode.pm.~1~	Mon Mar  4 15:00:05 2002
+++ perl/ext/Encode/Encode.pm	Mon Mar  4 15:00:05 2002
@@ -167,10 +167,13 @@
 # Seen in some Linuxes.
 define_alias( qr/^ujis$/i => 'euc-jp' );
 
+# CP936 doesn't have vendor-addon for GBK, so they're identical.
+define_alias( qr/^gbk$/i => '"cp936"');
+
 # TODO: HP-UX '8' encodings arabic8 greek8 hebrew8 kana8 thai8 turkish8
 # TODO: HP-UX '15' encodings japanese15 korean15 roi15
 # TODO: Cyrillic encoding ISO-IR-111 (useful?)
-# TODO: Chinese encodings GB18030 EUC-TW HZ
+# TODO: Chinese encodings HZ
 # TODO: Armenian encoding ARMSCII-8
 # TODO: Hebrew encoding ISO-8859-8-1
 # TODO: Thai encoding TCVN

==== //depot/perl/ext/Encode/KR/KR.pm#2 (text) ====
Index: perl/ext/Encode/KR/KR.pm
--- perl/ext/Encode/KR/KR.pm.~1~	Mon Mar  4 15:00:05 2002
+++ perl/ext/Encode/KR/KR.pm	Mon Mar  4 15:00:05 2002
@@ -6,6 +6,40 @@
 
 1;
 __END__
+=head1 NAME
+
+Encode::KR - Korean Encodings
+
+=head1 SYNOPSIS
+
+    use Encode::CN;
+    $euc_kr = encode("euc-kr", $utf8);
+    $utf8   = encode("euc-kr", $euc_kr);
+
+=head1 DESCRIPTION
+
+This module implements Korean charset encodings.  Encodings supported
+are as follows.
+
+  euc-kr	EUC (Extended Unix Character)
+  ksc5601	Korean standard code set
+  cp949	Code Page 949 (EUC-KR + Unified Hangul Code)
+  
+To find how to use this module in detail, see L<Encode>.
+
+=head1 BUGS
+
+The C<Johab> (two-byte combination code) encoding is not supported.
+
+ASCII part (0x00-0x7f) is preserved for all encodings, even though it
+conflicts with mappings by the Unicode Consortium.  See
 
-todo:
+F<http://www.debian.or.jp/~kubota/unicode-symbols.html.en>
+
+to find why it is implemented that way.
+
+=head1 SEE ALSO
+
+L<Encode>
 
+=cut

==== //depot/perl/ext/Encode/KR/Makefile.PL#3 (text) ====
Index: perl/ext/Encode/KR/Makefile.PL
--- perl/ext/Encode/KR/Makefile.PL.~1~	Mon Mar  4 15:00:05 2002
+++ perl/ext/Encode/KR/Makefile.PL	Mon Mar  4 15:00:05 2002
@@ -4,6 +4,7 @@
 
 my %tables = (EUC_KR   => ['euc-kr.enc'],
 	      KSC5601  => ['ksc5601.enc'],
+	      CP949    => ['cp949.enc'],
              );
 
 my $name = 'KR';

==== //depot/perl/ext/Encode/MANIFEST#4 (text) ====
Index: perl/ext/Encode/MANIFEST
--- perl/ext/Encode/MANIFEST.~1~	Mon Mar  4 15:00:05 2002
+++ perl/ext/Encode/MANIFEST	Mon Mar  4 15:00:05 2002
@@ -47,6 +47,7 @@
 Encode/ascii.enc
 Encode/ascii.ucm
 Encode/big5.enc
+Encode/big5-hkscs.enc
 Encode/cp1006.enc
 Encode/cp1047.enc
 Encode/cp1047.ucm
@@ -95,6 +96,7 @@
 Encode/gb2312.enc
 Encode/gsm0338.enc
 Encode/HZ.enc
+Encode/iso-ir-165.enc
 Encode/ir-197.enc
 Encode/jis0201.enc
 Encode/jis0208.enc

==== //depot/perl/ext/Encode/TW/Makefile.PL#5 (text) ====
==== //depot/perl/ext/Encode/TW/TW.pm#3 (text) ====
Index: perl/ext/Encode/TW/TW.pm
--- perl/ext/Encode/TW/TW.pm.~1~	Mon Mar  4 15:00:05 2002
+++ perl/ext/Encode/TW/TW.pm	Mon Mar  4 15:00:05 2002
@@ -6,3 +6,49 @@
 
 1;
 __END__
+=head1 NAME
+
+Encode::TW - Taiwan-based Chinese Encodings
+
+=head1 SYNOPSIS
+
+    use Encode::CN;
+    $big5 = encode("big5", $utf8);
+    $utf8 = encode("big5", $big5);
+
+=head1 DESCRIPTION
+
+This module implements Taiwan-based Chinese charset encodings.
+Encodings supported are as follows.
+
+  big5		The original Big5 encoding
+  big5-hkscs	Big5 plus Cantonese characters in Hong Kong
+  cp950	Code Page 950 (Big5 + Microsoft vendor mappings)
+  
+To find how to use this module in detail, see L<Encode>.
+
+=head1 NOTES
+
+Due to size concerns, C<EUC-TW> (Extended Unix Character) and C<BIG5PLUS>
+(CMEX's Big5+) are distributed separately on CPAN, under the name
+L<Encode::HanExtra>. That module also contains extra China-based encodings.
+
+=head1 BUGS
+
+The C<CNS11643> encoding files are not complete (only the first two planes,
+C<11643-1> and C<11643-2>, exist in the distribution). For common CNS11643
+manipulation, please use C<EUC-TW> in L<Encode::HanExtra>, which contains
+plane 1-7.
+
+ASCII part (0x00-0x7f) is preserved for all encodings, even though it
+conflicts with mappings by the Unicode Consortium.  See
+
+F<http://www.debian.or.jp/~kubota/unicode-symbols.html.en>
+
+to find why it is implemented that way.
+
+=head1 SEE ALSO
+
+L<Encode>
+
+=cut
End of Patch.




nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About