Chandler: Perl 判斷文字檔編碼格式

2011年7月30日星期六

Perl 判斷文字檔編碼格式

原文 http://www.perlmonks.org/?node_id=256728

Windows的ActivePerl好像沒有File::BOM, File::MMagic。

最後一個方法是讀檔案前面兩個bytes來判斷。
Example:

open $FH_i, "<", "unicode.txt";
read $FH_i, $buf, 2, 0;
close $FH_i;

@File_head = split(//, $buf);
if (($File_head[0] eq "\xFF") && ($File_head[1] eq "\xFE")) {
print "This is unicode Little Endian file.\n";
} elsif (($File_head[0] eq "\xFE") && ($File_head[1] eq "\xFF")) {
print "This is unicode Big Endian file.\n";
} else {
print "This is ASCII file.\n";
}

參考 http://www.cs.cf.ac.uk/Dave/PERL/node73.html

How do I determine encoding format of a file ?

Perl 5.8 has a module called "Encode::Guess", which might work well if you know the language involved and/or can provide some hints as to the likely candidates. (I haven't tried it yet, but it is admittedly limited and speculative at present.)

Answer: How do I determine encoding format of a file ?
contributed by idsfa
File::BOM provides get_encoding_from_filehandle and get_encoding from_stream to identify the encoding of Unicode files. Example:

use File::BOM qw( :all );
open $fh, '<', $filename;
my ($encoding) = get_encoding_from_filehandle($fh);
[download]

Answer: How do I determine encoding format of a file ?
contributed by particlehave a look at File::MMagic, it guesses the filetype given the filename or a filehandle, and is quite configurable (you can add more file type descriptions based on regular expressions.) it's a handy little module.

Answer: How do I determine encoding format of a file ?
contributed by donno20Read the first two bytes of the file. Corresponding encoding and hex codes are as follow:
unicode Little Endian = "\xFF\xFE"
unicode Big Endian = "\xFE\xFF"
utf8 = "\xEF\xBB"
ASCII = straight to content

Chandler

2011年7月30日星期六

Perl 判斷文字檔編碼格式

原文 http://www.perlmonks.org/?node_id=256728

How do I determine encoding format of a file ?

沒有留言:

張貼留言

2011年7月30日 星期六

Perl 判斷文字檔編碼格式

原文 http://www.perlmonks.org/?node_id=256728

How do I determine encoding format of a file ?

沒有留言:

張貼留言

2011年7月30日星期六