-
Cope with Unicode in MFC
日期:2009-01-01 | 分类:Playing With Technology
版权声明:转载时请以超链接形式标明文章原始出处和作者信息及本声明
http://keilt.blogbus.com/logs/33254438.html
Due to some special requirements, I have to move some ANSI codes to Unicode. Before I started, I never thought such job could be so annoying. It's a tough job, but anyway, I made it.
Here's the tips:
(1)VS2005 MFC sets Unicode as the default encoding stardard, if you don't need it, don't activate it, it will save you a lot of trouble.
(2)Unicode file start with 0xFF and 0xFE, if you want to read an unicode file, make sure you skip those 2 indicators.
(3)In the latest Unicode version(5.1.0), Simplified Chinese has a value between 19968(0x4E00) and 40869(0x9FA5), you can just compare a character with those value to judge whether it is a Simplified Chinese character. In regular expression, it's ^[\u4E00-\u9FA5]+$.
Advantages of using Unicode:
(1)Unicode 5.1.0 contains over 100,000 characters, far more than ANSI can has. You won't have non-English characters interchange/processing/display problems in your software in other regions.
(2)In Unicode, a character takes only 1 widechar space, no matter what kind of character it is(Chinese/English/symbol...). In ANSI, an non-English character takes 2 char space, but an English character takes 1 char space, it makes processing hybrid text really troublesome.
Some resources:(1)the Unicode charts of the latest Unicode version.
Click here...
(2)David Pritchard wrote an a class(in MFC) derived from CStdioFile which transparently handles the reading and writing of Unicode text files as well as ordinary multibyte text files.
Notice that his function IsFileUnicodeText() is rely on the file's entry, it's not an absolute guarantee.
Click here...随机文章:
You know what is cool? Check it out 2009-02-13中文信息处理之一 - 机械分词 2008-10-23王小云真的破解了MD5和SHA-1吗? 2008-08-05ACM技巧(For amateur only) 2008-07-22抵达大连 2008-07-04
收藏到:Del.icio.us







