• Cope with Unicode in MFC

    日期:2009-01-01 | 分类:Playing With Technology

    版权声明:转载时请以超链接形式标明文章原始出处和作者信息及本声明
    http://keilt.blogbus.com/logs/33254438.html

    Due to some special requirements, I have to move some ANSI codes to Unicode. Before I started, I never thought such job could be so annoying. It's a tough job, but anyway, I made it.

    Here's the tips:

    (1)VS2005 MFC sets Unicode as the default encoding stardard, if you don't need it, don't activate it, it will save you a lot of trouble.

    (2)Unicode file start with 0xFF and 0xFE, if you want to read an unicode file, make sure you skip those 2 indicators.

    (3)In the latest Unicode version(5.1.0), Simplified Chinese has a value between 19968(0x4E00) and 40869(0x9FA5), you can just compare a character with those value to judge whether it is a Simplified Chinese character. In regular expression, it's ^[\u4E00-\u9FA5]+$.

    Advantages of using Unicode:

    (1)Unicode 5.1.0 contains over 100,000 characters, far more than ANSI can has. You won't have non-English characters interchange/processing/display problems in your software in other regions.

    (2)In Unicode, a character takes only 1 widechar space, no matter what kind of character it is(Chinese/English/symbol...). In ANSI, an non-English character takes 2 char space, but an English character takes 1 char space, it makes processing hybrid text really troublesome.

    Some resources:

    (1)the Unicode charts of the latest Unicode version.
    Click here...

    (2)David Pritchard wrote an a class(in MFC) derived from CStdioFile which transparently handles the reading and writing of Unicode text files as well as ordinary multibyte text files.
    Notice that his function IsFileUnicodeText() is rely on the file's entry, it's not an absolute guarantee.
    Click here...


    收藏到:Del.icio.us