最近使用正規表達式(Regular Expression)處理夾雜中英文、及符號的字串。因為字串是UTF8的編碼,於是翻了一下正規表達式的經典書籍- 精通正規表達式(第三版) (這本書寫的真不錯,看了前兩章就可以感受到作者的用心與功力),看看PHP該如何處理。 書中提到PHP的表示方式為\xnum、\x{num},這和常見的\unum方式不同。照書中所述的處理方式去做,結果和預期不同。 最後看了以下兩篇PHP官網上的說明,才瞭解正確的用法 Pattern Modifiersu ,提到PCRE UTF-8 mode… This modifier turns on additional functionality of PCRE that is incompatible with Perl. Pattern strings are treated as UTF-8. This modifier is available from PHP 4.1.0 or greater on Unix and from PHP 4.2.3 on win32. UTF-8 validity of the pattern is checked since PHP 4.3.5. Escape sequences ,這篇說明了 \xYY 和 \x{YYYY}的差異… After "\x", up to two hexadecimal digits are read (letters can be in upper or lower case). In UTF-8 mode, "\x{...}" is allowed, where the contents of the braces is a string of hexadecimal digits. It is interpreted as a UTF-8 character whose code number is the given hexadecimal number. The original hexadecimal escape sequence, \xhh, matches a two-byte UTF-8 character if the value is greater t...