utf8(3) Perl Programmers Reference Guide utf8(3)NAMEutf8 - Perl pragma to enable/disable UTF-8 in source code
SYNOPSIS
use utf8;
no utf8;
DESCRIPTION
WARNING: The implementation of Unicode support in Perl is
incomplete. See the perlunicode manpage for the exact
details.
The "use utf8" pragma tells the Perl parser to allow UTF-8
in the program text in the current lexical scope. The "no
utf8" pragma tells Perl to switch back to treating the
source text as literal bytes in the current lexical scope.
This pragma is primarily a compatibility device. Perl
versions earlier than 5.6 allowed arbitrary bytes in
source code, whereas in future we would like to standard
ize on the UTF-8 encoding for source text. Until UTF-8
becomes the default format for source text, this pragma
should be used to recognize UTF-8 in the source. When
UTF-8 becomes the standard source format, this pragma will
effectively become a no-op. This pragma already is a no-
op on EBCDIC platforms (where it is alright to code perl
in EBCDIC rather than UTF-8).
Enabling the "utf8" pragma has the following effects:
Bytes in the source text that have their high-bit set
will be treated as being part of a literal UTF-8 char
acter. This includes most literals such as identi
fiers, string constants, constant regular expression
patterns and package names.
In the absence of inputs marked as UTF-8, regular
expressions within the scope of this pragma will
default to using character semantics instead of byte
semantics.
@bytes_or_chars = split //, $data; # may split to bytes if data
# $data isn't UTF-8
{
use utf8; # force char semantics
@chars = split //, $data; # splits characters
}
SEE ALSO
the perlunicode manpage, the bytes manpage
2001-02-22 perl v5.6.1 utf8(3)