Das Internet nutzt zur Übertragung von von Informationen adressenbasierte Vermittlungen.
Das Internet ist 2014 ubiquitär (allgegenwärtig, überall, überall vertreten, omnipräsent).
XHTML steht für
E(x)tensible-(H)yper(T)ext-(M)arkup-(L)anguage
(W3C-Standard, XML-Syntaxregeln, Wohlgeformtheit, Validierbar;
textbasierte Auszeichnungssprache fuer Dokumente, Textinhalte, Bilder, Hyperlinks)
Für die langlebigkeit von gespeicherten Informationen
ist neben den technischen Verarbeitungs- und
Zugriffsmethoden ( Betriebssysteme, Hardware)
die Beständigkeit der der Kodierungsformate (Unicode) wesentlich.
Heute wird oft UTF-8 (entsprich RFC 3629) verwendet.
Die Kodierungslänge je Zeichen nach ISO 10646 beträgt 32 bit,
die Kodierungslänge je Zeichen nach Unicode 4.0 beträgt 21 bit
1989 Vorschlag DP 10646 1991 Unicode 1.0 1993 Abgleich mit ISO 10646
XML kann UTF-Zeichensätze verwenden. UTF ist ein Synonym für Unicode Transformation Format.
Mit UTF-8 können (praktisch alle) Schriftzeichen der Welt abgebildet werden
(
Umgang mit Zeichencodierungen in HTML und CSS, für Anfänger siehe
w3.org:
Internationalisierung
,
Zeichencodierung
,
HTML5 UTF-8 ( Overlong forms )
,
unicode.org:
Unicode-Standards
,
).
Bei einem Editor (Speicherung), der den Unicode-Zeichenvorrat nicht beherrscht,
wird dann z.B. der Buchstabe ü "benummert" durch
ü (dezimale Notation) oder
ü (benummert: hexadezimale Notation) oder
ü (benannt: mit Kurznamen benamte Notation).
Das Unicode-System versucht die Schriftzeichen der Welt abzubilden.
So entspricht z.B. das Eurozeichen-Symbol dem
Unicode U+20AC,
in der benannten (mit Kurznamen benamten) XHTML-Schreibweise
€ (Anzeige: €) und
in der dezimal-benummerten XHTML-Schreibweise
€ (Anzeige: €)
und der hexadezimal-benummerten-Schreibweise
€ (Anzeige: €)
UTF ist eine Abkürzung für Unicode Transformation Format.
UTF-8 ist die am weitesten verbreitete Kodierung für Unicode-Zeichen.
encoding="UTF-8" steht für eine internationale Kodierung
auf Basis der ISO/IEC-10646-Unicode-Norm (RFC 3629).
Jedem Unicode-Zeichen (Anzahl 1.114.112)
wird eine speziell kodierte Bytekette von variabler Länge (bis zu 4 Byte) zugeordnet.
(siehe
unicode.org
).
Die ersten Bytes einer Datei dienen der Erkennung der Zeichencodierung (BOM, Byte Order Mark, dt. Bytereihenfolge-Markierung) : "EFBBBF" = "UTF-8"; "FEFF" = "UTF-16 Big Endian"; "FFFE" = "UTF-16 Little Endian"; "0000FEFF" = "UTF-32 Big Endian"; "FFFE0000" = "UTF-32 Little Endian"; "0EFEFF" = "SCSU"; "DD736673" = "UTF-EBCDIC"; "FBEE28" = "BOCU-1"; "2B2F76382D" = "UTF-7"; "2B2F7638" = "UTF-7"; "2B2F7639" = "UTF-7"; "2B2F762B" = "UTF-7"; "2B2F762F" = "UTF-7";
Alle neuen Internetkommunikationsprotokolle sollen UTF-8 unterstützen. Die folgende Tabelle fasst einige UTF-Abhängigkeiten zusammen:
| Name | UTF-8 | UTF-16 | UTF-16BE | UTF-16LE | UTF-32 | UTF-32BE | UTF-32LE |
|---|---|---|---|---|---|---|---|
| Smallest code point | 0000 | 0000 | 0000 | 0000 | 0000 | 0000 | 0000 |
| Largest code point | 10FFFF | 10FFFF | 10FFFF | 10FFFF | 10FFFF | 10FFFF | 10FFFF |
| Code unit size | 8 bits | 16 bits | 16 bits | 16 bits | 32 bits | 32 bits | 32 bits |
| Byte order | N/A | <BOM> | big-endian | little-endian | <BOM> | big-endian | little-endian |
| Minimal bytes/character | 1 | 2 | 2 | 2 | 4 | 4 | 4 |
| Maximal bytes/character | 4 | 4 | 4 | 4 | 4 | 4 | 4 |
HTML 4.01 sollte immer HTML MIME type nutzen.
XHTML 1.x doctypes können HTML oder XML MIME nutzen.
Seiten ohne doctype werden (2011) als HTML5-Seiten betrachtet.
Hier einige Links: W3C Internationalization Checker W3C BOM-Tester W3C-Tester: UTF-8-Signatur (Byte Order Mark,hex-EFBBBF-BOM) W3C-Validierer: validator.w3.org rexswain.com: HTTP Viewer Mozilla: Web-Sniffer
In der Typografie ist eine Glyphe die grafische Darstellung eines Schriftzeichens (z.B. Buchstabens, Silbenzeichen, Ligatur oder Buchstabenteil). Die Glyphe bildet dabei in sich eine grafische Einheit.
Die Entwicklung von Schriftzeichen, die gesetzt stets zu einem ästetischen Ganzen bilden, ist eine schwieriger, umfangreicher Design-Prozess. bei der Verwendung von Fonts und Glyphs ist auf die Rechte zu achten. Rechte halten z.B. Adobe Systems Incorporated, Monotype Imaging, Apple Computer Inc., Atelier Fluxus Virus, Beijing Zhong Yi (Zheng Code) Electronics Company, DecoType Inc., Evertype, Hapax, IBM Corporation, Microsoft Corporation, Peking University Founder Group Corporation, Production First Software, SIL International, STAR - Sylheti Translation And Research, usw.
Beim Unicode darf ein Character-Set-Name (MIB) bis zu 40 Buchstaben enthalten. Die folgenden Normen (USC = Universal Character Set) legen Zeichensätze fest:
Es gibt viele Character Sets (RFC,ISO,UTF,DIN,IBM,HP,usw.), die zunehmend vereinheitlicht und durch "Unicode Character Database" abgelöst werden.
===================================================================
These are the official names for character sets that may be used in
the Internet and may be referred to in Internet documentation. These
names are expressed in ANSI_X3.4-1968 which is commonly called
US-ASCII or simply ASCII. The character set most commonly use in the
Internet and used especially in protocol standards is US-ASCII, this
is strongly encouraged. The use of the name US-ASCII is also
encouraged.
The character set names may be up to 40 characters taken from the
printable characters of US-ASCII. However, no distinction is made
between use of upper and lower case letters.
The MIBenum value is a unique value for use in MIBs to identify coded
character sets.
The value space for MIBenum values has been divided into three
regions. The first region (3-999) consists of coded character sets
that have been standardized by some standard setting organization.
This region is intended for standards that do not have subset
implementations. The second region (1000-1999) is for the Unicode and
ISO/IEC 10646 coded character sets together with a specification of a
(set of) sub-repertoires that may occur. The third region (>1999) is
intended for vendor specific coded character sets.
Assigned MIB enum Numbers
-------------------------
0-2 Reserved
3-999 Set By Standards Organizations
1000-1999 Unicode / 10646
2000-2999 Vendor
The aliases that start with "cs" have been added for use with the
IANA-CHARSET-MIB as originally defined in RFC3808, and as currently
maintained by IANA at http://www.iana.org/assignments/ianacharset-mib.
Note that the ianacharset-mib needs to be kept in sync with this
registry. These aliases that start with "cs" contain the standard
numbers along with suggestive names in order to facilitate applications
that want to display the names in user interfaces. The "cs" stands
for character set and is provided for applications that need a lower
case first letter but want to use mixed case thereafter that cannot
contain any special characters, such as underbar ("_") and dash ("-").
If the character set is from an ISO standard, its cs alias is the ISO
standard number or name. If the character set is not from an ISO
standard, but is registered with ISO (IPSJ/ITSCJ is the current ISO
Registration Authority), the ISO Registry number is specified as
ISOnnn followed by letters suggestive of the name or standards number
of the code set. When a national or international standard is
revised, the year of revision is added to the cs alias of the new
character set entry in the IANA Registry in order to distinguish the
revised character set from the original character set.
Character Set Reference
------------- ---------
Name: ANSI_X3.4-1968 [RFC1345,KXS2]
MIBenum: 3
Source: ECMA registry
Alias: iso-ir-6
Alias: ANSI_X3.4-1986
Alias: ISO_646.irv:1991
Alias: ASCII
Alias: ISO646-US
Alias: US-ASCII (preferred MIME name)
Alias: us
Alias: IBM367
Alias: cp367
Alias: csASCII
Name: ISO_8859-1:1987 [RFC1345,KXS2]
MIBenum: 4
Source: ECMA registry
Alias: iso-ir-100
Alias: ISO_8859-1
Alias: ISO-8859-1 (preferred MIME name)
Alias: latin1
Alias: l1
Alias: IBM819
Alias: CP819
Alias: csISOLatin1
Name: ISO_8859-2:1987 [RFC1345,KXS2]
MIBenum: 5
Source: ECMA registry
Alias: iso-ir-101
Alias: ISO_8859-2
Alias: ISO-8859-2 (preferred MIME name)
Alias: latin2
Alias: l2
Alias: csISOLatin2
Name: ISO_8859-3:1988 [RFC1345,KXS2]
MIBenum: 6
Source: ECMA registry
Alias: iso-ir-109
Alias: ISO_8859-3
Alias: ISO-8859-3 (preferred MIME name)
Alias: latin3
Alias: l3
Alias: csISOLatin3
Name: ISO_8859-4:1988 [RFC1345,KXS2]
MIBenum: 7
Source: ECMA registry
Alias: iso-ir-110
Alias: ISO_8859-4
Alias: ISO-8859-4 (preferred MIME name)
Alias: latin4
Alias: l4
Alias: csISOLatin4
Name: ISO_8859-5:1988 [RFC1345,KXS2]
MIBenum: 8
Source: ECMA registry
Alias: iso-ir-144
Alias: ISO_8859-5
Alias: ISO-8859-5 (preferred MIME name)
Alias: cyrillic
Alias: csISOLatinCyrillic
Name: ISO_8859-6:1987 [RFC1345,KXS2]
MIBenum: 9
Source: ECMA registry
Alias: iso-ir-127
Alias: ISO_8859-6
Alias: ISO-8859-6 (preferred MIME name)
Alias: ECMA-114
Alias: ASMO-708
Alias: arabic
Alias: csISOLatinArabic
Name: ISO_8859-7:1987 [RFC1947,RFC1345,KXS2]
MIBenum: 10
Source: ECMA registry
Alias: iso-ir-126
Alias: ISO_8859-7
Alias: ISO-8859-7 (preferred MIME name)
Alias: ELOT_928
Alias: ECMA-118
Alias: greek
Alias: greek8
Alias: csISOLatinGreek
Name: ISO_8859-8:1988 [RFC1345,KXS2]
MIBenum: 11
Source: ECMA registry
Alias: iso-ir-138
Alias: ISO_8859-8
Alias: ISO-8859-8 (preferred MIME name)
Alias: hebrew
Alias: csISOLatinHebrew
Name: ISO_8859-9:1989 [RFC1345,KXS2]
MIBenum: 12
Source: ECMA registry
Alias: iso-ir-148
Alias: ISO_8859-9
Alias: ISO-8859-9 (preferred MIME name)
Alias: latin5
Alias: l5
Alias: csISOLatin5
Name: ISO-8859-10 (preferred MIME name) [RFC1345,KXS2]
MIBenum: 13
Source: ECMA registry
Alias: iso-ir-157
Alias: l6
Alias: ISO_8859-10:1992
Alias: csISOLatin6
Alias: latin6
Name: ISO_6937-2-add [RFC1345,KXS2]
MIBenum: 14
Source: ECMA registry and ISO 6937-2:1983
Alias: iso-ir-142
Alias: csISOTextComm
Name: JIS_X0201 [RFC1345,KXS2]
MIBenum: 15
Source: JIS X 0201-1976. One byte only, this is equivalent to
JIS/Roman (similar to ASCII) plus eight-bit half-width
Katakana
Alias: X0201
Alias: csHalfWidthKatakana
Name: JIS_Encoding
MIBenum: 16
Source: JIS X 0202-1991. Uses ISO 2022 escape sequences to
shift code sets as documented in JIS X 0202-1991.
Alias: csJISEncoding
Name: Shift_JIS (preferred MIME name)
MIBenum: 17
Source: This charset is an extension of csHalfWidthKatakana by
adding graphic characters in JIS X 0208. The CCS's are
JIS X0201:1997 and JIS X0208:1997. The
complete definition is shown in Appendix 1 of JIS
X0208:1997.
This charset can be used for the top-level media type "text".
Alias: MS_Kanji
Alias: csShiftJIS
Name: Extended_UNIX_Code_Packed_Format_for_Japanese
MIBenum: 18
Source: Standardized by OSF, UNIX International, and UNIX Systems
Laboratories Pacific. Uses ISO 2022 rules to select
code set 0: US-ASCII (a single 7-bit byte set)
code set 1: JIS X0208-1990 (a double 8-bit byte set)
restricted to A0-FF in both bytes
code set 2: Half Width Katakana (a single 7-bit byte set)
requiring SS2 as the character prefix
code set 3: JIS X0212-1990 (a double 7-bit byte set)
restricted to A0-FF in both bytes
requiring SS3 as the character prefix
Alias: csEUCPkdFmtJapanese
Alias: EUC-JP (preferred MIME name)
Name: Extended_UNIX_Code_Fixed_Width_for_Japanese
MIBenum: 19
Source: Used in Japan. Each character is 2 octets.
code set 0: US-ASCII (a single 7-bit byte set)
1st byte = 00
2nd byte = 20-7E
code set 1: JIS X0208-1990 (a double 7-bit byte set)
restricted to A0-FF in both bytes
code set 2: Half Width Katakana (a single 7-bit byte set)
1st byte = 00
2nd byte = A0-FF
code set 3: JIS X0212-1990 (a double 7-bit byte set)
restricted to A0-FF in
the first byte
and 21-7E in the second byte
Alias: csEUCFixWidJapanese
Name: BS_4730 [RFC1345,KXS2]
MIBenum: 20
Source: ECMA registry
Alias: iso-ir-4
Alias: ISO646-GB
Alias: gb
Alias: uk
Alias: csISO4UnitedKingdom
Name: SEN_850200_C [RFC1345,KXS2]
MIBenum: 21
Source: ECMA registry
Alias: iso-ir-11
Alias: ISO646-SE2
Alias: se2
Alias: csISO11SwedishForNames
Name: IT [RFC1345,KXS2]
MIBenum: 22
Source: ECMA registry
Alias: iso-ir-15
Alias: ISO646-IT
Alias: csISO15Italian
Name: ES [RFC1345,KXS2]
MIBenum: 23
Source: ECMA registry
Alias: iso-ir-17
Alias: ISO646-ES
Alias: csISO17Spanish
Name: DIN_66003 [RFC1345,KXS2]
MIBenum: 24
Source: ECMA registry
Alias: iso-ir-21
Alias: de
Alias: ISO646-DE
Alias: csISO21German
Name: NS_4551-1 [RFC1345,KXS2]
MIBenum: 25
Source: ECMA registry
Alias: iso-ir-60
Alias: ISO646-NO
Alias: no
Alias: csISO60DanishNorwegian
Alias: csISO60Norwegian1
Name: NF_Z_62-010 [RFC1345,KXS2]
MIBenum: 26
Source: ECMA registry
Alias: iso-ir-69
Alias: ISO646-FR
Alias: fr
Alias: csISO69French
Name: ISO-10646-UTF-1
MIBenum: 27
Source: Universal Transfer Format (1), this is the multibyte
encoding, that subsets ASCII-7. It does not have byte
ordering issues.
Alias: csISO10646UTF1
Name: ISO_646.basic:1983 [RFC1345,KXS2]
MIBenum: 28
Source: ECMA registry
Alias: ref
Alias: csISO646basic1983
Name: INVARIANT [RFC1345,KXS2]
MIBenum: 29
Alias: csINVARIANT
Name: ISO_646.irv:1983 [RFC1345,KXS2]
MIBenum: 30
Source: ECMA registry
Alias: iso-ir-2
Alias: irv
Alias: csISO2IntlRefVersion
Name: NATS-SEFI [RFC1345,KXS2]
MIBenum: 31
Source: ECMA registry
Alias: iso-ir-8-1
Alias: csNATSSEFI
Name: NATS-SEFI-ADD [RFC1345,KXS2]
MIBenum: 32
Source: ECMA registry
Alias: iso-ir-8-2
Alias: csNATSSEFIADD
Name: NATS-DANO [RFC1345,KXS2]
MIBenum: 33
Source: ECMA registry
Alias: iso-ir-9-1
Alias: csNATSDANO
Name: NATS-DANO-ADD [RFC1345,KXS2]
MIBenum: 34
Source: ECMA registry
Alias: iso-ir-9-2
Alias: csNATSDANOADD
Name: SEN_850200_B [RFC1345,KXS2]
MIBenum: 35
Source: ECMA registry
Alias: iso-ir-10
Alias: FI
Alias: ISO646-FI
Alias: ISO646-SE
Alias: se
Alias: csISO10Swedish
Name: KS_C_5601-1987 [RFC1345,KXS2]
MIBenum: 36
Source: ECMA registry
Alias: iso-ir-149
Alias: KS_C_5601-1989
Alias: KSC_5601
Alias: korean
Alias: csKSC56011987
Name: ISO-2022-KR (preferred MIME name) [RFC1557,Choi]
MIBenum: 37
Source: RFC-1557 (see also KS_C_5601-1987)
Alias: csISO2022KR
Name: EUC-KR (preferred MIME name) [RFC1557,Choi]
MIBenum: 38
Source: RFC-1557 (see also KS_C_5861-1992)
Alias: csEUCKR
Name: ISO-2022-JP (preferred MIME name) [RFC1468,Murai]
MIBenum: 39
Source: RFC-1468 (see also RFC-2237)
Alias: csISO2022JP
Name: ISO-2022-JP-2 (preferred MIME name) [RFC1554,Ohta]
MIBenum: 40
Source: RFC-1554
Alias: csISO2022JP2
Name: JIS_C6220-1969-jp [RFC1345,KXS2]
MIBenum: 41
Source: ECMA registry
Alias: JIS_C6220-1969
Alias: iso-ir-13
Alias: katakana
Alias: x0201-7
Alias: csISO13JISC6220jp
Name: JIS_C6220-1969-ro [RFC1345,KXS2]
MIBenum: 42
Source: ECMA registry
Alias: iso-ir-14
Alias: jp
Alias: ISO646-JP
Alias: csISO14JISC6220ro
Name: PT [RFC1345,KXS2]
MIBenum: 43
Source: ECMA registry
Alias: iso-ir-16
Alias: ISO646-PT
Alias: csISO16Portuguese
Name: greek7-old [RFC1345,KXS2]
MIBenum: 44
Source: ECMA registry
Alias: iso-ir-18
Alias: csISO18Greek7Old
Name: latin-greek [RFC1345,KXS2]
MIBenum: 45
Source: ECMA registry
Alias: iso-ir-19
Alias: csISO19LatinGreek
Name: NF_Z_62-010_(1973) [RFC1345,KXS2]
MIBenum: 46
Source: ECMA registry
Alias: iso-ir-25
Alias: ISO646-FR1
Alias: csISO25French
Name: Latin-greek-1 [RFC1345,KXS2]
MIBenum: 47
Source: ECMA registry
Alias: iso-ir-27
Alias: csISO27LatinGreek1
Name: ISO_5427 [RFC1345,KXS2]
MIBenum: 48
Source: ECMA registry
Alias: iso-ir-37
Alias: csISO5427Cyrillic
Name: JIS_C6226-1978 [RFC1345,KXS2]
MIBenum: 49
Source: ECMA registry
Alias: iso-ir-42
Alias: csISO42JISC62261978
Name: BS_viewdata [RFC1345,KXS2]
MIBenum: 50
Source: ECMA registry
Alias: iso-ir-47
Alias: csISO47BSViewdata
Name: INIS [RFC1345,KXS2]
MIBenum: 51
Source: ECMA registry
Alias: iso-ir-49
Alias: csISO49INIS
Name: INIS-8 [RFC1345,KXS2]
MIBenum: 52
Source: ECMA registry
Alias: iso-ir-50
Alias: csISO50INIS8
Name: INIS-cyrillic [RFC1345,KXS2]
MIBenum: 53
Source: ECMA registry
Alias: iso-ir-51
Alias: csISO51INISCyrillic
Name: ISO_5427:1981 [RFC1345,KXS2]
MIBenum: 54
Source: ECMA registry
Alias: iso-ir-54
Alias: ISO5427Cyrillic1981
Name: ISO_5428:1980 [RFC1345,KXS2]
MIBenum: 55
Source: ECMA registry
Alias: iso-ir-55
Alias: csISO5428Greek
Name: GB_1988-80 [RFC1345,KXS2]
MIBenum: 56
Source: ECMA registry
Alias: iso-ir-57
Alias: cn
Alias: ISO646-CN
Alias: csISO57GB1988
Name: GB_2312-80 [RFC1345,KXS2]
MIBenum: 57
Source: ECMA registry
Alias: iso-ir-58
Alias: chinese
Alias: csISO58GB231280
Name: NS_4551-2 [RFC1345,KXS2]
MIBenum: 58
Source: ECMA registry
Alias: ISO646-NO2
Alias: iso-ir-61
Alias: no2
Alias: csISO61Norwegian2
Name: videotex-suppl [RFC1345,KXS2]
MIBenum: 59
Source: ECMA registry
Alias: iso-ir-70
Alias: csISO70VideotexSupp1
Name: PT2 [RFC1345,KXS2]
MIBenum: 60
Source: ECMA registry
Alias: iso-ir-84
Alias: ISO646-PT2
Alias: csISO84Portuguese2
Name: ES2 [RFC1345,KXS2]
MIBenum: 61
Source: ECMA registry
Alias: iso-ir-85
Alias: ISO646-ES2
Alias: csISO85Spanish2
Name: MSZ_7795.3 [RFC1345,KXS2]
MIBenum: 62
Source: ECMA registry
Alias: iso-ir-86
Alias: ISO646-HU
Alias: hu
Alias: csISO86Hungarian
Name: JIS_C6226-1983 [RFC1345,KXS2]
MIBenum: 63
Source: ECMA registry
Alias: iso-ir-87
Alias: x0208
Alias: JIS_X0208-1983
Alias: csISO87JISX0208
Name: greek7 [RFC1345,KXS2]
MIBenum: 64
Source: ECMA registry
Alias: iso-ir-88
Alias: csISO88Greek7
Name: ASMO_449 [RFC1345,KXS2]
MIBenum: 65
Source: ECMA registry
Alias: ISO_9036
Alias: arabic7
Alias: iso-ir-89
Alias: csISO89ASMO449
Name: iso-ir-90 [RFC1345,KXS2]
MIBenum: 66
Source: ECMA registry
Alias: csISO90
Name: JIS_C6229-1984-a [RFC1345,KXS2]
MIBenum: 67
Source: ECMA registry
Alias: iso-ir-91
Alias: jp-ocr-a
Alias: csISO91JISC62291984a
Name: JIS_C6229-1984-b [RFC1345,KXS2]
MIBenum: 68
Source: ECMA registry
Alias: iso-ir-92
Alias: ISO646-JP-OCR-B
Alias: jp-ocr-b
Alias: csISO92JISC62991984b
Name: JIS_C6229-1984-b-add [RFC1345,KXS2]
MIBenum: 69
Source: ECMA registry
Alias: iso-ir-93
Alias: jp-ocr-b-add
Alias: csISO93JIS62291984badd
Name: JIS_C6229-1984-hand [RFC1345,KXS2]
MIBenum: 70
Source: ECMA registry
Alias: iso-ir-94
Alias: jp-ocr-hand
Alias: csISO94JIS62291984hand
Name: JIS_C6229-1984-hand-add [RFC1345,KXS2]
MIBenum: 71
Source: ECMA registry
Alias: iso-ir-95
Alias: jp-ocr-hand-add
Alias: csISO95JIS62291984handadd
Name: JIS_C6229-1984-kana [RFC1345,KXS2]
MIBenum: 72
Source: ECMA registry
Alias: iso-ir-96
Alias: csISO96JISC62291984kana
Name: ISO_2033-1983 [RFC1345,KXS2]
MIBenum: 73
Source: ECMA registry
Alias: iso-ir-98
Alias: e13b
Alias: csISO2033
Name: ANSI_X3.110-1983 [RFC1345,KXS2]
MIBenum: 74
Source: ECMA registry
Alias: iso-ir-99
Alias: CSA_T500-1983
Alias: NAPLPS
Alias: csISO99NAPLPS
Name: T.61-7bit [RFC1345,KXS2]
MIBenum: 75
Source: ECMA registry
Alias: iso-ir-102
Alias: csISO102T617bit
Name: T.61-8bit [RFC1345,KXS2]
MIBenum: 76
Alias: T.61
Source: ECMA registry
Alias: iso-ir-103
Alias: csISO103T618bit
Name: ECMA-cyrillic
MIBenum: 77
Source: ISO registry (formerly ECMA registry)
http://www.itscj.ipsj.jp/ISO-IR/111.pdf
Alias: iso-ir-111
Alias: KOI8-E
Alias: csISO111ECMACyrillic
Name: CSA_Z243.4-1985-1 [RFC1345,KXS2]
MIBenum: 78
Source: ECMA registry
Alias: iso-ir-121
Alias: ISO646-CA
Alias: csa7-1
Alias: ca
Alias: csISO121Canadian1
Name: CSA_Z243.4-1985-2 [RFC1345,KXS2]
MIBenum: 79
Source: ECMA registry
Alias: iso-ir-122
Alias: ISO646-CA2
Alias: csa7-2
Alias: csISO122Canadian2
Name: CSA_Z243.4-1985-gr [RFC1345,KXS2]
MIBenum: 80
Source: ECMA registry
Alias: iso-ir-123
Alias: csISO123CSAZ24341985gr
Name: ISO_8859-6-E [RFC1556,IANA]
MIBenum: 81
Source: RFC1556
Alias: csISO88596E
Alias: ISO-8859-6-E (preferred MIME name)
Name: ISO_8859-6-I [RFC1556,IANA]
MIBenum: 82
Source: RFC1556
Alias: csISO88596I
Alias: ISO-8859-6-I (preferred MIME name)
Name: T.101-G2 [RFC1345,KXS2]
MIBenum: 83
Source: ECMA registry
Alias: iso-ir-128
Alias: csISO128T101G2
Name: ISO_8859-8-E [RFC1556,Nussbacher]
MIBenum: 84
Source: RFC1556
Alias: csISO88598E
Alias: ISO-8859-8-E (preferred MIME name)
Name: ISO_8859-8-I [RFC1556,Nussbacher]
MIBenum: 85
Source: RFC1556
Alias: csISO88598I
Alias: ISO-8859-8-I (preferred MIME name)
Name: CSN_369103 [RFC1345,KXS2]
MIBenum: 86
Source: ECMA registry
Alias: iso-ir-139
Alias: csISO139CSN369103
Name: JUS_I.B1.002 [RFC1345,KXS2]
MIBenum: 87
Source: ECMA registry
Alias: iso-ir-141
Alias: ISO646-YU
Alias: js
Alias: yu
Alias: csISO141JUSIB1002
Name: IEC_P27-1 [RFC1345,KXS2]
MIBenum: 88
Source: ECMA registry
Alias: iso-ir-143
Alias: csISO143IECP271
Name: JUS_I.B1.003-serb [RFC1345,KXS2]
MIBenum: 89
Source: ECMA registry
Alias: iso-ir-146
Alias: serbian
Alias: csISO146Serbian
Name: JUS_I.B1.003-mac [RFC1345,KXS2]
MIBenum: 90
Source: ECMA registry
Alias: macedonian
Alias: iso-ir-147
Alias: csISO147Macedonian
Name: greek-ccitt [RFC1345,KXS2]
MIBenum: 91
Source: ECMA registry
Alias: iso-ir-150
Alias: csISO150
Alias: csISO150GreekCCITT
Name: NC_NC00-10:81 [RFC1345,KXS2]
MIBenum: 92
Source: ECMA registry
Alias: cuba
Alias: iso-ir-151
Alias: ISO646-CU
Alias: csISO151Cuba
Name: ISO_6937-2-25 [RFC1345,KXS2]
MIBenum: 93
Source: ECMA registry
Alias: iso-ir-152
Alias: csISO6937Add
Name: GOST_19768-74 [RFC1345,KXS2]
MIBenum: 94
Source: ECMA registry
Alias: ST_SEV_358-88
Alias: iso-ir-153
Alias: csISO153GOST1976874
Name: ISO_8859-supp [RFC1345,KXS2]
MIBenum: 95
Source: ECMA registry
Alias: iso-ir-154
Alias: latin1-2-5
Alias: csISO8859Supp
Name: ISO_10367-box [RFC1345,KXS2]
MIBenum: 96
Source: ECMA registry
Alias: iso-ir-155
Alias: csISO10367Box
Name: latin-lap [RFC1345,KXS2]
MIBenum: 97
Source: ECMA registry
Alias: lap
Alias: iso-ir-158
Alias: csISO158Lap
Name: JIS_X0212-1990 [RFC1345,KXS2]
MIBenum: 98
Source: ECMA registry
Alias: x0212
Alias: iso-ir-159
Alias: csISO159JISX02121990
Name: DS_2089 [RFC1345,KXS2]
MIBenum: 99
Source: Danish Standard, DS 2089, February 1974
Alias: DS2089
Alias: ISO646-DK
Alias: dk
Alias: csISO646Danish
Name: us-dk [RFC1345,KXS2]
MIBenum: 100
Alias: csUSDK
Name: dk-us [RFC1345,KXS2]
MIBenum: 101
Alias: csDKUS
Name: KSC5636 [RFC1345,KXS2]
MIBenum: 102
Alias: ISO646-KR
Alias: csKSC5636
Name: UNICODE-1-1-UTF-7 [RFC1642]
MIBenum: 103
Source: RFC 1642
Alias: csUnicode11UTF7
Name: ISO-2022-CN [RFC1922]
MIBenum: 104
Source: RFC-1922
Name: ISO-2022-CN-EXT [RFC1922]
MIBenum: 105
Source: RFC-1922
Name: UTF-8 [RFC3629]
MIBenum: 106
Source: RFC 3629
Alias: None
Name: ISO-8859-13
MIBenum: 109
Source: ISO See (http://www.iana.org/assignments/charset-reg/ISO-8859-13)[Tumasonis]
Alias: None
Name: ISO-8859-14
MIBenum: 110
Source: ISO See (http://www.iana.org/assignments/charset-reg/ISO-8859-14) [Simonsen]
Alias: iso-ir-199
Alias: ISO_8859-14:1998
Alias: ISO_8859-14
Alias: latin8
Alias: iso-celtic
Alias: l8
Name: ISO-8859-15
MIBenum: 111
Source: ISO
Please see: <http://www.iana.org/assignments/charset-reg/ISO-8859-15>
Alias: ISO_8859-15
Alias: Latin-9
Name: ISO-8859-16
MIBenum: 112
Source: ISO
Alias: iso-ir-226
Alias: ISO_8859-16:2001
Alias: ISO_8859-16
Alias: latin10
Alias: l10
Name: GBK
MIBenum: 113
Source: Chinese IT Standardization Technical Committee
Please see: <http://www.iana.org/assignments/charset-reg/GBK>
Alias: CP936
Alias: MS936
Alias: windows-936
Name: GB18030
MIBenum: 114
Source: Chinese IT Standardization Technical Committee
Please see: <http://www.iana.org/assignments/charset-reg/GB18030>
Alias: None
Name: OSD_EBCDIC_DF04_15
MIBenum: 115
Source: Fujitsu-Siemens standard mainframe EBCDIC encoding
Please see: <http://www.iana.org/assignments/charset-reg/OSD-EBCDIC-DF04-15>
Alias: None
Name: OSD_EBCDIC_DF03_IRV
MIBenum: 116
Source: Fujitsu-Siemens standard mainframe EBCDIC encoding
Please see: <http://www.iana.org/assignments/charset-reg/OSD-EBCDIC-DF03-IRV>
Alias: None
Name: OSD_EBCDIC_DF04_1
MIBenum: 117
Source: Fujitsu-Siemens standard mainframe EBCDIC encoding
Please see: <http://www.iana.org/assignments/charset-reg/OSD-EBCDIC-DF04-1>
Alias: None
Name: ISO-11548-1
MIBenum: 118
Source: See <http://www.iana.org/assignments/charset-reg/ISO-11548-1> [Thibault]
Alias: ISO_11548-1
Alias: ISO_TR_11548-1
Alias: csISO115481
Name: KZ-1048
MIBenum: 119
Source: See <http://www.iana.org/assignments/charset-reg/KZ-1048> [Veremeev, Kikkarin]
Alias: STRK1048-2002
Alias: RK1048
Alias: csKZ1048
Name: ISO-10646-UCS-2
MIBenum: 1000
Source: the 2-octet Basic Multilingual Plane, aka Unicode
this needs to specify network byte order: the standard
does not specify (it is a 16-bit integer space)
Alias: csUnicode
Name: ISO-10646-UCS-4
MIBenum: 1001
Source: the full code space. (same comment about byte order,
these are 31-bit numbers.
Alias: csUCS4
Name: ISO-10646-UCS-Basic
MIBenum: 1002
Source: ASCII subset of Unicode. Basic Latin = collection 1
See ISO 10646, Appendix A
Alias: csUnicodeASCII
Name: ISO-10646-Unicode-Latin1
MIBenum: 1003
Source: ISO Latin-1 subset of Unicode. Basic Latin and Latin-1
Supplement = collections 1 and 2. See ISO 10646,
Appendix A. See RFC 1815.
Alias: csUnicodeLatin1
Alias: ISO-10646
Name: ISO-10646-J-1
Source: ISO 10646 Japanese, see RFC 1815.
Name: ISO-Unicode-IBM-1261
MIBenum: 1005
Source: IBM Latin-2, -3, -5, Extended Presentation Set, GCSGID: 1261
Alias: csUnicodeIBM1261
Name: ISO-Unicode-IBM-1268
MIBenum: 1006
Source: IBM Latin-4 Extended Presentation Set, GCSGID: 1268
Alias: csUnicodeIBM1268
Name: ISO-Unicode-IBM-1276
MIBenum: 1007
Source: IBM Cyrillic Greek Extended Presentation Set, GCSGID: 1276
Alias: csUnicodeIBM1276
Name: ISO-Unicode-IBM-1264
MIBenum: 1008
Source: IBM Arabic Presentation Set, GCSGID: 1264
Alias: csUnicodeIBM1264
Name: ISO-Unicode-IBM-1265
MIBenum: 1009
Source: IBM Hebrew Presentation Set, GCSGID: 1265
Alias: csUnicodeIBM1265
Name: UNICODE-1-1 [RFC1641]
MIBenum: 1010
Source: RFC 1641
Alias: csUnicode11
Name: SCSU
MIBenum: 1011
Source: SCSU See (http://www.iana.org/assignments/charset-reg/SCSU) [Scherer]
Alias: None
Name: UTF-7 [RFC2152]
MIBenum: 1012
Source: RFC 2152
Alias: None
Name: UTF-16BE [RFC2781]
MIBenum: 1013
Source: RFC 2781
Alias: None
Name: UTF-16LE [RFC2781]
MIBenum: 1014
Source: RFC 2781
Alias: None
Name: UTF-16 [RFC2781]
MIBenum: 1015
Source: RFC 2781
Alias: None
Name: CESU-8 [Phipps]
MIBenum: 1016
Source: <http://www.unicode.org/unicode/reports/tr26>
Alias: csCESU-8
Name: UTF-32 [Davis]
MIBenum: 1017
Source: <http://www.unicode.org/unicode/reports/tr19/>
Alias: None
Name: UTF-32BE [Davis]
MIBenum: 1018
Source: <http://www.unicode.org/unicode/reports/tr19/>
Alias: None
Name: UTF-32LE [Davis]
MIBenum: 1019
Source: <http://www.unicode.org/unicode/reports/tr19/>
Alias: None
Name: BOCU-1 [Scherer]
MIBenum: 1020
Source: http://www.unicode.org/notes/tn6/
Alias: csBOCU-1
Name: ISO-8859-1-Windows-3.0-Latin-1 [HP-PCL5]
MIBenum: 2000
Source: Extended ISO 8859-1 Latin-1 for Windows 3.0.
PCL Symbol Set id: 9U
Alias: csWindows30Latin1
Name: ISO-8859-1-Windows-3.1-Latin-1 [HP-PCL5]
MIBenum: 2001
Source: Extended ISO 8859-1 Latin-1 for Windows 3.1.
PCL Symbol Set id: 19U
Alias: csWindows31Latin1
Name: ISO-8859-2-Windows-Latin-2 [HP-PCL5]
MIBenum: 2002
Source: Extended ISO 8859-2. Latin-2 for Windows 3.1.
PCL Symbol Set id: 9E
Alias: csWindows31Latin2
Name: ISO-8859-9-Windows-Latin-5 [HP-PCL5]
MIBenum: 2003
Source: Extended ISO 8859-9. Latin-5 for Windows 3.1
PCL Symbol Set id: 5T
Alias: csWindows31Latin5
Name: hp-roman8 [HP-PCL5,RFC1345,KXS2]
MIBenum: 2004
Source: LaserJet IIP Printer User's Manual,
HP part no 33471-90901, Hewlet-Packard, June 1989.
Alias: roman8
Alias: r8
Alias: csHPRoman8
Name: Adobe-Standard-Encoding [Adobe]
MIBenum: 2005
Source: PostScript Language Reference Manual
PCL Symbol Set id: 10J
Alias: csAdobeStandardEncoding
Name: Ventura-US [HP-PCL5]
MIBenum: 2006
Source: Ventura US. ASCII plus characters typically used in
publishing, like pilcrow, copyright, registered, trade mark,
section, dagger, and double dagger in the range A0 (hex)
to FF (hex).
PCL Symbol Set id: 14J
Alias: csVenturaUS
Name: Ventura-International [HP-PCL5]
MIBenum: 2007
Source: Ventura International. ASCII plus coded characters similar
to Roman8.
PCL Symbol Set id: 13J
Alias: csVenturaInternational
Name: DEC-MCS [RFC1345,KXS2]
MIBenum: 2008
Source: VAX/VMS User's Manual,
Order Number: AI-Y517A-TE, April 1986.
Alias: dec
Alias: csDECMCS
Name: IBM850 [RFC1345,KXS2]
MIBenum: 2009
Source: IBM NLS RM Vol2 SE09-8002-01, March 1990
Alias: cp850
Alias: 850
Alias: csPC850Multilingual
Name: PC8-Danish-Norwegian [HP-PCL5]
MIBenum: 2012
Source: PC Danish Norwegian
8-bit PC set for Danish Norwegian
PCL Symbol Set id: 11U
Alias: csPC8DanishNorwegian
Name: IBM862 [RFC1345,KXS2]
MIBenum: 2013
Source: IBM NLS RM Vol2 SE09-8002-01, March 1990
Alias: cp862
Alias: 862
Alias: csPC862LatinHebrew
Name: PC8-Turkish [HP-PCL5]
MIBenum: 2014
Source: PC Latin Turkish. PCL Symbol Set id: 9T
Alias: csPC8Turkish
Name: IBM-Symbols [IBM-CIDT]
MIBenum: 2015
Source: Presentation Set, CPGID: 259
Alias: csIBMSymbols
Name: IBM-Thai [IBM-CIDT]
MIBenum: 2016
Source: Presentation Set, CPGID: 838
Alias: csIBMThai
Name: HP-Legal [HP-PCL5]
MIBenum: 2017
Source: PCL 5 Comparison Guide, Hewlett-Packard,
HP part number 5961-0510, October 1992
PCL Symbol Set id: 1U
Alias: csHPLegal
Name: HP-Pi-font [HP-PCL5]
MIBenum: 2018
Source: PCL 5 Comparison Guide, Hewlett-Packard,
HP part number 5961-0510, October 1992
PCL Symbol Set id: 15U
Alias: csHPPiFont
Name: HP-Math8 [HP-PCL5]
MIBenum: 2019
Source: PCL 5 Comparison Guide, Hewlett-Packard,
HP part number 5961-0510, October 1992
PCL Symbol Set id: 8M
Alias: csHPMath8
Name: Adobe-Symbol-Encoding [Adobe]
MIBenum: 2020
Source: PostScript Language Reference Manual
PCL Symbol Set id: 5M
Alias: csHPPSMath
Name: HP-DeskTop [HP-PCL5]
MIBenum: 2021
Source: PCL 5 Comparison Guide, Hewlett-Packard,
HP part number 5961-0510, October 1992
PCL Symbol Set id: 7J
Alias: csHPDesktop
Name: Ventura-Math [HP-PCL5]
MIBenum: 2022
Source: PCL 5 Comparison Guide, Hewlett-Packard,
HP part number 5961-0510, October 1992
PCL Symbol Set id: 6M
Alias: csVenturaMath
Name: Microsoft-Publishing [HP-PCL5]
MIBenum: 2023
Source: PCL 5 Comparison Guide, Hewlett-Packard,
HP part number 5961-0510, October 1992
PCL Symbol Set id: 6J
Alias: csMicrosoftPublishing
Name: Windows-31J
MIBenum: 2024
Source: Windows Japanese. A further extension of Shift_JIS
to include NEC special characters (Row 13), NEC
selection of IBM extensions (Rows 89 to 92), and IBM
extensions (Rows 115 to 119). The CCS's are
JIS X0201:1997, JIS X0208:1997, and these extensions.
This charset can be used for the top-level media type "text",
but it is of limited or specialized use (see RFC2278).
PCL Symbol Set id: 19K
Alias: csWindows31J
Name: GB2312 (preferred MIME name)
MIBenum: 2025
Source: Chinese for People's Republic of China (PRC) mixed one byte,
two byte set:
20-7E = one byte ASCII
A1-FE = two byte PRC Kanji
See GB 2312-80
PCL Symbol Set Id: 18C
Alias: csGB2312
Name: Big5 (preferred MIME name)
MIBenum: 2026
Source: Chinese for Taiwan Multi-byte set.
PCL Symbol Set Id: 18T
Alias: csBig5
Name: macintosh [RFC1345,KXS2]
MIBenum: 2027
Source: The Unicode Standard ver1.0, ISBN 0-201-56788-1, Oct 1991
Alias: mac
Alias: csMacintosh
Name: IBM037 [RFC1345,KXS2]
MIBenum: 2028
Source: IBM NLS RM Vol2 SE09-8002-01, March 1990
Alias: cp037
Alias: ebcdic-cp-us
Alias: ebcdic-cp-ca
Alias: ebcdic-cp-wt
Alias: ebcdic-cp-nl
Alias: csIBM037
Name: IBM038 [RFC1345,KXS2]
MIBenum: 2029
Source: IBM 3174 Character Set Ref, GA27-3831-02, March 1990
Alias: EBCDIC-INT
Alias: cp038
Alias: csIBM038
Name: IBM273 [RFC1345,KXS2]
MIBenum: 2030
Source: IBM NLS RM Vol2 SE09-8002-01, March 1990
Alias: CP273
Alias: csIBM273
Name: IBM274 [RFC1345,KXS2]
MIBenum: 2031
Source: IBM 3174 Character Set Ref, GA27-3831-02, March 1990
Alias: EBCDIC-BE
Alias: CP274
Alias: csIBM274
Name: IBM275 [RFC1345,KXS2]
MIBenum: 2032
Source: IBM NLS RM Vol2 SE09-8002-01, March 1990
Alias: EBCDIC-BR
Alias: cp275
Alias: csIBM275
Name: IBM277 [RFC1345,KXS2]
MIBenum: 2033
Source: IBM NLS RM Vol2 SE09-8002-01, March 1990
Alias: EBCDIC-CP-DK
Alias: EBCDIC-CP-NO
Alias: csIBM277
Name: IBM278 [RFC1345,KXS2]
MIBenum: 2034
Source: IBM NLS RM Vol2 SE09-8002-01, March 1990
Alias: CP278
Alias: ebcdic-cp-fi
Alias: ebcdic-cp-se
Alias: csIBM278
Name: IBM280 [RFC1345,KXS2]
MIBenum: 2035
Source: IBM NLS RM Vol2 SE09-8002-01, March 1990
Alias: CP280
Alias: ebcdic-cp-it
Alias: csIBM280
Name: IBM281 [RFC1345,KXS2]
MIBenum: 2036
Source: IBM 3174 Character Set Ref, GA27-3831-02, March 1990
Alias: EBCDIC-JP-E
Alias: cp281
Alias: csIBM281
Name: IBM284 [RFC1345,KXS2]
MIBenum: 2037
Source: IBM NLS RM Vol2 SE09-8002-01, March 1990
Alias: CP284
Alias: ebcdic-cp-es
Alias: csIBM284
Name: IBM285 [RFC1345,KXS2]
MIBenum: 2038
Source: IBM NLS RM Vol2 SE09-8002-01, March 1990
Alias: CP285
Alias: ebcdic-cp-gb
Alias: csIBM285
Name: IBM290 [RFC1345,KXS2]
MIBenum: 2039
Source: IBM 3174 Character Set Ref, GA27-3831-02, March 1990
Alias: cp290
Alias: EBCDIC-JP-kana
Alias: csIBM290
Name: IBM297 [RFC1345,KXS2]
MIBenum: 2040
Source: IBM NLS RM Vol2 SE09-8002-01, March 1990
Alias: cp297
Alias: ebcdic-cp-fr
Alias: csIBM297
Name: IBM420 [RFC1345,KXS2]
MIBenum: 2041
Source: IBM NLS RM Vol2 SE09-8002-01, March 1990,
IBM NLS RM p 11-11
Alias: cp420
Alias: ebcdic-cp-ar1
Alias: csIBM420
Name: IBM423 [RFC1345,KXS2]
MIBenum: 2042
Source: IBM NLS RM Vol2 SE09-8002-01, March 1990
Alias: cp423
Alias: ebcdic-cp-gr
Alias: csIBM423
Name: IBM424 [RFC1345,KXS2]
MIBenum: 2043
Source: IBM NLS RM Vol2 SE09-8002-01, March 1990
Alias: cp424
Alias: ebcdic-cp-he
Alias: csIBM424
Name: IBM437 [RFC1345,KXS2]
MIBenum: 2011
Source: IBM NLS RM Vol2 SE09-8002-01, March 1990
Alias: cp437
Alias: 437
Alias: csPC8CodePage437
Name: IBM500 [RFC1345,KXS2]
MIBenum: 2044
Source: IBM NLS RM Vol2 SE09-8002-01, March 1990
Alias: CP500
Alias: ebcdic-cp-be
Alias: ebcdic-cp-ch
Alias: csIBM500
Name: IBM851 [RFC1345,KXS2]
MIBenum: 2045
Source: IBM NLS RM Vol2 SE09-8002-01, March 1990
Alias: cp851
Alias: 851
Alias: csIBM851
Name: IBM852 [RFC1345,KXS2]
MIBenum: 2010
Source: IBM NLS RM Vol2 SE09-8002-01, March 1990
Alias: cp852
Alias: 852
Alias: csPCp852
Name: IBM855 [RFC1345,KXS2]
MIBenum: 2046
Source: IBM NLS RM Vol2 SE09-8002-01, March 1990
Alias: cp855
Alias: 855
Alias: csIBM855
Name: IBM857 [RFC1345,KXS2]
MIBenum: 2047
Source: IBM NLS RM Vol2 SE09-8002-01, March 1990
Alias: cp857
Alias: 857
Alias: csIBM857
Name: IBM860 [RFC1345,KXS2]
MIBenum: 2048
Source: IBM NLS RM Vol2 SE09-8002-01, March 1990
Alias: cp860
Alias: 860
Alias: csIBM860
Name: IBM861 [RFC1345,KXS2]
MIBenum: 2049
Source: IBM NLS RM Vol2 SE09-8002-01, March 1990
Alias: cp861
Alias: 861
Alias: cp-is
Alias: csIBM861
Name: IBM863 [RFC1345,KXS2]
MIBenum: 2050
Source: IBM Keyboard layouts and code pages, PN 07G4586 June 1991
Alias: cp863
Alias: 863
Alias: csIBM863
Name: IBM864 [RFC1345,KXS2]
MIBenum: 2051
Source: IBM Keyboard layouts and code pages, PN 07G4586 June 1991
Alias: cp864
Alias: csIBM864
Name: IBM865 [RFC1345,KXS2]
MIBenum: 2052
Source: IBM DOS 3.3 Ref (Abridged), 94X9575 (Feb 1987)
Alias: cp865
Alias: 865
Alias: csIBM865
Name: IBM868 [RFC1345,KXS2]
MIBenum: 2053
Source: IBM NLS RM Vol2 SE09-8002-01, March 1990
Alias: CP868
Alias: cp-ar
Alias: csIBM868
Name: IBM869 [RFC1345,KXS2]
MIBenum: 2054
Source: IBM Keyboard layouts and code pages, PN 07G4586 June 1991
Alias: cp869
Alias: 869
Alias: cp-gr
Alias: csIBM869
Name: IBM870 [RFC1345,KXS2]
MIBenum: 2055
Source: IBM NLS RM Vol2 SE09-8002-01, March 1990
Alias: CP870
Alias: ebcdic-cp-roece
Alias: ebcdic-cp-yu
Alias: csIBM870
Name: IBM871 [RFC1345,KXS2]
MIBenum: 2056
Source: IBM NLS RM Vol2 SE09-8002-01, March 1990
Alias: CP871
Alias: ebcdic-cp-is
Alias: csIBM871
Name: IBM880 [RFC1345,KXS2]
MIBenum: 2057
Source: IBM NLS RM Vol2 SE09-8002-01, March 1990
Alias: cp880
Alias: EBCDIC-Cyrillic
Alias: csIBM880
Name: IBM891 [RFC1345,KXS2]
MIBenum: 2058
Source: IBM NLS RM Vol2 SE09-8002-01, March 1990
Alias: cp891
Alias: csIBM891
Name: IBM903 [RFC1345,KXS2]
MIBenum: 2059
Source: IBM NLS RM Vol2 SE09-8002-01, March 1990
Alias: cp903
Alias: csIBM903
Name: IBM904 [RFC1345,KXS2]
MIBenum: 2060
Source: IBM NLS RM Vol2 SE09-8002-01, March 1990
Alias: cp904
Alias: 904
Alias: csIBBM904
Name: IBM905 [RFC1345,KXS2]
MIBenum: 2061
Source: IBM 3174 Character Set Ref, GA27-3831-02, March 1990
Alias: CP905
Alias: ebcdic-cp-tr
Alias: csIBM905
Name: IBM918 [RFC1345,KXS2]
MIBenum: 2062
Source: IBM NLS RM Vol2 SE09-8002-01, March 1990
Alias: CP918
Alias: ebcdic-cp-ar2
Alias: csIBM918
Name: IBM1026 [RFC1345,KXS2]
MIBenum: 2063
Source: IBM NLS RM Vol2 SE09-8002-01, March 1990
Alias: CP1026
Alias: csIBM1026
Name: EBCDIC-AT-DE [RFC1345,KXS2]
MIBenum: 2064
Source: IBM 3270 Char Set Ref Ch 10, GA27-2837-9, April 1987
Alias: csIBMEBCDICATDE
Name: EBCDIC-AT-DE-A [RFC1345,KXS2]
MIBenum: 2065
Source: IBM 3270 Char Set Ref Ch 10, GA27-2837-9, April 1987
Alias: csEBCDICATDEA
Name: EBCDIC-CA-FR [RFC1345,KXS2]
MIBenum: 2066
Source: IBM 3270 Char Set Ref Ch 10, GA27-2837-9, April 1987
Alias: csEBCDICCAFR
Name: EBCDIC-DK-NO [RFC1345,KXS2]
MIBenum: 2067
Source: IBM 3270 Char Set Ref Ch 10, GA27-2837-9, April 1987
Alias: csEBCDICDKNO
Name: EBCDIC-DK-NO-A [RFC1345,KXS2]
MIBenum: 2068
Source: IBM 3270 Char Set Ref Ch 10, GA27-2837-9, April 1987
Alias: csEBCDICDKNOA
Name: EBCDIC-FI-SE [RFC1345,KXS2]
MIBenum: 2069
Source: IBM 3270 Char Set Ref Ch 10, GA27-2837-9, April 1987
Alias: csEBCDICFISE
Name: EBCDIC-FI-SE-A [RFC1345,KXS2]
MIBenum: 2070
Source: IBM 3270 Char Set Ref Ch 10, GA27-2837-9, April 1987
Alias: csEBCDICFISEA
Name: EBCDIC-FR [RFC1345,KXS2]
MIBenum: 2071
Source: IBM 3270 Char Set Ref Ch 10, GA27-2837-9, April 1987
Alias: csEBCDICFR
Name: EBCDIC-IT [RFC1345,KXS2]
MIBenum: 2072
Source: IBM 3270 Char Set Ref Ch 10, GA27-2837-9, April 1987
Alias: csEBCDICIT
Name: EBCDIC-PT [RFC1345,KXS2]
MIBenum: 2073
Source: IBM 3270 Char Set Ref Ch 10, GA27-2837-9, April 1987
Alias: csEBCDICPT
Name: EBCDIC-ES [RFC1345,KXS2]
MIBenum: 2074
Source: IBM 3270 Char Set Ref Ch 10, GA27-2837-9, April 1987
Alias: csEBCDICES
Name: EBCDIC-ES-A [RFC1345,KXS2]
MIBenum: 2075
Source: IBM 3270 Char Set Ref Ch 10, GA27-2837-9, April 1987
Alias: csEBCDICESA
Name: EBCDIC-ES-S [RFC1345,KXS2]
MIBenum: 2076
Source: IBM 3270 Char Set Ref Ch 10, GA27-2837-9, April 1987
Alias: csEBCDICESS
Name: EBCDIC-UK [RFC1345,KXS2]
MIBenum: 2077
Source: IBM 3270 Char Set Ref Ch 10, GA27-2837-9, April 1987
Alias: csEBCDICUK
Name: EBCDIC-US [RFC1345,KXS2]
MIBenum: 2078
Source: IBM 3270 Char Set Ref Ch 10, GA27-2837-9, April 1987
Alias: csEBCDICUS
Name: UNKNOWN-8BIT [RFC1428]
MIBenum: 2079
Alias: csUnknown8BiT
Name: MNEMONIC [RFC1345,KXS2]
MIBenum: 2080
Source: RFC 1345, also known as "mnemonic+ascii+38"
Alias: csMnemonic
Name: MNEM [RFC1345,KXS2]
MIBenum: 2081
Source: RFC 1345, also known as "mnemonic+ascii+8200"
Alias: csMnem
Name: VISCII [RFC1456]
MIBenum: 2082
Source: RFC 1456
Alias: csVISCII
Name: VIQR [RFC1456]
MIBenum: 2083
Source: RFC 1456
Alias: csVIQR
Name: KOI8-R (preferred MIME name) [RFC1489]
MIBenum: 2084
Source: RFC 1489, based on GOST-19768-74, ISO-6937/8,
INIS-Cyrillic, ISO-5427.
Alias: csKOI8R
Name: HZ-GB-2312
MIBenum: 2085
Source: RFC 1842, RFC 1843 [RFC1842, RFC1843]
Name: IBM866 [Pond]
MIBenum: 2086
Source: IBM NLDG Volume 2 (SE09-8002-03) August 1994
Alias: cp866
Alias: 866
Alias: csIBM866
Name: IBM775 [HP-PCL5]
MIBenum: 2087
Source: HP PCL 5 Comparison Guide (P/N 5021-0329) pp B-13, 1996
Alias: cp775
Alias: csPC775Baltic
Name: KOI8-U [RFC2319]
MIBenum: 2088
Source: RFC 2319
Name: IBM00858
MIBenum: 2089
Source: IBM See (http://www.iana.org/assignments/charset-reg/IBM00858) [Mahdi]
Alias: CCSID00858
Alias: CP00858
Alias: PC-Multilingual-850+euro
Name: IBM00924
MIBenum: 2090
Source: IBM See (http://www.iana.org/assignments/charset-reg/IBM00924) [Mahdi]
Alias: CCSID00924
Alias: CP00924
Alias: ebcdic-Latin9--euro
Name: IBM01140
MIBenum: 2091
Source: IBM See (http://www.iana.org/assignments/charset-reg/IBM01140) [Mahdi]
Alias: CCSID01140
Alias: CP01140
Alias: ebcdic-us-37+euro
Name: IBM01141
MIBenum: 2092
Source: IBM See (http://www.iana.org/assignments/charset-reg/IBM01141) [Mahdi]
Alias: CCSID01141
Alias: CP01141
Alias: ebcdic-de-273+euro
Name: IBM01142
MIBenum: 2093
Source: IBM See (http://www.iana.org/assignments/charset-reg/IBM01142) [Mahdi]
Alias: CCSID01142
Alias: CP01142
Alias: ebcdic-dk-277+euro
Alias: ebcdic-no-277+euro
Name: IBM01143
MIBenum: 2094
Source: IBM See (http://www.iana.org/assignments/charset-reg/IBM01143) [Mahdi]
Alias: CCSID01143
Alias: CP01143
Alias: ebcdic-fi-278+euro
Alias: ebcdic-se-278+euro
Name: IBM01144
MIBenum: 2095
Source: IBM See (http://www.iana.org/assignments/charset-reg/IBM01144) [Mahdi]
Alias: CCSID01144
Alias: CP01144
Alias: ebcdic-it-280+euro
Name: IBM01145
MIBenum: 2096
Source: IBM See (http://www.iana.org/assignments/charset-reg/IBM01145) [Mahdi]
Alias: CCSID01145
Alias: CP01145
Alias: ebcdic-es-284+euro
Name: IBM01146
MIBenum: 2097
Source: IBM See (http://www.iana.org/assignments/charset-reg/IBM01146) [Mahdi]
Alias: CCSID01146
Alias: CP01146
Alias: ebcdic-gb-285+euro
Name: IBM01147
MIBenum: 2098
Source: IBM See (http://www.iana.org/assignments/charset-reg/IBM01147) [Mahdi]
Alias: CCSID01147
Alias: CP01147
Alias: ebcdic-fr-297+euro
Name: IBM01148
MIBenum: 2099
Source: IBM See (http://www.iana.org/assignments/charset-reg/IBM01148) [Mahdi]
Alias: CCSID01148
Alias: CP01148
Alias: ebcdic-international-500+euro
Name: IBM01149
MIBenum: 2100
Source: IBM See (http://www.iana.org/assignments/charset-reg/IBM01149) [Mahdi]
Alias: CCSID01149
Alias: CP01149
Alias: ebcdic-is-871+euro
Name: Big5-HKSCS [Yick]
MIBenum: 2101
Source: See (http://www.iana.org/assignments/charset-reg/Big5-HKSCS)
Alias: None
Name: IBM1047 [Robrigado]
MIBenum: 2102
Source: IBM1047 (EBCDIC Latin 1/Open Systems)
http://www-1.ibm.com/servers/eserver/iseries/software/globalization/pdf/cp01047z.pdf
Alias: IBM-1047
Name: PTCP154 [Uskov]
MIBenum: 2103
Source: See (http://www.iana.org/assignments/charset-reg/PTCP154)
Alias: csPTCP154
Alias: PT154
Alias: CP154
Alias: Cyrillic-Asian
Name: Amiga-1251
MIBenum: 2104
Source: See (http://www.amiga.ultranet.ru/Amiga-1251.html)
Alias: Ami1251
Alias: Amiga1251
Alias: Ami-1251
(Aliases are provided for historical reasons and should not be used)
[Malyshev]
Name: KOI7-switched
MIBenum: 2105
Source: See <http://www.iana.org/assignments/charset-reg/KOI7-switched>
Aliases: None
Name: BRF
MIBenum: 2106
Source: See <http://www.iana.org/assignments/charset-reg/BRF> [Thibault]
Alias: csBRF
Name: TSCII
MIBenum: 2107
Source: See <http://www.iana.org/assignments/charset-reg/TSCII> [Kalyanasundaram]
Alias: csTSCII
Name: windows-1250
MIBenum: 2250
Source: Microsoft (http://www.iana.org/assignments/charset-reg/windows-1250) [Lazhintseva]
Alias: None
Name: windows-1251
MIBenum: 2251
Source: Microsoft (http://www.iana.org/assignments/charset-reg/windows-1251) [Lazhintseva]
Alias: None
Name: windows-1252
MIBenum: 2252
Source: Microsoft (http://www.iana.org/assignments/charset-reg/windows-1252) [Wendt]
Alias: None
Name: windows-1253
MIBenum: 2253
Source: Microsoft (http://www.iana.org/assignments/charset-reg/windows-1253) [Lazhintseva]
Alias: None
Name: windows-1254
MIBenum: 2254
Source: Microsoft (http://www.iana.org/assignments/charset-reg/windows-1254) [Lazhintseva]
Alias: None
Name: windows-1255
MIBenum: 2255
Source: Microsoft (http://www.iana.org/assignments/charset-reg/windows-1255) [Lazhintseva]
Alias: None
Name: windows-1256
MIBenum: 2256
Source: Microsoft (http://www.iana.org/assignments/charset-reg/windows-1256) [Lazhintseva]
Alias: None
Name: windows-1257
MIBenum: 2257
Source: Microsoft (http://www.iana.org/assignments/charset-reg/windows-1257) [Lazhintseva]
Alias: None
Name: windows-1258
MIBenum: 2258
Source: Microsoft (http://www.iana.org/assignments/charset-reg/windows-1258) [Lazhintseva]
Alias: None
Name: TIS-620
MIBenum: 2259
Source: Thai Industrial Standards Institute (TISI) [Tantsetthi]
REFERENCES
----------
[RFC1345] Simonsen, K., "Character Mnemonics & Character Sets",
RFC 1345, Rationel Almen Planlaegning, Rationel Almen
Planlaegning, June 1992.
[RFC1428] Vaudreuil, G., "Transition of Internet Mail from
Just-Send-8 to 8bit-SMTP/MIME", RFC1428, CNRI, February
1993.
[RFC1456] Vietnamese Standardization Working Group, "Conventions for
Encoding the Vietnamese Language VISCII: VIetnamese
Standard Code for Information Interchange VIQR: VIetnamese
Quoted-Readable Specification Revision 1.1", RFC 1456, May
1993.
[RFC1468] Murai, J., Crispin, M., and E. van der Poel, "Japanese
Character Encoding for Internet Messages", RFC 1468,
Keio University, Panda Programming, June 1993.
[RFC1489] Chernov, A., "Registration of a Cyrillic Character Set",
RFC1489, RELCOM Development Team, July 1993.
[RFC1554] Ohta, M., and K. Handa, "ISO-2022-JP-2: Multilingual
Extension of ISO-2022-JP", RFC1554, Tokyo Institute of
Technology, ETL, December 1993.
[RFC1556] Nussbacher, H., "Handling of Bi-directional Texts in MIME",
RFC1556, Israeli Inter-University, December 1993.
[RFC1557] Choi, U., Chon, K., and H. Park, "Korean Character Encoding
for Internet Messages", KAIST, Solvit Chosun Media,
December 1993.
[RFC1641] Goldsmith, D., and M. Davis, "Using Unicode with MIME",
RFC1641, Taligent, Inc., July 1994.
[RFC1642] Goldsmith, D., and M. Davis, "UTF-7", RFC1642, Taligent,
Inc., July 1994.
[RFC1815] Ohta, M., "Character Sets ISO-10646 and ISO-10646-J-1",
RFC 1815, Tokyo Institute of Technology, July 1995.
[Adobe] Adobe Systems Incorporated, PostScript Language Reference
Manual, second edition, Addison-Wesley Publishing Company,
Inc., 1990.
[ECMA Registry] ISO-IR: International Register of Escape Sequences
http://www.itscj.ipsj.or.jp/ISO-IE/ Note: The current
registration authority is IPSJ/ITSCJ, Japan.
[HP-PCL5] Hewlett-Packard Company, "HP PCL 5 Comparison Guide",
(P/N 5021-0329) pp B-13, 1996.
[IBM-CIDT] IBM Corporation, "ABOUT TYPE: IBM's Technical Reference
for Core Interchange Digitized Type", Publication number
S544-3708-01
[RFC1842] Wei, Y., J. Li, and Y. Jiang, "ASCII Printable
Characters-Based Chinese Character Encoding for Internet
Messages", RFC 1842, Harvard University, Rice University,
University of Maryland, August 1995.
[RFC1843] Lee, F., "HZ - A Data Format for Exchanging Files of
Arbitrarily Mixed Chinese and ASCII Characters", RFC 1843,
Stanford University, August 1995.
[RFC2152] Goldsmith, D., M. Davis, "UTF-7: A Mail-Safe Transformation
Format of Unicode", RFC 2152, Apple Computer, Inc.,
Taligent Inc., May 1997.
[RFC2279] Yergeau, F., "UTF-8, A Transformation Format of ISO 10646",
RFC 2279, Alis Technologies, January, 1998.
[RFC2781] Hoffman, P., Yergeau, F., "UTF-16, an encoding of ISO 10646",
RFC 2781, February 2000.
[RFC3629] Yergeau, F., "UTF-8, a transformation format of ISO 10646",
RFC3629, November 2003.
PEOPLE
------
[KXS2] Keld Simonsen <Keld.Simonsen&dkuug.dk>
[Choi] Woohyong Choi <whchoi&cosmos.kaist.ac.kr>
[Davis] Mark Davis, <mark&unicode.org>, April 2002.
[Kalyanasundaram] Kuppuswamy Kalyanasundaram, <kalyan.geo@yahoo. com>, 14 May 2007.
[Kikkarin] Sairan M. Kikkarin, <sairan&sci.kz>, 7 December 2006.
[Lazhintseva] Katya Lazhintseva, <katyal&MICROSOFT.com>, May 1996.
[Mahdi] Tamer Mahdi, <tamer&ca.ibm.com>, August 2000.
[Malyshev] Michael Malyshev, <michael_malyshev&mail.ru>, January 2004
[Murai] Jun Murai <jun&wide.ad.jp>
[Nussbacher] Hank Nussbacher, <hank&vm.tau.ac.il>
[Ohta] Masataka Ohta, <mohta&cc.titech.ac.jp>, July 1995.
[Phipps] Toby Phipps, <tphipps&peoplesoft.com>, March 2002.
[Pond] Rick Pond, <rickpond&vnet.ibm.com>, March 1997.
[Robrigado] Reuel Robrigado, <reuelr&ca.ibm.com>, September 2002.
[Scherer] Markus Scherer, <markus.scherer&jtcsv.com>, August 2000,
September 2002.
[Simonsen] Keld Simonsen, <Keld.Simonsen&rap.dk>, August 2000.
[Tantsetthi] Trin Tantsetthi, <trin&mozart.inet.co.th>, September 1998.
[Thibault] Samuel Thibault, <samuel.thibault&ens-lyon.org>, 7 December 2006.
[Tumasonis] Vladas Tumasonis, <vladas.tumasonis&maf.vu.lt>, August 2000.
[Uskov] Alexander Uskov, <auskov&idc.kz>, September 2002.
[Veremeev] Alexei Veremeev, <Alexey.Veremeev&oracle.com>, 7 December 2006.
[Wendt] Chris Wendt, <christwµsoft.com>, December 1999.
[Yick] Nicky Yick, <cliac&itsd.gcn.gov.hk>, October 2000.
[]
# Blocks-5.1.0.txt # Date: 2008-03-20, 17:41:00 PDT [KW] # # Unicode Character Database # Copyright (c) 1991-2008 Unicode, Inc. # For terms of use, see http://www.unicode.org/terms_of_use.html # For documentation, see UCD.html # # Note: The casing of block names is not normative. # For example, "Basic Latin" and "BASIC LATIN" are equivalent. # # Format: # Start Code..End Code; Block Name # ================================================ # Note: When comparing block names, casing, whitespace, hyphens, # and underbars are ignored. # For example, "Latin Extended-A" and "latin extended a" are equivalent. # For more information on the comparison of property values, # see UCD.html. # # All code points not explicitly listed for Block # have the value No_Block. # Property: Block # # @missing: 0000..10FFFF; No_Block 0000..007F; Basic Latin 0080..00FF; Latin-1 Supplement 0100..017F; Latin Extended-A 0180..024F; Latin Extended-B 0250..02AF; IPA Extensions 02B0..02FF; Spacing Modifier Letters 0300..036F; Combining Diacritical Marks 0370..03FF; Greek and Coptic 0400..04FF; Cyrillic 0500..052F; Cyrillic Supplement 0530..058F; Armenian 0590..05FF; Hebrew 0600..06FF; Arabic 0700..074F; Syriac 0750..077F; Arabic Supplement 0780..07BF; Thaana 07C0..07FF; NKo 0900..097F; Devanagari 0980..09FF; Bengali 0A00..0A7F; Gurmukhi 0A80..0AFF; Gujarati 0B00..0B7F; Oriya 0B80..0BFF; Tamil 0C00..0C7F; Telugu 0C80..0CFF; Kannada 0D00..0D7F; Malayalam 0D80..0DFF; Sinhala 0E00..0E7F; Thai 0E80..0EFF; Lao 0F00..0FFF; Tibetan 1000..109F; Myanmar 10A0..10FF; Georgian 1100..11FF; Hangul Jamo 1200..137F; Ethiopic 1380..139F; Ethiopic Supplement 13A0..13FF; Cherokee 1400..167F; Unified Canadian Aboriginal Syllabics 1680..169F; Ogham 16A0..16FF; Runic 1700..171F; Tagalog 1720..173F; Hanunoo 1740..175F; Buhid 1760..177F; Tagbanwa 1780..17FF; Khmer 1800..18AF; Mongolian 1900..194F; Limbu 1950..197F; Tai Le 1980..19DF; New Tai Lue 19E0..19FF; Khmer Symbols 1A00..1A1F; Buginese 1B00..1B7F; Balinese 1B80..1BBF; Sundanese 1C00..1C4F; Lepcha 1C50..1C7F; Ol Chiki 1D00..1D7F; Phonetic Extensions 1D80..1DBF; Phonetic Extensions Supplement 1DC0..1DFF; Combining Diacritical Marks Supplement 1E00..1EFF; Latin Extended Additional 1F00..1FFF; Greek Extended 2000..206F; General Punctuation 2070..209F; Superscripts and Subscripts 20A0..20CF; Currency Symbols 20D0..20FF; Combining Diacritical Marks for Symbols 2100..214F; Letterlike Symbols 2150..218F; Number Forms 2190..21FF; Arrows 2200..22FF; Mathematical Operators 2300..23FF; Miscellaneous Technical 2400..243F; Control Pictures 2440..245F; Optical Character Recognition 2460..24FF; Enclosed Alphanumerics 2500..257F; Box Drawing 2580..259F; Block Elements 25A0..25FF; Geometric Shapes 2600..26FF; Miscellaneous Symbols 2700..27BF; Dingbats 27C0..27EF; Miscellaneous Mathematical Symbols-A 27F0..27FF; Supplemental Arrows-A 2800..28FF; Braille Patterns 2900..297F; Supplemental Arrows-B 2980..29FF; Miscellaneous Mathematical Symbols-B 2A00..2AFF; Supplemental Mathematical Operators 2B00..2BFF; Miscellaneous Symbols and Arrows 2C00..2C5F; Glagolitic 2C60..2C7F; Latin Extended-C 2C80..2CFF; Coptic 2D00..2D2F; Georgian Supplement 2D30..2D7F; Tifinagh 2D80..2DDF; Ethiopic Extended 2DE0..2DFF; Cyrillic Extended-A 2E00..2E7F; Supplemental Punctuation 2E80..2EFF; CJK Radicals Supplement 2F00..2FDF; Kangxi Radicals 2FF0..2FFF; Ideographic Description Characters 3000..303F; CJK Symbols and Punctuation 3040..309F; Hiragana 30A0..30FF; Katakana 3100..312F; Bopomofo 3130..318F; Hangul Compatibility Jamo 3190..319F; Kanbun 31A0..31BF; Bopomofo Extended 31C0..31EF; CJK Strokes 31F0..31FF; Katakana Phonetic Extensions 3200..32FF; Enclosed CJK Letters and Months 3300..33FF; CJK Compatibility 3400..4DBF; CJK Unified Ideographs Extension A 4DC0..4DFF; Yijing Hexagram Symbols 4E00..9FFF; CJK Unified Ideographs A000..A48F; Yi Syllables A490..A4CF; Yi Radicals A500..A63F; Vai A640..A69F; Cyrillic Extended-B A700..A71F; Modifier Tone Letters A720..A7FF; Latin Extended-D A800..A82F; Syloti Nagri A840..A87F; Phags-pa A880..A8DF; Saurashtra A900..A92F; Kayah Li A930..A95F; Rejang AA00..AA5F; Cham AC00..D7AF; Hangul Syllables D800..DB7F; High Surrogates DB80..DBFF; High Private Use Surrogates DC00..DFFF; Low Surrogates E000..F8FF; Private Use Area F900..FAFF; CJK Compatibility Ideographs FB00..FB4F; Alphabetic Presentation Forms FB50..FDFF; Arabic Presentation Forms-A FE00..FE0F; Variation Selectors FE10..FE1F; Vertical Forms FE20..FE2F; Combining Half Marks FE30..FE4F; CJK Compatibility Forms FE50..FE6F; Small Form Variants FE70..FEFF; Arabic Presentation Forms-B FF00..FFEF; Halfwidth and Fullwidth Forms FFF0..FFFF; Specials 10000..1007F; Linear B Syllabary 10080..100FF; Linear B Ideograms 10100..1013F; Aegean Numbers 10140..1018F; Ancient Greek Numbers 10190..101CF; Ancient Symbols 101D0..101FF; Phaistos Disc 10280..1029F; Lycian 102A0..102DF; Carian 10300..1032F; Old Italic 10330..1034F; Gothic 10380..1039F; Ugaritic 103A0..103DF; Old Persian 10400..1044F; Deseret 10450..1047F; Shavian 10480..104AF; Osmanya 10800..1083F; Cypriot Syllabary 10900..1091F; Phoenician 10920..1093F; Lydian 10A00..10A5F; Kharoshthi 12000..123FF; Cuneiform 12400..1247F; Cuneiform Numbers and Punctuation 1D000..1D0FF; Byzantine Musical Symbols 1D100..1D1FF; Musical Symbols 1D200..1D24F; Ancient Greek Musical Notation 1D300..1D35F; Tai Xuan Jing Symbols 1D360..1D37F; Counting Rod Numerals 1D400..1D7FF; Mathematical Alphanumeric Symbols 1F000..1F02F; Mahjong Tiles 1F030..1F09F; Domino Tiles 20000..2A6DF; CJK Unified Ideographs Extension B 2F800..2FA1F; CJK Compatibility Ideographs Supplement E0000..E007F; Tags E0100..E01EF; Variation Selectors Supplement F0000..FFFFF; Supplementary Private Use Area-A 100000..10FFFF; Supplementary Private Use Area-B # EOF
Beispiele für benamte/benummerte XHTML-Zeichen:
& = & &,
< = < <,
> = > >,
⇒ = ⇒ = ⇒ ⇒,
→ = → = → →,
ä ä,
ö ö,
ü ü,
Ä Ä,
Ö Ö,
Ü Ü,
ß ß,
⟨ = 〈 〈,
⟩ = 〉 〉,
– =   =   = –,
— =   =   =—,
...
Hier einige (benummerte) Unicode-Zeichen:
❐ = ❐, invers =❐ ,
➥ = ➥ = ➥ ,
☓ = ☓ = ☓ ,
⊗ = ⊗ ,
✺ = ✺ ,
❍ = ❍ ,
〇 = 〇 Vergleich: 'O', '0',
⊗
〣 = 〣 ,
─ ─
│ │
┌ ┌
┐ ┐
└ └
┘ ┘
├ ├
┤ ┤
┬ ┬
┴ ┴
┼ ┼
═ ═
║ ║
╒ ╒
╓ ╓
╔ ╔
╕ ╕
╖ ╖
╗ ╗
╘ ╘
╙ ╙
╚ ╚
╛ ╛
╜ ╜
╝ ╝
╞ ╞
╟ ╟
╠ ╠
╡ ╡
╢ ╢
╣ ╣
╤ ╤
╥ ╥
╦ ╦
╧ ╧
╨ ╨
╩ ╩
╪ ╪
╫ ╫
╬ ╬
▀ ▀
▄ ▄
█ █
▌ ▌
▐ ▐
░ ░
▒ ▒
▓ ▓
■ ■
▪ ▪
▫ ▫
▬ ▬
▲ ▲
► ►
▼ ▼
◄ ◄
◊ ◊
○ ○
● ●
◘ ◘
◙ ◙
◦ ◦
☺ ☺
☻ ☻
☼ ☼
♀ ♀
♂ ♂
♠ ♠
♣ ♣
♥ ♥
♦ ♦
♪ ♪
♫ ♫
♰ ♰
♱ ♱
1. Beispiel für Unicode-Block: wikipedia.org: U+0000–U+007F Unicode-Block Basis-Lateinisch oder decodeunicode.org: visuell U+0000–U+007F (Basic Latin) unicode.coeurlumiere: U+0000–U+0FFF Tabelle (hex,dez)
2. Beispiel für Unicode-Block: wikipedia.org: U+2300–U+23FF Unicode-Block Miscellaneous Technical (Verschiedene technische Zeichen) oder decodeunicode.org: visuell U+2300–U+23FF Miscellaneous Technical (technische Symbole)
3. Beispiel für Unicode-Block: wikipedia.org: U+25A0–U+25FF Unicode-Block Geometrische Formen oder decodeunicode.org: visuell U+25A0–U+25FF Geometric Shapes
| Name und Link zur Unicodetabelle | Block |
|---|---|
| Basic Latin (ASCII-Codetabelle) | U+0000 bis U+007F |
| Latin-1 (Codetabelle von ISO 8859-1) | U+0080 bis U+00FF |
| Latin Extended-A | U+0100 bis U+017F |
| Latin Extended-B | U+0180 bis U+024F |
| IPA Extensions | U+0250 bis U+02AF |
| Spacing Modifier Letters | U+02B0 bis U+02FF |
| Combining Diacritical Marks | U+0300 bis U+036F |
| Greek | U+0370 bis U+03FF |
| Cyrillic | U+0400 bis U+04FF |
| Armenian | U+0530 bis U+058F |
| Hebrew | U+0590 bis U+05FF |
| Arabic | U+0600 bis U+06FF |
| Devanagari | U+0900 bis U+097F |
| Bengali | U+0980 bis U+09FF |
| Gurmukhi | U+0A00 bis U+0A7F |
| Gujarati | U+0A80 bis U+0AFF |
| Oriya | U+0B00 bis U+0B7F |
| Tamil | U+0B80 bis U+0BFF |
| Telugu | U+0C00 bis U+0C7F |
| Kannada | U+0C80 bis U+0CFF |
| Malayalam | U+0D00 bis U+0D7F |
| Thai | U+0E00 bis U+0E7F |
| Lao | U+0E80 bis U+0EFF |
| Tibetan | U+0F00 bis U+0FBF |
| Georgian | U+10A0 bis U+10FF |
| Hangul Jamo | U+1100 bis U+11FF |
| Latin Extended Additional | U+1E00 bis U+1EFF |
| Greek Extended | U+1F00 bis U+1FFF |
| General Punctuation | U+2000 bis U+206F |
| Superscripts and Subscripts | U+2070 bis U+209F |
| Currency Symbols |
U+20A0 bis U+20CF
Eurozeichen-Symbol; U+20AC,
HTML auch: €
oder €
|
| Combining Diacritical Marks for Symbols | U+20D0 bis U+20FF |
| Letterlike Symbols | U+2100 bis U+214F |
| Number Forms | U+2150 bis U+218F |
| Arrows | U+2190 bis U+21FF |
| Mathematical Operators | U+2200 bis U+22FF |
| Miscellaneous Technical | U+2300 bis U+23FF |
| Control Pictures | U+2400 bis U+243F |
| Optical Character Recognition | U+2440 bis U+245F |
| Enclosed Alphanumerics | U+2460 bis U+24FF |
| Box Drawing | U+2500 bis U+257F |
| Block Elements | U+2580 bis U+259F |
| Geometric Shapes | U+25A0 bis U+25FF |
| Miscellaneous Symbols | U+2600 bis U+26FF |
| Dingbats | U+2700 bis U+27BF |
| CJK Symbols and Punctuation | U+3000 bis U+303F |
| Hiragana | U+3040 bis U+309F |
| Katakana | U+30A0 bis U+30FF |
| Bopomofo | U+3100 bis U+312F |
| Hangul Compatibility Jamo | U+3130 bis U+318F |
| Kanbun | U+3190 bis U+319F |
| Enclosed CJK Letters and Months | U+3200 bis U+32FF |
| CJK Compatibility | U+3300 bis U+33FF |
| CJK Unified Ideographs | U+4E00 bis U+9FA5 |
| Hangul Syllables | U+AC00 bis U+D7A3 |
| High Surrogates | U+D800 bis U+DB7F |
| Private Use High Surrogates | U+DB80 bis U+DBFF |
| Low Surrogates | U+DC00 bis U+DFFF |
| Private Use Area | U+E000 bis U+F8FF |
| CJK Compatibility Ideographs | U+F900 bis U+FAFF |
| Alphabetic Presentation Forms | U+FB00 bis U+FB4F |
| Arabic Presentation Forms-A | U+FB50 bis U+FDFF |
| Combining Half Marks | U+FE20 bis U+FE2F |
| CJK Compatibility Forms | U+FE30 bis U+FE4F |
| Small Form Variants | U+FE50 bis U+FE6F |
| Arabic Presentation Forms-B | U+FE70 bis U+FEFF |
| Halfwidth and Fullwidth Forms | U+FF00 bis U+FFEF |
| Specials | U+FFF0 bis U+FFFF |
| Name und Link zur Unicodetabelle | Block |
|---|---|
| Linear B Syllabary | U+10000 bis U+1007F |
| Linear B Ideograms | U+10080 bis U+100FF |
| Aegean Numbers | U+10100 bis U+1013F |
| Ancient Greek Numbers | U+10140 bis U+1018F |
| Old Italic | U+10300 bis U+1032F |
| Gothic | U+10330 bis U+1034F |
| Ugaritic | U+10380 bis U+1039F |
| Old Persian | U+103A0 bis U+103DF |
| Deseret | U+10400 bis U+1044F |
| Shavian | U+10450 bis U+1047F |
| Osmanya | U+10480 bis U+104AF |
| Cypriot Syllabary | U+10800 bis U+1083F |
| Kharoshthi | U+10A00 bis U+10A5F |
| Byzantine Musical Symbols | U+1D000 bis U+1D0FF |
| Musical Symbols | U+1D100 bis U+1D1FF |
| Ancient Greek Musical Notation | U+1D200 bis U+1D24F |
| Tai Xuan Jing Symbols | U+1D300 bis U+1D35F |
| Mathematical Alphanumeric Symbols | U+1D400 bis U+1D7FF |
| CJK Unified Ideographs Extension B | U+20000 bis U+2A6D6 |
| CJK Compatibility Ideographs Supplement | U+2F800 bis U+2FA1F |
| Tags | U+E0000 bis U+E007F |
| Variation Selectors Supplement | U+E0100 bis U+E01EF |
| Supplementary Private Use Area-A | U+F0000 bis U+E01EF |
| Supplementary Private Use Area-B | U+100000 bis U+10FFFF |
Zur Zeit (2009) sind in den Browsern noch nicht alle Unicode-Blöcke verfügbar.
→U+0000 - U+007F Basis-Lateinisch →U+0080 - U+00FF Lateinisch-1, Ergänzung →U+0100 - U+017F Lateinisch, erweitert-A →U+0180 - U+024F Lateinisch, erweitert-B →U+02B0 - U+02FF Spacing Modifier Letters →U+0300 - U+036F Kombinierende diakritische Zeichen →U+0370 - U+03FF Griechisch und Koptisch →U+0300 - U+036F Kombinierende diakritische Zeichen, Ergänzung →U+1F00 - U+1FFF Griechisch, Zusatz →U+2070 - U+209F Hoch- und tiefgestellte Zeichen →U+20A0 - U+20CF Währungszeichen →U+2100 - U+214F Buchstabenähnliche Symbole →U+2150 - U+218F Zahlzeichen →U+2190 - U+21FF Pfeile →U+2400 - U+243F Symbole für Steuerzeichen →U+2440 - U+245F Optische Zeichenerkennung →U+2460 - U+24FF Umschlossene alphanumerische Zeichen →U+2580 - U+259F Blockelemente →U+25A0 - U+25FF Geometrische Formen →U+2600 - U+26FF Verschiedene Symbole →U+27F0 - U+27FF Zusätzliche Pfeile-A →U+2800 - U+28FF Braille-Zeichen →U+FE70 - U+FEFF Arabische Präsentationsformen-B
Unicode-Block: General-Punctuation U+2000(8192) – U+206F(8303) → Unicode.org chart U+2000(8192) – U+206F(8303) (PDF)
| Unicode-Block: General-Punctuation U+2000(8192) – U+206F(8303) | ||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| x= | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | a | b | c | d | e | f |
U+200x
| Ȁ | ȁ | Ȃ | ȃ | Ȅ | ȅ | Ȇ | ȇ | Ȉ | ȉ | Ȋ | ȋ | Ȍ | ȍ | Ȏ | ȏ |
U+201x
| ‐ | ‑ | ‒ | – | — | ― | ‖ | ‗ | ‘ | ’ | ‚ | ‛ | “ | ” | „ | ‟ |
U+202x
| † | ‡ | • | ‣ | ․ | ‥ | … | ‧ | | | | | | |||
U+203x
| ‰ | ‱ | ′ | ″ | ‴ | ‵ | ‶ | ‷ | ‸ | ‹ | › | ※ | ‼ | ‽ | ‾ | ‿ |
U+204x
| ⁀ | ⁁ | ⁂ | ⁃ | ⁄ | ⁅ | ⁆ | ⁇ | ⁈ | ⁉ | ⁊ | ⁋ | ⁌ | ⁍ | ⁎ | ⁏ |
U+205x
| ⁐ | ⁑ | ⁒ | ⁓ | ⁔ | ⁕ | ⁖ | ⁗ | ⁘ | ⁙ | ⁚ | ⁛ | ⁜ | ⁝ | ⁞ | |
U+206x
| | | | | | | | | | | | | | | | |
Unicode-Block: Letterlike-Symbols U+2100(8448) – U+214f(8527) → Unicode.org chart U+2100(8448) – U+214f(8527) (PDF)
| Unicode-Letterlike-Symbols U+2100(8448) – U+214f(8527) | ||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| x= | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | a | b | c | d | e | f |
U+220x
| ℀ | ℁ | ℂ | ℃ | ℄ | ℅ | ℆ | ℇ | ℈ | ℉ | ℊ | ℋ | ℌ | ℍ | ℎ | ℏ |
U+211x
| ℐ | ℑ | ℒ | ℓ | ℔ | ℕ | № | ℗ | ℘ | ℙ | ℚ | ℛ | ℜ | ℝ | ℞ | ℟ |
U+212x
| ℠ | ℡ | ™ | ℣ | ℤ | ℥ | Ω | ℧ | ℨ | ℩ | K | Å | ℬ | ℭ | ℮ | ℯ |
U+213x
| ℰ | ℱ | Ⅎ | ℳ | ℴ | ℵ | ℶ | ℷ | ℸ | ℹ | ℺ | ℻ | ℼ | ℽ | ℾ | ℿ |
U+214x
| ⅀ | ⅁ | ⅂ | ⅃ | ⅄ | ⅅ | ⅆ | ⅇ | ⅈ | ⅉ | ⅊ | ⅋ | ⅌ | ⅍ | ⅎ | ⅏ |
Unicode-Block: Block-Elements U+2580 (9600) - U+259f (9631) → Unicode.org chart U+2580 (9600) - U+259f (9631) (PDF)
| Unicode-Block-Elements U+2580 (9600) - U+259f (9631) | ||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| x= | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | a | b | c | d | e | f |
U+258x
| ▀ | ▁ | ▂ | ▃ | ▄ | ▅ | ▆ | ▇ | █ | ▉ | ▊ | ▋ | ▌ | ▍ | ▎ | ▏ |
U+259x
| ▐ | ░ | ▒ | ▓ | ▔ | ▕ | ▖ | ▗ | ▘ | ▙ | ▚ | ▛ | ▜ | ▝ | ▞ | ▟ |
Unicode-Block: Geometric Shapes U+25a0 (9632) - U+25ff (9727) → Unicode.org chart U+25a0 (9632) - U+25ff (9727) (PDF)
Beispiele aus U+25a0 (9632) - U+25ff (9727):
▲   ▲  
◄ ► ◄   ►
▼   ▼  
◤▴◥ ◤▴◥
◂◌▶ ◂◌▶
◣▾◢ ◣▾◢
| Unicode-Geometric-Shapes U+25a0 (9632) - U+25ff (9727) | ||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| x= | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | a | b | c | d | e | f |
U+25ax
| ■ | □ | ▢ | ▣ | ▤ | ▥ | ▦ | ▧ | ▨ | ▩ | ▪ | ▫ | ▬ | ▭ | ▮ | ▯ |
U+25bx
| ▰ | ▱ | ▲ | △ | ▴ | ▵ | ▶ | ▷ | ▸ | ▹ | ► | ▻ | ▼ | ▽ | ▾ | ▿ |
U+25cx
| ◀ | ◁ | ◂ | ◃ | ◄ | ◅ | ◆ | ◇ | ◈ | ◉ | ◊ | ○ | ◌ | ◍ | ◎ | ● |
U+25dx
| ◐ | ◑ | ◒ | ◓ | ◔ | ◕ | ◖ | ◗ | ◘ | ◙ | ◚ | ◛ | ◜ | ◝ | ◞ | ◟ |
U+25ex
| ◠ | ◡ | ◢ | ◣ | ◤ | ◥ | ◦ | ◧ | ◨ | ◩ | ◪ | ◫ | ◬ | ◭ | ◮ | ◯ |
U+25fx
| ◰ | ◱ | ◲ | ◳ | ◴ | ◵ | ◶ | ◷ | ◸ | ◹ | ◺ | ◻ | ◼ | ◽ | ◾ | ◿ |
Unicode-Block: Miscellaneous-Symbols U+2600(9728) – U+26FF(9983) → Unicode.org chart U+2600(9728) – U+26FF(9983) (PDF)
Beispiele aus U+2600(9728) – U+26FF(9983):
☀ = ☀,
☁ = ☁,
☂ = ☂,
☃ = ☃,
♩ = ♩,
♪ = ♪,
♫ = ♫,
♬ = ♬,
| Unicode-Miscellaneous-Symbols U+2600(9728) – U+26FF(9983) | ||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| x= | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | a | b | c | d | e | f |
U+260x
| ☀ | ☁ | ☂ | ☃ | ☄ | ★ | ☆ | ☇ | ☈ | ☉ | ☊ | ☋ | ☌ | ☍ | ☎ | ☏ |
U+261x
| ☐ | ☑ | ☒ | ☓ | ☔ | ☕ | ☖ | ☗ | ☘ | ☙ | ☚ | ☛ | ☜ | ☝ | ☞ | ☟ |
U+262x
| ☠ | ☡ | ☢ | ☣ | ☤ | ☥ | ☦ | ☧ | ☨ | ☩ | ☪ | ☫ | ☬ | ☭ | ☮ | ☯ |
U+263x
| ☰ | ☱ | ☲ | ☳ | ☴ | ☵ | ☶ | ☷ | ☸ | ☹ | ☺ | ☻ | ☼ | ☽ | ☾ | ☿ |
U+264x
| ♀ | ♁ | ♂ | ♃ | ♄ | ♅ | ♆ | ♇ | ♈ | ♉ | ♊ | ♋ | ♌ | ♍ | ♎ | ♏ |
U+265x
| ♐ | ♑ | ♒ | ♓ | ♔ | ♕ | ♖ | ♗ | ♘ | ♙ | ♚ | ♛ | ♜ | ♝ | ♞ | ♟ |
U+266x
| ♠ | ♡ | ♢ | ♣ | ♤ | ♥ | ♦ | ♧ | ♨ | ♩ | ♪ | ♫ | ♬ | ♭ | ♮ | ♯ |
U+267x
| ♰ | ♱ | ♲ | ♳ | ♴ | ♵ | ♶ | ♷ | ♸ | ♹ | ♺ | ♻ | ♼ | ♽ | ♾ | ♿ |
Unicode-Block: Dingbats-Block U+2701 (9985) - U+27BE (10174) → Unicode.org chart Dingbats-Block U+2701 (9985) - U+27BE (10174) (PDF)
| Unicode-Nummer | Zeichen | XHTML- Code | Beschreibung | Offizielle Bezeichnung |
|---|---|---|---|---|
| U+2701 (9985) | ✁ | ✁ |
Schere mit oberer Klinge | UPPER BLADE SCISSORS |
| U+2702 (9986) | ✂ | ✂ |
Schwarze Schere | BLACK SCISSORS |
| U+2703 (9987) | ✃ | ✃ |
Schere mit unterer Klinge | LOWER BLADE SCISSORS |
| U+2704 (9988) | ✄ | ✄ |
Weiße Schere | WHITE SCISSORS |
| U+2706 (9990) | ✆ | ✆ |
Zeichen für Telefonstandort ( U+2121 TELEPHONE SIGN ℡) | TELEPHONE LOCATION SIGN |
| U+2707 (9991) | ✇ | ✇ |
Bandlaufwerk | TAPE DRIVE |
| U+2708 (9992) | ✈ | ✈ |
Flugzeug, Zeichen für Flughafen | AIRPLANE |
| U+2709 (9993) | ✉ | ✉ |
Briefumschlag, Zeichen für Post | ENVELOPE |
| U+270C (9996) | ✌ | ✌ |
Victory-Zeichen | VICTORY HAND |
| U+270D (9997) | ✍ | ✍ |
Schreibende Hand Zeichen für handschriftlich, Schriftstellerei |
WRITING HAND |
| U+270E (9998) | ✎ | ✎ |
Schreibstift nach rechts unten | LOWER RIGHT PENCIL |
| U+270F (9999) | ✏ | ✏ |
Schreibstift | PENCIL |
| U+2710 (10000) | ✐ | ✐ |
Schreibstift nach rechts oben | UPPER RIGHT PENCIL |
| U+2711 (10001) | ✑ | ✑ |
Weiße Schreibfederspitze | WHITE NIB |
| U+2712 (10002) | ✒ | ✒ |
Schwarze Schreibfederspitze | BLACK NIB |
| U+2713 (10003) | ✓ | ✓ |
Häkchen (U+2611 BALLOT BOX WITH CHECK ☑) | CHECK MARK |
| U+2714 (10004) | ✔ | ✔ |
Fettes Häkchen | HEAVY CHECK MARK |
| U+2715 (10005) | ✕ | ✕ |
Kreuzchen als Malzeichen für Multiplikation (Mathematik) U+00D7 MULTIPLICATION SIGN × U+2573 BOX DRAWINGS LIGHT DIAGONAL CROSS ╳ |
MULTIPLICATION X |
| U+2716 (10006) | ✖ | ✖ |
Fettes Kreuzchen als Malzeichen | HEAVY MULTIPLICATION X |
| U+2717 (10007) | ✗ | ✗ |
Kreuzchen (wörtl. "Wahlkästchen-X", U+2612 BALLOT BOX WITH X, U+2613 SALTIRE ☓) |
BALLOT X |
| U+2718 (10008) | ✘ | ✘ |
Fettes Kreuzchen | HEAVY BALLOT X |
| U+2719 (10009) | ✙ | ✙ |
Umrandetes griechisches Kreuz | OUTLINED GREEK CROSS |
| U+271A (10010) | ✚ | ✚ |
Starkes griechisches Kreuz | HEAVY GREEK CROSS |
| U+271B (10011) | ✛ | ✛ |
Kreuz mit offener Mitte | OPEN CENTRE CROSS |
| U+271C (10012) | ✜ | ✜ |
Starkes Kreuz mit offener Mitte | HEAVY OPEN CENTRE CROSS |
| U+271D (10013) | ✝ | ✝ |
Lateinisches Kreuz | LATIN CROSS |
| U+271E (10014) | ✞ | ✞ |
Schattiertes weißes lateinisches Kreuz | SHADOWED WHITE LATIN CROSS |
| U+271F (10015) | ✟ | ✟ |
Umrandetes lateinisches Kreuz | OUTLINED LATIN CROSS |
| U+2720 (10016) | ✠ | ✠ |
Malteserkreuz | MALTESE CROSS |
| U+2721 (10017) | ✡ | ✡ |
Davidstern | STAR OF DAVID |
| U+2722 (10018) | ✢ | ✢ |
Vierarmiges Tropfensternchen (Tropfenkreuz) | FOUR TEARDROP-SPOKED ASTERISK |
| U+2723 (10019) | ✣ | ✣ |
Vierarmiges Ballensternchen (Ballenkreuz) | FOUR BALLOON-SPOKED ASTERISK |
| U+2724 (10020) | ✤ | ✤ |
Fettes vierarmiges Ballensternchen | HEAVY FOUR BALLOON-SPOKED ASTERISK |
| U+2725 (10021) | ✥ | ✥ |
Kleeblattsternchen (Kleeblattkreuz) | FOUR CLUB-SPOKED ASTERISK |
| U+2726 (10022) | ✦ | ✦ |
Gefüllter vierzackiger Stern | BLACK FOUR POINTED STAR |
| U+2727 (10023) | ✧ | ✧ |
Weißer vierzackiger Stern | WHITE FOUR POINTED STAR |
| U+2729 (10025) | ✩ | ✩ |
Weißer fünfzackiger Stern | STRESS OUTLINED WHITE STAR |
| U+272A (10026) | ✪ | ✪ |
Weißer fünfzackiger Stern in gefülltem Kreis | CIRCLED WHITE STAR |
| U+272B (10027) | ✫ | ✫ |
Schwarzer fünfzackiger Stern mit offener Mitte | OPEN CENTRE BLACK STAR |
| U+272C (10028) | ✬ | ✬ |
Weißer fünfzackiger Stern mit schwarzer Mitte | BLACK CENTRE WHITE STAR |
| U+272D (10029) | ✭ | ✭ |
Schwarzer fünfzackiger Stern gefüllt, mit Kontur | OUTLINED BLACK STAR |
| U+272E (10030) | ✮ | ✮ |
Dicker schwarzer fünfzackiger Stern gefüllt, mit Kontur | HEAVY OUTLINED BLACK STAR |
| U+272F (10031) | ✯ | ✯ |
Fünfzackige Kompassrose | PINWHEEL STAR |
| U+2730 (10032) | ✰ | ✰ |
Weißer fünfzackiger Stern leer, mit Schatten | SHADOWED WHITE STAR |
| U+2731 (10033) | ✱ | ✱ |
Großes Sternchen | HEAVY ASTERISK |
| U+2732 (10034) | ✲ | ✲ |
Sternchen mit offener Mitte | OPEN CENTRE ASTERISK |
| U+2733 (10035) | ✳ | ✳ |
Achtarmiges Sternchen | EIGHT SPOKED ASTERISK |
| U+2734 (10036) | ✴ | ✴ |
Schwarzer achtzackiger Stern | EIGHT POINTED BLACK STAR |
| U+2735 (10037) | ✵ | ✵ |
Achtarmiges Windrädchen | EIGHT POINTED PINWHEEL STAR |
| U+2736 (10038) | ✶ | ✶ |
Sechszackiger schwarzer Stern | SIX POINTED BLACK STAR |
| U+2737 (10039) | ✷ | ✷ |
Geradlinig achtzackiger schwarzer Stern | EIGHT POINTED RECTILINEAR BLACK STAR |
| U+2738 (10040) | ✸ | ✸ |
Dicker geradlinig achtzackiger schwarzer Stern | HEAVY EIGHT POINTED RECTILINEAR BLACK STAR |
| U+2739 (10041) | ✹ | ✹ |
Schwarzer zwölfzackiger Stern | TWELVE POINTED BLACK STAR |
| U+273A (10042) | ✺ | ✺ |
Sechszehnarmiges Sternchen | SIXTEEN POINTED ASTERISK |
| U+273B (10043) | ✻ | ✻ |
Tropfensternchen | TEARDROP-SPOKED ASTERISK |
| U+273C (10044) | ✼ | ✼ |
Tropfensternchen mit offener Mitte | OPEN CENTRE TEARDROP-SPOKED ASTERISK |
| U+273D (10045) | ✽ | ✽ |
Dickes Tropfensternchen | HEAVY TEARDROP-SPOKED ASTERISK |
| U+273E (10046) | ✾ | ✾ |
Sechsblättrige Blüte, je schwarz und weiß | SIX PETALLED BLACK AND WHITE FLORETTE |
| U+273F (10047) | ✿ | ✿ |
Schwarze fünfblättrige Blüte | BLACK FLORETTE |
| U+2740 (10048) | ❀ | ❀ |
Weiße fünfblättrige Blüte | WHITE FLORETTE |
| U+2741 (10049) | ❁ | ❁ |
Achtblättrige umrandete schwarze Blüte | EIGHT PETALLED OUTLINED BLACK FLORETTE |
| U+2742 (10050) | ❂ | ❂ |
Achtzackiger Stern mit offener Mitte im Kreis | CIRCLED OPEN CENTRE EIGHT POINTED STAR |
| U+2743 (10051) | ❃ | ❃ |
Dickes tropfenförmiges Windrädchen | HEAVY TEARDROP-SPOKED PINWHEEL ASTERISK |
| U+2744 (10052) | ❄ | ❄ |
Schneeflocke | SNOWFLAKE |
| U+2745 (10053) | ❅ | ❅ |
Knapp gegabelte Schneeflocke | TIGHT TRIFOLIATE SNOWFLAKE |
| U+2746 (10054) | ❆ | ❆ |
Astige Schneeflocke mit dicken Winkeln | HEAVY CHEVRON SNOWFLAKE |
| U+2747 (10055) | ❇ | ❇ |
Funken | SPARKLE |
| U+2748 (10056) | ❈ | ❈ |
Dicker Funken | HEAVY SPARKLE |
| U+2749 (10057) | ❉ | ❉ |
Kugelsternchen | BALLOON-SPOKED ASTERISK |
| U+274A (10058) | ❊ | ❊ |
Propellersternchen aus acht Tropfen | EIGHT TEARDROP-SPOKED PROPELLER ASTERISK |
| U+274B (10059) | ❋ | ❋ |
Dickes Propellersternchen aus acht Tropfen | HEAVY EIGHT TEARDROP-SPOKED PROPELLER ASTERISK |
| U+274D (10061) | ❍ | ❍ |
Weißer Kreis nach rechts schattiert | SHADOWED WHITE CIRCLE |
| U+274F (10063) | ❏ | ❏ |
Weißes Quadrat unten rechts abgetrennt schattiert | LOWER RIGHT DROP-SHADOWED WHITE SQUARE |
| U+2750 (10064) | ❐ | ❐ |
Weißes Quadrat oben rechts abgetrennt schattiert | UPPER RIGHT DROP-SHADOWED WHITE SQUARE |
| U+2751 (10065) | ❑ | ❑ |
Weißes Quadrat nach unten rechts schattiert | LOWER RIGHT SHADOWED WHITE SQUARE |
| U+2752 (10066) | ❒ | ❒ |
Weißes Quadrat nach oben rechts schattiert | UPPER RIGHT SHADOWED WHITE SQUARE |
| U+2756 (10070) | ❖ | ❖ |
Schwarzes Karo ohne weißem X | BLACK DIAMOND MINUS WHITE X |
| U+2758 (10072) | ❘ | ❘ |
Dünner senkrechter Strich | LIGHT VERTICAL BAR |
| U+2759 (10073) | ❙ | ❙ |
Mittlererstarker senkrechter Strich | MEDIUM VERTICAL BAR |
| U+275A (10074) | ❚ | ❚ |
Dicker senkrechter Strich | HEAVY VERTICAL BAR |
| U+275B (10075) | ❛ | ❛ |
Dickes öffnendes halbes Anführungszeichen (englisch) | HEAVY SINGLE TURNED COMMA QUOTATION MARK ORNAMENT |
| U+275C (10076) | ❜ | ❜ |
Dickes schliessendes halbes Anführungszeichen (englisch) | HEAVY SINGLE COMMA QUOTATION MARK ORNAMENT |
| U+275D (10077) | ❝ | ❝ |
Dickes öffnendes Anführungszeichen (englisch) | HEAVY DOUBLE TURNED COMMA QUOTATION MARK ORNAMENT |
| U+275E (10078) | ❞ | ❞ |
Dickes schliessendes Anführungszeichen (englisch) | HEAVY DOUBLE COMMA QUOTATION MARK ORNAMENT |
| U+2761 (10081) | ❡ | ❡ |
Geschwungenes Absatzzeichen | CURVED STEM PARAGRAPH SIGN ORNAMENT |
| U+2762 (10082) | ❢ | ❢ |
Dickes geschwungenes Ausrufezeichen | HEAVY EXCLAMATION MARK ORNAMENT |
| U+2763 (10083) | ❣ | ❣ |
Dickes herzförmiges geschwungenes Ausrufezeichen | HEAVY HEART EXCLAMATION MARK ORNAMENT |
| U+2764 (10084) | ❤ | ❤ |
Dickes schwarzes Herz | HEAVY BLACK HEART |
| U+2765 (10085) | ❥ | ❥ |
Dickes schwarzes Herz, gegen den Uhrzeigersinn gedreht (Aufzählungszeichen) |
ROTATED HEAVY BLACK HEART BULLET |
| U+2766 (10086) | ❦ | ❦ |
Aldusblatt (wörtl. „florales Herz“) | FLORAL HEART |
| U+2767 (10087) | ❧ | ❧ |
Aldusblatt, gegen den Uhrzeigersinn gedreht (Aufzählungszeichen, wörtl. „gedrehtes florales Herz“, U+2619 REVERSED ROTATED FLORAL HEART BULLET ☙) |
ROTATED FLORAL HEART BULLET |
| U+2768 (10088) | ❨ | ❨ |
Öffnende runde Klammer | MEDIUM LEFT PARENTHESIS ORNAMENT |
| U+2769 (10089) | ❩ | ❩ |
Schließende runde Klammer | MEDIUM RIGHT PARENTHESIS ORNAMENT |
| U+276A (10090) | ❪ | ❪ |
Abgeflachte öffnende runde Klammer | MEDIUM FLATTENED LEFT PARENTHESIS ORNAMENT |
| U+276B (10091) | ❫ | ❫ |
Abgeflachte schließende runde Klammer | MEDIUM FLATTENED RIGHT PARENTHESIS ORNAMENT |
| U+276C (10092) | ❬ | ❬ |
Öffnende Winkelklammer | MEDIUM LEFT-POINTING ANGLE BRACKET ORNAMENT |
| U+276D (10093) | ❭ | ❭ |
Schließende Winkelklammern | MEDIUM RIGHT-POINTING ANGLE BRACKET ORNAMENT |
| U+276E (10094) | ❮ | ❮ |
Fettes linksweisendes einfaches Guillemet/Chevron (Anführungszeichen) | HEAVY LEFT-POINTING ANGLE QUOTATION MARK ORNAMENT |
| U+276F (10095) | ❯ | ❯ |
Fettes rechtsweisendes einfaches Guillemet/Chevron (Anführungszeichen) | HEAVY RIGHT-POINTING ANGLE QUOTATION MARK ORNAMENT |
| U+2770 (10096) | ❰ | ❰ |
Fette öffnende Winkelklammer | HEAVY LEFT-POINTING ANGLE BRACKET ORNAMENT |
| U+2771 (10097) | ❱ | ❱ |
Fette schließende Winkelklammern | HEAVY RIGHT-POINTING ANGLE BRACKET ORNAMENT |
| U+2772 (10098) | ❲ | ❲ |
Feine linke schildpattförmige (?) Zierklammer | LIGHT LEFT TORTOISE SHELL BRACKET ORNAMENT |
| U+2773 (10099) | ❳ | ❳ |
Feine rechte schildpattförmige (?) Zierklammer | LIGHT RIGHT TORTOISE SHELL BRACKET ORNAMENT |
| U+2774 (10100) | ❴ | ❴ |
Öffnende geschweifte Klammer | MEDIUM LEFT CURLY BRACKET ORNAMENT |
| U+2775 (10101) | ❵ | ❵ |
Schließende geschweifte Klammer | MEDIUM RIGHT CURLY BRACKET ORNAMENT |
| U+2776 (10102) | ❶ | ❶ |
Eins im gefüllten Kreis | DINGBAT NEGATIVE CIRCLED DIGIT ONE |
| U+2777 (10103) | ❷ | ❷ |
Zwei im gefüllten Kreis | DINGBAT NEGATIVE CIRCLED DIGIT TWO |
| U+2778 (10104) | ❸ | ❸ |
Drei im gefüllten Kreis | DINGBAT NEGATIVE CIRCLED DIGIT THREE |
| U+2779 (10105) | ❹ | ❹ |
Vier im gefüllten Kreis | DINGBAT NEGATIVE CIRCLED DIGIT FOUR |
| U+277A (10106) | ❺ | ❺ |
Fünf im gefüllten Kreis | DINGBAT NEGATIVE CIRCLED DIGIT FIVE |
| U+277B (10107) | ❻ | ❻ |
Sechs im gefüllten Kreis | DINGBAT NEGATIVE CIRCLED DIGIT SIX |
| U+277C (10108) | ❼ | ❼ |
Sieben im gefüllten Kreis | DINGBAT NEGATIVE CIRCLED DIGIT SEVEN |
| U+277D (10109) | ❽ | ❽ |
Acht im gefüllten Kreis | DINGBAT NEGATIVE CIRCLED DIGIT EIGHT |
| U+277E (10110) | ❾ | ❾ |
Neun im gefüllten Kreis | DINGBAT NEGATIVE CIRCLED DIGIT NINE |
| U+277F (10111) | ❿ | ❿ |
Zehn im gefüllten Kreis | DINGBAT NEGATIVE CIRCLED NUMBER TEN |
| U+2780 (10112) | ➀ | ➀ |
Eins im Kreis, serifenlos | DINGBAT CIRCLED SANS-SERIF DIGIT ONE |
| U+2781 (10113) | ➁ | ➁ |
Zwei im Kreis, serifenlos | DINGBAT CIRCLED SANS-SERIF DIGIT TWO |
| U+2782 (10114) | ➂ | ➂ |
Drei im Kreis, serifenlos | DINGBAT CIRCLED SANS-SERIF DIGIT THREE |
| U+2783 (10115) | ➃ | ➃ |
Vier im Kreis, serifenlos | DINGBAT CIRCLED SANS-SERIF DIGIT FOUR |
| U+2784 (10116) | ➄ | ➄ |
Fünf im Kreis, serifenlos | DINGBAT CIRCLED SANS-SERIF DIGIT FIVE |
| U+2785 (10117) | ➅ | ➅ |
Sechs im Kreis, serifenlos | DINGBAT CIRCLED SANS-SERIF DIGIT SIX |
| U+2786 (10118) | ➆ | ➆ |
Sieben im Kreis, serifenlos | DINGBAT CIRCLED SANS-SERIF DIGIT SEVEN |
| U+2787 (10119) | ➇ | ➇ |
Acht im Kreis, serifenlos | DINGBAT CIRCLED SANS-SERIF DIGIT EIGHT |
| U+2788 (10120) | ➈ | ➈ |
Neun im Kreis, serifenlos | DINGBAT CIRCLED SANS-SERIF DIGIT NINE |
| U+2789 (10121) | ➉ | ➉ |
Zehn im Kreis, serifenlos | DINGBAT CIRCLED SANS-SERIF NUMBER TEN |
| U+278A (10122) | ➊ | ➊ |
Eins im gefüllten Kreis, serifenlos | DINGBAT NEGATIVE CIRCLED SANS-SERIF DIGIT ONE |
| U+278B (10123) | ➋ | ➋ |
Zwei im gefüllten Kreis, serifenlos | DINGBAT NEGATIVE CIRCLED SANS-SERIF DIGIT TWO |
| U+278C (10124) | ➌ | ➌ |
Drei im gefüllten Kreis, serifenlos | DINGBAT NEGATIVE CIRCLED SANS-SERIF DIGIT THREE |
| U+278D (10125) | ➍ | ➍ |
Vier im gefüllten Kreis, serifenlos | DINGBAT NEGATIVE CIRCLED SANS-SERIF DIGIT FOUR |
| U+278E (10126) | ➎ | ➎ |
Fünf im gefüllten Kreis, serifenlos | DINGBAT NEGATIVE CIRCLED SANS-SERIF DIGIT FIVE |
| U+278F (10127) | ➏ | ➏ |
Sechs im gefüllten Kreis, serifenlos | DINGBAT NEGATIVE CIRCLED SANS-SERIF DIGIT SIX |
| U+2790 (10128) | ➐ | ➐ |
Sieben im gefüllten Kreis, serifenlos | DINGBAT NEGATIVE CIRCLED SANS-SERIF DIGIT SEVEN |
| U+2791 (10129) | ➑ | ➑ |
Acht im gefüllten Kreis, serifenlos | DINGBAT NEGATIVE CIRCLED SANS-SERIF DIGIT EIGHT |
| U+2792 (10130) | ➒ | ➒ |
Neun im gefüllten Kreis, serifenlos | DINGBAT NEGATIVE CIRCLED SANS-SERIF DIGIT NINE |
| U+2793 (10131) | ➓ | ➓ |
Zehn im gefüllten Kreis, serifenlos | DINGBAT NEGATIVE CIRCLED SANS-SERIF NUMBER TEN |
| U+2794 (10132) | ➔ | ➔ |
Dicker Pfeil nach rechts mit breiter Spitze | HEAVY WIDE-HEADED RIGHTWARDS ARROW |
| U+2798 (10136) | ➘ | ➘ |
Dicker Pfeil nach Südost | HEAVY SOUTH EAST ARROW |
| U+2799 (10137) | ➙ | ➙ |
Dicker Pfeil nach rechts | HEAVY RIGHTWARDS ARROW |
| U+279A (10138) | ➚ | ➚ |
Dicker Pfeil nach Nordost | HEAVY NORTH EAST ARROW |
| U+279B (10139) | ➛ | ➛ |
Bemaßungspfeil nach rechts | DRAFTING POINT RIGHTWARDS ARROW |
| U+279C (10140) | ➜ | ➜ |
Dicker Pfeil nach rechts, mit abgerundeten Balken | HEAVY ROUND-TIPPED RIGHTWARDS ARROW |
| U+279D (10141) | ➝ | ➝ |
Pfeil nach rechts mit Dreiecksspitze | TRIANGLE-HEADED RIGHTWARDS ARROW |
| U+279E (10142) | ➞ | ➞ |
Dicker Pfeil nach rechts mit Dreiecksspitze | HEAVY TRIANGLE-HEADED RIGHTWARDS ARROW |
| U+279F (10143) | ➟ | ➟ |
Strichlierter Pfeil nach rechts mit Dreiecksspitze | DASHED TRIANGLE-HEADED RIGHTWARDS ARROW |
| U+27A0 (10144) | ➠ | ➠ |
Dicker strichlierter Pfeil nach rechts mit Dreiecksspitze | HEAVY DASHED TRIANGLE-HEADED RIGHTWARDS ARROW |
| U+27A1 (10145) | ➡ | ➡ |
Schwarzer Pfeil nach rechts | BLACK RIGHTWARDS ARROW |
| U+27A2 (10146) | ➢ | ➢ |
Dreidimensionaler Pfeil nach rechts, oben weiß | THREE-D TOP-LIGHTED RIGHTWARDS ARROWHEAD |
| U+27A3 (10147) | ➣ | ➣ |
Dreidimensionaler Pfeil nach rechts, unten weiß | THREE-D BOTTOM-LIGHTED RIGHTWARDS ARROWHEAD |
| U+27A4 (10148) | ➤ | ➤ |
Schwarze Pfeilspitze nach rechts | BLACK RIGHTWARDS ARROWHEAD |
| U+27A5 (10149) | ➥ | ➥ |
Dicker schwarzer Pfeil, nach unten und rechts gebogen | HEAVY BLACK CURVED DOWNWARDS AND RIGHTWARDS ARROW |
| U+27A6 (10150) | ➦ | ➦ |
Dicker schwarzer Pfeil, nach oben und rechts gebogen | HEAVY BLACK CURVED UPWARDS AND RIGHTWARDS ARROW |
| U+27A7 (10151) | ➧ | ➧ |
Gestauchter schwarzer Pfeil nach rechts | SQUAT BLACK RIGHTWARDS ARROW |
| U+27A8 (10152) | ➨ | ➨ |
Dicker konkavspitzer schwarzer Pfeil nach rechts | HEAVY CONCAVE-POINTED BLACK RIGHTWARDS ARROW |
| U+27A9 (10153) | ➩ | ➩ |
Weißer Pfeil nach rechts mit Rechtsschatten | RIGHT-SHADED WHITE RIGHTWARDS ARROW |
| U+27AA (10154) | ➪ | ➪ |
Weißer Pfeil nach rechts mit Linksschatten | LEFT-SHADED WHITE RIGHTWARDS ARROW |
| U+27AB (10155) | ➫ | ➫ |
Nach hinten gekippter Pfeil nach rechts mit Schatten | BACK-TILTED SHADOWED WHITE RIGHTWARDS ARROW |
| U+27AC (10156) | ➬ | ➬ |
Nach vorn gekippter Pfeil nach rechts mit Schatten | FRONT-TILTED SHADOWED WHITE RIGHTWARDS ARROW |
| U+27AD (10157) | ➭ | ➭ |
Dicker weißer Pfeil nach rechts mit Schatten rechts unten | HEAVY LOWER RIGHT-SHADOWED WHITE RIGHTWARDS ARROW |
| U+27AE (10158) | ➮ | ➮ |
Dicker weißer Pfeil nach rechts mit Schatten rechts oben | HEAVY UPPER RIGHT-SHADOWED WHITE RIGHTWARDS ARROW |
| U+27AF (10159) | ➯ | ➯ |
Gekerbter Pfeil nach rechts mit Schatten rechts unten | NOTCHED LOWER RIGHT-SHADOWED WHITE RIGHTWARDS ARROW |
| U+27B1 (10161) | ➱ | ➱ |
Gekerbter Pfeil nach rechts mit Schatten rechts oben | NOTCHED UPPER RIGHT-SHADOWED WHITE RIGHTWARDS ARROW |
| U+27B2 (10162) | ➲ | ➲ |
Dicker weißer Pfeil nach rechts im Kreis | CIRCLED HEAVY WHITE RIGHTWARDS ARROW |
| U+27B3 (10163) | ➳ | ➳ |
Weiß-gefiederter Pfeil nach rechts | WHITE-FEATHERED RIGHTWARDS ARROW |
| U+27B4 (10164) | ➴ | ➴ |
Schwarz-gefiederter Pfeil nach Südosten | BLACK-FEATHERED SOUTH EAST ARROW |
| U+27B5 (10165) | ➵ | ➵ |
Schwarz-gefiederter Pfeil nach rechts | BLACK-FEATHERED RIGHTWARDS ARROW |
| U+27B6 (10166) | ➶ | ➶ |
Schwarz-gefiederter Pfeil nach Nordosten | BLACK-FEATHERED NORTH EAST ARROW |
| U+27B7 (10167) | ➷ | ➷ |
Dicker schwarz-gefiederter Pfeil nach Südosten | HEAVY BLACK-FEATHERED SOUTH EAST ARROW |
| U+27B8 (10168) | ➸ | ➸ |
Dicker schwarz-gefiederter Pfeil nach rechts | HEAVY BLACK-FEATHERED RIGHTWARDS ARROW |
| U+27B9 (10169) | ➹ | ➹ |
Dicker schwarz-gefiederter Pfeil nach Nordosten | HEAVY BLACK-FEATHERED NORTH EAST ARROW |
| U+27BA (10170) | ➺ | ➺ |
Pfeil nach rechts mit tropfenförmigen Widerhaken | TEARDROP-BARBED RIGHTWARDS ARROW |
| U+27BB (10171) | ➻ | ➻ |
Dicker Pfeil nach rechts mit tropfenförmigem Schaft | HEAVY TEARDROP-SHANKED RIGHTWARDS ARROW |
| U+27BC (10172) | ➼ | ➼ |
Keilschwänziger Pfeil nach rechts | WEDGE-TAILED RIGHTWARDS ARROW |
| U+27BD (10173) | ➽ | ➽ |
Dicker keilschwänziger Pfeil nach rechts | HEAVY WEDGE-TAILED RIGHTWARDS ARROW |
| U+27BE (10174) | ➾ | ➾ |
Offen konturierter Pfeil nach rechts | OPEN-OUTLINED RIGHTWARDS ARROW |
Unicode-Block: CJK-Symbols and Punctuation U+3000 (12288) – U+303f(12351) → Unicode.org chart U+3000 (12288) – U+303f(12351) (PDF), → Unicode.org chart U+3000 (12288) – U+303f(12351)
| Unicode-CJK-Symbols and Punctuation U+3000 (12288) – U+303f(12351) | ||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| x= | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | a | b | c | d | e | f |
U+300x
| 、 | 。 | 〃 | 〄 | 々 | 〆 | 〇 | 〈 | 〉 | 《 | 》 | 「 | 」 | 『 | 』 | |
U+301x
| 【 | 】 | 〒 | 〓 | 〔 | 〕 | 〖 | 〗 | 〘 | 〙 | 〚 | 〛 | 〜 | 〝 | 〞 | 〟 |
U+302x
| 〠 | 〡 | 〢 | 〣 | 〤 | 〥 | 〦 | 〧 | 〨 | 〩 | 〪 | 〫 | 〬 | 〭 | 〮 | 〯 |
U+303x
| 〰 | 〱 | 〲 | 〳 | 〴 | 〵 | 〶 | 〷 | 〸 | 〹 | 〺 | 〻 | 〼 | 〽 | 〾 | 〿 |
Es gibt zahlreiche Schriften für die Weltsprachen. In Indien gibt es z.B. 15 zugelassene Sprachen. Eine XHTML-Seite kann länderspezifische Zeichensätze verwenden. XHTML unterstützt Meta-Infos, die der Unicode-Zeichen-Darstellung dienen. Wie werden die Schriften in einer XHML-Seite unterstützt?
Wie sieht ein typischer Aufbau einer XHTML-Seite (mit Meta-Infos) aus? Eine XHTML-Seite besteht aus einem hirachischen strukturiertem Dokument. XHTML unterstützt Zeichensätze, die der Unicode-Zeichen-Darstellung dienen. Eine XHTML-Page mit einer XML-Verarbeitungsanweisung (1.Zeile) hat den typischen Aufbau:
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" lang="de" xml:lang="de"> <head> <title> mein-titel </title> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /> <meta http-equiv="Content-Script-Type" content="text/javascript" charset="" /> <meta http-equiv="Content-Style-Type" content="text/css" charset="" /> <meta http-equiv="expires" content="0" /> <!-- Page von Org-Adresse laden --> <meta http-equiv="refresh" content="5; URL=http://www.fh-giessen.de/~hg54/" /> <meta name="author" content="mein Name" /> <meta name="copyright" content="besitze alle Rechte" /> <meta http-equiv="Content-Style-Type" content="text/css" /> <meta name="description" content="Kurzbeschreibung" /> <meta name="keywords" content="Inhalt-Schlüsselworte" /> <link type='text/css' rel='stylesheet' href='html-standard.css' /> </head> <body> ... </body> </html>
Wie werden XHTML-eigene Zeichen maskiert? Die folgenden Zeichen haben in XHTML eine "Umschalt"-Bedeutung und müssen bei einer Darstellung in einer html-Page maskiert werden, wenn das grafische Zeichen angezeigt werden soll.
| Zeichen | Bedeutung | named-entity | Unicode |
|---|---|---|---|
| & | "Ampersand" | & | & |
| < | "lower than" | < | < |
| > | "greater than" | > | > |
| " | "Quote" | " | " |
| ' | "Apostroph" (XML) | ' | ' |
Die ANSI-Codierung (wie z.B. charset=iso-8859-1) lässt nur Zeichen zu, die von der aktuellen Codepage unterstützt werden. Dadurch wird die internationale Verwendbarkeit eingeschränkt.
Was sind "benannte Zeichen" (named entities)?
& )
α )
∑ )
⌈ )
↑ )
• )
‰
—
– )
©
®
¢
£
¦
Wie können deutsche Umlaut in XHTML als "benannte Zeichen" (named entities) dargestellt werden?
html kann den Unicode (ISO-10646-Standard = Unicode-Standard) verwenden, der Deutsche Umlaute und scharfes S enthält. Das Euro-Zeichen ( € ) kann z.B. im html-Quelltext durch € dargestellt werden. Ohne Angaben zum verwendeten Zeichencode, müssen deutsche Umlaute durch benannte und/oder benummerte html-Zeichen ersetzt werden.
| Zeichen | ä | Ä | ö | Ö | ü | Ü | ß |
| html-Ersatzname (named entities) | ä | Ä | ö | Ö | ü | Ü | ß |
Beispiel für html-Quellttext mit named-entities:
XHTML-Quelltext: Fräulein Müllers möchte einen Kuß.
Browser-Anzeige: Fräulein Müllers möchte einen Kuß.
Zum Austausch von Zeichen in einem String var str;
kann die eingebaute ECMAScript-Funktion
str = str.replace(FindRe, Replstring)
verwendet werden.
Als Beispiel einige Paare für /RegExpr/ und "ErsatzStr"
/\ä/ ersetze durch "ä", /\Ä/ ersetze durch "Ä", /\ö/ ersetze durch "ö", /\Ö/ ersetze durch "Ö", /\ü/ ersetze durch "ü", /\Ü/ ersetze durch "Ü", /\ß/ ersetze durch "ß"
Die Dateien lat1.ent, symbol.ent, spezial.ent sind Bestandteil der XHTML-DTD (Dokumenttypdefinition, engl. DTD = Document Type Definition, auch Schema-Definition oder DOCTYPE) :
Die Datei lat1.ent gehört zur XHTML-DTD xhtml1-....dtd includiert die Dateien lat1.ent, symbol.ent, spezial.ent, die die "ENTITIES"-Zeichen-Definitionen enthalten. Diese "benannten" (mit Kurznamen benamte) und "benummerten" Zeichen können in XHTML verwendet werden. Nicht alle Zeichen werden derzeit von allen Browsern unterstützt (siehe Datei lat1.ent ).
| ENTITIES Latin 1 for XHTML (lat1.ent) | ||||||||
|---|---|---|---|---|---|---|---|---|
|   | | ¡ | ¡ | ¡ | ¢ | ¢ | ¢ | |
| £ | £ | £ | ¤ | ¤ | ¤ | ¥ | ¥ | ¥ |
| ¦ | ¦ | ¦ | § | § | § | ¨ | ¨ | ¨ |
| © | © | © | ª | ª | ª | « | « | « |
| ¬ | ¬ | ¬ | | ­ | ­ | ® | ® | ® |
| ¯ | ¯ | ¯ | ° | ° | ° | ± | ± | ± |
| ² | ² | ² | ³ | ³ | ³ | ´ | ´ | ´ |
| µ | µ | µ | ¶ | ¶ | ¶ | · | · | · |
| ¸ | ¸ | ¸ | ¹ | ¹ | ¹ | º | º | º |
| » | » | » | ¼ | ¼ | ¼ | ½ | ½ | ½ |
| ¾ | ¾ | ¾ | ¿ | ¿ | ¿ | À | À | À |
| Á | Á | Á | Â | Â | Â | Ã | Ã | Ã |
| Ä | Ä | Ä | Å | Å | Å | Æ | Æ | Æ |
| Ç | Ç | Ç | È | È | È | É | É | É |
| Ê | Ê | Ê | Ë | Ë | Ë | Ì | Ì | Ì |
| Í | Í | Í | Î | Î | Î | Ï | Ï | Ï |
| Ð | Ð | Ð | Ñ | Ñ | Ñ | Ò | Ò | Ò |
| Ó | Ó | Ó | Ô | Ô | Ô | Õ | Õ | Õ |
| Ö | Ö | Ö | × | × | × | Ø | Ø | Ø |
| Ù | Ù | Ù | Ú | Ú | Ú | Û | Û | Û |
| Ü | Ü | Ü | Ý | Ý | Ý | Þ | Þ | Þ |
| ß | ß | ß | à | à | à | á | á | á |
| â | â | â | ã | ã | ã | ä | ä | ä |
| å | å | å | æ | æ | æ | ç | ç | ç |
| è | è | è | é | é | é | ê | ê | ê |
| ë | ë | ë | ì | ì | ì | í | í | í |
| î | î | î | ï | ï | ï | ð | ð | ð |
| ñ | ñ | ñ | ò | ò | ò | ó | ó | ó |
| ô | ô | ô | õ | õ | õ | ö | ö | ö |
| ÷ | ÷ | ÷ | ø | ø | ø | ù | ù | ù |
| ú | ú | ú | û | û | û | ü | ü | ü |
| ý | ý | ý | þ | þ | þ | ÿ | ÿ | ÿ |
xhtml1-....dtd includiert die Dateien lat1.ent, symbol.ent, spezial.ent, die die "ENTITIES"-Zeichen-Definitionen enthalten. Diese "benannten" (mit Kurznamen benamte) und "benummerten" Zeichen können in XHTML verwendet werden (siehe symbol.ent ). Nicht alle Zeichen werden derzeit von allen Browsern unterstützt
| ENTITIES Symbols for XHTML (symbol.ent) | ||||||||
|---|---|---|---|---|---|---|---|---|
| ƒ | ƒ | ƒ | Α | Α | Α | Β | Β | Β |
| Γ | Γ | Γ | Δ | Δ | Δ | Ε | Ε | Ε |
| Ζ | Ζ | Ζ | Η | Η | Η | Θ | Θ | Θ |
| Ι | Ι | Ι | Κ | Κ | Κ | Λ | Λ | Λ |
| Μ | Μ | Μ | Ν | Ν | Ν | Ξ | Ξ | Ξ |
| Ο | Ο | Ο | Π | Π | Π | Ρ | Ρ | Ρ |
| Σ | Σ | Σ | Τ | Τ | Τ | Υ | Υ | Υ |
| Φ | Φ | Φ | Χ | Χ | Χ | Ψ | Ψ | Ψ |
| Ω | Ω | Ω | α | α | α | β | β | β |
| γ | γ | γ | δ | δ | δ | ε | ε | ε |
| ζ | ζ | ζ | η | η | η | θ | θ | θ |
| ι | ι | ι | κ | κ | κ | λ | λ | λ |
| μ | μ | μ | ν | ν | ν | ξ | ξ | ξ |
| ο | ο | ο | π | π | π | ρ | ρ | ρ |
| ς | ς | ς | σ | σ | σ | τ | τ | τ |
| υ | υ | υ | φ | φ | φ | χ | χ | χ |
| ψ | ψ | ψ | ω | ω | ω | ϑ | ϑ | ϑ |
| ϒ | ϒ | ϒ | ϖ | ϖ | ϖ | • | • | • |
| … | … | … | ′ | ′ | ′ | ″ | ″ | ″ |
| ‾ | ‾ | ‾ | ⁄ | ⁄ | ⁄ | ℘ | ℘ | ℘ |
| ℑ | ℑ | ℑ | ℜ | ℜ | ℜ | ™ | ™ | ™ |
| ℵ | ℵ | ℵ | ← | ← | ← | ↑ | ↑ | ↑ |
| → | → | → | ↓ | ↓ | ↓ | ↔ | ↔ | ↔ |
| ↵ | ↵ | ↵ | ⇐ | ⇐ | ⇐ | ⇑ | ⇑ | ⇑ |
| ⇒ | ⇒ | ⇒ | ⇓ | ⇓ | ⇓ | ⇔ | ⇔ | ⇔ |
| ∀ | ∀ | ∀ | ∂ | ∂ | ∂ | ∃ | ∃ | ∃ |
| ∅ | ∅ | ∅ | ∇ | ∇ | ∇ | ∈ | ∈ | ∈ |
| ∉ | ∉ | ∉ | ∋ | ∋ | ∋ | ∏ | ∏ | ∏ |
| ∑ | ∑ | ∑ | − | − | − | ∗ | ∗ | ∗ |
| √ | √ | √ | ∝ | ∝ | ∝ | ∞ | ∞ | ∞ |
| ∠ | ∠ | ∠ | ∧ | ∧ | ∧ | ∨ | ∨ | ∨ |
| ∩ | ∩ | ∩ | ∪ | ∪ | ∪ | ∫ | ∫ | ∫ |
| ∴ | &8756; | ∴ | ∼ | ∼ | ∼ | ≅ | ≅ | ≅ |
| ≈ | ≈ | ≈ | ≠ | ≠ | ≠ | ≡ | ≡ | ≡ |
| ≤ | ≤ | ≤ | ≥ | ≥ | ≥ | ⊂ | ⊂ | ⊂ |
| ⊃ | ⊃ | ⊃ | ⊄ | ⊄ | ⊄ | ⊆ | ⊆ | ⊆ |
| ⊇ | ⊇ | ⊇ | ⊕ | ⊕ | ⊕ | ⊗ | ⊗ | ⊗ |
| ⊥ | ⊥ | ⊥ | ⋅ | ⋅ | ⋅ | ⌈ | ⌈ | ⌈ |
| ⌉ | ⌉ | ⌉ | ⌊ | ⌊ | ⌊ | ⌋ | ⌋ | ⌋ |
| 〈 | 〈 | ⟨ | 〉 | 〉 | ⟩ | ◊ | ◊ | ◊ |
| ♠ | ♠ | ♠ | ♣ | ♣ | ♣ | ♥ | ♥ | ♥ |
| ♦ | ♦ | ♦ | ||||||
xhtml1-....dtd includiert die Dateien lat1.ent, symbol.ent, spezial.ent, die die "ENTITIES"-Zeichen-Definitionen enthalten. XHTML verwendet werden (siehe Datei special.ent ). Nicht alle Zeichen werden derzeit von allen Browsern unterstützt (siehe special.ent ).
| XHTML (spezial.ent) | ||||||||
|---|---|---|---|---|---|---|---|---|
| " | " | " | & | & | & | < | < | < |
| > | > | > | ' | ' | ' | Œ | Œ | Œ |
| œ | œ | œ | Š | Š | Š | š | š | š |
| Ÿ | Ÿ | Ÿ | ˆ | ˆ | ˆ | ˜ | ˜ | ˜ |
|   |   |   |   |   |   | |||
| | ‌ | ‌ | | ‍ | ‍ | | ‎ | ‎ |
| | ‏ | ‏ | – | – | – | — | — | — |
| ‘ | ‘ | ‘ | ’ | ’ | ’ | ‚ | ‚ | ‚ |
| “ | “ | “ | ” | ” | ” | „ | „ | „ |
| † | † | † | ‡ | ‡ | ‡ | ‰ | ‰ | ‰ |
| ‹ | ‹ | ‹ | › | › | › | € | € | € |
ECMAScript 262 wird umgangssprachlich JavaScript genannt und ist eine standardisierte Skriptsprache
(modern, schlank, dynamisch typisierte, objektorientiert aber klassenlos, Prototypen;
kann prozedural, funktional, objektorientiert fuer DOM-Scripting in Web-Browsern).
ECMAScript
-Programm erstellt aus den XHTML
-lat1-Zeichen
mit Hilfe von unescape("%"+j.toString(16));
die Tabelle für die "benummerten" und "benannte" Zeichen.
function build_html_zeichen_tabelle()
{ // start_idx=160
var aa=[" ", "¡", "¢", "£", "¤",
"¥", "¦","§", "¨", "©",
"ª", "«", "¬", "", "®",
"¯", "°", "±","²", "³",
"´", "µ","¶", "·","¸",
"¹", "º", "»", "¼","½",
"¾","¿","À","Á","Â",
"Ã","Ä", "Å","Æ","Ç",
"È","É","Ê","Ë","Ì",
"Í","Î", "Ï","Ð","Ñ",
"Ò","Ó","Ô","Õ","Ö",
"×","Ø", "Ù","Ú","Û",
"Ü","Ý", "Þ","ß","à",
"á","â", "ã","ä","å",
"æ","ç", "è","é","ê",
"ë","ì", "í","î","ï",
"ð","ñ", "ò","ó","ô",
"õ","ö", "÷","ø","ù",
"ú","û", "ü","ý","þ",
"ÿ"];
var j, start_idx=160, s="";
for(var i=0;i< aa.length; i++){
j = start_idx + i;
s +="<br />\"" + unescape("%"+j.toString(16));
s += "\", \"" + "&#"+j;
s += "\", \"" + aa[i];
s += "\",";
} document.write(s);
}
build_html_zeichen_tabelle();
Das A als benummertes XHTML/XML
Zeichen
kann durch A dargestellt werden.
Die eingebaute ECMAScript-unescape()-Funktion liefert den HTML-Zeichencode.
Der folgende ECMAScript-Code dient dazu, mit Hilfe der Funktion unescape()
die benummerte Zeichenmaskierung (wie z.B. A) darzustaellen.
<textarea id="SRC" cols="90" rows="22">
</textarea>
<script type="text/javascript">
var s="<table border='1'>";
for (var i=0; i < 256; i++) {
if((i%10)==0)
if(i==0) s += "\n<tr><th colspan=\"2\">" + i + ":";
else s += "\n</th></tr><tr><th colspan=\"2\">" + i + ":";
s += "</th><td>"+unescape("%"+i.toString(16));
s += "</td><th>&#"+i;
} s += "</td></tr></table>"
document.getElementById("SRC").value=s;
</script>
| Anzeige von unescape("%"+i.toString(16)) und &#i | |||||||||||||||||||||
| i | % | &# | % | &# | % | &# | % | &# | % | &# | % | &# | % | &# | % | &# | % | &# | % | &# | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0: | %0 | � | %1 |  | %2 |  | %3 |  | %4 |  | %5 |  | %6 |  | %7 |  | %8 |  | %9 | 	 | |
| 10: | %a | 
 | %b |  | %c |  | %d | 
 | %e |  | %f |  |  |  |  |  | |||||
| 20: |  |  |  |  |  |  |  |  |  |  | |||||||||||
| 30: |  |  |   | ! | ! | " | " | # | # | $ | $ | % | % | & | & | ' | ' | ||||
| 40: | ( | ( | ) | ) | * | * | + | + | , | , | - | - | . | . | / | / | 0 | 0 | 1 | 1 | |
| 50: | 2 | 2 | 3 | 3 | 4 | 4 | 5 | 5 | 6 | 6 | 7 | 7 | 8 | 8 | 9 | 9 | : | : | ; | ; | |
| 60: | < | < | = | = | > | > | ? | ? | @ | @ | A | A | B | B | C | C | D | D | E | E | |
| 70: | F | F | G | G | H | H | I | I | J | J | K | K | L | L | M | M | N | N | O | O | |
| 80: | P | P | Q | Q | R | R | S | S | T | T | U | U | V | V | W | W | X | X | Y | Y | |
| 90: | Z | Z | [ | [ | \ | \ | ] | ] | ^ | ^ | _ | _ | ` | ` | a | a | b | b | c | c | |
| 100: | d | d | e | e | f | f | g | g | h | h | i | i | j | j | k | k | l | l | m | m | |
| 110: | n | n | o | o | p | p | q | q | r | r | s | s | t | t | u | u | v | v | w | w | |
| 120: | x | x | y | y | z | z | { | { | | | | | } | } | ~ | ~ |  | € |  | ||||
| 130: | ‚ | ƒ | „ | … | † | ‡ | ˆ | ‰ | Š | ‹ | |||||||||||
| 140: | Œ |  | Ž |  |  | ‘ | ’ | “ | ” | • | |||||||||||
| 150: | – | — | ˜ | ™ | š | › | œ |  | ž | Ÿ | |||||||||||
| 160: |   | ¡ | ¡ | ¢ | ¢ | £ | £ | ¤ | ¤ | ¥ | ¥ | ¦ | ¦ | § | § | ¨ | ¨ | © | © | ||
| 170: | ª | ª | « | « | ¬ | ¬ | | ­ | ® | ® | ¯ | ¯ | ° | ° | ± | ± | ² | ² | ³ | ³ | |
| 180: | ´ | ´ | µ | µ | ¶ | ¶ | · | · | ¸ | ¸ | ¹ | ¹ | º | º | » | » | ¼ | ¼ | ½ | ½ | |
| 190: | ¾ | ¾ | ¿ | ¿ | À | À | Á | Á | Â | Â | Ã | Ã | Ä | Ä | Å | Å | Æ | Æ | Ç | Ç | |
| 200: | È | È | É | É | Ê | Ê | Ë | Ë | Ì | Ì | Í | Í | Î | Î | Ï | Ï | Ð | Ð | Ñ | Ñ | |
| 210: | Ò | Ò | Ó | Ó | Ô | Ô | Õ | Õ | Ö | Ö | × | × | Ø | Ø | Ù | Ù | Ú | Ú | Û | Û | |
| 220: | Ü | Ü | Ý | Ý | Þ | Þ | ß | ß | à | à | á | á | â | â | ã | ã | ä | ä | å | å | |
| 230: | æ | æ | ç | ç | è | è | é | é | ê | ê | ë | ë | ì | ì | í | í | î | î | ï | ï | |
| 240: | ð | ð | ñ | ñ | ò | ò | ó | ó | ô | ô | õ | õ | ö | ö | ÷ | ÷ | ø | ø | ù | ù | |
| 250: | ú | ú | û | û | ü | ü | ý | ý | þ | þ | ÿ | ÿ | |||||||||
Zu XML -Dokumenten gehört eine Dokumenttypdefinition DTD (englisch Document Type Definition, DTD, auch Schema-Definition oder DOCTYPE) ist ein Satz an Regeln für Dokumente. Eine DTD legt die gültige die Struktur des Dokuments fest, d.h. eine DTD legt die Reihenfolge, die Verschachtelung der Elemente und die mögliche Art des Inhalts von Attributen fest.
Bei XML werden die Tags und die zugehörigen .dtd's (bzw. .xsd's) abhängig von den Datenstrukturen durch den User entwickelt. Benummerte Unicode-Zeichen können unmittelbar verwendet werden. Für "benannten" (mit Kurznamen benamte) Unicode-Zeichen müssen die "ENTITIES"-Zeichen-Definitionen erstellt werden, denn zunächst existieren ffür XML keine "Unicode-Kurz-Namen für Zeichen".
Bei einem zeichesystem spielen
die "Leer-Zeichen" eine wesentliche Rolle.
Bei
XML
bedürfen Daten, die "white-Chr"-erhalten
und deren Umwandlung (Transformationen) besonderer Aufmerksamkeit (z.B. sog. "nichtdruckbare Zeichen").
Die meisten Unicode-Zeichen liegen in den Unicode-Bereichen
#x20-#xD7FF, #xE000-#xFFFD, #x10000-#x10FFFF.
Es gibt auch Werte, die nicht erlaubt sind, wie z.B. #xFFFE und #xFFFF.
Leerraumzeichen (Steuerzeichen) sind z.B.
Tabulator-Zeichen (#x09;)
Zeilenvorschub-Zeichen (#x0a;)
Wagenrücklaufzeichen (#x0d;)
normales Leerzeichen (#x20;)
Unicode "blank characters" sind:
0009 = HT 000a = LF 000b = VT 000c = FF 000d = CR 0020 = space 0085 = next line 00a0 = non-breaking space 1680 = Ogham space mark 180e = Mongolian vowel separator 2000 - 2000b = spaces of different sizes, including zero 2028 = line separator 2029 = paragraph separator 202f = narrow no-break space 205f = medium mathematical space 3000 = ideographic space feff = zero-width no-break space
Hier eine Tabelle für Unicode-Separatoren (Space):
| Unicode und Separator, Space | |||||
|---|---|---|---|---|---|
| Symbol | Hex-Code | Dez-Code | Unicode-Name | Block | Vers |
| A B |   | 32 | SPACE | Basic Latin | 2.1 |
| A B |   | 160 | NO-BREAK SPACE | Latin-1 Supplement | 2.1 |
| A B |   | 8192 | EN QUAD | General Punctuation | 2.1 |
| A B |   | 8193 | EM QUAD | 2.1 | |
| A B |   | 8194 | EN SPACE | 2.1 | |
| A B |   | 8195 | EM SPACE | 2.1 | |
| A B |   | 8196 | THREE-PER-EM SPACE | 2.1 | |
| A B |   | 8197 | FOUR-PER-EM SPACE | 2.1 | |
| A B |   | 8198 | SIX-PER-EM SPACE | 2.1 | |
| A B |   | 8199 | FIGURE SPACE | 2.1 | |
| A B |   | 8200 | PUNCTUATION SPACE | 2.1 | |
| A B |   | 8201 | THIN SPACE | 2.1 | |
| A B |   | 8202 | HAIR SPACE |
|
2.1 |
| A B |   | 12288 | IDEOGRAPHIC SPACE | CJK Symbols and Punctuation | 2.1 |
| A B |   | 8239 | NARROW NO-BREAK SPACE | 3.0 | |
| A B |   | 5760 | OGHAM SPACE MARK | Ogham | 3.0 |
| AB | ᠎ | 6158 | MONGOLIAN VOWEL SEPARATOR | Mongolian | 3.0 |
| A B |   | 8287 | MEDIUM MATHEMATICAL SPACE | 3.2 | |
XML-Dokumente dürfen CDATA-Abschnitte enthalten. Diese werden nicht vom Parser interpretiert.
<![CDATA[<Element>dieses Element wird nur als Zeichenfolge ausgegeben</Element>]]>
Wie sind html-Zeichen in .dtd eingebunden?
Das Prozentzeichen % sagt, dass das Entity (der Inhalt) zum Bestandteil der aktuellen DTD wird.
.dtd-Beispiel für benannte Zeichen
Die benutzerdefinierten Zeichen &smiley_traurig; und &smiley_froehlich; werden bei xml in einer my_smilies.dtd hinterlegt (falls der Browser den Unicode hinreichend unterstützt).
<!-- in my_smilies.dtd --> <!ENTITY smiley_traurig "⍩" > <!ENTITY smiley_froehlich "⍪" > <!ELEMENT smilies (#PCDATA)> <!-- in my_smilies.xml --> <?xml version="1.0" encoding="ISO-8859-1" ?> <!DOCTYPE smilies SYSTEM "my_smilies.dtd"> <smilies> &smiley_froehlich; oder &smiley_traurig;? </smilies>
<!ENTITY % HTML_Chars PUBLIC "-//W3C//ENTITIES Latin1//EN//HTML" > %HTML_Chars;
xml und externe Ressourcen
<!-- news.dtd: --> <!ELEMENT news (newsdaten)*> <!ENTITY datenquelle SYSTEM "news.txt" > <!ELEMENT newsdaten EMPTY> <!ATTLIST newsdaten quelle ENTITY #REQUIRED> <!-- .xml: --> <?xml version="1.0" encoding="ISO-8859-1" ?> <!DOCTYPE news SYSTEM "news.dtd"> <news> <newsdaten quelle="datenquelle" /> </news>
Das Prozentzeichen % sagt, dass das Entity (der Inhalt) zum Bestandteil der aktuellen DTD wird.
<!-- produkt.dtd: --> <!ELEMENT produkt (warennummer,bezeichnung,hersteller)> <!ELEMENT warennummer (#PCDATA)> <!ELEMENT bezeichnung (#PCDATA)> <!ELEMENT hersteller (#PCDATA)> <!-- Bestellungen.dtd: --> <!ENTITY % produktdaten SYSTEM "produkt.dtd" > %produktdaten; <!ELEMENT bestellungen (bestellung)*> <!ELEMENT bestellung (produkt,besteller,anzahl,preis)*> <!ELEMENT besteller (#PCDATA)> <!ELEMENT anzahl (#PCDATA)> <!ELEMENT preis (#PCDATA)>