Anonymous
Not logged in
Talk
Contributions
Create account
Log in
RS-485
Search
Editing
Character encodings in HTML
(section)
From RS-485
Namespaces
Page
Discussion
More
More
Page actions
Read
Edit
Edit source
History
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
==Character references== {{Main|List of XML and HTML character entity references|Numeric character reference}} In addition to native character encodings, characters can also be encoded as ''character references'', which can be ''numeric character references'' ([[decimal]] or [[hexadecimal]]) or ''character entity references''. Character entity references are also sometimes referred to as ''named entities'', or ''HTML entities'' for HTML. HTML's usage of character references derives from [[SGML]]. ===HTML character references=== <!--Linked from [[Template:Auxiliary template common notice]]--> A ''[[numeric character reference]]'' in HTML refers to a character by its [[Universal Character Set]]/[[Unicode]] ''[[code point]]'', and uses the format {{block indent|<code>&#''nnnn'';</code>}} or {{block indent|<code>&#x''hhhh'';</code>}} where ''nnnn'' is the code point in [[decimal]] form, and ''hhhh'' is the code point in [[hexadecimal]] form. The ''x'' must be lowercase in XML documents. The ''nnnn'' or ''hhhh'' may be any number of digits and may include leading zeros. The ''hhhh'' may mix uppercase and lowercase, though uppercase is the usual style. Not all [[web browser]]s or [[email client]]s used by receivers of HTML documents, or [[text editor]]s used by authors of HTML documents, will be able to render all HTML characters. Most modern software is able to display most or all of the characters for the user's language, and will draw a box or other clear indicator for characters they cannot render. For codes from 0 to 127, the original 7-bit [[ASCII]] standard set, most of these characters can be used without a character reference. Codes from 160 to 255 can all be created using [[List of XML and HTML character entity references|character entity names]]. Only a few higher-numbered codes can be created using entity names, but all can be created by decimal number character reference. [[List of XML and HTML character entity references|Character entity references]] can also have the format <code>&''name'';</code> where ''name'' is a case-sensitive alphanumeric string. For example, "λ" can also be encoded as <code>&lambda;</code> in an HTML document. The character entity references <code>&lt;</code>, <code>&gt;</code>, <code>&quot;</code> and <code>&amp;</code> are predefined in HTML and SGML, because <code><</code>, <code>></code>, <code>"</code> and <code>&</code> are already used to delimit markup. This notably did not include XML's <code>&apos;</code> (') entity prior to [[HTML5]]. For a list of all named HTML character entity references along with the versions in which they were introduced, see [[List of XML and HTML character entity references]]. Unnecessary use of HTML character references may significantly reduce HTML readability. If the character encoding for a web page is chosen appropriately, then HTML character references are usually only required for markup delimiting characters as mentioned above, and for a few special characters (or none at all if a native [[Unicode]] encoding like [[UTF-8]] is used). Incorrect HTML entity escaping may also open up security vulnerabilities for injection attacks such as [[cross-site scripting]]. If HTML attributes are left unquoted, certain characters, most importantly [[whitespace character|whitespace]], such as space and tab, must be escaped using entities. Other languages related to HTML have their own methods of escaping characters. ===XML character references=== Unlike traditional HTML with its large range of character entity references, in [[XML]] there are only five predefined character entity references. These are used to escape characters that are markup sensitive in certain contexts:<ref>{{cite book |chapter-url=http://www.w3.org/TR/REC-xml/#sec-references |chapter=Character and Entity References |title=XML |first1=T. |last1=Bray |author-link1=Tim Bray |first2=J. |last2=Paoli |first3=C. |last3=Sperberg-McQueen |author-link3=Michael Sperberg-McQueen |first4=E. |last4=Maler |first5=F. |last5=Yergeau |publisher=[[W3C]] |date=26 November 2008 |access-date=8 March 2010}}</ref> {| class="wikitable" ! Reference !! Character !! Name !! Code point |- | <code>&amp;</code> ||align="center"| & || [[ampersand]] || U+0026 |- | <code>&lt;</code> ||align="center"| < || less-than sign || U+003C |- | <code>&gt;</code> ||align="center"| > || greater-than sign || U+003E |- | <code>&quot;</code> ||align="center"| " || quotation mark || U+0022 |- | <code>&apos;</code> ||align="center"| ' || apostrophe || U+0027 |} All other character entity references have to be defined before they can be used. For example, use of <code>&eacute;</code> (which gives é, Latin lower-case E with acute accent, U+00E9 in Unicode) in an XML document will generate an error unless the entity has already been defined. XML also requires that the <code>x</code> in hexadecimal numeric references be in lowercase: for example <code>&#xA1b</code> rather than <code>&#XA1b</code>. [[XHTML]], which is an XML application, supports the HTML entity set, along with XML's predefined entities.
Summary:
Please note that all contributions to RS-485 may be edited, altered, or removed by other contributors. If you do not want your writing to be edited mercilessly, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource (see
RS-485:Copyrights
for details).
Do not submit copyrighted work without permission!
Cancel
Editing help
(opens in new window)
Navigation
Navigation
Main page
Recent changes
Random page
Help about MediaWiki
Wiki tools
Wiki tools
Special pages
Page tools
Page tools
User page tools
More
What links here
Related changes
Page information
Page logs