Admin: Imported missing template from Wikipedia

2026-05-03T17:09:45Z

Imported missing template from Wikipedia

New page

{{Short description|Practice of deducing the file type of a bitstream}}
{{More citations needed|date=January 2024}}

'''Content sniffing''', also known as '''media type sniffing''' or '''MIME sniffing''', is the practice of inspecting the content of a [[byte stream]] to attempt to deduce the [[file format]] of the data within it. Content sniffing is generally used to compensate for a lack of accurate [[metadata]] that would otherwise be required to enable the file to be interpreted correctly. Content sniffing techniques tend to use a mixture of techniques that rely on the [[redundancy (information theory)|redundancy]] found in most file formats: looking for [[file format#Magic number|file signature]]s and [[magic number (programming)|magic number]]s, and [[heuristic]]s including searching for well-known representative substrings, the use of [[letter frequency|byte frequency]] and [[n-gram|''n''-gram]] tables, and [[Bayesian inference]].

[[Multipurpose Internet Mail Extensions]] (MIME) sniffing was, and still is, used by some [[web browser]]s, including notably [[Microsoft]]'s [[Internet Explorer]], in an attempt to help web sites which do not correctly signal the [[MIME type]] of web content display.<ref>{{cite web|url=http://msdn.microsoft.com/en-us/library/ms775147.aspx|title=MIME Type Detection in Windows Internet Explorer|publisher=Microsoft|access-date=2012-07-14}}</ref> However, doing this opens up a serious [[security vulnerability]],<ref>{{Cite web|url = http://www.adambarth.com/papers/2009/barth-caballero-song.pdf|title = Secure Content Sniffing for Web Browsers, or How to Stop Papers from Reviewing Themselves|last = Barth|first = Adam|doi = |pmid = |access-date = }}</ref> in which, by confusing the MIME sniffing algorithm, the browser can be manipulated into interpreting data in a way that allows an attacker to carry out operations that are not expected by either the site operator or user, such as [[cross-site scripting]].<ref>{{cite web|url=http://www.h-online.com/security/features/Risky-MIME-sniffing-in-Internet-Explorer-746229.html|title=Risky sniffing: MIME sniffing in Internet Explorer enables cross-site scripting attacks|author=Henry Sudhof|publisher=The H|date=11 February 2009|access-date=2012-07-14}}</ref> Moreover, by making sites which do not correctly assign MIME types to content appear to work correctly in those browsers, it fails to encourage the correct labeling of material, which in turn makes content sniffing necessary for these sites to work, creating a vicious circle of incompatibility with web standards and security best practices.

A specification exists for media type sniffing in [[HTML5]], which attempts to balance the requirements of security with the need for reverse compatibility with web content with missing or incorrect MIME-type data. It attempts to provide a precise specification that can be used across implementations to implement a single well-defined and deterministic set of behaviors.<ref>{{cite web|url=http://mimesniff.spec.whatwg.org/|title=Mime Sniffing|author=Adam Barth, Ian Hickson|publisher=WHATWG|access-date=2012-07-14}}</ref>

The UNIX [[file (command)|{{mono|file}} command]] can be viewed as a content sniffing application.

== Charset sniffing ==

{{see also|Charset detection}}
Numerous web browsers use a more limited form of content sniffing to attempt to determine the [[character encoding]] of text files for which the MIME type is already known. This technique is known as charset sniffing or [[codepage]] sniffing and, for certain encodings, may be used to bypass security restrictions too. For instance, [[Internet Explorer 7]] may be tricked to run [[JScript]] in circumvention of its policy by allowing the browser to guess that an [[HTML]]-file was encoded in [[UTF-7]].<ref>{{cite web
|url=http://msdn.microsoft.com/en-us/library/dd565635%28v=vs.85%29.aspx
|title=Event 1058 - Codepage Sniffing
|work=Internet Explorer
|publisher=[[MSDN]]
|access-date=2012-07-14
}}</ref> This bug is worsened by the feature of the UTF-7 encoding which permits multiple encodings of the same text and, specifically, alternative representations of [[ASCII]] characters.

Most encodings do not allow evasive presentations of ASCII characters, so charset sniffing is less dangerous in general because, due to the historical accident of the ASCII-centric nature of scripting and markup languages, characters outside the ASCII repertoire are more difficult to use to circumvent security boundaries, and misinterpretations of character sets tend to produce results no worse than the display of [[mojibake]].

== See also ==
*[[Browser sniffing]]

== References ==
{{reflist}}

== External links ==
* [https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/X-Content-Type-Options X-Content-Type-Options header]
* [https://mimesniff.spec.whatwg.org/ MIME Sniffing Standard]
* {{cite web|url=http://tools.ietf.org/html/draft-masinter-mime-web-info-00 |date=March 27, 2011 |author=L. Masinter |title=Internet Media Types and the Web |publisher=[[IETF]] |access-date=2012-07-14}}
* {{cite web|url=http://tools.ietf.org/html/draft-abarth-mime-sniff-06 |date=January 24, 2011 |author=A. Barth, I. Hickson |title=Media Type Sniffing |publisher=[[IETF]] |access-date=2012-07-14}}
* {{cite web|url=http://deletethis.net/dave/?q=mime-sniffing|title=Mime-sniffing|author=David Risney|access-date=2012-07-14}}

[[Category:Heuristics]]
[[Category:Computer file formats|*]]
[[Category:Web technology]]

Content sniffing - Revision history

Admin: Imported missing template from Wikipedia