The application/pdf Media
Type
Adobe Systems Incorporated
345 Park Ave
San Jose
CA
95110
USA
mahardy@adobe.com
Adobe Systems Incorporated
345 Park Ave
San Jose
CA
95110
USA
masinter@adobe.com
http://larry.masinter.net
Adobe Systems Incorporated
345 Park Ave
San Jose
CA
95110
USA
dmarkovi@adobe.com
PDF Association
Neue Kantstrasse 14
Berlin
14057
Germany
duff.johnson@pdfa.org
Global Graphics
2030 Cambourne Business Park
Cambridge
CB23 6DW
UK
martin.bailey@globalgraphics.com
http://www.globalgraphics.com
Requests for Comment
The Portable Document Format (PDF) is an ISO standard (ISO
32000-1:2008) defining a final-form document representation
language in use for document exchange, including on the
Internet, since 1993. This document provides an overview of
the PDF format and updates the media type registration of
application/pdf. It obsoletes
RFC 3778.
This document is intended to provide updated information
on the registration of the MIME Media Type
application/pdf for documents defined in the PDF , "Portable Document Format", syntax. It obsoletes .
PDF was originally envisioned as a way to reliably communicate
and view printed information electronically across a
wide variety of machine configurations, operating
systems, and communication networks.
PDF is used to represent "final form" formatted documents.
PDF pages may include
text, images, graphics and multimedia content such as
video and audio. PDF is also capable of containing
auxiliary structures including annotations,
bookmarks, file attachments, hyperlinks, logical
structure and metadata. These features are useful for
navigation, building collections of related documents and for
reviewing and commenting on documents. A rich JavaScript
model has been defined for interacting with PDF documents.
PDF used the imaging model of the
PostScript page description language
to render complex text, images, and graphics in a device
and resolution-independent manner.
PDF supports encryption and digital signatures. The
encryption capability is combined with access control
information to facilitate management of the
functionality available to the recipient. PDF supports
the inclusion of document and object-level metadata through
the eXtensible Metadata Platform.
PDF is used widely in the Internet community. The first
version of PDF, 1.0, was published in 1993 by Adobe Systems
Incorporated. Since then PDF has grown to be a widely-used
format for capturing and exchanging formatted documents
electronically across the Web, via e-mail and virtually every
other document exchange mechanism. In 2008, PDF 1.7 was
published as an ISO standard , ISO
32000-1:2008. It was adopted using ISO Fast-Track process and
is technically identical to Adobe Portable Document Format
version 1.7 referenced
by .
The ISO TC-171 committee is presently working on a
refresh of PDF, known as ISO 32000-2, with a version
of PDF 2.0, expected to be published in 2017.
In addition to ISO 32000-1:2008 and 32000-2,
several subset standards have been defined to
address specific use cases and standardized by
the ISO. These standards include PDF
for Archival (PDF/A) ,
PDF for Engineering (PDF/E) ,
PDF for Universal Accessibility (PDF/UA) ,
PDF for Variable Data and Transactional Printing (PDF/VT) ,
and PDF for Prepress Digital Data Exchange (PDF/X) .
The subset
standards are fully compliant PDF files capable of being
displayed in a general PDF viewer.
Fragment identifiers appear at the end of a URI, and provide a way to
reference an anchor to subordinate content within the target
of the URI, or additional parameters to the process of opening
the identified content. The syntax and semantics of fragment
identifiers is referenced in the media type definition.
The specification of fragment
identifiers for PDF appeared originally in , but
now will be included in ISO 32000-2 .
This section is a summary of that material. Any
disagreements between that document and this should be
resolved in favor of the ISO 32000-2 definition, once that has been
approved.
A fragment identifier for PDF has one or more parameters,
separated by the ampersand (&) or pound (#)
character. Each parameter consists of the parameter name, "="
(equal), and the parameter value; lists of values are
comma-separated, and parameter value strings may be
URI-encoded ().
Parameters are processed left to right.
Coordinate values (such as <left>, <right>,
<width>) are expressed in the default user space
coordinate system of the document: 1/72 of an inch
measured down and to the right from the upper-left corner
of the (current) page. ( 8.3.2.3 "User Space")
The following parameters identify subordinate content
of a PDF file, but also may be used to set the document view
to make the (start of) the identified content visible:
Identifies a specified (physical) page; the first page in the document
has a pageNum value of 1.
Identifies a named destination ( 12.3.2.4 "Named destinations").
structID is a byte string with URI encoding;
identifies the structure element with ID key
within a StructElem dictionary of the document.
The commentID is the value of an annotation name, which is
defined by the NM key in the corresponding annotation dictionary
(of the selected page. (
12.5.2 "Annotation dictionaries")
Identifies the embedded file where the parameter
string <name> matches a file
specification dictionary in the EmbeddedFiles name tree.
If the "ef" parameter is not at the end of the fragment
identifier, then the rest of the fragment identifier (after
the ampersand or hash delimiter) is applied to the embedded
file according to its own media type. This allows
identification of content within the embedded file
(which itself might be a PDF file).
NOTE: When opening a PDF file that is not from a
trusted source, processor may choose to prompt the user or
even prevent opening of the file.
These parameters also operate on the view of
the PDF document when it is opened.
<scale> is the percentage to which the document should
be zoomed, where a value of 100 correspond to a zoom of 100%.
<left> and <top> are optional, but both must be
specified if either is included.
The arguments correspond to those found in 12.3.2.2
"Explicit destinations".
keyword is one of the keywords defined in
"Table 149: Destination syntax"
with appropriate position values.
Set the view rectangle.
Highlight the specified rectangle.
Open the document and search for one or more words, selecting
the first matching word in the document. wordList
is a string enclosed in
quotation marks where individual words are separated by
the space character (or %20).
Imports data into PDF form fields. The URI is either a relative or absolute URI
to an FDF or XFDF file.
The fdf parameter should be specified as the last parameter to a given URI.
Several subsets of PDF have been published as distinct ISO standards:
PDF/X, initially released in 2001 as PDF/X-1a , specifies how to use PDF for graphics
exchange, with the aim to fascilitate correct and
predictable printing by print service providers. The
standard has gone through multiple revisions over the
years and has several published parts, the most recently
released being part 8, specifying different levels of
conformance: PDF/X-1a:2001, PDF/X-3:2002, PDF/X-1a:2003,
PDF/X-3:2003, PDF/X-4, PDF/X-4p, PDF/X-5g,
PDF/X-5pg and PDF/X-5n.
PDF/A, initially released in 2005, specifies how to use
PDF for long-term preservation (archiving) of electronic
documents. It prohibits PDF features which are not well
suited to long term archiving of documents, including
JavaScript or executable file launches. Its
requirements for PDF/A viewers include color management
guidelines and support for embedded fonts. There are three
parts of this standard and a total of eight conformance levels:
PDF/A-1a, PDF/A-1b, PDF/A-2a,
PDF/A-2b, PDF/A-2u, PDF/A-3a, PDF/A-3b and PDF/A-3u.
PDF/E, initially released in 2008 as PDF/E-1 , specifies how to use PDF in
engineering workflows, such as manufacturing, construction
and geospatial analysis. Future revisions of PDF/E are
supposed to include support for 3D PDF workflows.
PDF/VT, initially released in 2010, specifies how to use
PDF in variable and transactional printing. It is based on
PDF/X, and adds adidtional restrictions on PDF content
elements and supporting metadata. It specifies three
conformance levels: PDF/VT-1, PDF/VT-2 and PDF/VT-2s .
PDF/UA, initially released in 2012 as PDF/UA-1 , specifies how to create accessible
electonic documents. It requires use of ISO 32000's Tagged
PDF feature, and adds many requirements regarding semantic
correctness in applying logical structures to content in
PDF documents.
All of these subset standards use application/pdf media type. The subset
standards are generally not exclusive, so it is possible to
construct a PDF file which conforms to, for example, both
PDF/A-2b and PDF/X-4 subset standards.
PDF documents claiming conformance to one or more of the
subset standards use XMP metadata to identify levels of
conformance. PDF processors should examine document metadata
streams for such subset standards identifiers and, if
apropriate, label documents as such when presenting them to
the user.
PDF format has gone through several revisions, primarily for
the addition of features. PDF features have generally been
added in a way that older viewers "fail gracefully", because
they can just ignore features they do not recognize. Even
so, the older the PDF version produced, the more legacy
viewers will support that version, but the fewer features will
be enabled. See Annex I, "PDF Versions
and Compatibility".
PDF files are experienced through a reader or viewer of PDF files.
For most of the common platforms in use (iOS, OS X, Windows, Android, ChromeOS, Kindle)
and for most browsers (Edge, Safari, Chrome, Firefox), PDF viewing is built-in. In addition,
there are many PDF viewers available for download and install. The PDF specification
was published and freely available since the format was introduced in 1993, so hundreds
of companies and organizations make tools for PDF creation, viewing, and manipulation.
PDF is certainly a complex media type as per Section 4.6 of
, which sets requirements for security
analysis of media type registrations.
(which this document obsoletes) contained a detailed analysis
of some of the security issues for PDF implementations known
at the time. While the analysis isn't necessarily wrong, the threat
analysis is much too limited, and the mitigations somewhat out
of date. There is now extensive literature on security
threats involving PDF implementations and how to avoid them,
consistent with broad implementation over decades. We are not
registering a new media type but rather making a primarily
administrative update. With those caveats:
The PDF file format allows several constructs which may
compromise security if handled inadequately by PDF
processors. For example:
PDF may contain scripts to customize the displaying
and processing of PDF files. These scripts are
expressed in a version of JavaScript and are intended
for execution by the PDF processor.
PDF file may refer to other PDF files for portions of
content. PDF processors are expected to find these
external files and load them in order to display the
document.
PDF may act as a container for various files embedded in
it (for example, as attached files). PDF processors may
offer functionality to open and display such files or
store them on the system, such as with the "ef" open
action. THe PDF specification places no
restrictions on types of files which may be embedded, so
PDF processors should be extremely careful to prevent
unwanted execution of attached executables or
decompression of attached archives which may store
dangerous files in the host file system.
PDF files may contain links to content on the
internet. PDF processors may offer functionality to show
such content upon following the link.
The fragment identifier syntax ()
contains directives for opening ("ef") or inluding
("fdf") additional material.
PDF interpreters executing any scripts or programs related to
these constructs must be extremely careful to insure that
untrusted software is executed in a protected environment.
In addition, the PDF processor itself, as well as its plugins,
scripts etc. may be a source of insecurity, by either obvious
or subtle means.
This document updates the registration
of application/pdf, a media type
registration as defined in :
Type name: application
Subtype name: pdf
Required parameters: none
Optional parameter: none
Encoding considerations: binary
Security considerations: See
of this document.
Interoperability considerations: See of this document.
Published specification: ISO 32000-1:2008 (PDF 1.7)
. ISO 32000-2 (PDF 2.0) is currently under development.
Applications which use this media type: See of this document.
Fragment identifier considerations: See of this document.
Additional information:
Deprecated alias names for this type: none
Magic number(s): All PDF files start with the characters
'%PDF-' followed by the PDF version number, e.g., "%PDF-1.7".
These characters are in US-ASCII encoding.
File extension(s): .pdf
Macintosh file type code(s): "PDF "
Person & email address to contact for further information:
Duff Johnson <duff@duff-johnson.com>, Peter Wyatt
<Peter.wyatt@cisra.canon.com.au>, ISO 32000 Project Leaders
Intended usage: COMMON
Restrictions on usage: none
Author: Authors of this document
Change controller: ISO; in particular, ISO 32000 is by
ISO/TC 171/SC 02/WG 08, "PDF specification". Duff Johnson
<duff@duff-johnson.com> and Peter Wyatt
<Peter.wyatt@cisra.canon.com.au are current ISO 32000
Project Leaders.
Document management -- Portable document format -- Part 1: PDF 1.7
ISO
Also available free from Adobe.
Document management -- Portable document format -- Part 2: PDF 2.0
ISO
Currently under development - publication expected in 2017. This becomes a Normative Reference on approval.
Graphic technology -- Prepress digital data exchange using PDF --
Part 8: Partial exchange of printing data using PDF 1.6 (PDF/X-5)
ISO
Document management -- Electronic document file format for long-term preservation
-- Part 3: Use of ISO 32000-1 with support for embedded files (PDF/A-3)
ISO
Document management -- Engineering document format using PDF -- Part 1:
Use of PDF 1.6 (PDF/E-1)
ISO
Graphic technology -- Variable data exchange -- Part 2:
Using PDF/X-4 and PDF/X-5 (PDF/VT-1 and PDF/VT-2)
ISO
Document management applications -- Electronic document file format enhancement
for accessibility -- Part 1: Use of ISO 32000-1 (PDF/UA-1)
ISO
Extensible metadata platform (XMP) specification -- Part 1: Data model, serialization and core properties
ISO
Not available for free, but there are a
number of descriptive resources, e.g.,
PostScript Language Reference, third edition
Adobe Systems Incorporated
PDF Reference, sixth edition
Adobe Systems Incorporated
This specification replaces RFC 3778, which previously
defined the application/pdf
Media Type. Differences include:
To reflect the transition from a proprietary specification
by Adobe to an open ISO Standard, the Change Controller
has changed from Adobe to ISO, and references updated.
The overview of PDF capabilitiies, the history of PDF, and
the descriptions of PDF subsets were updated to reflect
more recent relevant history.
The section on Fragment identifiers was updated to closely
reflect the material which has been added to ISO-32000-2.
The status of popular PDF implementations was
updated.
The Security Considerations were updated to match the
current understanding of PDF vulnerabilities.
The registration template was updated to match RFC 6838.