The application/pdf Media
Type
Adobe
345 Park Ave
San Jose
CA
95110
USA
mahardy@adobe.com
Adobe
345 Park Ave
San Jose
CA
95110
USA
masinter@adobe.com
http://larry.masinter.net
PDF Association
Neue Kantstrasse 14
Berlin
14057
Germany
duff.johnson@pdfa.org
http://www.pdfa.org
Requests for Comment
PDF, the 'Portable Document Format', is an ISO standard
(ISO 32000-1:2008) defining a final-form document
representation language in use for document exchange,
including on the Internet, since 1993. This document
provides an overview of the PDF format and updates the
media type registration of 'application/pdf'. It
replaces RFC 3778.
This document is intended to provide updated information
on the registration of the MIME Media Type
"application/pdf" for documents defined in the PDF , 'Portable Document Format', syntax.
Additionally, this document provides a brief history of
the PDF format, describes several of the key
capabilities of the format and addresses some security
concerns.
PDF is used widely in the Internet community. The first
version of PDF, 1.0, was published in 1993 by Adobe
Systems [REF needed]. Since then PDF has grown to be a
widely-used format for capturing and exchanging
formatted documents electronically across the Web, via
e-mail and virtually every other document exchange
mechanism. In 2008, PDF 1.7 was published as an ISO
standard , ISO 32000-1:2008.
PDF represents "final form" formatted documents with a
fixed layout and appearance. PDF pages may include
text, images, graphics and multimedia content such as
video and audio. PDF is also capable of containing
higher level structures including annotations,
bookmarks, file attachments, hyperlinks, logical
structure and metadata. A rich JavaScript model has
been defined for interacting with PDF documents.
PDF supports encryption and digital signatures. The
encryption capability is combined with access control
information to facilitate management of the
functionality available to the recipient. PDF supports
the inclusion of metadata through XMP metadata as well as directly via PDF
structures.
In addition to the ISO 32000-1:2008 PDF standard,
several ISO PDF subset standards have been defined to
address specific use cases. These standards include PDF
for Archival (PDF/A), PDF for Engineering (PDF/E), PDF
for Universal Accessibility (PDF/UA), PDF for Variable
Data and Transactional Printing (PDF/VT) and PDF for
Prepress Digital Data Exchange (PDF/X). The subset
standards are fully compliant PDF files capable of being
displayed in a general PDF viewer.
PDF usage is widespread enough for 'application/pdf' to
be used in other IETF specifications. RFC 2346 describes how to better structure PDF
files for international exchange of documents where
different paper sizes are used; HTTP byte range
retrieval is illustrated using application/pdf (RFC 2616
, Section 19.2); RFC 3297 illustrates how PDF can be sent to a
recipient in a way that identifies the user's ability to
accept the PDF using content negotiation.
PDF was originally envisioned as a way to communicate
and view printed information electronically across a
wide variety of machine configurations, operating
systems, and communication networks in a reliable
manner.
PDF relies on the same fundamental imaging model as the
PostScript page description language
to render complex text, images, and graphics in a device
and resolution-independent manner, bringing this feature
to the screen as well as the printer. However, unlike
PostScript, PDF enforces page independence, ensuring
that any page in a document can render without having to
render previous pages. Additionally, PDF reduces the
complexity of processing content to improve performance
for interactive viewing. In addition to the rendering
capabilities, PDF also includes objects, such as
hypertext links and annotations, that are not part of
the page itself, but are useful for navigation, building
collections of related documents and for reviewing and
commenting on documents.
The application/pdf media type was first registered in
1993 by Paul Lindner for use by the gopher protocol and
was subsequently updated in 1994 by Steve Zilles.
A set of fragment identifiers
and their handling are defined in Adobe Technical Note
5428 . This section summarizes
that material.
A fragment identifier consists of one or more PDF-open
parameters in a single URL, separated by the ampersand
(&) or pound (#) character. Each parameter implies
an action to be performed and the value to be used for
that action. Actions are processed and executed from
left to right as they appear in the character string
that makes up the fragment identifier.
The PDF-open parameters allow the specification of a
particular page or named destination to open. Named
destinations are similar to the "anchors" used in HTML
or the IDs used in XML. Once the target is specified,
the view of the page in which it occurs can be
specified, either by specifying the position of a
viewing rectangle and its scale or size coordinates or
by specifying a view relative to the viewing window in
which the chosen page is to be presented.
The list of PDF-open parameters and the action they
imply is:
namedest=<name>
Open to a specified named destination (which includes a view).
page=<pagenum>
Open the specified (physical) page.
zoom=<scale>,<left>,<top>
Set the <scale> and scrolling
factors. <left>, and <top> are measured from the
top left corner of the page, independent of the size of the
page. The pair <left> and <top> are optional but
both must appear if present.
view=<keyword>,<position>
Set the view to show some specified
portion of the page or its bounding box; keywords are defined
by Table 8.2 of the PDF Reference, version 1.5 (NEEDS UPDATING
TO ISO REF). The <position> value is required for some
of the keywords and not allowed for others.
viewrect=<left>,<top>,<wd>,<ht>
As with the zoom parameter, set the
scale and scrolling factors, but using an explicit width and
height instead of a scale percentage.
highlight=<lt>,<rt>,<top>,<btm>
Highlight a rectangle on the chosen
page where <lt>, <rt>, <top>, and
<btm> are the coordinates of the sides of the rectangle
measured from the top left corner of the page.
All specified actions are executed in order; later actions
will override the effects of previous actions; for this
reason, page actions should appear before zoom actions.
Commands are not case sensitive (except for the value of a
named destination).
TODO: Describe the subset standards, their history and include
references to the ISO documents.
TODO: Describe the Accessibility capabilities of PDF.
There are a number of widely available, independently
implemented, interoperable implementations of PDF for a wide
variety of platforms and systems. Since the PDF specification
was published and freely available since the format was
introduced in 1993, hundreds of companies and organizations,
including web-browser developers, make PDF creation, viewing,
and manipulation tools for many years prior to ISO
standardization of PDF.
TODO: Update the above list to ensure relevance to
update market conditions...
TODO: Clean up of this section is still required...
An "application/pdf" resource contains information to be
parsed and processed by the recipient's PDF system.
Because PDF is both a representation of formatted
documents and a container system for the resources need
to reproduce or view said documents, it is possible that
a PDF file has embedded resources not described in the
PDF Reference.
Although it is not a defined feature of PDF, a PDF
processor could extract these resources and store them
on the recipients system. Furthermore, a PDF processor
may accept and execute "plug-in" modules accessible to
the recipient. These may also access material in the
PDF file or on the recipients system. Therefore, care
in establishing the source, security, and reliability of
such plug-ins is recommended. Message-sending software
should not make use of arbitrary plug-ins without prior
agreement on their presence at the intended recipients.
Message-receiving and -displaying software should make
sure that any non-standard plug-ins are secure and do
not present a security threat.
PDF may contain "scripts" to customize the displaying
and processing of PDF files. These scripts are
expressed in a version of JavaScript. They are intended
for execution by the PDF processor. User agents
executing such scripts or programs must be extremely
careful to insure that untrusted software is executed in
a protected environment.
In general, any information stored outside of the direct
control of the user -- including referenced application
software or plug-ins and embedded files, scripts or
other material not covered in the PDF Reference -- can
be a source of insecurity, by either obvious or subtle
means. For example, a script can modify the content of
a document prior to its being displayed. Thus, the
security of any PDF document may be dependent on the
resources referenced by that document.
This document updates the registration
of 'application/pdf', a media type
registration as defined in
Multipurpose Internet Mail Extensions
MIME) Part Four: Registration
Procedures :
MIME media type name: application
MIME subtype name: pdf
Required parameters: none
Optional parameter: none
Encoding considerations: PDF files frequently contain binary data, and thus must be encoded in non-binary contexts.
Security considerations: See
of this document.
Interoperability considerations: See of this document.
Published specification: ISO 32000-1:2008 (PDF 1.7)
.
Applications which use this media type: See of this document.
Additional information:
Magic number(s): All PDF files start with the characters
'%PDF-' using the PDF version number, e.g., '%PDF-1.7'.
These characters are in US-ASCII encoding.
File extension(s): .pdf
Macintosh File Type Code(s): "PDF "
For further information: Duff Johnson <duff.johnson@pdfa.org>, Cherie Ekholm <cheriee@microsoft.com>, ISO 32000 Project Leaders
Intended usage: COMMON
Author/Change controller: Duff Johnson <duff.johnson@pdfa.org>, Cherie Ekholm <cheriee@microsoft.com>, ISO 32000 Project Leaders
Document management -- Portable document format -- Part 1: PDF 1.7
ISO
Also available free from Adobe Systems.
Extensible metadata platform (XMP) specification -- Part 1: Data model, serialization and core properties
ISO
Not available for free, but there are a
number of descriptive resources, e.g.,
PostScript Language Reference, third edition
Adobe Systems Incorporated
Available at:
PDF Open Parameters
Adobe Systems Incorporated
Available at: