Main objects

class pikepdf.Pdf(*args, **kwargs)
pikepdf.open()

Alias for pikepdf.Pdf.open().

pikepdf.new()

Alias for pikepdf.Pdf.new().

Access modes

class pikepdf.ObjectStreamMode(*args, **kwds)

Options for saving object streams within PDFs.

Object streams are more a compact way of saving certain types of data that was added in PDF 1.5. All modern PDF viewers support object streams, but some third party tools and libraries cannot read them.

disable = Ellipsis

Disable the use of object streams.

If any object streams exist in the file, remove them when the file is saved.

generate = Ellipsis

Preserve any existing object streams in the original file.

This is the default behavior.

preserve = Ellipsis

Generate object streams.

class pikepdf.StreamDecodeLevel(*args, **kwds)

Options for decoding streams within PDFs.

all = Ellipsis

Do not attempt to apply any filters. Streams remain as they appear in the original file. Note that uncompressed streams may still be compressed on output. You can disable that by saving with .save(..., compress_streams=False).

generalized = Ellipsis

This is the default. libqpdf will apply LZWDecode, ASCII85Decode, ASCIIHexDecode, and FlateDecode filters on the input. When saved with compress_streams=True, the default, the effect of this is that streams filtered with these older and less efficient filters will be recompressed with the Flate filter. As a special case, if a stream is already compressed with FlateDecode and compress_streams=True, the original compressed data will be preserved.

none = Ellipsis

In addition to uncompressing the generalized compression formats, supported non-lossy compression will also be be decoded. At present, this includes the RunLengthDecode filter.

specialized = Ellipsis

In addition to generalized and non-lossy specialized filters, supported lossy compression filters will be applied. At present, this includes DCTDecode (JPEG) compression. Note that compressing the resulting data with DCTDecode again will accumulate loss, so avoid multiple compression and decompression cycles. This is mostly useful for (low-level) retrieving image data; see pikepdf.PdfImage for the preferred method.

class pikepdf.Encryption

Specify the encryption settings to apply when a PDF is saved.

R = 6

Select the security handler algorithm to use. Choose from: 2, 3, 4 or 6. By default, the highest version of is selected (6). 5 is a deprecated algorithm that should not be used.

aes = True

If True, request the AES algorithm. If False, use RC4. If omitted, AES is selected whenever possible (R >= 4).

allow

The permissions to set. If omitted, all permissions are granted to the user.

metadata = True

If True, also encrypt the PDF metadata. If False, metadata is not encrypted. Reading document metadata without decryption may be desirable in some cases. Requires aes=True. If omitted, metadata is encrypted whenever possible.

owner =

The owner password to use. This allows full control of the file. If blank, the PDF will be encrypted and present as “(SECURED)” in PDF viewers. If the owner password is blank, the user password should be as well.

user =

The user password to use. With this password, some restrictions will be imposed by a typical PDF reader. If blank, the PDF can be opened by anyone, but only modified as allowed by the permissions in allow.

Object construction

class pikepdf.Object
class pikepdf.Name

Construct a PDF Name object.

Names can be constructed with two notations:

  1. Name.Resources

  2. Name('/Resources')

The two are semantically equivalent. The former is preferred for names that are normally expected to be in a PDF. The latter is preferred for dynamic names and attributes.

class pikepdf.String

Construct a PDF String object.

class pikepdf.Array

Construct a PDF Array object.

class pikepdf.Dictionary

Construct a PDF Dictionary object.

class pikepdf.Stream

Construct a PDF Stream object.

class pikepdf.Operator

Construct an operator for use in a content stream.

An Operator is one of a limited set of commands that can appear in PDF content streams (roughly the mini-language that draws objects, lines and text on a virtual PDF canvas). The commands parse_content_stream() and unparse_content_stream() create and expect Operators respectively, along with their operands.

pikepdf uses the special Operator “INLINE IMAGE” to denote an inline image in a content stream.

Common PDF data structures

class pikepdf.Matrix

A 2D affine matrix for PDF transformations.

PDF uses matrices to transform document coordinates to screen/device coordinates.

PDF matrices are encoded as pikepdf.Array with exactly six numeric elements, ordered as a b c d e f.

\[\begin{split}\begin{bmatrix} a & b & 0 \\ c & d & 0 \\ e & f & 1 \\ \end{bmatrix}\end{split}\]

The approximate interpretation of these six parameters is documented below. The values (0, 0, 1) in the third column are fixed, so a general 3×3 matrix cannot be converted to a PDF matrix.

PDF transformation matrices are the transpose of most textbook treatments. In a textbook, typically A × vc is used to transform a column vector vc=(x, y, 1) by the affine matrix A. In PDF, the matrix is the transpose of that in the textbook, and vr × A' is used to transform a row vector vr=(x, y, 1).

Transformation matrices specify the transformation from the new (transformed) coordinate system to the original (untransformed) coordinate system. x’ and y’ are the coordinates in the untransformed coordinate system, and x and y are the coordinates in the transformed coordinate system.

PDF order:

\[\begin{split}\begin{equation} \begin{bmatrix} x' & y' & 1 \end{bmatrix} = \begin{bmatrix} x & y & 1 \end{bmatrix} \begin{bmatrix} a & b & 0 \\ c & d & 0 \\ e & f & 1 \end{bmatrix} \end{equation}\end{split}\]

To concatenate transformations, use the matrix multiple (@) operator to pre-multiply the next transformation onto existing transformations.

Alternatively, use the .translated(), .scaled(), and .rotated() methods to chain transformation operations.

Addition and other operations are not implemented because they’re not that meaningful in a PDF context.

Matrix objects are immutable. All transformation methods return new matrix objects.

New in version 8.7.

class pikepdf.Rectangle(llx: float, lly: float, urx: float, ury: float, /)

A PDF rectangle.

Typically this will be a rectangle in PDF units (points, 1/72”). Unlike raster graphics, the rectangle is defined by the lower left and upper right points.

Rectangles in PDF are encoded as pikepdf.Array with exactly four numeric elements, ordered as llx lly urx ury. See PDF 1.7 Reference Manual section 7.9.5.

The rectangle may be considered degenerate if the lower left corner is not strictly less than the upper right corner.

New in version 2.14.

Changed in version 8.5: Added operators to test whether rectangle a is contained in rectangle b (a <= b) and to calculate their intersection (a & b).

llx = Ellipsis

The lower left corner on the x-axis.

lly = Ellipsis

The lower left corner on the y-axis.

urx = Ellipsis

The upper right corner on the x-axis.

ury = Ellipsis

The upper right corner on the y-axis.

Content stream elements

class pikepdf.ContentStreamInstruction(operands: _ObjectList, operator: Operator)

Represents one complete instruction inside a content stream.

class pikepdf.ContentStreamInlineImage

Represents an instruction to draw an inline image.

pikepdf consolidates the BI-ID-EI sequence of operators, as appears in a PDF to declare an inline image, and replaces them with a single virtual content stream instruction with the operator “INLINE IMAGE”.

Internal objects

These objects are returned by other pikepdf objects. They are part of the API, but not intended to be created explicitly.

class pikepdf._core.PageList

For accessing pages in a PDF.

A list-like object enumerating a range of pages in a pikepdf.Pdf. It may be all of the pages or a subset. Obtain using pikepdf.Pdf.pages.

See pikepdf.Page for accessing individual pages.

class pikepdf._core._ObjectList

A list whose elements are always pikepdf.Object.

In all other respects, this object behaves like a standard Python list.

class pikepdf.ObjectType(*args, **kwds)

Enumeration of PDF object types.

These values are used to implement pikepdf’s instance type checking. In the vast majority of cases it is more pythonic to use isinstance(obj, pikepdf.Stream) or issubclass.

These values are low-level and documented for completeness. They are exposed through pikepdf.Object._type_code.

array = Ellipsis

A PDF array, meaning the object is a pikepdf.Array.

boolean = Ellipsis

A PDF boolean. In most cases, booleans are automatically converted to bool, so this should not appear.

dictionary = Ellipsis

A PDF dictionary, meaning the object is a pikepdf.Dictionary.

inlineimage = Ellipsis

A PDF inline image, meaning the object is the data stream of an inline image. It would be necessary to combine this with the implicit dictionary to interpret the image correctly. pikepdf automatically packages inline images into a more useful class, so this will not generally appear.

integer = Ellipsis

A PDF integer. In most cases, integers are automatically converted to int, so this should not appear. Unlike Python integers, PDF integers are 32-bit signed integers.

name_ = Ellipsis

A PDF name, meaning the object is a pikepdf.Name.

null = Ellipsis

A PDF null. In most cases, nulls are automatically converted to None, so this should not appear.

operator = Ellipsis

A PDF operator, meaning the object is a pikepdf.Operator.

real = Ellipsis

A PDF real. In most cases, reals are automatically convert to decimal.Decimal.

reserved = Ellipsis

A temporary object used in creating circular references. Should not appear in most cases.

stream = Ellipsis

A PDF stream, meaning the object is a pikepdf.Stream (and it also has a dictionary).

string = Ellipsis

A PDF string, meaning the object is a pikepdf.String.

uninitialized = Ellipsis

An uninitialized object. If this appears, it is probably a bug.

Jobs

class pikepdf.Job(json: str)

Provides access to the QPDF job interface.

All of the functionality of the qpdf command line program is now available to pikepdf through jobs.

For further details:

https://qpdf.readthedocs.io/en/stable/qpdf-job.html

EXIT_CORRECT_PASSWORD = 3
EXIT_ERROR = 2

Exit code for a job that had an error.

EXIT_IS_NOT_ENCRYPTED = 2

Exit code for a job that provide a password when the input was not encrypted.

EXIT_WARNING = 3

Exit code for a job that had a warning.

LATEST_JOB_JSON

Version number of the most recent job-JSON schema.

LATEST_JSON

Version number of the most recent QPDF-JSON schema.