Main objects
- class pikepdf.Pdf(*args, **kwargs)
- pikepdf.open()
Alias for
pikepdf.Pdf.open()
.
- pikepdf.new()
Alias for
pikepdf.Pdf.new()
.
Access modes
- class pikepdf.ObjectStreamMode(*args, **kwds)
Options for saving object streams within PDFs.
Object streams are more a compact way of saving certain types of data that was added in PDF 1.5. All modern PDF viewers support object streams, but some third party tools and libraries cannot read them.
- disable = Ellipsis
Disable the use of object streams.
If any object streams exist in the file, remove them when the file is saved.
- generate = Ellipsis
Preserve any existing object streams in the original file.
This is the default behavior.
- preserve = Ellipsis
Generate object streams.
- class pikepdf.StreamDecodeLevel(*args, **kwds)
Options for decoding streams within PDFs.
- all = Ellipsis
Do not attempt to apply any filters. Streams remain as they appear in the original file. Note that uncompressed streams may still be compressed on output. You can disable that by saving with
.save(..., compress_streams=False)
.
- generalized = Ellipsis
This is the default. libqpdf will apply LZWDecode, ASCII85Decode, ASCIIHexDecode, and FlateDecode filters on the input. When saved with
compress_streams=True
, the default, the effect of this is that streams filtered with these older and less efficient filters will be recompressed with the Flate filter. As a special case, if a stream is already compressed with FlateDecode andcompress_streams=True
, the original compressed data will be preserved.
- none = Ellipsis
In addition to uncompressing the generalized compression formats, supported non-lossy compression will also be be decoded. At present, this includes the RunLengthDecode filter.
- specialized = Ellipsis
In addition to generalized and non-lossy specialized filters, supported lossy compression filters will be applied. At present, this includes DCTDecode (JPEG) compression. Note that compressing the resulting data with DCTDecode again will accumulate loss, so avoid multiple compression and decompression cycles. This is mostly useful for (low-level) retrieving image data; see
pikepdf.PdfImage
for the preferred method.
- class pikepdf.Encryption
Specify the encryption settings to apply when a PDF is saved.
- R = 6
Select the security handler algorithm to use. Choose from:
2
,3
,4
or6
. By default, the highest version of is selected (6
).5
is a deprecated algorithm that should not be used.
- aes = True
If True, request the AES algorithm. If False, use RC4. If omitted, AES is selected whenever possible (R >= 4).
- allow
The permissions to set. If omitted, all permissions are granted to the user.
- metadata = True
If True, also encrypt the PDF metadata. If False, metadata is not encrypted. Reading document metadata without decryption may be desirable in some cases. Requires
aes=True
. If omitted, metadata is encrypted whenever possible.
- owner =
The owner password to use. This allows full control of the file. If blank, the PDF will be encrypted and present as “(SECURED)” in PDF viewers. If the owner password is blank, the user password should be as well.
- user =
The user password to use. With this password, some restrictions will be imposed by a typical PDF reader. If blank, the PDF can be opened by anyone, but only modified as allowed by the permissions in
allow
.
Object construction
- class pikepdf.Object
- class pikepdf.Name
Construct a PDF Name object.
Names can be constructed with two notations:
Name.Resources
Name('/Resources')
The two are semantically equivalent. The former is preferred for names that are normally expected to be in a PDF. The latter is preferred for dynamic names and attributes.
- class pikepdf.String
Construct a PDF String object.
- class pikepdf.Array
Construct a PDF Array object.
- class pikepdf.Dictionary
Construct a PDF Dictionary object.
- class pikepdf.Stream
Construct a PDF Stream object.
- class pikepdf.Operator
Construct an operator for use in a content stream.
An Operator is one of a limited set of commands that can appear in PDF content streams (roughly the mini-language that draws objects, lines and text on a virtual PDF canvas). The commands
parse_content_stream()
andunparse_content_stream()
create and expect Operators respectively, along with their operands.pikepdf uses the special Operator “INLINE IMAGE” to denote an inline image in a content stream.
Common PDF data structures
- class pikepdf.Matrix
A 2D affine matrix for PDF transformations.
PDF uses matrices to transform document coordinates to screen/device coordinates.
PDF matrices are encoded as
pikepdf.Array
with exactly six numeric elements, ordered asa b c d e f
.\[\begin{split}\begin{bmatrix} a & b & 0 \\ c & d & 0 \\ e & f & 1 \\ \end{bmatrix}\end{split}\]The approximate interpretation of these six parameters is documented below. The values (0, 0, 1) in the third column are fixed, so a general 3×3 matrix cannot be converted to a PDF matrix.
PDF transformation matrices are the transpose of most textbook treatments. In a textbook, typically
A × vc
is used to transform a column vectorvc=(x, y, 1)
by the affine matrixA
. In PDF, the matrix is the transpose of that in the textbook, andvr × A'
is used to transform a row vectorvr=(x, y, 1)
.Transformation matrices specify the transformation from the new (transformed) coordinate system to the original (untransformed) coordinate system. x’ and y’ are the coordinates in the untransformed coordinate system, and x and y are the coordinates in the transformed coordinate system.
PDF order:
\[\begin{split}\begin{equation} \begin{bmatrix} x' & y' & 1 \end{bmatrix} = \begin{bmatrix} x & y & 1 \end{bmatrix} \begin{bmatrix} a & b & 0 \\ c & d & 0 \\ e & f & 1 \end{bmatrix} \end{equation}\end{split}\]To concatenate transformations, use the matrix multiple (
@
) operator to pre-multiply the next transformation onto existing transformations.Alternatively, use the .translated(), .scaled(), and .rotated() methods to chain transformation operations.
Addition and other operations are not implemented because they’re not that meaningful in a PDF context.
Matrix objects are immutable. All transformation methods return new matrix objects.
New in version 8.7.
- class pikepdf.Rectangle(llx: float, lly: float, urx: float, ury: float, /)
A PDF rectangle.
Typically this will be a rectangle in PDF units (points, 1/72”). Unlike raster graphics, the rectangle is defined by the lower left and upper right points.
Rectangles in PDF are encoded as
pikepdf.Array
with exactly four numeric elements, ordered asllx lly urx ury
. See PDF 1.7 Reference Manual section 7.9.5.The rectangle may be considered degenerate if the lower left corner is not strictly less than the upper right corner.
New in version 2.14.
Changed in version 8.5: Added operators to test whether rectangle
a
is contained in rectangleb
(a <= b
) and to calculate their intersection (a & b
).- llx = Ellipsis
The lower left corner on the x-axis.
- lly = Ellipsis
The lower left corner on the y-axis.
- urx = Ellipsis
The upper right corner on the x-axis.
- ury = Ellipsis
The upper right corner on the y-axis.
Content stream elements
- class pikepdf.ContentStreamInstruction(operands: _ObjectList, operator: Operator)
Represents one complete instruction inside a content stream.
- class pikepdf.ContentStreamInlineImage
Represents an instruction to draw an inline image.
pikepdf consolidates the BI-ID-EI sequence of operators, as appears in a PDF to declare an inline image, and replaces them with a single virtual content stream instruction with the operator “INLINE IMAGE”.
Internal objects
These objects are returned by other pikepdf objects. They are part of the API, but not intended to be created explicitly.
- class pikepdf._core.PageList
For accessing pages in a PDF.
A
list
-like object enumerating a range of pages in apikepdf.Pdf
. It may be all of the pages or a subset. Obtain usingpikepdf.Pdf.pages
.See
pikepdf.Page
for accessing individual pages.
- class pikepdf._core._ObjectList
A list whose elements are always pikepdf.Object.
In all other respects, this object behaves like a standard Python list.
- class pikepdf.ObjectType(*args, **kwds)
Enumeration of PDF object types.
These values are used to implement pikepdf’s instance type checking. In the vast majority of cases it is more pythonic to use
isinstance(obj, pikepdf.Stream)
orissubclass
.These values are low-level and documented for completeness. They are exposed through
pikepdf.Object._type_code
.- array = Ellipsis
A PDF array, meaning the object is a
pikepdf.Array
.
- boolean = Ellipsis
A PDF boolean. In most cases, booleans are automatically converted to
bool
, so this should not appear.
- dictionary = Ellipsis
A PDF dictionary, meaning the object is a
pikepdf.Dictionary
.
- inlineimage = Ellipsis
A PDF inline image, meaning the object is the data stream of an inline image. It would be necessary to combine this with the implicit dictionary to interpret the image correctly. pikepdf automatically packages inline images into a more useful class, so this will not generally appear.
- integer = Ellipsis
A PDF integer. In most cases, integers are automatically converted to
int
, so this should not appear. Unlike Python integers, PDF integers are 32-bit signed integers.
- name_ = Ellipsis
A PDF name, meaning the object is a
pikepdf.Name
.
- null = Ellipsis
A PDF null. In most cases, nulls are automatically converted to
None
, so this should not appear.
- operator = Ellipsis
A PDF operator, meaning the object is a
pikepdf.Operator
.
- real = Ellipsis
A PDF real. In most cases, reals are automatically convert to
decimal.Decimal
.
- reserved = Ellipsis
A temporary object used in creating circular references. Should not appear in most cases.
- stream = Ellipsis
A PDF stream, meaning the object is a
pikepdf.Stream
(and it also has a dictionary).
- string = Ellipsis
A PDF string, meaning the object is a
pikepdf.String
.
- uninitialized = Ellipsis
An uninitialized object. If this appears, it is probably a bug.
Jobs
- class pikepdf.Job(json: str)
Provides access to the QPDF job interface.
All of the functionality of the
qpdf
command line program is now available to pikepdf through jobs.- For further details:
- EXIT_CORRECT_PASSWORD = 3
- EXIT_ERROR = 2
Exit code for a job that had an error.
- EXIT_IS_NOT_ENCRYPTED = 2
Exit code for a job that provide a password when the input was not encrypted.
- EXIT_WARNING = 3
Exit code for a job that had a warning.
- LATEST_JOB_JSON
Version number of the most recent job-JSON schema.
- LATEST_JSON
Version number of the most recent QPDF-JSON schema.