Support models
Support models are abstracts over “raw” objects within a Pdf. For example, a page
in a PDF is a Dictionary with set to /Type
of /Page
. The Dictionary in
that case is the “raw” object. Upon establishing what type of object it is, we
can wrap it with a support model that adds features to ensure consistency with
the PDF specification.
In version 2.x, did not apply support models to “raw” objects automatically.
Version 3.x automatically applies support models to /Page
objects.
- class pikepdf.ObjectHelper
Base class for wrapper/helper around an Object.
Used to expose additional functionality specific to that object type.
pikepdf.Page
is an example of an object helper. The actual page object is a PDF is a Dictionary. The helper provides additional methods specific to pages.
- class pikepdf.PdfMatrix(*args)
Support class for PDF content stream matrices.
PDF content stream matrices are 3x3 matrices summarized by a shorthand
(a, b, c, d, e, f)
, where the first column vector is(a, c, e)
and the second column vector is(b, d, f)
. The final column vector is always(0, 0, 1)
since PDF uses homogenous coordinates.a
is the horizontal scaling factor.b
is horizontal skewing.c
is vertical skewing.d
is the vertical scaling factor.e
is the horizontal translation.f
is the vertical translation.For scaling,
a
andd
are the scaling factors in the horizontal and vertical directions, respectively; for pure scaling,b
andc
are zero.PDF uses row vectors. That is,
vr @ A'
gives the effect of transforming a row vectorvr=(x, y, 1)
by the matrixA'
. Most textbook treatments useA @ vc
where the column vectorvc=(x, y, 1)'
.Matrices should be premultipled with other matrices to concatenate transformations.
(
@
is the Python matrix multiplication operator.)Addition and other operations are not implemented because they’re not that meaningful in a PDF context (they can be defined and are mathematically meaningful in general).
PdfMatrix objects are immutable. All transformations on them produce a new matrix.
Deprecated since version 8.7: Use
pikepdf.Matrix
instead.
- class pikepdf.PdfImage(obj: Stream)
Support class to provide a consistent API for manipulating PDF images.
The data structure for images inside PDFs is irregular and complex, making it difficult to use without introducing errors for less typical cases. This class addresses these difficulties by providing a regular, Pythonic API similar in spirit (and convertible to) the Python Pillow imaging library.
- MAIN_COLORSPACES
- PRINT_COLORSPACES
- SIMPLE_COLORSPACES
- obj
- class pikepdf.PdfInlineImage(*, image_data: Object, image_object: tuple)
Support class for PDF inline images.
- class pikepdf.models.PdfMetadata(pdf: Pdf, pikepdf_mark: bool = True, sync_docinfo: bool = True, overwrite_invalid_xml: bool = True)
Read and edit the metadata associated with a PDF.
The PDF specification contain two types of metadata, the newer XMP (Extensible Metadata Platform, XML-based) and older DocumentInformation dictionary. The PDF 2.0 specification removes the DocumentInformation dictionary.
This primarily works with XMP metadata, but includes methods to generate XMP from DocumentInformation and will also coordinate updates to DocumentInformation so that the two are kept consistent.
XMP metadata fields may be accessed using the full XML namespace URI or the short name. For example
metadata['dc:description']
andmetadata['{http://purl.org/dc/elements/1.1/}description']
both refer to the same field. Several common XML namespaces are registered automatically.See the XMP specification for details of allowable fields.
To update metadata, use a with block.
Example
>>> with pdf.open_metadata() as records: ... records['dc:title'] = 'New Title'
See also
pikepdf.Pdf.open_metadata()
- DOCINFO_MAPPING
- NS
- REVERSE_NS
- class pikepdf.models.Encryption
Specify the encryption settings to apply when a PDF is saved.
- R = 6
Select the security handler algorithm to use. Choose from:
2
,3
,4
or6
. By default, the highest version of is selected (6
).5
is a deprecated algorithm that should not be used.
- aes = True
If True, request the AES algorithm. If False, use RC4. If omitted, AES is selected whenever possible (R >= 4).
- allow
The permissions to set. If omitted, all permissions are granted to the user.
- metadata = True
If True, also encrypt the PDF metadata. If False, metadata is not encrypted. Reading document metadata without decryption may be desirable in some cases. Requires
aes=True
. If omitted, metadata is encrypted whenever possible.
- owner =
The owner password to use. This allows full control of the file. If blank, the PDF will be encrypted and present as “(SECURED)” in PDF viewers. If the owner password is blank, the user password should be as well.
- user =
The user password to use. With this password, some restrictions will be imposed by a typical PDF reader. If blank, the PDF can be opened by anyone, but only modified as allowed by the permissions in
allow
.
- class pikepdf.models.Outline(pdf: Pdf, max_depth: int = 15, strict: bool = False)
Maintains a intuitive interface for creating and editing PDF document outlines.
See PDF 1.7 Reference Manual section 12.3.
- Parameters:
pdf – PDF document object.
max_depth – Maximum recursion depth to consider when reading the outline.
strict – If set to
False
(default) silently ignores structural errors. Setting it toTrue
raises apikepdf.OutlineStructureError
if any object references re-occur while the outline is being read or written.
See also
pikepdf.Pdf.open_outline()
- class pikepdf.models.OutlineItem(title: str, destination: Array | String | Name | int | None = None, page_location: PageLocation | str | None = None, action: Dictionary | None = None, obj: Dictionary | None = None, *, left: float | None = None, top: float | None = None, right: float | None = None, bottom: float | None = None, zoom: float | None = None)
Manage a single item in a PDF document outlines structure.
Includes nested items.
- Parameters:
title – Title of the outlines item.
destination – Page number, destination name, or any other PDF object to be used as a reference when clicking on the outlines entry. Note this should be
None
if an action is used instead. If set to a page number, it will be resolved to a reference at the time of writing the outlines back to the document.page_location – Supplemental page location for a page number in
destination
, e.g.PageLocation.Fit
. May also be a simple string such as'FitH'
.action – Action to perform when clicking on this item. Will be ignored during writing if
destination
is also set.obj –
Dictionary
object representing this outlines item in aPdf
. May beNone
for creating a new object. If present, an existing object is modified in-place during writing and original attributes are retained.left – Describes the viewport position associated with a destination.
top – Describes the viewport position associated with a destination.
bottom – Describes the viewport position associated with a destination.
right – Describes the viewport position associated with a destination.
zoom – Describes the viewport position associated with a destination.
This object does not contain any information about higher-level or neighboring elements.
- Valid destination arrays:
[page /XYZ left top zoom] generally [page, PageLocationEntry, 0 to 4 ints]
- class pikepdf.Permissions
Stores the user-level permissions for an encrypted PDF.
A compliant PDF reader/writer should enforce these restrictions on people who have the user password and not the owner password. In practice, either password is sufficient to decrypt all document contents. A person who has the owner password should be allowed to modify the document in any way. pikepdf does not enforce the restrictions in any way; it is up to application developers to enforce them as they see fit.
Unencrypted PDFs implicitly have all permissions allowed. Permissions can only be changed when a PDF is saved.
- accessibility = True
Deprecated in PDF 2.0. Formerly used to block accessibility tools.
In older versions of the PDF specification, it was possible to request a PDF reader to block a user’s right to use accessibility tools. Modern PDF readers do not support this archaic feature and always allow accessibility tools to be used. The only purpose of this permission is to provide testing of this deprecated feature.
- extract = True
Can users extract contents?
- modify_annotation = True
Can users modify annotations?
- modify_assembly = False
Can users arrange document contents?
- modify_form = True
Can users fill out forms?
- modify_other = True
Can users modify the document?
- print_highres = True
Can users print the document at high resolution?
- print_lowres = True
Can users print the document at low resolution?
- class pikepdf.models.EncryptionInfo(encdict: dict[str, Any])
Reports encryption information for an encrypted PDF.
This information may not be changed, except when a PDF is saved. This object is not used to specify the encryption settings to save a PDF, due to non-overlapping information requirements.
- class pikepdf.Annotation(obj: Object)
A PDF annotation. Wrapper around a PDF dictionary.
Describes an annotation in a PDF, such as a comment, underline, copy editing marks, interactive widgets, redactions, 3D objects, sound and video clips.
See the PDF 1.7 Reference Manual section 12.5.6 for the full list of annotation types and definition of terminology.
New in version 2.12.
- class pikepdf._core.Attachments(*args, **kwargs)
Exposes files attached to a PDF.
If a file is attached to a PDF, it is exposed through this interface. For example
p.attachments['readme.txt']
would return apikepdf._core.AttachedFileSpec
that describes the attached file, if a file were attached under that name.p.attachments['readme.txt'].get_file()
would return apikepdf._core.AttachedFile
, an archaic intermediate object to support different versions of the file for different platforms. Typically one just callsp.attachments['readme.txt'].read_bytes()
to get the contents of the file.This interface provides access to any files that are attached to this PDF, exposed as a Python
collections.abc.MutableMapping
interface.The keys (virtual filenames) are always
str
, and values are alwayspikepdf.AttachedFileSpec
.To create a new attached file, use
pikepdf._core.AttachedFileSpec.from_filepath()
to create apikepdf._core.AttachedFileSpec
and then assign it to thepikepdf.Pdf.attachments
mapping. If the file is in memory, usep.attachments['test.pdf'] = b'binary data'
.Use this interface through
pikepdf.Pdf.attachments
.New in version 3.0.
Changed in version 8.10.1: Added convenience interface for directly loading attached files, e.g.
pdf.attachments['/test.pdf'] = b'binary data'
. Prior to this release, there was no way to attach data in memory as a file.
- class pikepdf.AttachedFileSpec(data: bytes, *, description: str, filename: str, mime_type: str, creation_date: str, mod_date: str)
In a PDF, a file specification provides name and metadata for a target file.
Most file specifications are simple file specifications, and contain only one attached file. Call
get_file()
to get the attached file:pdf = Pdf.open(...) fs = pdf.attachments['example.txt'] stream = fs.get_file()
To attach a new file to a PDF, you may construct a
AttachedFileSpec
.pdf = Pdf.open(...) fs = AttachedFileSpec.from_filepath(pdf, Path('somewhere/spreadsheet.xlsx')) pdf.attachments['spreadsheet.xlsx'] = fs
PDF supports the concept of having multiple, platform-specialized versions of the attached file (similar to resource forks on some operating systems). In theory, this attachment ought to be the same file, but encoded in different ways. For example, perhaps a PDF includes a text file encoded with Windows line endings (
\r\n
) and a different one with POSIX line endings (\n
). Similarly, PDF allows for the possibility that you need to encode platform-specific filenames. pikepdf cannot directly create these, because they are arguably obsolete; it can provide access to them, however.If you have to deal with platform-specialized versions, use
get_all_filenames()
to enumerate those available.Described in the PDF 1.7 Reference Manual section 7.11.3.
New in version 3.0.
- class pikepdf._core.AttachedFile
An object that contains an actual attached file.
These objects do not need to be created manually; they are normally part of an AttachedFileSpec.
New in version 3.0.
- creation_date
- mime_type
Get the MIME type of the attached file according to the PDF creator.
- mod_date
- class pikepdf.NameTree(obj: Object, *, auto_repair: bool = ...)
An object for managing name tree data structures in PDFs.
A name tree is a key-value data structure. The keys are any binary strings (that is, Python
bytes
). Ifstr
selected is provided as a key, the UTF-8 encoding of that string is tested. Name trees are (confusingly) not indexed bypikepdf.Name
objects. They behave likeDictMapping[bytes, pikepdf.Object]
.The keys are sorted; pikepdf will ensure that the order is preserved.
The value may be any PDF object. Typically it will be a dictionary or array.
Internally in the PDF, a name tree can be a fairly complex tree data structure implemented with many dictionaries and arrays. pikepdf (using libqpdf) will automatically read, repair and maintain this tree for you. There should not be any reason to access the internal nodes of a number tree; use this interface instead.
NameTrees are used to store certain objects like file attachments in a PDF. Where a more specific interface exists, use that instead, and it will manipulate the name tree in a semantic correct manner for you.
Do not modify the internal structure of a name tree while you have a
NameTree
referencing it. Access it only through theNameTree
object.Names trees are described in the PDF 1.7 Reference Manual section 7.9.6. See section 7.7.4 for a list of PDF objects that are stored in name trees.
New in version 3.0.
- class pikepdf.NumberTree(obj: Object, *, auto_repair: bool = ...)
An object for managing number tree data structures in PDFs.
A number tree is a key-value data structure, like name trees, except that the key is an integer. It behaves like
Dict[int, pikepdf.Object]
.The keys can be sparse - not all integers positions will be populated. Keys are also always sorted; pikepdf will ensure that the order is preserved.
The value may be any PDF object. Typically it will be a dictionary or array.
Internally in the PDF, a number tree can be a fairly complex tree data structure implemented with many dictionaries and arrays. pikepdf (using libqpdf) will automatically read, repair and maintain this tree for you. There should not be any reason to access the internal nodes of a number tree; use this interface instead.
NumberTrees are not used much in PDF. The main thing they provide is a mapping between 0-based page numbers and user-facing page numbers (which pikepdf also exposes as
Page.label
). The/PageLabels
number tree is where the page numbering rules are defined.Number trees are described in the PDF 1.7 Reference Manual section 7.9.7. See section 12.4.2 for a description of the page labels number tree. Here is an example of modifying an existing page labels number tree:
pagelabels = NumberTree(pdf.Root.PageLabels) # Label pages starting at 0 with lowercase Roman numerals pagelabels[0] = Dictionary(S=Name.r) # Label pages starting at 6 with decimal numbers pagelabels[6] = Dictionary(S=Name.D) # Page labels will now be: # i, ii, iii, iv, v, 1, 2, 3, ...
Do not modify the internal structure of a name tree while you have a
NumberTree
referencing it. Access it only through theNumberTree
object.New in version 5.4.
Module for generating PDF content streams.
- class pikepdf.canvas.Canvas(*, page_size)
Canvas for rendering PDFs with pikepdf.
All drawing is done on a pikepdf canvas using the .do property. This interface manages the graphics state of the canvas.
A Canvas can be exported as a single page Pdf using .to_pdf. This Pdf can then be merged into other PDFs or written to a file.
- Parameters:
page_size (tuple[int | float, int | float]) –
- add_font(resource_name, font)
Add a font to the page.
- Parameters:
resource_name (Name) –
font (Font) –
- property do: _CanvasAccessor
Do operations on the current graphics state.
- to_pdf()
Render the canvas as a single page PDF.
- Return type:
Pdf
- class pikepdf.canvas.Color(red, green, blue, alpha)
- alpha
Alias for field number 3
- blue
Alias for field number 2
- green
Alias for field number 1
- red
Alias for field number 0
- class pikepdf.canvas.ContentStreamBuilder
Content stream builder.
- append_rectangle(x, y, w, h)
Append rectangle to path.
- Parameters:
x (float) –
y (float) –
w (float) –
h (float) –
- begin_marked_content(mctype)
Begin marked content sequence.
- Parameters:
mctype (Name) –
- begin_marked_content_proplist(mctype, mcid)
Begin marked content sequence.
- Parameters:
mctype (Name) –
mcid (int) –
- begin_text()
Begin text object.
- build()
Build content stream.
- Return type:
bytes
- cm(matrix)
Concatenate matrix.
- Parameters:
matrix (Matrix) –
- draw_xobject(name)
Draw XObject.
Add instructions to render an XObject. The XObject must be defined in the document.
- Parameters:
name (Name) – Name of XObject
- end_marked_content()
End marked content sequence.
- end_text()
End text object.
- extend(other)
Append another content stream.
- Parameters:
other (ContentStreamBuilder) –
- fill()
Stroke and close path.
- line(x1, y1, x2, y2)
Draw line.
- Parameters:
x1 (float) –
y1 (float) –
x2 (float) –
y2 (float) –
- move_cursor(dx, dy)
Move cursor.
- pop()
Restore the graphics state.
- push()
Save the graphics state.
- set_dashes(array=None, phase=0)
Set dashes.
- set_fill_color(r, g, b)
Set RGB fill color.
- Parameters:
r (float) –
g (float) –
b (float) –
- set_line_width(width)
Set line width.
- set_stroke_color(r, g, b)
Set RGB stroke color.
- Parameters:
r (float) –
g (float) –
b (float) –
- set_text_font(font, size)
Set text font and size.
- Parameters:
font (Name) –
size (int) –
- set_text_horizontal_scaling(scale)
Set text horizontal scaling.
- Parameters:
scale (float) –
- set_text_matrix(matrix)
Set text matrix.
- Parameters:
matrix (Matrix) –
- set_text_rendering(mode)
Set text rendering mode.
- Parameters:
mode (int) –
- show_text(encoded)
Show text.
The text must be encoded in character codes expected by the font.
- Parameters:
encoded (bytes) –
- stroke_and_close()
Stroke and close path.
- class pikepdf.canvas.Font
Base class for fonts.
- abstract register(pdf)
Register the font.
Create several data structures in the Pdf to describe the font. While it create the data, a reference should be set in at least one page’s /Resources dictionary to retain the font in the output PDF and ensure it is usable on that page.
The returned Dictionary should be created as an indirect object, using
pdf.make_indirect()
.Returns a Dictionary suitable for insertion into a /Resources /Font dictionary.
- Parameters:
pdf (Pdf) –
- Return type:
Dictionary
- abstract text_width(text, fontsize)
Estimate the width of a text string when rendered with the given font.
- Parameters:
text (str) –
fontsize (float) –
- Return type:
float
- class pikepdf.canvas.Helvetica
Helvetica font.
- register(pdf)
Register the font.
- Parameters:
pdf (Pdf) –
- Return type:
Dictionary
- text_width(text, fontsize)
Estimate the width of a text string when rendered with the given font.
- Parameters:
text (str) –
fontsize (float) –
- Return type:
float
- class pikepdf.canvas.LoadedImage(name, image)
Loaded image.
- Parameters:
name (Name) –
image (Image) –
- class pikepdf.canvas.Text(direction=TextDirection.LTR)
Text object for rendering text on a pikepdf canvas.
- font(font, size)
Set font and size.
- Parameters:
font (Name) –
size (float) –
- horiz_scale(scale)
Set text horizontal scaling.
- move_cursor(x, y)
Move cursor.
- render_mode(mode)
Set text rendering mode.
- show(text)
Show text.
The text must be encoded in character codes expected by the font. If a text string is passed, it will be encoded as UTF-16BE. Text rendering will not work properly if the font’s character codes are not consistent with UTF-16BE. This is a rudimentary interface. You’ve been warned.
- Parameters:
text (str | bytes) –
- text_transform(matrix)
Set text matrix.
- Parameters:
matrix (Matrix) –
- class pikepdf.canvas.TextDirection(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)
Enumeration for text direction.