paperap.models.document.metadata.model module

Document metadata models for Paperless-NgX.

This module provides models for representing document metadata in Paperless-NgX, including file information, checksums, and document properties. These models are used to access and manipulate metadata associated with documents stored in the Paperless-NgX system.

class paperap.models.document.metadata.model.MetadataElement(**data)[source]

Bases: BaseModel

Represents a key-value pair of document metadata in Paperless-NgX.

This model represents individual metadata elements extracted from document files, such as author, creation date, or other file-specific properties. Each element consists of a key and its corresponding value.

key

The metadata field name or identifier.

value

The value associated with the metadata field.

Examples

>>> metadata = MetadataElement(key="Author", value="John Doe")
>>> print(f"{metadata.key}: {metadata.value}")
Author: John Doe
Parameters:

data (Any)

key: str
value: str
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class paperap.models.document.metadata.model.DocumentMetadata(**data)[source]

Bases: StandardModel

Represents comprehensive metadata for a Paperless-NgX document.

This model encapsulates all metadata associated with a document in Paperless-NgX, including information about both the original document and its archived version (if available). It provides access to file properties such as checksums, sizes, MIME types, and extracted metadata elements.

The metadata is primarily read-only as it is generated by the Paperless-NgX system during document processing.

original_checksum

The SHA256 checksum of the original document file.

original_size

The size of the original document in bytes.

original_mime_type

The MIME type of the original document (e.g., “application/pdf”).

media_filename

The filename of the document in the Paperless-NgX media storage.

has_archive_version

Whether the document has an archived version (typically a PDF/A).

original_metadata

List of metadata elements extracted from the original document.

archive_checksum

The SHA256 checksum of the archived document version.

archive_media_filename

The filename of the archived version in media storage.

original_filename

The original filename of the document when it was uploaded.

lang

The detected language code of the document content.

archive_size

The size of the archived document version in bytes.

archive_metadata

List of metadata elements extracted from the archived version.

Examples

>>> # Access document metadata
>>> metadata = client.documents.get(123).metadata
>>> print(f"Original file: {metadata.original_filename}")
>>> print(f"Size: {metadata.original_size} bytes")
>>> print(f"MIME type: {metadata.original_mime_type}")
>>>
>>> # Iterate through extracted metadata elements
>>> for element in metadata.original_metadata:
...     print(f"{element.key}: {element.value}")
Parameters:

data (Any)

original_checksum: str | None
original_size: int | None
original_mime_type: str | None
media_filename: str | None
has_archive_version: bool | None
original_metadata: list[MetadataElement]
archive_checksum: str | None
archive_media_filename: str | None
original_filename: str | None
lang: str | None
archive_size: int | None
archive_metadata: list[MetadataElement]
class Meta(model)[source]

Bases: Meta

Metadata configuration for the DocumentMetadata model.

This class defines metadata properties for the DocumentMetadata model, particularly specifying which fields are read-only.

Parameters:

model (type[_Self])

read_only_fields: ClassVar[set[str]] = {'archive_checksum', 'archive_media_filename', 'archive_metadata', 'archive_size', 'has_archive_version', 'id', 'lang', 'media_filename', 'original_checksum', 'original_filename', 'original_metadata', 'original_mime_type', 'original_size'}
blacklist_filtering_params: ClassVar[set[str]] = {}
field_map: dict[str, str] = {}
filtering_disabled: ClassVar[set[str]] = {}
filtering_fields: ClassVar[set[str]] = {'_resource', 'archive_checksum', 'archive_media_filename', 'archive_metadata', 'archive_size', 'has_archive_version', 'id', 'lang', 'media_filename', 'original_checksum', 'original_filename', 'original_metadata', 'original_mime_type', 'original_size'}
supported_filtering_params: ClassVar[set[str]] = {'id', 'id__in', 'limit'}
model: type[_Self]
name: str
model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'ignore', 'populate_by_name': True, 'use_enum_values': True, 'validate_assignment': True, 'validate_default': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_post_init(context: Any, /) None

We need to both initialize private attributes and call the user-defined model_post_init method.

Parameters:
Return type:

None

id: int