abacusai.document_data

Module Contents

Classes

DocumentData

Data extracted from a docstore document.

class abacusai.document_data.DocumentData(client, docId=None, mimeType=None, pageCount=None, extractedText=None, embeddedText=None, pages=None, tokens=None, metadata=None, pageMarkdown=None)

Bases: abacusai.return_class.AbstractApiClass

Data extracted from a docstore document.

Parameters:
  • client (ApiClient) – An authenticated API Client instance

  • docId (str) – Unique Docstore string identifier for the document.

  • mimeType (str) – The mime type of the document.

  • pageCount (int) – The total number of pages in document.

  • extractedText (str) – The extracted text in the document obtained from OCR.

  • embeddedText (str) – The embedded text in the document. Only available for digital documents.

  • pages (list) – List of embedded text for each page in the document. Only available for digital documents.

  • tokens (list) – List of extracted tokens in the document obtained from OCR.

  • metadata (list) – List of metadata for each page in the document.

  • pageMarkdown (list) – The markdown text for the page.

__repr__()

Return repr(self).

to_dict()

Get a dict representation of the parameters in this class

Returns:

The dict value representation of the class parameters

Return type:

dict