Skip to content

Docling

% pip install novastack-loaders-docling

DoclingLoader #

Bases: BaseFileLoader

A document loader that uses the docling library to extract and structure content from various file types including PDF, DOCX, and HTML.

For more information, see Docling

Attributes:

Name Type Description
detached_tables bool

If True, separates extracted tables from the main document text and treats them as individual documents. Default is False.

export_table_format str

Format used when exporting tables. Applicable only if detached_tables is True. Choose between "markdown" or "html". Defaults to "markdown".

input_file str

File path to load.

Example
from novastack.loaders.docling import DoclingLoader

docling_loader = DoclingLoader(input_file="path/to/file.pdf")
documents = docling_loader.load_data()

load_data #

load_data() -> list[Document]

Loads data and returns a list of documents.