Skip to content

File

% pip install novastack-loaders-file

DocxLoader #

Bases: BaseFileLoader

Microsoft Word (Docx) loader.

Attributes:

Name Type Description
input_file str

File path to load.

Example
from novastack.loaders.file import DocxLoader

loader = DocxLoader(input_file="path/to/file.docx")
documents = loader.load_data()

load_data #

load_data() -> list[Document]

Loads data and returns a list of documents.

HtmlLoader #

Bases: BaseFileLoader

Load a HTML file and extract text from a specific tag.

Attributes:

Name Type Description
input_file str

File path to load.

tag str

HTML tag to extract. Defaults to section.

Example
from novastack.loaders.file import HtmlLoader

loader = HtmlLoader(input_file="path/to/file.html")
documents = loader.load_data()

load_data #

load_data() -> list[Document]

Loads data and returns a list of documents.

JsonLoader #

Bases: BaseFileLoader

JSON loader.

Attributes:

Name Type Description
input_file str

File path to load.

jq_schema str

jq schema to use to extract the data from the JSON.

Example
from novastack.loaders.file import JsonLoader

loader = JsonLoader(input_file="path/to/file.json")
documents = loader.load_data()

load_data #

load_data() -> list[Document]

Loads data and returns a list of documents.

PdfLoader #

Bases: BaseFileLoader

PDF loader using PyPDF.

Attributes:

Name Type Description
input_file str

File path to load.

Example
from novastack.loaders.file import PdfLoader

loader = PdfLoader(input_file="path/to/file.pdf")
documents = loader.load_data()

load_data #

load_data() -> list[Document]

Loads data and returns a list of documents.