File

% pip install novastack-loaders-file

DocxLoader #

Bases: BaseFileLoader

Microsoft Word (Docx) loader.

Attributes:

Name	Type	Description
`input_file`	`str`	File path to load.

Example

from novastack.loaders.file import DocxLoader

loader = DocxLoader(input_file="path/to/file.docx")
documents = loader.load_data()

load_data() -> list[Document]

Loads data and returns a list of documents.

Bases: BaseFileLoader

Load a HTML file and extract text from a specific tag.

Attributes:

Name	Type	Description
`input_file`	`str`	File path to load.
`tag`	`str`	HTML tag to extract. Defaults to `section`.

Example

from novastack.loaders.file import HtmlLoader

loader = HtmlLoader(input_file="path/to/file.html")
documents = loader.load_data()

load_data() -> list[Document]

Loads data and returns a list of documents.

Bases: BaseFileLoader

JSON loader.

Attributes:

Name	Type	Description
`input_file`	`str`	File path to load.
`jq_schema`	`str`	jq schema to use to extract the data from the JSON.

Example

from novastack.loaders.file import JsonLoader

loader = JsonLoader(input_file="path/to/file.json")
documents = loader.load_data()

load_data() -> list[Document]

Loads data and returns a list of documents.

Bases: BaseFileLoader

PDF loader using PyPDF.

Attributes:

Name	Type	Description
`input_file`	`str`	File path to load.

Example

from novastack.loaders.file import PdfLoader

loader = PdfLoader(input_file="path/to/file.pdf")
documents = loader.load_data()

load_data() -> list[Document]

Loads data and returns a list of documents.