File
% pip install novastack-loaders-file
DocxLoader #
Bases: BaseFileLoader
Microsoft Word (Docx) loader.
Attributes:
| Name | Type | Description |
|---|---|---|
input_file |
str
|
File path to load. |
Example
from novastack.loaders.file import DocxLoader
loader = DocxLoader(input_file="path/to/file.docx")
documents = loader.load_data()
HtmlLoader #
Bases: BaseFileLoader
Load a HTML file and extract text from a specific tag.
Attributes:
| Name | Type | Description |
|---|---|---|
input_file |
str
|
File path to load. |
tag |
str
|
HTML tag to extract. Defaults to |
Example
from novastack.loaders.file import HtmlLoader
loader = HtmlLoader(input_file="path/to/file.html")
documents = loader.load_data()
JsonLoader #
Bases: BaseFileLoader
JSON loader.
Attributes:
| Name | Type | Description |
|---|---|---|
input_file |
str
|
File path to load. |
jq_schema |
str
|
jq schema to use to extract the data from the JSON. |
Example
from novastack.loaders.file import JsonLoader
loader = JsonLoader(input_file="path/to/file.json")
documents = loader.load_data()
PdfLoader #
Bases: BaseFileLoader
PDF loader using PyPDF.
Attributes:
| Name | Type | Description |
|---|---|---|
input_file |
str
|
File path to load. |
Example
from novastack.loaders.file import PdfLoader
loader = PdfLoader(input_file="path/to/file.pdf")
documents = loader.load_data()