Token
TokenTextChunker #
Bases: BaseTextChunker
This is the simplest splitting method. Designed to split input text into smaller chunks by looking at word tokens.
Attributes:
| Name | Type | Description |
|---|---|---|
chunk_size |
int
|
Size of each chunk. Default is |
chunk_overlap |
int
|
Amount of overlap between chunks. Default is |
separator |
str
|
Separators used for splitting into words. Default is |
Example
from novastack.core.text_chunker import TokenTextChunker
text_chunker = TokenTextChunker()
get_text_chunks #
get_text_chunks(text: str) -> list[str]
Split a single string of text into smaller chunks.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
text
|
str
|
Input text to split. |
required |
get_document_chunks #
get_document_chunks(documents: list[Document]) -> list[Document]
Split a list of documents into smaller document chunks.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
documents
|
list[Document]
|
Documents to split. |
required |