Token

TokenTextChunker #

Bases: BaseTextChunker

This is the simplest splitting method. Designed to split input text into smaller chunks by looking at word tokens.

Attributes:

Name	Type	Description
`chunk_size`	`int`	Size of each chunk. Default is `512`.
`chunk_overlap`	`int`	Amount of overlap between chunks. Default is `256`.
`separator`	`str`	Separators used for splitting into words. Default is `\\n\\n`.

Example

from novastack.core.text_chunker import TokenTextChunker

text_chunker = TokenTextChunker()

get_text_chunks(text: str) -> list[str]

Split a single string of text into smaller chunks.

Parameters:

Name	Type	Description	Default
`text`	`str`	Input text to split.	required

get_document_chunks(documents: list[Document]) -> list[Document]

Split a list of documents into smaller document chunks.

Parameters:

Name	Type	Description	Default
`documents`	`list[Document]`	Documents to split.	required