Skip to content

Context similarity

ContextSimilarityEvaluator #

Bases: BaseEvaluator

Evaluates context similarity using query-context and answer-context similarity combined with harmonic mean.

This evaluator provides a balanced evaluation of RAG systems by ensuring: - Retrieval quality: How well contexts match the query (retrieval quality). - Answer grounding: How well the answer is supported by contexts (answer grounding).

The harmonic mean is used to combine scores because it: - Penalizes imbalanced scores (both metrics must be good) - Is more conservative than arithmetic mean - Better reflects the "weakest link" in RAG quality

Attributes:

Name Type Description
embed_model BaseEmbedding

The embedding model used to compute vector representations.

similarity_mode SimilarityMode

Similarity strategy to use. Supported options are "cosine", "dot_product", and "euclidean". Defaults to "cosine".

score_threshold float

Minimum required score for evaluation approval. Must be between 0.0 and 1.0. Defaults to 0.8.

Example
from novastack.core.evaluation import ContextSimilarityEvaluator
from novastack.embedding.huggingface import HuggingFaceEmbedding

embedding = HuggingFaceEmbedding()

evaluator = ContextSimilarityEvaluator(embed_model=embedding)

evaluate #

evaluate(query: str | None = None, generated_text: str | None = None, contexts: list[str] | None = None, **kwargs: Any) -> dict

Evaluate context similarity using query and answer.

Parameters:

Name Type Description Default
query str

Input query text

None
generated_text str

LLM-generated answer text

None
contexts list[str]

List of context strings

None
**kwargs Any

Additional keyword arguments (unused)

{}
Example
result = evaluator.evaluate(
    query="What is the capital of France?",
    generated_text="The capital of France is Paris.",
    contexts=[
        "Paris is the capital city of France.",
        "France is in Europe.",
    ],
)

print(f"Query-Context Score: {result['query_context_similarity']['score']}")
print(
    f"Answer-Context Score: {result['answer_context_similarity']['score']}"
)
print(f"Combined Score: {result['score']}")
print(f"Passing: {result['passing']}")