watsonx

pipuv

$ pip install novastack-observability-watsonx

$ uv add novastack-observability-watsonx

Observability#

WatsonxObservability #

Bases: BaseObservability

Add observability to Novastack LLM Calls with IBM watsonx.governance.

Attributes:

Name	Type	Description
`authenticator`	`Authenticator`	The authenticator specifies the authentication mechanism.
`subscription_id`	`str`	The subscription ID associated with the records being logged.
`region`	`str`	The region where watsonx.governance is hosted when using IBM Cloud. Defaults to `us-south`.
`service_instance_id`	`str`	The service instance ID.

Example

from donkey.core import set_global_handler
from novastack.observability.watsonx import WatsonxObservability

# watsonx.governance (IBM Cloud)
watsonx_handler = WatsonxObservability(
    authenticator=IAMAuthenticator(api_key="API_KEY"),
    subscription_id="SUBSCRIPTION_ID",
)

set_global_handler(watsonx_handler)

Client#

WatsonxGovClient #

Bases: BaseModel

Unified client for interacting with IBM watsonx.governance for prompt monitoring.

Supports both native IBM watsonx.ai LLMs (via :meth:setup_monitor) and external LLM providers (via :meth:setup_external_monitor).

Note

One of the following parameters is required: project_id or space_id, but not both.

Attributes:

Name	Type	Description
`authenticator`	`Authenticator`	The authenticator specifies the authentication mechanism.
`space_id`	`str`	The space ID in watsonx.governance.
`project_id`	`str`	The project ID in watsonx.governance.
`region`	`str`	The region where watsonx.governance is hosted when using IBM Cloud. Defaults to `us-south`.
`service_instance_id`	`str`	The service instance ID.

Example

from novastack.observability.watsonx import WatsonxGovClient
from novastack.observability.watsonx.authenticators import IAMAuthenticator

# watsonx.governance (IBM Cloud)
client = WatsonxGovClient(
    authenticator=IAMAuthenticator(api_key="API_KEY"),
    region="us-south",
    space_id="SPACE_ID",
)

# watsonx.governance (CP4D)
from novastack.observability.watsonx.authenticators import (
    CloudPakForDataAuthenticator,
)

client = WatsonxGovClient(
    authenticator=CloudPakForDataAuthenticator(
        url="CPD_URL",
        username="USERNAME",
        password="PASSWORD",
        instance_id="openshift",
        version="5.3",
    ),
    space_id="SPACE_ID",
)

setup_monitor #

setup_monitor(name: str, model_id: str, task_id: str, description: str = '', model_parameters: dict | None = None, prompt_template: str | None = None, prompt_variables: list[str] | None = None, locale: str = 'en', context_fields: list[str] | None = None, question_field: str | None = None) -> dict

Creates an IBM prompt template asset and setup monitor for the given prompt template asset.

Parameters:

Name	Type	Description	Default
`name`	`str`	The name of the Prompt Template Asset.	required
`model_id`	`str`	The ID of the model associated with the prompt.	required
`task_id`	`str`	The task identifier.	required
`description`	`str`	A description of the Prompt Template Asset.	`''`
`model_parameters`	`dict`	A dictionary of model parameters and their respective values.	`None`
`prompt_template`	`str`	The prompt template.	`None`
`prompt_variables`	`list[str]`	A list of values for prompt input variables.	`None`
`locale`	`str`	Locale code for the input/output language. eg. "en", "pt", "es".	`'en'`
`context_fields`	`list[str]`	A list of fields that will provide context to the prompt. Applicable only for the `retrieval_augmented_generation` task type.	`None`
`question_field`	`str`	The field containing the question to be answered. Applicable only for the `retrieval_augmented_generation` task type.	`None`

Example

client.setup_monitor(
    name="IBM prompt template",
    model_id="ibm/granite-3-2b-instruct",
    task_id="retrieval_augmented_generation",
    prompt_template="You are a helpful AI assistant. {context}. Question: {input_query}.",
    prompt_variables=["context", "input_query"],
    context_fields=["context"],
    question_field="input_query",
)

setup_external_monitor #

setup_external_monitor(name: str, model_id: str, model_provider: str, task_id: str, description: str = '', model_name: str | None = None, model_parameters: dict | None = None, model_url: str | None = None, prompt_id: str | None = None, prompt_url: str | None = None, prompt_additional_info: dict | None = None, prompt_template: str | None = None, prompt_variables: list[str] | None = None, locale: str = 'en', context_fields: list[str] | None = None, question_field: str | None = None) -> dict

Creates a detached (external) prompt template asset and attaches a monitor to it.

Parameters:

Name	Type	Description	Default
`name`	`str`	The name of the External Prompt Template Asset.	required
`model_id`	`str`	The ID of the model associated with the prompt.	required
`model_provider`	`str`	The model provider (e.g. "AWS Bedrock", "Azure OpenAI").	required
`task_id`	`str`	The task identifier.	required
`description`	`str`	A description of the External Prompt Template Asset.	`''`
`model_name`	`str`	The name of the external model.	`None`
`model_parameters`	`dict`	Model parameters and their respective values.	`None`
`model_url`	`str`	The URL of the external model.	`None`
`prompt_id`	`str`	The ID of the external prompt. Auto-generated if not provided.	`None`
`prompt_url`	`str`	The URL of the external prompt.	`None`
`prompt_additional_info`	`dict`	Additional information related to the external prompt.	`None`
`prompt_template`	`str`	The prompt template.	`None`
`prompt_variables`	`list[str]`	Values for the prompt variables.	`None`
`locale`	`str`	Locale code for the input/output language. eg. "en", "pt", "es".	`'en'`
`context_fields`	`list[str]`	A list of fields that will provide context to the prompt. Applicable only for the `retrieval_augmented_generation` task type.	`None`
`question_field`	`str`	The field containing the question to be answered. Applicable only for the `retrieval_augmented_generation` task type.	`None`

Example

client.setup_external_monitor(
    name="Prompt Template for Retrieval Augmented Generation",
    model_id="anthropic.claude-v2",
    model_provider="AWS Bedrock",
    task_id="retrieval_augmented_generation",
    model_name="Anthropic Claude 2.0",
    model_url="https://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-claude.html",
    prompt_template="You are a helpful AI assistant. {context}. Question: {input_query}.",
    prompt_variables=["context", "input_query"],
    context_fields=["context"],
    question_field="input_query",
)

log_payload_records #

log_payload_records(request_records: list[dict], subscription_id: str | None = None) -> list[str]

Stores records to the payload logging system.

Parameters:

Name	Type	Description	Default
`request_records`	`list[dict]`	A list of records to be logged. Each record is represented as a dictionary.	required
`subscription_id`	`str`	The subscription ID associated with the records being logged.	`None`

Example

client.log_payload_records(
    request_records=[
        {
            "context1": "value_context1",
            "context2": "value_context2",
            "input_query": "What's novastack Framework?",
            "generated_text": "novastack is a data framework to make AI easier to work with.",
            "input_token_count": 25,
            "generated_token_count": 150,
        }
    ],
    subscription_id="5d62977c-a53d-4b6d-bda1-7b79b3b9d1a0",
)

log_feedback_records #

log_feedback_records(request_records: list[dict], subscription_id: str | None = None) -> dict

Stores records to the feedback logging system.

Info

Feedback data must include the model output named generated_text.
For prompt monitors created using novastack, the label field is reference_output.

Parameters:

Name	Type	Description	Default
`request_records`	`list[dict]`	A list of records to be logged, where each record is represented as a dictionary.	required
`subscription_id`	`str`	The subscription ID associated with the records being logged.	`None`

Example

client.log_feedback_records(
    request_records=[
        {
            "context1": "value_context1",
            "context2": "value_context2",
            "input_query": "What's novastack Framework?",
            "reference_output": "novastack is a data framework to make AI easier to work with.",
            "generated_text": "novastack is a data framework to make AI easier to work with.",
        }
    ],
    subscription_id="5d62977c-a53d-4b6d-bda1-7b79b3b9d1a0",
)

create_custom_metric #

create_custom_metric(name: str, metrics: list[WatsonxMetricSpec], integrated_system_url: str, integrated_system_credentials: IntegratedSystemCredentials, schedule: bool = False) -> dict

Creates a custom metric definition for IBM watsonx.governance.

This must be done before using custom metrics.

Parameters:

Name	Type	Description	Default
`name`	`str`	The name of the custom metric group.	required
`metrics`	`list[WatsonxMetricSpec]`	A list of metrics to be measured.	required
`integrated_system_url`	`str`	The URL of the external metric provider.	required
`integrated_system_credentials`	`IntegratedSystemCredentials`	The credentials for the integrated system.	required
`schedule`	`bool`	Enable or disable the scheduler. Defaults to `False`.	`False`

Example

from novastack.observability.watsonx import (
    WatsonxMetricSpec,
    IntegratedSystemCredentials,
    WatsonxMetricThreshold,
)

client.create_custom_metric(
    name="Custom LLM Quality",
    metrics=[
        WatsonxMetricSpec(
            name="context_quality",
            applies_to=[
                "retrieval_augmented_generation",
                "summarization",
            ],
            thresholds=[
                WatsonxMetricThreshold(
                    threshold_type="lower_limit", default_value=0.75
                )
            ],
        )
    ],
    integrated_system_url="IS_URL",
    integrated_system_credentials=IntegratedSystemCredentials(
        auth_type="basic", username="USERNAME", password="PASSWORD"
    ),
)

associate_monitor_instance #

associate_monitor_instance(integrated_system_id: str, monitor_definition_id: str, subscription_id: str)

Associate the specified monitor definition to the specified subscription.

Parameters:

Name	Type	Description	Default
`integrated_system_id`	`str`	The ID of the integrated system.	required
`monitor_definition_id`	`str`	The ID of the custom metric monitor instance.	required
`subscription_id`	`str`	The ID of the subscription to associate the monitor with.	required

Example

client.associate_monitor_instance(
    integrated_system_id="019667ca-5687-7838-8d29-4ff70c2b36b0",
    monitor_definition_id="custom_llm_quality",
    subscription_id="0195e95d-03a4-7000-b954-b607db10fe9e",
)

log_measurements #

log_measurements(monitor_instance_id: str, run_id: str, request_records: dict[str, float | int])

Log aggregated metrics measurements to the specified custom monitor instance.

Parameters:

Name	Type	Description	Default
`monitor_instance_id`	`str`	The unique ID of the monitor instance.	required
`run_id`	`str`	The ID of the monitor run that generated the metrics.	required
`request_records`	`dict[str \| float \| int]`	dict containing the metrics to be published.	required

Example

client.log_measurements(
    monitor_instance_id="01966801-f9ee-7248-a706-41de00a8a998",
    run_id="RUN_ID",
    request_records={"context_quality": 0.914, "sensitivity": 0.85},
)

log_record_measurements #

log_record_measurements(custom_data_set_id: str, reference_data_set_id: str, computed_on: str, run_id: str, request_records: list[dict])

Log record-level measurements for individual records in the custom dataset.

Parameters:

Name	Type	Description	Default
`custom_data_set_id`	`str`	The ID of the custom metric data set.	required
`reference_data_set_id`	`str`	The dataset ID on which the metric was calculated.	required
`computed_on`	`str`	The dataset on which the metric was calculated (e.g., payload or feedback).	required
`run_id`	`str`	The ID of the monitor run that generated the metrics.	required
`request_records`	`list[dict]`	A list of dictionaries containing the records to be stored.	required

Example

client.log_record_measurements(
    custom_data_set_id="CUSTOM_DATASET_ID",
    reference_data_set_id="COMPUTED_ON_DATASET_ID",
    computed_on="payload",
    run_id="RUN_ID",
    request_records=[
        {
            "reference_record_id": "COMPUTED_ON_RECORD_ID",
            "record_timestamp": "2025-12-09T00:00:00Z",
            "context_quality": 0.786,
            "pii": 0.05,
        }
    ],
)

Supporting Classes#

IntegratedSystemCredentials #

Bases: BaseModel

Encapsulates the credentials for an Integrated System based on the authentication type.

Depending on the auth_type, only a subset of the properties is required.

Attributes:

Name	Type	Description
`auth_type`	`str`	The type of authentication. Currently supports "basic" and "bearer".
`username`	`str`	The username for Basic Authentication.
`password`	`str`	The password for Basic Authentication.
`token_url`	`str`	The URL of the authentication endpoint used to request a Bearer token.
`token_method`	`str`	The HTTP method (e.g., "POST", "GET") used to request the Bearer token. Defaults to "POST".
`token_headers`	`dict`	Optional headers to include when requesting the Bearer token. Defaults to `None`.
`token_payload`	`str \| dict`	The body or payload to send when requesting the Bearer token. Can be a string (e.g., raw JSON). Defaults to `None`.

WatsonxMetricSpec #

Bases: BaseModel

Defines the IBM watsonx.governance global monitor metric.

Attributes:

Name	Type	Description
`name`	`str`	The name of the metric.
`applies_to`	`list[str]`	A list of task types that the metric applies to. Currently supports: "summarization", "generation", "question_answering", "extraction", and "retrieval_augmented_generation".
`thresholds`	`list[WatsonxMetricThreshold]`	A list of metric thresholds associated with the metric.

Example

from novastack.observability.watsonx import (
    WatsonxMetricSpec,
    WatsonxMetricThreshold,
)

WatsonxMetricSpec(
    name="context_quality",
    applies_to=["retrieval_augmented_generation", "summarization"],
    thresholds=[
        WatsonxMetricThreshold(threshold_type="lower_limit", default_value=0.75)
    ],
)

WatsonxMetricThreshold #

Bases: BaseModel

Defines the metric threshold for IBM watsonx.governance.

Attributes:

Name	Type	Description
`threshold_type`	`str`	The threshold type. Can be either `lower_limit` or `upper_limit`.
`default_value`	`float`	The metric threshold value.

Example

from novastack.observability.watsonx import WatsonxMetricThreshold

WatsonxMetricThreshold(threshold_type="lower_limit", default_value=0.8)

Enums#

Region #

Supported IBM watsonx.governance regions.

Defines the available regions where watsonx.governance SaaS services are deployed.

Attributes:

Name	Type	Description
`AU_SYD`	`str`	"au-syd".
`AWS_AP_SOUTH`	`str`	"aws-ap-south".
`EU_DE`	`str`	"eu-de".
`US_SOUTH`	`str`	"us-south".

TaskType #

Supported IBM watsonx.governance tasks.

Attributes:

Name	Type	Description
`QUESTION_ANSWERING`	`str`	"question_answering".
`SUMMARIZATION`	`str`	"summarization".
`RETRIEVAL_AUGMENTED_GENERATION`	`str`	"retrieval_augmented_generation".
`CLASSIFICATION`	`str`	"classification".
`GENERATION`	`str`	"generation".
`CODE`	`str`	"code".
`EXTRACTION`	`str`	"extraction".

DataSetType #

Supported IBM watsonx.governance tasks.

Attributes:

Name	Type	Description
`PAYLOAD`	`str`	"payload".
`FEEDBACK`	`str`	"feedback".