Is vastlint deterministic enough to use as a reward signal?

Yes. The same XML returns byte-identical results, including issue order, across repeated calls. Validation is stateless, in-process, and runs in roughly 13 microseconds for a small tag, so it is safe to call inside a training loop or across thousands of rollouts.

What happens if a model emits an empty string or non-string output?

Empty input raises ValueError, not VastlintError, and non-string input (None, int, list) raises TypeError. A reward function that only catches VastlintError will leak these and crash the rollout, so catch (VastlintError, ValueError) and guard the type.

Can I run vastlint across many workers in an eval harness?

Yes. Validation holds no shared state and is thread-safe: a thread or process pool returns the same results as a serial run. Throughput scales close to linearly across cores.

VAST as a reward signal: vastlint for ML and agents

Short answer: if you train a model that generates VAST tags, or build a buyer or seller agent that touches them, call vastlint.validate(xml) as your verifier. It is deterministic, runs in-process in roughly 13 microseconds for a small tag, and returns structured issues a model can act on. Drop it in as a reward signal during training, a verifier in an agent repair loop, or a gate in an eval harness.

A language model is a probabilistic generator; the VAST spec is not. You do not want the model to be your judge of whether its own output is a valid tag. A deterministic verifier is faster than the model, never hallucinates, and returns the same verdict every time, the properties a training signal needs. It is the same Rust core used by the web validator, the CLI, and the MCP server, so the verdict matches across your whole pipeline.

Why it fits ML and agentic workflows

Deterministic: identical input returns byte-identical output, including issue order
Fast: roughly 13µs per small tag, in-process, no subprocess or network hop
Stateless and thread-safe: fan out across a thread or process pool, same results as serial
Structured feedback: each issue carries a rule id, message, path, and spec citation

The reward function

Start with a binary reward: a tag is either spec-clean or it is not. The important part is not the happy path, it is guarding every way a model can hand you something that is not a tag.

import vastlint
from vastlint import VastlintError


def vast_reward(xml: str) -> float:
    """Binary reward: 1.0 for a spec-clean tag, 0.0 otherwise.

    Guards every way a model can hand you something that is not a valid tag:
      - empty string / empty bytes  -> ValueError
      - None / non-string output    -> TypeError
      - unparseable XML             -> VastlintError
    """
    try:
        return 1.0 if vastlint.validate(xml).valid else 0.0
    except (VastlintError, ValueError, TypeError):
        return 0.0

import vastlint
from vastlint import VastlintError


def vast_reward(xml: str) -> float:
    """Binary reward: 1.0 for a spec-clean tag, 0.0 otherwise.

    Guards every way a model can hand you something that is not a valid tag:
      - empty string / empty bytes  -> ValueError
      - None / non-string output    -> TypeError
      - unparseable XML             -> VastlintError
    """
    try:
        return 1.0 if vastlint.validate(xml).valid else 0.0
    except (VastlintError, ValueError, TypeError):
        return 0.0

For RL or rejection sampling you usually want a denser signal so the model gets gradient while a tag is still invalid. Penalise by severity instead of a single pass/fail bit.

def graded_vast_reward(xml: str) -> float:
    """Dense reward: penalise by severity instead of a single pass/fail bit.

    Gives the model gradient even while a tag is still invalid: fewer errors
    and warnings score higher, so partial progress is rewarded.
    """
    try:
        s = vastlint.validate(xml).summary.to_dict()
    except (VastlintError, ValueError, TypeError):
        return -1.0
    return -(1.0 * s["errors"] + 0.25 * s["warnings"])

def graded_vast_reward(xml: str) -> float:
    """Dense reward: penalise by severity instead of a single pass/fail bit.

    Gives the model gradient even while a tag is still invalid: fewer errors
    and warnings score higher, so partial progress is rewarded.
    """
    try:
        s = vastlint.validate(xml).summary.to_dict()
    except (VastlintError, ValueError, TypeError):
        return -1.0
    return -(1.0 * s["errors"] + 0.25 * s["warnings"])

The result contract

Every call returns a Result with the same stable shape. The structured issues are what make this usable as feedback rather than a scalar.

r = vastlint.validate(xml)

r.valid           # bool: zero errors against the IAB spec
r.version         # detected VAST version, e.g. "4.2"
r.issues          # list[Issue]
r.summary         # Summary

s = r.summary.to_dict()
# {"errors": 2, "warnings": 1, "infos": 1, "valid": False}

issue = r.issues[0].to_dict()
# {
#   "id":       "VAST-4.1-adservingid-present",
#   "severity": "error",            # error | warning | info
#   "message":  "<InLine> is missing required <AdServingId> ...",
#   "path":     "/VAST/Ad[0]/InLine",
#   "spec_ref": "IAB VAST 4.1 §3.4.1",
#   "line":     1,
#   "col":      32,
# }

r.to_dict()       # full result, JSON-safe
r.to_json()       # stable string, round-trips via Result.from_json()

r = vastlint.validate(xml)

r.valid           # bool: zero errors against the IAB spec
r.version         # detected VAST version, e.g. "4.2"
r.issues          # list[Issue]
r.summary         # Summary

s = r.summary.to_dict()
# {"errors": 2, "warnings": 1, "infos": 1, "valid": False}

issue = r.issues[0].to_dict()
# {
#   "id":       "VAST-4.1-adservingid-present",
#   "severity": "error",            # error | warning | info
#   "message":  "<InLine> is missing required <AdServingId> ...",
#   "path":     "/VAST/Ad[0]/InLine",
#   "spec_ref": "IAB VAST 4.1 §3.4.1",
#   "line":     1,
#   "col":      32,
# }

r.to_dict()       # full result, JSON-safe
r.to_json()       # stable string, round-trips via Result.from_json()

Agent repair loop

In an agentic flow the verifier closes the loop: validate, hand the errors back to the model, revalidate, until the tag is clean or you run out of steps. Because the issues name the exact rule, element path, and spec section, the model has something concrete to fix.

# Verifier inside an agent repair loop: the structured issues are the
# feedback the model acts on, not a single pass/fail bit.
def repair(xml: str, agent, max_steps: int = 5) -> str:
    for _ in range(max_steps):
        r = vastlint.validate(xml)
        if r.valid:
            return xml
        feedback = [
            {"rule": i.id, "fix": i.message, "where": i.path}
            for i in r.issues
            if i.severity == "error"
        ]
        xml = agent.revise(xml, feedback)   # your model call
    return xml

# Verifier inside an agent repair loop: the structured issues are the
# feedback the model acts on, not a single pass/fail bit.
def repair(xml: str, agent, max_steps: int = 5) -> str:
    for _ in range(max_steps):
        r = vastlint.validate(xml)
        if r.valid:
            return xml
        feedback = [
            {"rule": i.id, "fix": i.message, "where": i.path}
            for i in r.issues
            if i.severity == "error"
        ]
        xml = agent.revise(xml, feedback)   # your model call
    return xml

Eval harness and batch gating

Validation is independent per tag and holds no shared state, so it parallelises cleanly. Use it to gate a batch of rollouts, score a dataset of generated creatives, or measure error rates across model checkpoints.

from concurrent.futures import ProcessPoolExecutor

# Gate a batch of rollouts. Validation is stateless and thread-safe, so a
# process (or thread) pool returns the same results as a serial run.
def score_batch(candidates: list[str]) -> list[float]:
    with ProcessPoolExecutor() as pool:
        return list(pool.map(vast_reward, candidates))

from concurrent.futures import ProcessPoolExecutor

# Gate a batch of rollouts. Validation is stateless and thread-safe, so a
# process (or thread) pool returns the same results as a serial run.
def score_batch(candidates: list[str]) -> list[float]:
    with ProcessPoolExecutor() as pool:
        return list(pool.map(vast_reward, candidates))

Edge cases that break reward functions

These are the behaviours that bite when model output is the input. They are all clean, documented Python exceptions, never a crash, but you have to catch the right ones.

Empty input raises ValueError, not VastlintError. validate("") and validate(b"") raise ValueError: xml must not be empty. A model can easily emit an empty string, so a reward function that only catches VastlintError will crash the rollout.
Wrong types raise TypeError. None, int, and list are rejected cleanly. Catch (VastlintError, ValueError, TypeError) to cover all three.
Whitespace-only is not empty. " " returns a normal (invalid) Result rather than raising, so it takes a different path than "". Normalise if you treat them the same.
valid is strict. A tag needs the required elements for its version (for example AdServingId in VAST 4.1+) to report valid=True. For a graded reward, lean on the summary counts rather than the binary flag.
str and bytes are equivalent. Both produce the same result; UTF-8, unicode, and a leading BOM are handled.

Tuning the signal

Use rule_overrides to shape what counts. Map a rule id to "off" to stop penalising a rule you do not care about, or to "error" to escalate a warning you want the model to treat as fatal.

vastlint.validate(
    xml,
    max_wrapper_depth=5,
    rule_overrides={
        "VAST-4.1-mezzanine-recommended": "off",
        "VAST-2.0-mediafile-https": "error",
    },
)

vastlint.validate(
    xml,
    max_wrapper_depth=5,
    rule_overrides={
        "VAST-4.1-mezzanine-recommended": "off",
        "VAST-2.0-mediafile-https": "error",
    },
)