VAST as a reward signal: vastlint for ML and agents
Short answer: if you train a model that generates VAST tags, or build a buyer or seller agent that touches them, call vastlint.validate(xml) as your verifier. It is deterministic, runs in-process in roughly 13 microseconds for a small tag, and returns structured issues a model can act on. Drop it in as a reward signal during training, a verifier in an agent repair loop, or a gate in an eval harness.
A language model is a probabilistic generator; the VAST spec is not. You do not want the model to be your judge of whether its own output is a valid tag. A deterministic verifier is faster than the model, never hallucinates, and returns the same verdict every time, the properties a training signal needs. It is the same Rust core used by the web validator, the CLI, and the MCP server, so the verdict matches across your whole pipeline.
Why it fits ML and agentic workflows
- Deterministic: identical input returns byte-identical output, including issue order
- Fast: roughly 13µs per small tag, in-process, no subprocess or network hop
- Stateless and thread-safe: fan out across a thread or process pool, same results as serial
- Structured feedback: each issue carries a rule id, message, path, and spec citation
The reward function
Start with a binary reward: a tag is either spec-clean or it is not. The important part is not the happy path, it is guarding every way a model can hand you something that is not a tag.
import vastlint
from vastlint import VastlintError
def vast_reward(xml: str) -> float:
"""Binary reward: 1.0 for a spec-clean tag, 0.0 otherwise.
Guards every way a model can hand you something that is not a valid tag:
- empty string / empty bytes -> ValueError
- None / non-string output -> TypeError
- unparseable XML -> VastlintError
"""
try:
return 1.0 if vastlint.validate(xml).valid else 0.0
except (VastlintError, ValueError, TypeError):
return 0.0For RL or rejection sampling you usually want a denser signal so the model gets gradient while a tag is still invalid. Penalise by severity instead of a single pass/fail bit.
def graded_vast_reward(xml: str) -> float:
"""Dense reward: penalise by severity instead of a single pass/fail bit.
Gives the model gradient even while a tag is still invalid: fewer errors
and warnings score higher, so partial progress is rewarded.
"""
try:
s = vastlint.validate(xml).summary.to_dict()
except (VastlintError, ValueError, TypeError):
return -1.0
return -(1.0 * s["errors"] + 0.25 * s["warnings"])The result contract
Every call returns a Result with the same stable shape. The structured issues are what make this usable as feedback rather than a scalar.
r = vastlint.validate(xml)
r.valid # bool: zero errors against the IAB spec
r.version # detected VAST version, e.g. "4.2"
r.issues # list[Issue]
r.summary # Summary
s = r.summary.to_dict()
# {"errors": 2, "warnings": 1, "infos": 1, "valid": False}
issue = r.issues[0].to_dict()
# {
# "id": "VAST-4.1-adservingid-present",
# "severity": "error", # error | warning | info
# "message": "<InLine> is missing required <AdServingId> ...",
# "path": "/VAST/Ad[0]/InLine",
# "spec_ref": "IAB VAST 4.1 §3.4.1",
# "line": 1,
# "col": 32,
# }
r.to_dict() # full result, JSON-safe
r.to_json() # stable string, round-trips via Result.from_json()Agent repair loop
In an agentic flow the verifier closes the loop: validate, hand the errors back to the model, revalidate, until the tag is clean or you run out of steps. Because the issues name the exact rule, element path, and spec section, the model has something concrete to fix.
# Verifier inside an agent repair loop: the structured issues are the
# feedback the model acts on, not a single pass/fail bit.
def repair(xml: str, agent, max_steps: int = 5) -> str:
for _ in range(max_steps):
r = vastlint.validate(xml)
if r.valid:
return xml
feedback = [
{"rule": i.id, "fix": i.message, "where": i.path}
for i in r.issues
if i.severity == "error"
]
xml = agent.revise(xml, feedback) # your model call
return xmlEval harness and batch gating
Validation is independent per tag and holds no shared state, so it parallelises cleanly. Use it to gate a batch of rollouts, score a dataset of generated creatives, or measure error rates across model checkpoints.
from concurrent.futures import ProcessPoolExecutor
# Gate a batch of rollouts. Validation is stateless and thread-safe, so a
# process (or thread) pool returns the same results as a serial run.
def score_batch(candidates: list[str]) -> list[float]:
with ProcessPoolExecutor() as pool:
return list(pool.map(vast_reward, candidates))Edge cases that break reward functions
These are the behaviours that bite when model output is the input. They are all clean, documented Python exceptions, never a crash, but you have to catch the right ones.
- Empty input raises
ValueError, notVastlintError.validate("")andvalidate(b"")raiseValueError: xml must not be empty. A model can easily emit an empty string, so a reward function that only catchesVastlintErrorwill crash the rollout. - Wrong types raise
TypeError.None,int, andlistare rejected cleanly. Catch(VastlintError, ValueError, TypeError)to cover all three. - Whitespace-only is not empty.
" "returns a normal (invalid)Resultrather than raising, so it takes a different path than"". Normalise if you treat them the same. validis strict. A tag needs the required elements for its version (for exampleAdServingIdin VAST 4.1+) to reportvalid=True. For a graded reward, lean on thesummarycounts rather than the binary flag.- str and bytes are equivalent. Both produce the same result; UTF-8, unicode, and a leading BOM are handled.
Tuning the signal
Use rule_overrides to shape what counts. Map a rule id to "off" to stop penalising a rule you do not care about, or to "error" to escalate a warning you want the model to treat as fatal.
vastlint.validate(
xml,
max_wrapper_depth=5,
rule_overrides={
"VAST-4.1-mezzanine-recommended": "off",
"VAST-2.0-mediafile-https": "error",
},
)