Changelog¶

All notable changes to this project are documented here. The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

Unreleased ¶

0.5.0 - 2026-06-24¶

Added¶

Serving helpers (P23) — agentix.serving turns an Agent into a streaming HTTP endpoint. Dependency-free serializers map the stream() events to Server-Sent Events / NDJSON (event_to_dict, sse_events, ndjson_events, outcome_to_payload), and a thin lazy-imported FastAPI/Starlette adapter (sse_response / ndjson_response, extra agentix[serving]) wraps them in a StreamingResponse. outcome_to_payload serializes a run (including a suspended run's pending approvals) for a request/response + /approve flow. See examples/30_serving_fastapi.py.

Changed¶

Friendlier, plain-language package summary and README intro (consistent with the docs voice). No API changes.

0.4.1 - 2026-06-24¶

Added¶

Discoverability — the docs site now publishes llms.txt and llms-full.txt (a curated and a full-text, LLM-friendly index of the docs) so AI coding assistants can read it cleanly; broader PyPI keywords and trove classifiers (AI / async / typed); PyPI project links (Documentation, Changelog, Issues) so the docs link shows on the project page; and GitHub repo topics.
Documentation site (P17) — a Material for MkDocs site with a beginner-friendly getting-started, ~12 task guides (each linking a runnable example), a plain- language security model writeup, and a full API reference generated from the docstrings (mkdocstrings). Builds are gated in CI (mkdocs build --strict) and auto-deploy to GitHub Pages. Adds a docs dependency group; no library code changes.

0.4.0 - 2026-06-24¶

Added¶

Tier C polish (one PR): eval dataset loaders — load_cases(path) reads Cases from .jsonl / .json / .csv (extra keys/columns fold into metadata); record/replay cassettes — CassetteModel records a real model's responses to a JSON file and replays them deterministically (mode="auto"), for fast offline tests; subagent cost roll-up — a tool may return a ToolResult carrying cost_usd/tokens_used, and subagent_tool now does, so a child run's spend is added to the parent's AgentOutcome totals; instrument(agent) — one call wraps the model in TracingModel and merges tracing_events into the agent's events (existing callbacks preserved). See examples/29_cassettes.py and examples/13_subagents.py.
First-class structured output (P21) — Agent(response_model=…) (a Pydantic model class or a raw JSON-Schema dict) wires the whole path in one knob: an output validator so outcome.parsed is typed/validated (re-prompting on failure), a schema instruction injected into the system context (any model conforms), and native provider enforcement when the adapter supports it via a new with_response_format(schema) (Anthropic output_config.format, OpenAI/ LiteLLM response_format, Gemini response_schema, Ollama format; composes through RetryModel/FallbackModel). See examples/27_structured_output.py.
Rate-limit-aware retries (P22) — RetryModel now honors a provider's Retry-After (via retry_after=, default default_retry_after reading the error attribute or response header) instead of blind exponential backoff, capped by max_sleep, with an on_retry hook to surface waits. Falls back to exponential backoff when no hint is present. See examples/28_rate_limit.py.
Pluggable memory (P20) — a Memory protocol for cross-session recall (recall(query) / write(content)); MemoryRecord; and a dependency-free InMemoryMemory default with keyword-overlap recall (+ dump/load for persistence). Agent(memory=…, memory_limit=…) recalls before each run/stream and injects the records as trusted system context; remember_exchange=True persists each completed exchange. agentix owns the interface — bring your own vector DB / search index as the backend. See examples/26_memory.py.
Token-accurate context (P19) — a TokenCounter abstraction (Callable[[str], int]) with a dependency-free HeuristicTokenCounter / approx_token_counter default, transcript counting (count_tokens, count_message_tokens; text + tool calls + a per-media estimate + per-message overhead), and FitContextWindow — a context strategy that trims to a real token budget (dropping oldest whole rounds, preserving tool pairing, with reserve_tokens headroom) instead of counting rounds/characters. Pass any tokenizer (e.g. tiktoken, a provider counter) as the counter. See examples/25_token_context.py.

0.3.0 - 2026-06-23¶

Added¶

Suspendable human-in-the-loop (P18) — Agent(suspend_on_confirm=True) pauses a run when a tool needs confirmation instead of awaiting confirm_fn inline: it checkpoints (with the assistant tool-turn as the tail) and returns AgentOutcome(status="suspended", pending=[PendingApproval(...)]). A later resume(run_id, decisions={call_id: bool}) — on the same or a brand-new Agent, since the state lives in the store — finishes that turn (approve/deny; undecided pending calls fail closed) and continues. Requires a store + run_id; adds the on_suspend event and the PendingApproval type. Built for web/serverless flows where the request coroutine can't block. See examples/24_suspend_resume.py.
Sandboxed execution (P16) — SubprocessExecutor (a ToolExecutor) runs each tool as a separate OS process and actually enforces the limits the loop passes: network egress is denied when network_allowlist is empty (Linux network namespace via unshare, auto-detected; fails closed if it can't isolate, unless require_network_isolation=False), plus POSIX CPU/memory/file-size/ process rlimits, a fresh per-call temp working directory, a scrubbed environment (no parent secrets leak), an output cap, and a timeout that kills the process group. Ships SandboxPolicy and Command. This closes the gap where LocalToolExecutor ignored network_allowlist. See examples/23_sandbox.py.
Multimodal input (P15) — Message.content is now str | list[ContentPart], with TextPart, ImagePart, DocumentPart, and AudioPart (build via from_path / from_bytes / from_base64 / from_url). Message.text gives a string view. Agent.run/run_sync/stream accept a parts list anywhere a string request goes. Every adapter translates supported media to its provider format and raises a clear error otherwise (e.g. audio on Anthropic, URL images on Bedrock). Plain-string content is fully backward compatible. See examples/22_multimodal.py.
Multi-provider adapters (P14) — the toolkit now ships five more model backends alongside Anthropic, each behind its own extra and each a drop-in ModelFn: OpenAIModel (agentix[openai]; Chat Completions, also drives any OpenAI-compatible base_url, with streaming), GeminiModel (agentix[gemini]), BedrockModel (agentix[bedrock]; AWS Converse API, run off-thread), OllamaModel (agentix[ollama]; local models), and LiteLLMModel (agentix[litellm]; one bridge to 100+ providers). Best-effort pricing added for common OpenAI/Gemini models (override with register_price). See examples/21_providers.py.
AnthropicModel typed reasoning/cost knobs: thinking (True/"adaptive"/ "summarized"/"disabled"/dict), effort (low…max), and task_budget (int; adds the required beta header) — previously only via opaque extra. Docstring documents refusal-fallback behavior.
PromptRegistry: lightweight in-process prompt versioning with register / get / rollback / render and to_dict/from_dict persistence.

0.2.1 - 2026-06-23¶

Fixed¶

agentix.__version__ now reflects the installed distribution version (derived from package metadata) instead of a hardcoded string that could drift. (0.2.0 shipped reporting 0.1.0.)

0.2.0 - 2026-06-23¶

Added¶

Subagents: subagent_tool(agent, ...) exposes a child agent as a delegable tool (its own model/system prompt/tools/guards); composes with the loop and bounded_gather.
Cost & control: USD cost tracking (pricing module, cost_usd, and cost_usd on ModelResponse/AgentOutcome; the Anthropic adapter fills input_tokens/output_tokens/cost_usd); AgentPolicy.max_budget_usd; and Interrupt to stop a run/stream at a safe boundary.
Dynamic permissions: CallbackGuard (a can_use_tool-style per-call callback returning allow/deny/confirm) and ToolAllowlistGuard (scope a run to a subset of tools).
Output validation + retry: Agent(output_validator=, max_output_retries=) re-prompts on a failed validation and exposes AgentOutcome.parsed. Ships json_output, pydantic_output, regex_output.
Resilient model wrappers: RetryModel (backoff) and FallbackModel (try-next-on-error), composable and drop-in.
Eval harness (agentix.evals): evaluate(...) runs an agent over Cases and returns an EvalReport with pass_rate / format_success_rate / assert_pass_rate() (gate CI on regressions). Scorers: exact_match, contains, regex_match, predicate, llm_judge.
SelfConsistencyModel: sample a model N times per turn and return the majority vote (drop-in ModelFn).
JudgeGuard: an LLM reviews the final answer against a rubric and replaces it on failure (an on_answer safety/on-brand/format gate).
Anthropic adapter: structured-output passthrough documented (output_config={"format": ...}) and strict tool schemas forwarded.
OpenTelemetry tracing (agentix[otel]): TracingModel, tracing_events, and trace_run produce a span tree (run → model/tool spans) for your observability stack.

0.1.0 - 2026-06-22¶

Initial release.

Core¶

Async agent loop: Agent.run / run_sync / stream / resume, with step and token budgets.
Provider-agnostic ModelFn; tool schemas flow to the model.
@tool decorator generating JSON Schema from type hints + docstrings; Tool / ToolRegistry.
LocalToolExecutor — sync tools run off the event loop; real per-call timeouts.

Security (opt-in guard pipeline)¶

Trust boundary between user instructions and tool data.
Guards: TierGuard, PiiUrlGuard, InjectionGuard, UntrustedDataGuard, fail-closed RecipientTrustGuard, and PiiRedactionGuard (answer egress).
Async-or-sync confirmation; AgentEvents audit hooks; secure_defaults().

Providers & streaming¶

Anthropic adapter (claude-opus-4-8) with tool use and streaming.
Streaming events: AnswerDelta / ToolStarted / ToolFinished / Done.

Persistence & scale¶

Pluggable Store (MemoryStore, atomic non-blocking FileStore) + JSON codec.
Limiter and bounded_gather for fleet backpressure.

Integrations & context¶

MCP client support (MCPServer, agentix[mcp]): discover an MCP server's tools and use them in an agent.
Context management: ContextStrategy, TrimRounds, TruncateToolOutputs.

Delegation, cost & control¶

Subagents: subagent_tool exposes a child agent as a delegable tool.
Cost: pricing module + cost_usd; ModelResponse/AgentOutcome carry cost_usd; AgentPolicy.max_budget_usd aborts a run over budget.
Interrupt stops a run or stream at the next safe boundary.

Changelog¶

Changelog¶

Unreleased¶

0.5.0 - 2026-06-24¶

Added¶

Changed¶

0.4.1 - 2026-06-24¶

Added¶

0.4.0 - 2026-06-24¶

Added¶

0.3.0 - 2026-06-23¶

Added¶

0.2.1 - 2026-06-23¶

Fixed¶

0.2.0 - 2026-06-23¶

Added¶

0.1.0 - 2026-06-22¶

Core¶

Security (opt-in guard pipeline)¶

Providers & streaming¶

Persistence & scale¶

Integrations & context¶

Delegation, cost & control¶

Unreleased ¶