Managed RL training for knowledge agents

Train search agents on your corpus with reinforcement learning

Send us your corpus. We run the full KARL training pipeline—agentic data synthesis, OAPL reinforcement learning, parallel inference—and give you back a model that outperforms frontier LLMs on your data. Built on our open-source framework.

konash — training run
$ konash train --corpus ./sec-filings
[1/4] Synthesizing QA pairs...
  ✓ 2,000 multi-constraint questions
  ✓ Deduplicated → 1,847 unique
[2/4] Generating rollouts...
  ✓ 7,388 rollouts (4 per prompt)
  ✓ Pass-rate filtered → 1,203 examples
[3/4] OAPL training (H100)...
  ✓ Loss: 0.847 → 0.312
  ✓ Checkpoint saved
[4/4] Evaluation:
  Base: 48% → Trained: 71% (+23 pts)
Ready. Run `konash ask` to query.
100x
Cheaper than training from scratch
+23pts
Accuracy gain on FinanceBench
Hours
From corpus to trained model
2 min
Setup to first training run

The managed service for knowledge agent training

We pair KARL's reinforcement learning pipeline with your document corpus. Within days, you get a trained model with side-by-side evals proving it outperforms standard RAG and frontier LLMs on your data.

STEP 01

Send us your corpus

SEC filings, internal docs, research papers, legal contracts—any text. We ingest, chunk, embed, and index it.

STEP 02

We run the full pipeline

Agentic QA synthesis generates training data from your docs. OAPL reinforcement learning trains the model on successful search strategies. Automatic GPU provisioning.

STEP 03

You get a trained agent

A LoRA adapter that turns any base model into a domain expert on your corpus. Benchmark report included. Deploy anywhere with vLLM.

Why not just RAG?

Standard RAG retrieves once and hopes for the best.
KONASH trains your model to search iteratively, reason across documents, and know when to search again.

The model doesn't memorize your documents. It learns how to search them—what queries to issue, when to refine, how to synthesize evidence from multiple sources. This generalizes to new questions and even new corpora.

The pipeline

  • Agentic QA Synthesis
    Explores corpus, generates grounded questions
  • OAPL Training
    Off-policy RL on successful search trajectories
  • Token Masking
    Trains search strategy, not text reproduction
  • Parallel Thinking
    N rollouts + aggregation at inference
  • Value-Guided Search
    Learned value model guides tree search

Open source framework. Managed infrastructure.

The full KONASH framework is Apache 2.0. The managed service handles the GPUs, orchestration, and optimization so you can focus on your data.

Open Source

FREE

Run the full pipeline yourself. pip install konash

  • Full training pipeline (synthesis → rollouts → OAPL)
  • CLI + Python API
  • Any Together AI model
  • Pre-built benchmark indexes
  • Apache 2.0 license
View on GitHub
Managed

Managed Training

Everything in open source, plus we handle the infrastructure.

  • Automatic H100 provisioning across 20+ cloud providers
  • Managed training runs with live monitoring
  • Corpus analysis and training optimization
  • Side-by-side benchmark reports
  • Dedicated support, deployed in days not weeks
Start a Training Run

Results

GLM 4.5 Air on FinanceBench (150 SEC filing questions). The KARL paper reports 76% after RL training—KONASH implements the same pipeline.

ModeAccuracy
Base model (single rollout)48%
+ Parallel Thinking (N=3)51%
+ OAPL Training (2 iterations)76%

Source: KARL paper (Databricks, 2026). Nugget-based evaluation, Appendix D.1.

Supported Models

  • GLM 4.5 AirDefault. The KARL base model.
  • Qwen3 80B-A3BMoE. Good value.
  • Llama 3.3 70B TurboDense. General-purpose.
  • DeepSeek R1Reasoning-focused.
  • Any Together AI modelEnter any model ID.

Supported Corpora

  • BrowseComp-Plus67K articles. Pre-built index.
  • FinanceBench150 SEC filing questions.
  • QAMPARI250K+ encyclopedic chunks.
  • FreshStackTechnical documentation.
  • Your documents.txt .md .pdf .json .html .py +

Updates

What we're building and shipping.

Mar 2026

KONASH v0.4.0

RELEASE

Full training pipeline: agentic QA synthesis, rollout generation, pass-rate filtering, OAPL training, parallel thinking inference. CLI with interactive setup wizard.

Mar 2026

FinanceBench baseline

BENCHMARK

GLM 4.5 Air achieves 48% single-rollout, 51% with parallel thinking (N=3). Establishing baseline for OAPL improvement.

Mar 2026

Cloud GPU provisioning

INFRA

Automatic H100 provisioning via Shadeform across 20+ providers. OAPL gradient step costs ~$0.50.

Feb 2026

KARL paper published

RESEARCH

Databricks publishes KARL: Knowledge Agents via Reinforcement Learning. RL-trained agents match or exceed frontier models on grounded reasoning. KONASH development begins.

Ready to train on your corpus?

Use the open-source framework yourself, or let us handle the infrastructure and deliver a trained model in days.