Alpha v0.5.0: Universal Agent vs Specialized Prompts

Every workflow spec has a system_prompt_id field. For most workflows it points to goal_agent.universal. For a few complex workflows it points to a specialized file. The difference between these two paths is a deliberate design decision, not an accident.

The Universal Prompt

The universal prompt template is a single YAML file with placeholder variables filled at runtime from the workflow spec. It works because most workflows share the same structural logic: read the input, execute phases in order, store output.

# universal.yaml (condensed)
system_prompt: |
  You are an AI document workflow agent for Convilyn.
  Your task: {workflow_description}

  ## Workflow Phases
  {workflow_phases}

  ## Instructions
  1. Analyze the uploaded file(s) before asking any questions.
  2. Execute each phase in order using the tools available to you.
  3. Do not skip phases or combine steps from different phases.
  4. After completing all phases, store the result using store_artifact.
  5. Call complete_workflow with a summary when finished.

  ## Tools Available
  {tool_descriptions}

  ## Output Format
  {output_spec_description}

  ## Language
  Respond and produce output in: {locale_language}
  Style guide: {locale_style_guide}

The runtime fills {workflow_phases} from the spec's phases array, {tool_descriptions} from the registered MCP tools, and {locale_language} from the user's locale. The same prompt skeleton drives a resume fit scorer, a business plan outliner, and a document compliance checker — because they all follow the same structural pattern.

When Universal Works

Universal works when the workflow can be fully described by:

A set of ordered phases, each named and described
A straightforward one-way flow: extract → analyze → generate → store
No mandatory user interaction points embedded in the logic
Output quality is acceptable without domain-expert constraints

The vast majority of Goal Lane's nearly 100 specs meet these criteria. Adding a new workflow requires writing a spec, not writing a new prompt.

When Specialized Is Necessary

Some workflows have requirements that can't be expressed as ordered phases in a template. The signal is when output quality drops significantly, or when the agent makes decisions that require domain-expert guardrails.

resume_to_job_ready is the canonical example. Its specialized prompt is 351 lines. Here's why it couldn't use the universal template:

# Requirements that universal can't enforce:

1. MANDATORY multi-choice question for Phase 2
   The agent MUST present exactly 3 career categories,
   each with 2-4 role titles and reasoning for why each
   category was selected. This is not "ask for user input
   at some point" — it's a highly structured decision gate.

2. Category selection drives Phase 3 completely
   The research queries, the tailoring criteria, the
   tone adjustments — all derived from which categories
   the user selected. The template can't capture this
   conditional branching.

3. Exactly 2 questions allowed total
   Category selection + remote work preference.
   No more. No less. The agent must not ask other questions
   regardless of what it encounters in the file.

4. Domain-specific quality standards
   "Quantify achievements with real numbers wherever possible"
   "NEVER fabricate experience or credentials"
   "Each bullet point must start with an action verb"
   — These are not structural instructions; they're
     quality standards that require domain expertise.

These requirements need hardcoded constraints in the prompt, not soft instructions in a template. The universal prompt with well-written phases would produce a resume workflow — just not a high-quality one that users would pay for.

Routing by system_prompt_id

The agent selects its prompt at workflow initialization based on the system_prompt_id in the spec:

class PromptRegistry:
    def load(self, prompt_id: str, context: PromptContext) -> str:
        template_path = f"prompts/definitions/{prompt_id.replace('.', '/')}.yaml"
        template = load_yaml(template_path)
        return template.render(context)

# At job initialization:
prompt_id = spec["agent_config"]["system_prompt_id"]
# → "goal_agent.universal"  for most workflows
# → "goal_agent.resume_to_job_ready"  for that workflow

rendered_prompt = prompt_registry.load(
    prompt_id,
    PromptContext(
        workflow_description=spec.description_for_locale(locale),
        workflow_phases=format_phases(spec.phases),
        tool_descriptions=format_tools(available_tools),
        output_spec_description=format_output_spec(spec.output_specs),
        locale_language=LOCALE_LANGUAGE_MAP[locale],
        locale_style_guide=LOCALE_STYLE_GUIDE.get(locale, ""),
    )
)

The specialized prompt still uses template variables — it just has far more hardcoded structure around them.

Prompt Caching for Cost Efficiency

System prompts are long and identical across all turns of a single workflow execution. We mark the system message with a cache point at the infrastructure level:

SystemMessage(
    content=[
        {"type": "text",       "text": rendered_prompt},
        {"type": "cachePoint", "cachePoint": {"type": "default"}},
    ]
)

After the first turn, the system prompt is cached and subsequent calls pay approximately 10% of the normal input token cost for the prompt portion. For a 120-iteration workflow with a 4,000-token system prompt, this saves roughly 440,000 input tokens — meaningful at scale.

Decision Framework: Universal vs Specialized

Use universal when:
  ✓ Workflow has 2-5 clearly defined sequential phases
  ✓ No mandatory user interaction with complex structure
  ✓ Output quality acceptable with phase descriptions alone
  ✓ Workflow can be described to a knowledgeable non-expert

Use specialized when:
  ✗ Mandatory structured interactions (e.g., multi-choice with reasoning)
  ✗ Conditional logic across phases based on prior answers
  ✗ Domain quality standards that require expert encoding
  ✗ Maximum question count or specific interaction count required
  ✗ Output failures observed repeatedly with universal approach

We start every new workflow with the universal template and monitor output quality. Specialized prompts are written only after observing specific quality failures that the template cannot address — not as a default.

A universal template that serves 90 workflows at 80% quality is more valuable than 90 specialized prompts at 85% quality. The maintenance cost of 90 prompts dwarfs the 5% quality gap — except when quality failures are user-visible and high-stakes.