Critic Agent Specification
Purpose
The Critic Agent evaluates pilot results against the original creative intent, performs gap analysis, and makes budget continuation/cancellation decisions. It’s the quality gatekeeper that ensures only worthy pilots continue.
Inputs
original_request: The original video conceptpilot: The pilot strategy being evaluatedscene_results: List of generated scenes with QA scoresbudget_spent: Amount spent so farbudget_allocated: Total budget for this pilot
Outputs
@dataclass
class PilotResults:
pilot_id: str
tier: str
scenes_generated: List[SceneResult]
total_cost: float
avg_qa_score: float
# Critic's evaluation
critic_score: float # 0-100 overall score
approved: bool # Continue or cancel?
budget_remaining: float # How much to continue with
gap_analysis: Dict # What matched/missed
critic_reasoning: str # Explanation of decision
adjustments_needed: List[str] # Improvements for next phase
Behavior
- Analyze generated scenes against original request
- Identify matched elements and gaps
- Score overall quality (0-100)
- Make continue/cancel decision
- Recommend budget allocation for continuation
- Suggest adjustments for next phase
Scoring Rubric
| Score Range | Decision | Budget Multiplier |
|---|---|---|
| 90-100 | Excellent, continue | 100% remaining |
| 75-89 | Good, continue | 75% remaining |
| 65-74 | Acceptable, continue | 50% remaining |
| < 65 | Poor, CANCEL | 0% (reallocate) |
Gap Analysis
@dataclass
class GapAnalysis:
matched_elements: List[str] # What was captured well
missing_elements: List[str] # What's missing
quality_issues: List[str] # Technical problems
style_notes: str # Style consistency
Prompt Template
You are a production critic evaluating test scenes from a video pilot.
ORIGINAL REQUEST:
{original_request}
PILOT DETAILS:
- ID: {pilot.pilot_id}
- Tier: {pilot.tier.value}
- Budget allocated: ${budget_allocated}
- Budget spent: ${budget_spent}
- Scenes produced: {len(scene_results)}
TEST SCENE RESULTS:
{formatted_scene_results}
AVERAGE QA SCORE: {avg_qa}/100
Perform gap analysis:
1. How well do these scenes match the creative intent?
2. What's missing or incorrectly interpreted?
3. Is the quality acceptable for this tier?
4. Should we continue this pilot with more budget?
SCORING RUBRIC:
- 90-100: Excellent match, continue with 100% remaining budget
- 75-89: Good match, continue with 75% remaining budget
- 65-74: Acceptable, continue with 50% remaining budget
- Below 65: Poor match, CANCEL pilot
Return JSON:
{
"overall_score": 82,
"gap_analysis": {
"matched_elements": ["element1", "element2"],
"missing_elements": ["element3"],
"quality_issues": ["issue1"]
},
"decision": "continue",
"budget_multiplier": 0.75,
"reasoning": "Good execution but needs adjustment in X",
"adjustments_needed": ["adjustment1", "adjustment2"]
}
Integration
- Called by:
StudioOrchestratorafter pilot test phase - Receives from:
QAVerifierAgent(QA scores) - Output used by: Orchestrator for budget decisions
Example Usage
from agents.critic import CriticAgent, SceneResult
from core.claude_client import ClaudeClient
claude = ClaudeClient()
critic = CriticAgent(claude_client=claude)
result = await critic.evaluate_pilot(
original_request="A day in the life of a developer...",
pilot=pilot_strategy,
scene_results=test_scenes,
budget_spent=15.00,
budget_allocated=50.00
)
if result.approved:
print(f"✅ Continue with ${result.budget_remaining}")
print(f"Adjustments: {result.adjustments_needed}")
else:
print(f"❌ Cancelled: {result.critic_reasoning}")
Comparative Evaluation
When multiple pilots complete, the Critic can compare:
def compare_pilots(results: List[PilotResults]) -> PilotResults:
"""Select the best pilot from approved candidates"""
approved = [r for r in results if r.approved]
if not approved:
return None
# Sort by critic score, then cost efficiency
best = max(approved, key=lambda r: (
r.critic_score,
r.avg_qa_score / r.total_cost # Quality per dollar
))
return best
Feedback Loop
The Critic’s adjustments feed back into production:
# Critic identifies issues
adjustments = [
"Improve lighting in office scenes",
"Add more dynamic camera movement",
"Include developer's face in coding shots"
]
# These are passed to VideoGenerator for remaining scenes
for scene in remaining_scenes:
scene.prompt_hints.extend(adjustments)
Edge Cases
- All pilots fail: Return None, orchestrator stops production
- Tie in scores: Prefer lower-cost pilot
- Close scores: May continue multiple pilots
- Budget exhausted: Only evaluate, no continuation recommendation