Developer Journal

A chronological record of development decisions, discoveries, and lessons learned while building Claude Studio Producer.

View full developer notes →

= Aaron    = Claude

Recent Updates

Feb 15, 2026 - YouTube Publishing

I spent the last week playing with OpenClaw and even had my new agent, Lilit, start working with the codebase. Some of the most recent updates like getting the OpenTTS provider working and the YouTube publishing working was courtesy of Lilit.

I think the most interesting aspect of this was able to teach Lilit to use this studio CLI, and now I can just ask for a podcast about x topic and I’ll get one.

Yesterday I wanted to learn about the latest advances in memory, and she provided two papers from 2026. I selected said make a podcast for each. One was very long, 17 pages, and something was causing the subagent to fail extracting the pdf, so she made a github issue about it. I probably need to figure out how to have that agent get a proper identity like when Claude Code makes an update and you see that we committed together, rather that it coming up as an issue from me.

Anyways, the second paper about “FadeMem”, had no such issue. Here’s the video: https://youtu.be/eToEeH0yz4o

One key takeaway is that this was produced by me just having a conversation with the agent, Lilit, and suggesting a more entertaining script, less serious, more John Oliver, more Jon Stewart. The other thing was that I ran out of elevenlabs credits, so Lilit asked me if I wanted to try OpenTTS. I said sure. And it thrashed a little bit and chewed through some opus credits, but it figured it out and published the video using the Onyx voice… while I had dinner. Pretty neat actually.

Prior to that interaction though, I produced this video using the CLI myself, using a pdf about LLM hallucinations and this one shows off some other advances in the studio’s tooling.

This video is a culmination of several advances:

  1. karaoke style text renderings
  2. more selective image inputs (oh yeah, we have a new wikimedia provider!)
  3. knowledge base has better alignment with the content and we’re also timeline aware so that the spoken word of the script is better about trigger relevant visuals
  4. there’s probably more, but I can’t recall right now because it has been a pretty intense week!

I’ll see if Lilit cares to chime in on that. No doubt she will.

Oh! Classic burying the lede: you can just have OpenClaw (or your agent of choice) drive this studio and generate videos from w/e source content you want. The CLI is so feature rich that they can tinker with it and make a wide variety of content about your source material, so my focus on science papers was purely self-limiting on my laser focus. GLHF!

Feb 9, 2026 - Content-Aware Document Classification

The KB ingestion pipeline was treating all documents identically, which caused metadata pollution — author affiliations and university names were leaking into key themes. Now a ContentClassifier runs before the LLM to identify document type and structural zones.

What changed:

The result: cleaner knowledge graphs with themes that reflect actual content, not bibliographic metadata.

Feb 7, 2026 - Content Model Expansion

Extended StructuredScript with content-agnostic vocabulary and source attribution for broader use cases beyond scientific podcasts.

Content-Agnostic Intent Vocabulary: Replaced paper-specific intents (METHODOLOGY, KEY_FINDING, etc.) with 19 universal intents that work across content types:

Source Attribution: New SourceType (PAPER, NEWS, DATASET, GOVERNMENT, TRANSCRIPT, NOTE, ARTIFACT, URL) and SourceAttribution models track content provenance with confidence scores.

Variant/Perspective Support: perspective field on segments and scripts enables bias analysis workflows (left/right news variants sharing same source attributions).

Backward Compatibility: Intent mapping preserves existing scripts:

This enables news comparison, multi-source synthesis, and policy analysis workflows while maintaining compatibility with the existing podcast training pipeline.

Feb 7, 2026 - DoP and Unified Production (Phase 4)

Implemented Phase 4 of the Unified Production Architecture: the Director of Photography (DoP) module and ContentLibrarian integration.

DoP Module (core/dop.py):

Integration (cli/produce_video.py):

Example Output:

DoP visual assignment:
  figure_sync: 3 segments (KB figures)
  dall_e: 5 segments (new generation needed)
  carry_forward: 7 segments (reuse previous)
  text_only: 2 segments (transitions)

Estimated cost: $0.40 (5 DALL-E images)

The pipeline is now unified - both produce and produce-video commands share the same StructuredScript and ContentLibrary data layer. This enables incremental regeneration and asset reuse across runs.

Test Coverage: 116 tests passing (81 unit + 35 integration) covering all phases of the unified architecture, provider integrations, and end-to-end workflows.

Feb 7, 2026 - Training Outputs StructuredScript (Phase 3)

Training pipeline now outputs structured scripts alongside flat text files.

What Changed:

This bridges training and production: video production can now load structured scripts directly instead of re-parsing flat text. Figure references in scripts are pre-resolved with full metadata.

Feb 7, 2026 - Unified Production Architecture (Phase 1)

Implemented Phase 1 of the UNIFIED_PRODUCTION_ARCHITECTURE.md spec, establishing new data models as the foundation for the unified pipeline.

StructuredScript Model: Single source of truth replacing flat _script.txt files. The from_script_text() parser extracts Figure N references and section boundaries from existing scripts, enabling structured access to script content.

ContentLibrary Model: Persistent asset registry with approval tracking. Includes from_asset_manifest_v1() for migrating existing asset manifests to the new format. Tracks image/audio assets with generation status and approval state.

All 55 unit tests passing.

Feb 7, 2026 - Proportional Budgets & Audio Source Fix

Fixed architectural issues in the video production pipeline:

1. Proportional Budget Tiers

Previously, budget tiers used absolute image counts (e.g., “medium = 40 images”). This caused inconsistent quality when testing with scene subsets.

Now tiers use ratios:

This ensures consistent quality across runs. Testing 5 scenes with medium tier now produces ~1 image (not 5), matching what would happen proportionally in a full production.

2. Audio Uses Generated Script

Audio was incorrectly generated from the original Whisper transcription (“Welcome to Journal Club…”) instead of the new script (“Welcome back to another deep dive…”).

Fixed: Audio now comes from _script.txt paragraphs, not aligned_segments from the original transcription.

3. Audio Respects –limit Parameter

Audio was generating all 45 paragraphs even with --limit 5. Now slices paragraphs proportionally to match scene range.

4. Clear Visual Source Display

Scene list now distinguishes between:

# Output now shows which scene gets the image:
#  UAV positioning       intro  DALL-E     Ken Burns
#  multi-sensor info     intro  shared     Ken Burns
#  Kalman filter         intro  shared     Ken Burns

Feb 6, 2026 (late evening) - Scene-by-Scene Audio Generation

Added audio generation directly to produce-video, fixing a key architectural issue.

The Problem: Training was generating a full script, then trying to send it all to ElevenLabs at once. This hit character limits and was wasteful - training doesn’t need audio, only production does.

The Solution:

# Produce video with scene-by-scene audio (default: enabled)
claude-studio produce-video -t trial_000 --budget medium --live --voice lily

# Or specify a different voice
claude-studio produce-video -t trial_000 --budget medium --live --voice rachel

Output structure:

artifacts/video_production/20260206_204449/
├── images/
│   ├── scene_000.png
│   └── scene_001.png
├── audio/
│   ├── scene_000.mp3
│   └── scene_001.mp3
├── visual_plans.json
└── asset_manifest.json  # Links images + audio per scene

Feb 6, 2026 (evening) - Figure-Aware Script Generation

Fixed a key architectural issue: training now knows about figures before generating scripts.

The Problem: Scripts were generated without knowing what figures existed, then we tried to match figures afterward via keyword guessing.

The Solution:

Also documented the kb inspect command - shows beautiful quality reports:

claude-studio kb inspect my-project --quality

# Output shows atom distribution with bar charts:
# equation    █████░░░░░   44 (26%)
# paragraph   ████░░░░░░   38 (23%)
# figure      ███░░░░░░░   26 (16%)

Feb 6, 2026 - Training Pipeline & Video Production Integration

Big milestone: the podcast training pipeline and video production workflow are fully integrated!

Training Pipeline (claude-studio training run):

Video Production (claude-studio produce-video):

claude-studio produce-video -t trial_000 --show-tiers
claude-studio produce-video -t trial_000 --budget medium --kb my-project --live

Jan 30, 2026 - Security Hardening

Read some alarming posts about Clawdbot, so did a quick security check. Added __repr__ to Config classes to prevent API key leaks in debug outputs.

Added keychain import feature:

claude-studio secrets import .env

This imports all API keys from .env into your OS keychain, allowing secure storage without environment variables.

Jan 28, 2026 - DALL-E Provider

Stayed focused on core mission instead of getting distracted by Remotion graphics (saving that for later).

Successfully onboarded DALL-E using the provider onboarding agent:

claude-studio provider onboard -n dalle -t image --docs-url https://platform.openai.com/docs/guides/images

The system now supports:

This enables the DALL-E → Runway pipeline for image-to-video generation.

Jan 26, 2026 - Multi-Provider Pipelines

Completed the pipeline capability to chain providers:

  1. DALL-E generates seed image from text
  2. Runway transforms image to video

This is a key architectural milestone - providers can now feed into each other.

Jan 23, 2026 - Knowledge Base System

Major feature: Document-to-Video pipeline

Example workflow:

claude-studio kb create "AI Research" -d "Latest papers on multi-agent systems"
claude-studio kb add "AI Research" --paper paper.pdf
claude-studio kb produce "AI Research" -p "Explain transformer architecture" --style educational

Jan 20, 2026 - Multi-Tenant Memory

Upgraded memory system to support multi-tenant hierarchy:

Jan 8, 2026 - Memory & Dashboard

Jan 7-8, 2026 - Luma Provider Implementation

First real video provider integration:

Jan 7, 2026 - Foundation Sprint

Late night/early morning sprint creating all foundation specs:

Jan 6, 2026 - Agent Architecture

Initial agent system design:


Jan 9, 2026 - What Is This Even For?

The Vision

This project demonstrates:

  1. What you can do quickly with Claude
  2. How to design and implement a working multi-agent workflow
  3. Using learning/memory systems
  4. Using rewards and feedback
  5. Having fun!

The Workflow

A virtual studio where:

Studio Reinforcement Learning (StudioRL)

The feedback loop stores learnings in memory for the producer and script writer to leverage. The budget system keeps costs under control, allowing re-runs on promising pilots within budget constraints.

Are We Having Fun Yet?

make-it-rain-coffee

Prompt: A 15-second story of a developer having a breakthrough: Scene 1 - Wide shot of developer at desk in cozy home office at night, hunched over laptop, frustrated expression, warm desk lamp lighting. Scene 2 - They lean back with a satisfied smile, stretch arms up in victory celebration, coffee cup visible nearby, cinematic triumph moment.

Result: Make it rain coffee…!

Sometimes the AI interprets your vision in unexpected ways. This is part of why we have the QA and Critic agents - to catch these creative interpretations and decide whether they’re happy accidents or need revision.


← Back to Home View Specifications → Full Developer Notes →