The job
You have a 40-minute lecture, technical walkthrough, or video essay open on YouTube, and you need the point fast. You want the main claims, useful examples, and exact timestamps in a clean outline, without bouncing between tabs or rewatching sections to find where each idea appeared.
Why this is hard without Sephir
Without Sephir, you end up opening YouTube’s transcript, copying a huge text block, pasting it into a separate chat tool, then fixing the format by hand. Long transcripts are messy, timestamp mapping breaks, and verification becomes a second task. The agent layer keeps the extraction and summarization in the same tab so the source and output stay aligned.
How Sephir does it
- Open the YouTube video and expand the native transcript panel.
- Open Sephir in the sidepanel with
Cmd+Shift+S. - Ask for a timestamped summary of the visible transcript.
- Watch
extractPageText(active tab)capture transcript text from the page. - Review the outline and tighten it into sections like claims, evidence, and actions.
- Save the run as
/yt-summaryand export the result as Markdown or JSON.
The skill behind it
Sephir uses one extraction tool, then a structured synthesis pass so you get a readable outline tied to the source transcript.
What it costs
Sephir runs this on your own ChatGPT Plus via Codex OAuth or your own API key. Typical usage is ~4,000–8,000 input tokens and ~500–1,000 output tokens on Claude Opus 4, GPT-5.5, or Gemini 3 Pro. Short videos can fit the Free tier’s single-turn flow. See for Free vs Pro Lifetime details.
Related
- Built for focused async work:
- For sidebar comparison on this workflow: