The Challenge

We needed something that could handle the full workflow — from raw audio file to structured, readable PDF — without manual intervention at any step.

  • Accept a wide range of audio and video formats without pre-processing
  • Automatically detect the language of the recording
  • Produce structured, hierarchical notes — not just a wall of text
  • Output a professional, readable PDF ready for use or distribution
  • Handle large files (up to 500MB) without timeout errors or loss of context

The Solution

Sleek Summaries is built on a Python 3.11 backend, using the Claude API as its core intelligence layer. The upload interface accepts MP3, MP4, M4A, WAV, WEBM, and OGG files — covering virtually every format a user might bring in from a lecture recording app, Zoom export, or voice memo.

Once uploaded, the audio is transcribed and passed through a structured prompt pipeline that instructs Claude to identify key concepts, organise information by topic, flag actionable insights, and format everything into a hierarchy of headings, definitions, and summaries.

"The model doesn't just transcribe — it reasons about what matters. It surfaces the specific points a student needs to review, not everything that was said."

Sibusiso Mabaso, Founder & CEO
How It Works
Audio File
mp3 / mp4 / wav / m4a
Under 24 MB
Send directly to Whisper
or
Over 24 MB
pydub → split into chunks
Whisper API
Transcription
Claude API
Summarise & structure
WeasyPrint
HTML → PDF render
PDF Output
_notes.pdf

Language detection runs automatically. Users can override to a specific language if needed, but in testing, auto-detect handled 14 languages correctly without any manual input.

What Gets Generated

The output PDF is structured like a proper study guide — not a raw document dump. Each section includes:

  • A high-level topic summary (2–3 sentences)
  • Key definitions and concepts, clearly labelled
  • Supporting detail and examples from the recording
  • A concise review section at the end of each major topic
  • An optional lecture title (auto-detected if left blank)

The Big 4-enriched formatting ensures the output reads like something prepared by a professional — structured enough to study from, detailed enough to replace the original recording entirely for revision purposes.

Results

94%
Reduction in manual note-taking time
500MB
Max file size supported
14+
Languages auto-detected
6
Supported audio & video formats

What This Demonstrates

Sleek Summaries is a small tool with a clear use case — but it demonstrates exactly how AI automation can remove tedious, high-effort work from workflows that people assume are irreducible. The bottleneck wasn't intelligence. It was structure, formatting, and delivery.

The same pattern applies across industries. Legal teams spending hours summarising depositions. Consultants transcribing client discovery sessions. HR departments processing interview recordings. Anywhere there's unstructured audio and a need for structured output, this model works.

Want to automate a workflow in your business using the same approach?

Schedule a Call →