How did earmark originate and evolve from its initial concept?

The team started earmark as a Vision Pro AR/VR rehearsal experience focused on real-time speech coaching. Through user research they learned most people don’t prepare for presentations, so they pivoted to a real-time feedback web experience that informs in the moment and later evolved into automated creation of artifacts from conversations.

Why did you switch transcription providers just before launch?

They were hitting two problems with their previous provider: plumbing complexity and unpredictable concurrency limits that caused reliability concerns. After quick tests, Assembly AI offered faster, more accurate transcription, leading to a four-day migration before launch which significantly improved performance.

How many streams does earmark use per meeting and how does concurrency work?

Each participant on a call gets one audio stream; if multiple people are on the same meeting, that translates to multiple streams (e.g., four people would be four streams). Earmark benefits from Assembly's unlimited concurrency with a backoff policy to scale as needed without heavy enterprise contracts.

What is the 'unlimited task agents' feature?

Unbounded task agents run in real time in the background as conversations progress, creating artifacts and taking actions. Many customers find it hard to imagine not having these agents operating in the background to keep work moving forward.

What does 'temporary mode' privacy option do?

Temporary mode disables storage of transcripts and data—there’s no retention at all and no data saved to the database. This reflects a privacy-forward design where users can opt out of data retention entirely.

What is 'vibe docking' in the Earark UI?

Vibe docking is a UI concept that lets you tweak sections of a document (like an executive summary) and have the system regenerate content live based on edits. It emphasizes fast, on-the-spot customization to produce usable artifacts.

What is the 'second brain' concept for product teams?

The 'second brain' is a contextual, queryable pool of project context that can act as a system of action. It helps product teams access context on demand and enables proactive tasking and decisions, potentially reducing reliance on traditional systems of record.

Key Moments

Building Earmark: Real-time voice AI, privacy by design, and founder lessons

AssemblyAI

Science & Technology6 min read26 min video

Feb 9, 2026|241 views|4

Save to Pod

Want to know something specific about what's covered?

We've already dissected every moment. Ask and we will deliver (with timestamps).

Key Moments

On this page

TL;DR

Voice-driven chief of staff that auto-creates artifacts from meetings.

Key Insights

EarMark turns real-time conversations into actionable artifacts such as docs, tickets, updates, and next steps, shortening cycle times.

The product evolved from an immersive Vision Pro real-time speech coaching idea to a web-based artifact generation tool based on user research.

Privacy by design is central, including a Temporary Mode that stores no transcripts or data.

Assembly enables scalable transcription with unlimited concurrency streams, avoiding unpredictable bottlenecks and expensive contracts.

The roadmap is to become a proactive co-presence or second brain for product teams, surfacing blockers and driving proactive tasking.

Founders should balance customer needs with flexible strategies and avoid dogmatic playbooks in AI product development.

ORIGINS AND THE PIVOT TO ARTIFACT AUTOMATION

EarMark began as a vision for real-time speech coaching in a Vision Pro AR VR setting, aimed at helping product and engineering leaders influence more effectively. The team envisioned an immersive rehearsal space where feedback would address breathing, enunciation, and pacing. Through user research, they uncovered a surprising truth: most people do not actually prepare for presentations. This insight triggered a pivot from a coaching tool into a web-based, real-time feedback experience that informs participants in the moment and ultimately shifts the focus toward automating the work that follows conversations.

REAL-TIME ARTIFACT CREATION DURING MEETINGS

As EarMark matured, the core promise became clear: conversations should generate concrete outputs without manual follow-ups. The product listens to meetings and transforms what is said into artifacts like documents, tickets, updates, and next steps. This reduces cycle time and helps teams move forward without the friction of post‑meeting synthesis. Early iterations explored mapping speech to structured artifacts, and over several cycles, the tool evolved into a practical platform for producing actionable outputs in real time, aligning stakeholders around next steps while meetings are still in progress.

FROM VISION PRO TO WEB: LESSONS LEARNED

The team moved from an immersive Vision Pro concept to a web based solution to broaden access and accelerate iteration. The pivot was reinforced by ongoing customer conversations that demonstrated real demand for real time artifact creation during discussions. The shift reframed the product as a workflow enabler for knowledge workers, capturing, organizing, and distributing outputs as the conversation unfolds. Over roughly five iterations, EarMark refined its approach to deliverables that can be acted on immediately, shortening feedback loops for product, design, and engineering teams.

CHOOSING ASSEMBLY: PERFORMANCE AND SCALABILITY

Two major challenges drove the move to Assembly: heavy plumbing work required by the previous transcription provider, including microphone management and WebSocket lifecycles, and unpredictable concurrency limits that threatened a launch. In a critical four day window before a product launch, they tested Assembly and found faster, more accurate transcription, enabling reliable scaling. The switch reduced the risk of a launch failure and provided confidence in handling real time, multi user workflows. The decision highlights how a robust provider can unlock a product's potential when concurrency and reliability are non negotiable.

CONCURRENCY AND STREAMS: SCALING REAL-TIME WORKFLOWS

In EarMark, each user session creates a separate audio stream; a single meeting with multiple participants can generate multiple streams. The team needed a model that could scale across workplaces, time zones, and organizations. Assembly offered unlimited concurrency streams with a backoff policy that grows with demand, enabling EarMark to handle bursts of usage without hitting hard caps or expensive enterprise terms. This capability translated into a practical business advantage, making a real time voice driven artifact platform viable for both small teams and large enterprises.

PRODUCT VISION: CHIEF OF STAFF FOR PRODUCT TEAMS

The overarching vision positions EarMark as a true chief of staff for product teams, predictive about what individuals and teams need and capable of surfacing blockers in real time. The product goes beyond simple task delegation to proactive tasking and awareness of dependencies across offshore and onshore teams. Recognizing that audiences vary, the system aims to deliver different artifacts with appropriate fidelity. The result is a concept of a second brain that creates capacity for strategic work by surfacing critical information and enabling teams to act efficiently within their existing workflows.

SECOND BRAIN AND CONTEXT-CAPTURING

A central idea is the second brain: a queryable pool of project context that makes everything searchable and actionable. EarMark envisions a contextual layer that sits alongside systems of record, enabling proactive tasking and self organizing work. Features like pushing artifacts into external tools and prototyping flows demonstrate how captured context can drive real outputs rather than simply being archived. By organizing by project, the second brain becomes a practical hub for cross functional collaboration, reducing time spent hunting for information and enhancing decision making.

LIVE DEMO AND UX: MAKING WORKFLOW VINEGAR-SIMPLE

During the product walkthrough, the team showcased a simple, fast capture flow on the main page and a pre recorded retrospective. They highlighted how artifacts such as engineering specs can be generated directly from transcripts, and how actions can be pushed into external tools like Cursor or Linear. Templates for updates and PRDs, as well as a vibe docking feature for live content tweaks, illustrate how voice driven output becomes immediately actionable. The demo emphasized speed, ease of use, and the ability to translate conversational content into tangible work items.

VOICE-FIRST UX DESIGN: FORGIVING AND INTEGRATED

A key design principle is forgiving UX for voice driven tools since voice is often a supplementary layer to a larger workflow. EarMark emphasizes minimal friction, ideally a single click to start a capture, and self healing when minor glitches occur. The team argues that the product should feel almost invisible, enhancing the user’s natural work flow rather than forcing complex configurations during meetings. This focus on simplicity and resilience is essential for adoption in real world work environments where people multitask across meetings, calls, and updates.

PRIVACY, SECURITY, AND ADVICE FOR FOUNDERS

Privacy by design is a core value, given how sensitive voice data can be. EarMark advocates deliberate decisions about what to store, how long to store it, and how to encrypt data. A standout feature is Temporary Mode, available on all plans, which bypasses any retention entirely so transcripts are not saved. Beyond technical choices, the founders stress practical lessons: avoid dogma, lean on lived experience, and tailor strategies to customer needs rather than blindly applying standard enterprise playbooks. This pragmatic stance helps assess risk and pursue sustainable growth in AI products.

FUTURE OF WORK: PROACTIVE AGENTS AND SYSTEMS OF ACTION

The team imagines a future where the chief of staff not only surfaces information but also acts autonomously. Proactive tasking by agents could monitor blockers, renegotiations, and cross time zone dependencies, delivering the top priorities at the start of each workday. This evolution—from systems of record to systems of action—redefines how work is orchestrated. The dream includes unlimited task agents operating in the background, transforming captured context into ongoing action and enabling a more proactive, efficient knowledge workflow.

ADVICE FOR FOUNDERS AND CLOSING THOUGHTS

The interview offers grounded guidance for founders: design privacy from the outset, keep the UX forgiving and simple, and avoid overemphasizing dogma. Balance actionable customer insights with flexible strategies, and resist applying a rigid enterprise playbook to AI products when customer buying patterns differ. Real value comes from iteration, hands on experimentation, and staying close to user needs. The takeaway is to chart a personalized path, learn from lived experience, and stay open to adapting the business model as the product and market evolve.

Mentioned in This Episode

●Products

●Software & Apps

●Companies

●Concepts

●People Referenced

Common Questions

Earmark listens to meetings in real time and turns what's said into finished work, such as documents, tickets, updates, and next steps. Unlike generic AI meeting tools, it produces tangible artifacts—real work that teams can act on. This happens within the meeting without requiring manual follow-ups.

Topics

Earmark Earach Assembly Vision Pro Real-time Transcription Artifact Generation Second Brain Chief Of Staff Unlimited Task Agents Ux Forgiveness Vzero Linear Super Whisper Product Demo Ai In Work Voice Interface

Mentioned in this video

People

Mark

Host from Assembly introducing Earach and earmark

Sandon

Co-founder from Earach, discusses product

Dylan

Go-to-market leader at Earach

Companies

Earmark

Real-time meeting transcription and artifact generator

Products

Vision Pro

AR/VR headset used in Earark's early Vision Pro concept

Software & Apps

Google Slides

Slide deck used during meetings

VZero

Push destination in earmark workflow

Super Whisper

Voice-to-text tool used by the host

Ask anything from this episode.

Save it, chat with it, and connect it to Claude or ChatGPT. Get cited answers from the actual content — and build your own knowledge base of every podcast and video you care about.

Get Started Free