Building Earmark: Real-time voice AI, privacy by design, and founder lessons
Key Moments
Voice-driven chief of staff that auto-creates artifacts from meetings.
Key Insights
EarMark turns real-time conversations into actionable artifacts such as docs, tickets, updates, and next steps, shortening cycle times.
The product evolved from an immersive Vision Pro real-time speech coaching idea to a web-based artifact generation tool based on user research.
Privacy by design is central, including a Temporary Mode that stores no transcripts or data.
Assembly enables scalable transcription with unlimited concurrency streams, avoiding unpredictable bottlenecks and expensive contracts.
The roadmap is to become a proactive co-presence or second brain for product teams, surfacing blockers and driving proactive tasking.
Founders should balance customer needs with flexible strategies and avoid dogmatic playbooks in AI product development.
ORIGINS AND THE PIVOT TO ARTIFACT AUTOMATION
EarMark began as a vision for real-time speech coaching in a Vision Pro AR VR setting, aimed at helping product and engineering leaders influence more effectively. The team envisioned an immersive rehearsal space where feedback would address breathing, enunciation, and pacing. Through user research, they uncovered a surprising truth: most people do not actually prepare for presentations. This insight triggered a pivot from a coaching tool into a web-based, real-time feedback experience that informs participants in the moment and ultimately shifts the focus toward automating the work that follows conversations.
REAL-TIME ARTIFACT CREATION DURING MEETINGS
As EarMark matured, the core promise became clear: conversations should generate concrete outputs without manual follow-ups. The product listens to meetings and transforms what is said into artifacts like documents, tickets, updates, and next steps. This reduces cycle time and helps teams move forward without the friction of post‑meeting synthesis. Early iterations explored mapping speech to structured artifacts, and over several cycles, the tool evolved into a practical platform for producing actionable outputs in real time, aligning stakeholders around next steps while meetings are still in progress.
FROM VISION PRO TO WEB: LESSONS LEARNED
The team moved from an immersive Vision Pro concept to a web based solution to broaden access and accelerate iteration. The pivot was reinforced by ongoing customer conversations that demonstrated real demand for real time artifact creation during discussions. The shift reframed the product as a workflow enabler for knowledge workers, capturing, organizing, and distributing outputs as the conversation unfolds. Over roughly five iterations, EarMark refined its approach to deliverables that can be acted on immediately, shortening feedback loops for product, design, and engineering teams.
CHOOSING ASSEMBLY: PERFORMANCE AND SCALABILITY
Two major challenges drove the move to Assembly: heavy plumbing work required by the previous transcription provider, including microphone management and WebSocket lifecycles, and unpredictable concurrency limits that threatened a launch. In a critical four day window before a product launch, they tested Assembly and found faster, more accurate transcription, enabling reliable scaling. The switch reduced the risk of a launch failure and provided confidence in handling real time, multi user workflows. The decision highlights how a robust provider can unlock a product's potential when concurrency and reliability are non negotiable.
CONCURRENCY AND STREAMS: SCALING REAL-TIME WORKFLOWS
In EarMark, each user session creates a separate audio stream; a single meeting with multiple participants can generate multiple streams. The team needed a model that could scale across workplaces, time zones, and organizations. Assembly offered unlimited concurrency streams with a backoff policy that grows with demand, enabling EarMark to handle bursts of usage without hitting hard caps or expensive enterprise terms. This capability translated into a practical business advantage, making a real time voice driven artifact platform viable for both small teams and large enterprises.
PRODUCT VISION: CHIEF OF STAFF FOR PRODUCT TEAMS
The overarching vision positions EarMark as a true chief of staff for product teams, predictive about what individuals and teams need and capable of surfacing blockers in real time. The product goes beyond simple task delegation to proactive tasking and awareness of dependencies across offshore and onshore teams. Recognizing that audiences vary, the system aims to deliver different artifacts with appropriate fidelity. The result is a concept of a second brain that creates capacity for strategic work by surfacing critical information and enabling teams to act efficiently within their existing workflows.
SECOND BRAIN AND CONTEXT-CAPTURING
A central idea is the second brain: a queryable pool of project context that makes everything searchable and actionable. EarMark envisions a contextual layer that sits alongside systems of record, enabling proactive tasking and self organizing work. Features like pushing artifacts into external tools and prototyping flows demonstrate how captured context can drive real outputs rather than simply being archived. By organizing by project, the second brain becomes a practical hub for cross functional collaboration, reducing time spent hunting for information and enhancing decision making.
LIVE DEMO AND UX: MAKING WORKFLOW VINEGAR-SIMPLE
During the product walkthrough, the team showcased a simple, fast capture flow on the main page and a pre recorded retrospective. They highlighted how artifacts such as engineering specs can be generated directly from transcripts, and how actions can be pushed into external tools like Cursor or Linear. Templates for updates and PRDs, as well as a vibe docking feature for live content tweaks, illustrate how voice driven output becomes immediately actionable. The demo emphasized speed, ease of use, and the ability to translate conversational content into tangible work items.
VOICE-FIRST UX DESIGN: FORGIVING AND INTEGRATED
A key design principle is forgiving UX for voice driven tools since voice is often a supplementary layer to a larger workflow. EarMark emphasizes minimal friction, ideally a single click to start a capture, and self healing when minor glitches occur. The team argues that the product should feel almost invisible, enhancing the user’s natural work flow rather than forcing complex configurations during meetings. This focus on simplicity and resilience is essential for adoption in real world work environments where people multitask across meetings, calls, and updates.
PRIVACY, SECURITY, AND ADVICE FOR FOUNDERS
Privacy by design is a core value, given how sensitive voice data can be. EarMark advocates deliberate decisions about what to store, how long to store it, and how to encrypt data. A standout feature is Temporary Mode, available on all plans, which bypasses any retention entirely so transcripts are not saved. Beyond technical choices, the founders stress practical lessons: avoid dogma, lean on lived experience, and tailor strategies to customer needs rather than blindly applying standard enterprise playbooks. This pragmatic stance helps assess risk and pursue sustainable growth in AI products.
FUTURE OF WORK: PROACTIVE AGENTS AND SYSTEMS OF ACTION
The team imagines a future where the chief of staff not only surfaces information but also acts autonomously. Proactive tasking by agents could monitor blockers, renegotiations, and cross time zone dependencies, delivering the top priorities at the start of each workday. This evolution—from systems of record to systems of action—redefines how work is orchestrated. The dream includes unlimited task agents operating in the background, transforming captured context into ongoing action and enabling a more proactive, efficient knowledge workflow.
ADVICE FOR FOUNDERS AND CLOSING THOUGHTS
The interview offers grounded guidance for founders: design privacy from the outset, keep the UX forgiving and simple, and avoid overemphasizing dogma. Balance actionable customer insights with flexible strategies, and resist applying a rigid enterprise playbook to AI products when customer buying patterns differ. Real value comes from iteration, hands on experimentation, and staying close to user needs. The takeaway is to chart a personalized path, learn from lived experience, and stay open to adapting the business model as the product and market evolve.
Mentioned in This Episode
●Tools & Products
●People Referenced
Common Questions
Earmark listens to meetings in real time and turns what's said into finished work, such as documents, tickets, updates, and next steps. Unlike generic AI meeting tools, it produces tangible artifacts—real work that teams can act on. This happens within the meeting without requiring manual follow-ups.
Topics
Mentioned in this video
Host from Assembly introducing Earach and earmark
Co-founder from Earach, discusses product
Go-to-market leader at Earach
Real-time meeting transcription and artifact generator
AR/VR headset used in Earark's early Vision Pro concept
Slide deck used during meetings
Push destination in earmark workflow
Voice-to-text tool used by the host
More from AssemblyAI
View all 14 summaries
1 minUniversal-3 Pro Streaming: Subway test
2 minUniversal-3 Pro: Office Icebreakers
20 minBuilding Quso.ai: Autonomous social media, the death of traditional SaaS, and founder lessons
61 minPrompt Engineering Workshop: Universal-3 Pro
Found this useful? Build your knowledge library
Get AI-powered summaries of any YouTube video, podcast, or article in seconds. Save them to your personal pods and access them anytime.
Try Summify free