Why did the team rename from video.ai to Q.AI (Quick Social)?

The rename was intended to reflect the core mission: delivering quick, social-ready content. The change signifies a shift toward an end-to-end social workflow rather than just video editing.

What role did transcription quality play in the product's success?

Accurate transcription is critical because captions become the final visible output and influence downstream processing. They tested multiple providers and chose Assembly AI due to superior accuracy.

What is NAN and how does it help in video production?

NAN is an AI orchestration tool in the product that automatically reframes scenes and adapts layouts when new speakers appear, enabling dynamic, context-aware video edits.

What is the product roadmap around autonomous social content?

The plan is to move from repurposing content to a done-for-you autonomous loop that schedules, creates, and publishes social content, reducing manual intervention.

Why do they believe SaaS is changing rather than dying?

The traditional per-seat SaaS model is being disrupted by broader AI-enabled tools that scale by value, enabling more people to build software and services. The market is expanding as AI lowers barriers to creation.

What does Viddy do in the dashboard?

Viddy acts like a 'ChatGPT for your video,' generating timestamps, show notes, summaries, quotes, and titles, and helping create content across formats.

Key Moments

Building Quso.ai: Autonomous social media, the death of traditional SaaS, and founder lessons

AssemblyAI

Science & Technology5 min read20 min video

Feb 25, 2026|395 views|6

Save to Pod

Want to know something specific about what's covered?

We've already dissected every moment. Ask and we will deliver (with timestamps).

Key Moments

On this page

TL;DR

Quso AI enables autonomous social media workflows, challenging traditional SaaS.

Key Insights

Built to empower non-experts to publish daily without burnout through an AI-powered end-to-end social media tool.

Started as a simple clip generator in a pre-AI era and evolved into an autonomous, done-for-you publishing loop.

Transcription accuracy and multi-language support were pivotal, guiding the choice of Assembly AI as a partner.

The product expanded from video assets to cross-content, multi-format outputs (quotes, infographics, newsletters).

The market view is that traditional SaaS is changing; value-based, AI-driven usage is becoming the norm.

Founders are encouraged to experiment with AI quickly, leveraging partnerships to deliver real user value.

ORIGINS AND FOUNDING VISION

Vidant, a founder of Cuso (now rebranded as Quso.ai), built the company from firsthand industry frustration. After six years as a social media manager for a media company in India, he faced the brutal reality of how social presence consumed time, energy, and scarce human resources. The bottleneck was clear: every post required specialized software know-how and lengthy review cycles, pulling resources away from the core message. With practical experience and a mission to democratize content creation, he set out to simplify the process so non-experts could post consistently without burnout while still delivering quality content.

IDENTIFYING THE PROBLEM IN SOCIAL MEDIA MANAGEMENT

The problem extended beyond one person’s workload. Teams struggled with multi-step workflows, approvals, and handoffs between editors, designers, and strategists. That friction slowed growth and diluted momentum. Vidant describes how the bottleneck made even routine social updates risky and slow. This insight shaped Cuso’s product direction: remove dependency on deep software skills, compress turnaround times, and empower non-technical marketers to sustain an authentic presence without the usual headaches.

FROM VIDEO.AI TO QUICK SOCIAL: NAME AND FOCUS

To capture responsibility and branding, the team settled on a name change from Video.ai to Quso, signaling Quick Social. The shift reflected their core focus: turning long-form content into short, platform-ready assets quickly and with less friction. The early framing emphasized speed and accessibility rather than fancy features. This naming also aligned with the broader shift toward automation; the product would not just aid creators but become a repeatable system for how people show up online every day.

MVP IN THE PRE-AI ERA: THE LONG VIDEO TO SHORT CLIPS WORKFLOW

MVPs in the pre-AI era accepted long videos and produced a set of shorter clips in a linear workflow. There was no automatic captioning or social scheduling, and the pipeline relied on manual processing to extract useful assets. Despite the simplicity, the approach addressed a real demand as short videos and Reels were beginning to dominate. It established the core value proposition: one long video yields many shareable outputs.

TRANSCRIPTION CHALLENGES AND LANGUAGE COVERAGE

From the start, transcription quality was critical: captions appear on screen and influence both comprehension and downstream processing. The team tested multiple speech-to-text services—including major cloud offerings—to see which delivered the consistency and speed their users required. Early results were inconsistent, and the infrastructure relied on manual pipelines to refine transcripts. This friction highlighted the need for a higher accuracy solution, motivating a broader search beyond the obvious providers and setting up a rigorous internal benchmarking process.

CHOOSING ASSEMBLY AI: A QUALITY-DRIVEN DECISION

Faced with variable accuracy across services, Cuso conducted head‑to‑head tests, comparing five STT providers on the same videos. The verdict mattered: precision in transcription directly affected captions, speaker labels, and downstream editing. Assembly AI consistently outperformed others, delivering cleaner transcripts and more reliable language handling. The decision wasn’t only about speed; it was about enabling a dependable foundation for the rest of the product—especially as they expanded into multi-language support and more sophisticated editing workflows.

EVOLUTION TOWARD AUTONOMOUS WORKFLOWS

With improvements in accuracy, the team shifted from offering a plug‑and‑play asset generator to building a system that could autonomously manage the content life cycle. The goal was to remove manual intervention and deliver a done-for-you loop: upload, ingest, generate assets, schedule, and publish. This required orchestration across multiple AI tools (the NAN orchestrator) and intelligent decisions about framing, speaker changes, and pacing. The vision was clear: users should be able to record once and have a complete publishing machine working in the background.

END-TO-END VALUE: DONE-FOR-YOU LOOP AND AUTOPUBLISHING

The current product emphasizes a closed loop: you upload content, and the system derives clips, adds captions, writes descriptive text, and even posts to social accounts. Features like NAN allow automatic scene-aware layout changes, while a robust editor lets users tweak styles and crop frames. The Vidi tool acts as a promptable assistant—generating show notes, summaries, quotes, and SEO content. The promise is to save hours, deliver platform-ready assets, and empower teams to maintain a consistent presence without micromanaging every step.

BEYOND VIDEO: CROSS-CONTENT ASSETS AND MULTI-CHANNEL REUSE

The product isn’t limited to video assets. It extracts quotable statements for photos, builds infographics, and adapts content for newsletters and blogs. This broader repurposing opens new avenues for content strategies and ensures a single recording can fuel multiple formats across X, LinkedIn, YouTube, and newsletters. The workflow envisions generating a complete content kit from a single recording, enabling marketing teams to publish consistently for 30 days or more with far less manual drafting.

DEMO AND USER EXPERIENCE: SHOWCASING THE DASHBOARD

In the demonstration, the dashboard handles diverse inputs—YouTube, Instagram, Facebook links or direct uploads—and runs them through a suite of AI tasks. The system shows auto-framing, speaker-triggered layout changes, and a simple path to publish or share. The demo also highlights Viddy’s ability to extract summaries, show notes, and titles. Even as the product evolves, the takeaway is a hands-on sense of how easy it is to generate ready-to-post content from raw recordings.

MARKET CONTEXT: AI TOOLS, SAS 2.0, AND VALUE-BASED MODELING

The interview frames a broader industry shift: the proliferation of AI tools undermines traditional SaaS models, with pricing moving toward value and per‑unit usage rather than per seat. Vidant argues that the app layer is expanding as adoption rises and people experiment with AI across workflows. The takeaway is that software creation is democratizing, but the real competitive edge comes from delivering end-to-end value and enabling customers to get results quickly, not just offering a catalog of features.

LESSONS FOR FOUNDERS: ADVICE ON AI-DRIVEN INNOVATION

The founder closes with pragmatic guidance: now is the time to experiment with AI, especially voice AI, to solve real problems. He emphasizes speed and learning, argues that building with AI should focus on end-to-end outcomes, and that newcomers can ship useful products even if they serve a small audience initially. Partnerships (like with Assembly AI) matter for quality, and a willingness to iterate rapidly determines whether a tool becomes central to users’ workflows.

Mentioned in This Episode

●Products

●Software & Apps

●Companies

●Organizations

●Concepts

●People Referenced

Cuso.ai Quick Start Cheat Sheet

Practical takeaways from this episode

Do This

Upload your long-form video and let the AI generate short clips automatically.

Use Viddy to create show notes, quotes, titles, and SEO-friendly content from the video.

Leverage the dashboard to post directly to connected social accounts for rapid distribution.

Repurpose non-video content (quotes, testimonials, infographics) into posts, newsletters, or blog posts.

Record conversations (Zoom/Riverside) and drop the recording into Cuso to generate 30 days of content.

Avoid This

Rely on a single output style; experiment with layouts and framing to optimize engagement.

Ignore transcription accuracy; always validate captions for correct language and speaker labeling.

Wait for perfect content; AI tools enable rapid iteration—start small, then scale.

Overlook platform-specific best practices (e.g., varying aspect ratios and captions) when posting.

Common Questions

Vidant describes the core bottleneck of social media production: time-consuming, multi-person workflows that made timely posting difficult. Cuso was built to let non-experts post consistently by automating video editing, captioning, and distribution in one place.

Topics

Cuso Q.AI Video Editing Transcription Accuracy AI For Social Media NAN Viddy Assembly AI End-to-end Automation Autonomous Publishing Lip Syncing Speaker Identification Long-form To Short-form Content Content Repurposing

Mentioned in this video

People

Vidant

Founder of Q.AI (Cuso); discusses problem, product, and roadmap

Mart

Host from Assembly introducing the interview and guiding questions

Ryan

Individual referenced in the dashboard example (see NAN-driven layout)

Software & Apps

video.aii

Original product name before rebranding to Q.AI (Quick Social)

NAN

AI orchestration tool used to auto-reframe scenes and layout in clips

GCP

Google Cloud Platform; used as a referenced infrastructure option

Q.AI

Co-founded social media automation platform; rebranded to Quick Social

Cuso

The company/product platform discussed; reimagined as social content automation

Viddy

ChatGPT-for-your-video style assistant within the dashboard (timestamps, show notes, summaries, quotes, titles)

AWS

Amazon Web Services; used as a referenced infrastructure option

Companies

Riverside

Video recording platform mentioned for capturing conversations