Key Moments
AI Dev 26 x SF | William Imoh & Charlie Wood: Closing the Care Gap
Want to know something specific about what's covered?
We've already dissected every moment. Ask and we will deliver (with timestamps).
Key Moments
AI agents can now generate patient summaries in seconds, but integrating them safely with sensitive health data requires a 'local-first' approach to address strict privacy regulations.
Key Insights
About 15% of hospital patients are readmitted within a year, costing insurance payers for readmissions within a 30-day period.
Current healthcare EMRs are rule-based and lack predictive modeling, leading to physician and nurse fatigue from manual data pulling and generic alerts.
Actian's VectorAI DB is a 'local-first' database designed for regulated industries, prioritizing data sovereignty and enabling real-time indexing and edge use cases.
At 10 million vectors, Actian's VectorAI DB maintains higher query-per-second rates than competitors, though with a slight decrease in P95 recall (0.988 vs. 0.99).
The Care Transition Copilot uses a four-agent workflow: gather context, analyze risk, retrieve protocols, and draft a brief, generating output in 2-5 seconds.
The system prioritizes auditable retrieval and citations over fine-tuning models, ensuring clinicians can verify AI-generated information.
The immense burden of manual data processing in healthcare
Clinicians, such as home health nurses, currently spend an average of 45 minutes per patient on chart preparation. This manual process involves sifting through multiple disparate systems—including pharmacology, EMRs, outpatient records, and lab results—without unified integration. A major consequence of this inefficiency is patient readmissions; approximately 15% of patients are readmitted within a year, leading to insurance non-payment for those within a 30-day window. The complexity extends to handling various data formats: long notes, PDFs, scanned documents, and both structured and unstructured data. Strict data locality requirements, especially for Personally Identizable Information (PII), often prevent cloud-based solutions, while the demand for sub-second retrieval mirrors human expectations for prompt responses.
Limitations of current rule-based systems and the promise of vector databases
Existing healthcare systems, particularly Electronic Medical Records (EMRs), are largely rule-based and built on historical workflows rather than predictive modeling. This approach struggles to incorporate AI's full contextual understanding, necessitating continued manual data aggregation by physicians and nurses. Current systems often provide a single triage score, lack interaction, and suffer from alert fatigue with generic platforms. The critical need for systems that can operate within an organization's internal network, respecting data sovereignty, is paramount. This is where Actian's VectorAI DB emerges, a platform designed to be 'local-first,' allowing data to reside within the organization's 'four walls.' This approach is crucial for regulated industries like healthcare, banking, and finance, offering production-ready, real-time indexing with a developer-first philosophy.
Introducing Actian's VectorAI DB for local, private AI operations
Charlie Wood introduced Actian's VectorAI DB, a launched platform that is 'local-first,' emphasizing data sovereignty for regulated industries. This means the database can operate entirely within an organization's infrastructure, crucial for sensitive patient data. It is production-ready, supports real-time indexing, and is built with a developer-centric approach. The local deployment is particularly vital for healthcare, where PII cannot leave the network, and managed cloud services might be restricted. The platform aims to address sub-second retrieval needs and enable autonomous, interactive AI agents. While competitors with similar capabilities might require sacrificing recall for speed, VectorAI DB demonstrated strong performance at scale, retaining high query-per-second rates even with 10 million vectors.
Performance benchmarks and the recall trade-off
When testing VectorAI DB, the team focused on speed and scale. At 1 million vectors, they observed a 3-7x speed increase compared to competitors. Scaling to 10 million vectors, VectorAI DB maintained a significant portion of its query-per-second rate. However, William Imoh highlighted a crucial catch: a slight decrease in recall. While competitors achieved a P95 recall around 0.99, VectorAI DB was at 0.988. This is an important consideration for developers, as these benchmarks were based on the base versions of the vector databases. Adjustments to indexing parameters, like the EF construct and EF search, can be made within VectorAI DB to balance speed and recall.
Building a Care Transition Copilot with local-first RAG
The core use case demonstrated is a Care Transition Copilot, built using IdeaBoxAI and Actian VectorAI DB. The system aims to assist clinicians by assembling patient context, detecting risk signals, and generating actionable insights. The agent workflow begins with a trigger, typically from a nurse, then gathers patient information, analyzes it, retrieves relevant data, and finally drafts a concise brief for the clinician. This brief is intended to be directly usable during patient visits. For the vector database, data models were defined for patient history, clinical protocols, notes, and vitals, using 768 dimensions and a cosine metric. Importantly, the system utilizes hybrid queries, combining ANN-powered searches with filtering capabilities in a single round trip to maintain low latency, an advantage over splitting queries across different systems.
Technical implementation: Data ingestion and agent orchestration
The technical build involved modeling data into a vector database, which requires careful consideration of data sovereignty and latency. The VectorAI DB supports storing vectors alongside predicate data in 'points,' enabling both similarity searches and filtering in one pass. For embeddings, the team used the 'all-mpnet-base-v2' model from Sentence Transformers, chosen for its open-source nature and lightweight footprint suitable for on-premise deployment. Data is chunked from sources like discharge notes, documents, and emails. The agent orchestration involves four synchronous loops: gathering context using patient IDs, analyzing risks (like deterioration signals or medication conflicts), retrieving matching guidelines and protocols, and finally drafting a brief. While plain Python was used for the demo, the architecture supports integration with frameworks like LangGraph or Mastra.
API design and productionizing the copilot
The API surface for the copilot includes a primary POST request to generate a brief, which can then be stored in durable storage like S3. Feedback mechanisms are also included to refine the vector database over time. The production-ready API handles prompt injection and data validation. Key functions include startup and shutdown methods for database interaction, ingest methods that chunk discharge notes and build payloads (including patient ID, encounter date, length of stay, source), and database primitives for searching and scrolling. The agent functions—gather context, analyze risk, retrieve protocols, and draft brief—are called sequentially. The generated brief can be retrieved via API, with options to store it in various formats or send notifications.
Demo scenario: Brendan Chen's care transition
A realistic end-to-end scenario featured Brendan Chen, a 54-year-old patient with upper GI bleeding, discharged three days prior, with a visit scheduled in 90 minutes. This high-risk profile necessitates immediate attention. A home health nurse using the copilot application would prioritize Brendan due to his critical status. The application aggregates data from various systems (vitals, medications, visit history) to provide a combined view. Generating a pre-visit brief involves the RAG architecture, embedding and retrieving data from discharge instructions, SOAP notes, and escalation information. The brief highlights key findings like visit trends, current pending tasks (e.g., bleed surveillance), and suggested assessments derived by the agent, not a trained model. The system provides retrieval details, including matched queries and document sources, aiding the clinician in understanding readmission risks and diagnostic concerns. This enables proactive care to prevent readmissions and ensure continuity.
Key takeaways and future directions
The project emphasized that embeddings and RAG were sufficient, eliminating the need for model fine-tuning, as clinician verification of citations is paramount. The local-first approach was non-negotiable due to regulations, preventing the use of managed cloud versions. The system is read-only by design to avoid corrupting the vector database. Future steps include incorporating rerankers, enabling multi-turn conversations per hospital, and developing evaluation harnesses to define ideal brief formats. The core message is that while some posit RAG is dead, its need is increasing, especially in environments with limited connectivity, such as manufacturing floors or offshore oil rigs. Hybrid search, sub-15ms HNSW indexing, and real-time upserts contribute to the synchronous feel of agent interactions. The local runtime offers flexibility in model choices, allowing for on-premise deployment of models like Gemma or the Quen family. The Actian VectorAI DB community edition is available for local testing on Linux, Windows, and Mac OS via Docker.
Mentioned in This Episode
●Software & Apps
●Companies
●People Referenced
Vector Database Performance Comparison
Data extracted from this episode
| Vector Count | Speed Increase vs. Competitors | Recall (P95) |
|---|---|---|
| 1 Million | 3x to 7x | 0.99 (Competitors) |
| 10 Million | Retained Queries Per Second | 0.988 (Vector AIDB) |
Brief Generation Time
Data extracted from this episode
| Metric | Time | Context |
|---|---|---|
| End-to-end Brief Generation | 2-5 seconds | At scale (10 million vectors), with demo use case |
| Typical RAG Systems for SaaS | 5-10 seconds | Acceptable wait time for end-users |
Common Questions
Healthcare providers face time-consuming manual chart prep, data scattered across multiple disparate systems (pharmacology, EMR, labs), lack of integration, and significant readmission rates, leading to financial penalties. Data often exists in unstructured formats like PDFs and notes, requiring sub-second retrieval and strict locality for PII.
Topics
Mentioned in this video
Presents the technical deep dive and demo, focusing on building retrieval-backed AI applications and using Vector AIDB.
A 54-year-old patient with GI upper bleeding, used as a case study for the system's demo.
Global architect for Acten, who introduces the use case and demo.
A web framework used to demonstrate the API endpoint for creating a brief and to run the agent.
An open-source vector database mentioned as an alternative that would require workarounds for protected health information.
A source of data for the patient records, alongside Google Drive and a data warehouse.
A model used for embeddings in the Vector AIDB system, chosen for its open-source and lightweight nature for on-prem deployment.
A local-first vector database designed for regulated industries, focusing on data sovereignty, production readiness, and real-time indexing.
A tool mentioned for running agent loops, alongside plain Python.
A database system mentioned in the context of hybrid queries and isolation.
An open-source model that can be run on-prem, highlighting the flexibility of local deployments.
A source of data for patient records, integrated into the system.
Electronic Medical Records, a system with rule-based workflows and historical data that lacks predictive modeling and full context understanding.
The platform where the Vector AIDB team and community can connect for questions and support.
More from DeepLearningAI
View all 80 summaries
33 minAI Dev 26 x SF | Carter Rabasa: File Systems Are the New Primitive for AI Agents
28 minAI Dev 26 x SF | Melissa Herrera: Your Agents Should Be Durable
31 minAI Dev 26 x SF | Vlad Luzin: Herding Cats—The Hidden Challenges of Multi-Agent Autonomy
52 minAI Dev 26 x SF | Eli Schilling: Hands On Agent Context & Memory Engineering with Oracle AI Database
Ask anything from this episode.
Save it, chat with it, and connect it to Claude or ChatGPT. Get cited answers from the actual content — and build your own knowledge base of every podcast and video you care about.
Get Started Free