Voice-first interface infrastructure

HyperTalk is a voice-first interface where intent drives layout. Speak naturally — the system infers what you need, orchestrates the UI, and renders it as a dynamic spatial canvas.

No menus. No buttons. No screen to learn. Say what you want and the interface materializes around your words.

Intent → Layout, not command → layout.

Interfaces for everyone

Voice-first isn't a design preference — it's an accessibility architecture.

HyperTalk removes the assumption that users can see, tap, or navigate complex visual hierarchies. The spatial canvas adapts to the user: high-contrast cards for low vision, paged content for screen readers, voice-only operation for hands-free use.

Every interaction that works by voice also works for users who need it to. The interface meets people where they are.

From screens you navigate to interfaces that listen

Traditional interfaces require you to learn their structure — where the buttons are, what the menus contain, how to navigate between views. Voice assistants replaced buttons with commands but kept the same rigid structure underneath.

HyperTalk removes the indirection entirely. You express intent; the system infers the right layout, content density, and interaction mode — then renders it as cards on a spatial canvas you control with your voice.

"Open news app. Tap headlines. Scroll down. Tap article. Back."
"What's happening in AI today?" → fullscreen news feed, paged, summary density.

Intent-driven, deterministic rendering

Voice input flows through a layered architecture that separates intent inference from UI rendering — making the system predictable, testable, and fast.

Data Flow
Voice Input Fast Command Router Intent Inference UI Orchestration Spatial Canvas
01

Fast Command Router

Micro-commands like "scroll down," "turn the page," and "go back" execute locally in under 100ms — no LLM round-trip. Rule matching, then a tiny on-device classifier, with LLM fallback only when confidence is low.

02

Intent Inference

Natural speech becomes structured intent objects — goal, content type, layout mode, density, and interaction style. The system infers "show me a summary of today's news" as a reading-mode, fullscreen, summary-density news feed.

03

UI Orchestration

Deterministic rules map intent objects plus current context to UI commands. The same intent with the same context always produces the same layout. No inference at the rendering layer — only execution.

04

Canvas Rendering

UI commands flow over NATS to the SwiftUI client, which updates the spatial canvas. Commands are idempotent and replayable — the system can recover from disconnections without losing state.

What works

Intent objects — structured, validated, deterministic

Local micro-commands — sub-100ms, no network dependency

Deterministic orchestration — same input, same layout, every time

Idempotent commands — replay-safe, crash-recoverable

What doesn't

LLM-generated UI — unpredictable, untestable layouts

Command-driven voice — "open app, tap button" with extra steps

Stateless rendering — no context means no adaptation

From speech to structured intent

Every voice input becomes a validated intent object that captures what the user wants — not what buttons to press.

Structured Schema

Intent objects carry goal, mode, focus, density, interaction style, and extracted entities. Each field has a defined enum — no free-form strings reaching the UI layer.

Context-Aware

Intent inference considers the active mode, current card, available actions, user profile, and the last three intents. "More detail" means different things in a news feed vs. a data table.

Graceful Fallback

When intent can't be determined, the system emits a "clarify" intent rather than guessing. The UI prompts naturally — no error dialogs, no dead ends.

Progressive Disclosure

Information density adapts to the conversation. Start with summaries; say "more detail" and the same content expands — without navigating to a different view.

"What's the latest news today?" → intent: request_news, mode: reading, focus: fullscreen, density: summary

Cards, not screens

HyperTalk renders content as typed cards on a spatial canvas. Each card has a validated schema, deterministic rendering rules, and voice-controllable interactions.

Note
Text content with voice dictation, scrollable view, and inline formatting.
News Feed
Headline summaries with paged navigation. "Turn the page" advances; "more detail" expands.
Table
Structured rows and columns with pagination for large datasets. Scientific content as a first-class citizen.
Chart
Line, bar, scatter, and heatmap visualizations with reserved layout space and axis labeling.
Document
Paged document view with voice-driven page turning. "Turn the page" maps to page_next.
Now Playing
Album art, track info, and playback controls. Voice-driven: play, pause, skip, volume.

The canvas is voice-controlled at every level. "Scroll down" moves within a card. "Next card" shifts focus. "Fullscreen" expands the active card. "Split view" arranges two cards side by side. Layout adapts to content — a chart card alongside a data table, a news feed filling the screen.

Say what's wrong. The system looks.

"This doesn't look right" triggers a diagnostic flow: the system captures a screenshot, sends it for analysis, and returns a focused diagnosis card — all by voice. No manual screenshots, no filing tickets, no describing the problem in text.

The diagnosis card appears fullscreen with an "Analyzing…" state that resolves into structured findings. The system sees what you see and tells you what's wrong.

Diagnosis Flow
"This looks wrong" Intent: diagnose_ui Capture Screenshot Backend Analysis Diagnosis Card

Your world, by voice

HyperTalk connects to the services you use daily — rendering them as cards on the spatial canvas, controllable entirely by voice.

Email

OAuth-backed Gmail and Outlook. Threads render as conversation cards with voice-driven triage.

"Read my email"
"Reply to this"
"Archive it"
WhatsApp

Business Cloud API integration. Messages render as chat cards with threaded reading mode.

"Read my messages"
"Reply to the last one"
"Send to [contact]"
Spotify

OAuth playback control. Now Playing card with album art, track info, and voice controls.

"Play [artist]"
"Skip"
"Pause"

Technology stack

HyperTalk is built in pure Swift and SwiftUI — a native Apple experience with ADAMAS providing the agent intelligence layer.

Swift & SwiftUI
Native application

Built entirely in Swift with SwiftUI — platform-native accessibility, system integration, and the polish expected of an Apple-ecosystem product. Runs on iOS and macOS.

Whisper v3 Turbo
Speech-to-text

Fast, accurate speech recognition that converts voice input into text for intent inference. Low-latency transcription that keeps the voice interaction feeling immediate and natural.

InWorld
Text-to-speech

Natural voice synthesis that delivers responses with appropriate pacing and clarity. The voice layer that makes the system feel like a conversation, not a text-to-speech readback.

ADAMAS
Agent intelligence

Intent inference, content retrieval, and integration orchestration run on ADAMAS — providing durable execution, knowledge graph memory, and multi-agent coordination.

adamas.network →

Experience it

HyperTalk is currently in development for iOS and macOS, with the spatial canvas engine, intent architecture, and voice stack coming together as a native Apple experience.

If you're interested in voice-first interfaces, accessibility-driven design, or the future of how humans interact with AI — we'd like to hear from you.