HyperTalk — Voice-First Interface Infrastructure

The Shift

From screens you navigate to interfaces that listen

Traditional interfaces require you to learn their structure — where the buttons are, what the menus contain, how to navigate between views. Voice assistants replaced buttons with commands but kept the same rigid structure underneath.

HyperTalk removes the indirection entirely. You express intent; the system infers the right layout, content density, and interaction mode — then renders it as cards on a spatial canvas you control with your voice.

"Open news app. Tap headlines. Scroll down. Tap article. Back."

"What's happening in AI today?" → fullscreen news feed, paged, summary density.

Architecture

Intent-driven, deterministic rendering

Voice input flows through a layered architecture that separates intent inference from UI rendering — making the system predictable, testable, and fast.

Data Flow

Voice Input → Fast Command Router → Intent Inference → UI Orchestration → Spatial Canvas

01

Fast Command Router

Micro-commands like "scroll down," "turn the page," and "go back" execute locally in under 100ms — no LLM round-trip. Rule matching, then a tiny on-device classifier, with LLM fallback only when confidence is low.

02

Intent Inference

Natural speech becomes structured intent objects — goal, content type, layout mode, density, and interaction style. The system infers "show me a summary of today's news" as a reading-mode, fullscreen, summary-density news feed.

03

UI Orchestration

Deterministic rules map intent objects plus current context to UI commands. The same intent with the same context always produces the same layout. No inference at the rendering layer — only execution.

04

Canvas Rendering

UI commands flow over NATS to the SwiftUI client, which updates the spatial canvas. Commands are idempotent and replayable — the system can recover from disconnections without losing state.

What works

Intent objects — structured, validated, deterministic

Local micro-commands — sub-100ms, no network dependency

Deterministic orchestration — same input, same layout, every time

Idempotent commands — replay-safe, crash-recoverable

What doesn't

LLM-generated UI — unpredictable, untestable layouts

Command-driven voice — "open app, tap button" with extra steps

Stateless rendering — no context means no adaptation

Intent Inference

From speech to structured intent

Every voice input becomes a validated intent object that captures what the user wants — not what buttons to press.

Structured Schema

Intent objects carry goal, mode, focus, density, interaction style, and extracted entities. Each field has a defined enum — no free-form strings reaching the UI layer.

Context-Aware

Intent inference considers the active mode, current card, available actions, user profile, and the last three intents. "More detail" means different things in a news feed vs. a data table.

Graceful Fallback

When intent can't be determined, the system emits a "clarify" intent rather than guessing. The UI prompts naturally — no error dialogs, no dead ends.

Progressive Disclosure

Information density adapts to the conversation. Start with summaries; say "more detail" and the same content expands — without navigating to a different view.

"What's the latest news today?" → intent: request_news, mode: reading, focus: fullscreen, density: summary

Dynamic Spatial Canvas

Cards, not screens

HyperTalk renders content as typed cards on a spatial canvas. Each card has a validated schema, deterministic rendering rules, and voice-controllable interactions.

Note

Text content with voice dictation, scrollable view, and inline formatting.

News Feed

Headline summaries with paged navigation. "Turn the page" advances; "more detail" expands.

Table

Structured rows and columns with pagination for large datasets. Scientific content as a first-class citizen.

Chart

Line, bar, scatter, and heatmap visualizations with reserved layout space and axis labeling.

Document

Paged document view with voice-driven page turning. "Turn the page" maps to page_next.

Now Playing

Album art, track info, and playback controls. Voice-driven: play, pause, skip, volume.

The canvas is voice-controlled at every level. "Scroll down" moves within a card. "Next card" shifts focus. "Fullscreen" expands the active card. "Split view" arranges two cards side by side. Layout adapts to content — a chart card alongside a data table, a news feed filling the screen.

Emotional Intelligence

Responses that feel right

HyperTalk uses Hume AI's Empathic Voice Interface to detect emotional context in speech — adjusting response tone, pacing, and presentation.

Voice carries more than words. Frustration, curiosity, urgency — these shape what the right response looks like. A frustrated "this doesn't look right" triggers screenshot diagnosis with a calm, methodical analysis card. An excited "show me everything about this" expands to full detail with matching energy.

The emotional layer doesn't change what content is shown — it changes how it's delivered. Pacing, density, and tone adapt to the human on the other side.

Screenshot Diagnosis

Say what's wrong. The system looks.

"This doesn't look right" triggers a diagnostic flow: the system captures a screenshot, sends it for analysis, and returns a focused diagnosis card — all by voice. No manual screenshots, no filing tickets, no describing the problem in text.

The diagnosis card appears fullscreen with an "Analyzing…" state that resolves into structured findings. The system sees what you see and tells you what's wrong.

Diagnosis Flow

"This looks wrong" → Intent: diagnose_ui → Capture Screenshot → Backend Analysis → Diagnosis Card

Integrations

Your world, by voice

HyperTalk connects to the services you use daily — rendering them as cards on the spatial canvas, controllable entirely by voice.

Email

OAuth-backed Gmail and Outlook. Threads render as conversation cards with voice-driven triage.

"Read my email"
"Reply to this"
"Archive it"

Business Cloud API integration. Messages render as chat cards with threaded reading mode.

"Read my messages"
"Reply to the last one"
"Send to [contact]"

Spotify

OAuth playback control. Now Playing card with album art, track info, and voice controls.

"Play [artist]"
"Skip"
"Pause"

Built With

Technology stack

HyperTalk is built on Rust, Bevy, and SwiftUI — wrapped in a native Apple experience with ADAMAS providing the agent intelligence layer.

Rust & Bevy

Core engine

The spatial canvas engine is built in Rust using Bevy's ECS architecture — giving HyperTalk the performance characteristics of a game engine with the reliability of systems programming.

SwiftUI

Native shell

The Bevy engine is wrapped in a native Swift application, providing platform-native accessibility, system integration, and the polish expected of an Apple-ecosystem product.

Hume EVI

Empathic Voice Interface

Emotion-aware voice processing that detects tone, urgency, and frustration in real-time — adapting response delivery to match the human context of each interaction.

ADAMAS

Agent intelligence

Intent inference, content retrieval, and integration orchestration run on ADAMAS — providing durable execution, knowledge graph memory, and multi-agent coordination.

adamas.network →

Accessibility

Interfaces for everyone

Voice-first isn't just a design preference — it's an accessibility architecture. HyperTalk removes the assumption that users can see, tap, or navigate complex visual hierarchies. The spatial canvas adapts to the user: high-contrast cards for low vision, paged content for screen readers, voice-only operation for hands-free use.

Every interaction that works by voice also works for users who need it to. The interface meets people where they are.

Early Access

Experience it

HyperTalk is currently in development. The voice-first interface is being built for iOS and macOS, with the spatial canvas engine, intent architecture, and Hume AI integration coming together as a native Apple experience.

If you're interested in voice-first interfaces, accessibility-driven design, or the future of how humans interact with AI — we'd like to hear from you.

Get in touch →