> ## Documentation Index
> Fetch the complete documentation index at: https://docs.heylua.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# LuaVoice

> Define voice agents in code with LuaVoice — STT, TTS, LLM, lifecycle hooks, and voice tools

## Overview

`LuaVoice` is the class-based primitive you define in code to declare a voice-enabled agent — its speech-to-text engine, text-to-speech engine, LLM, turn detection, and any voice-specific tools.

```typescript theme={null}
import { LuaVoice } from 'lua-cli';

export default new LuaVoice({
  name: 'support-line',
  llm: 'openai/gpt-5.1-chat-latest',
  stt: 'deepgram/nova-3',
  tts: 'elevenlabs/eleven_turbo_v2_5:pwMBn0SsmN1220Aorv15',
  greeting: 'Hi, this is your support line. How can I help?',
});
```

<Note>
  For testing voice agents live or running automated voice tests, see the [Voice Command](/cli/voice-command). For the direct plugin route (when string descriptors aren't enough), see [Plugin and Realtime Engines](#plugin-and-realtime-engines) below.
</Note>

<Note>
  **Persona is configured on the parent `LuaAgent`, not on `LuaVoice`.** Use the channel-aware persona shape `{ base, voice, text }` on the agent to give a voice its own prompt — see [Channel-Aware Personas](/cli/persona-command#channel-aware-personas).
</Note>

***

## String Descriptors (recommended)

`llm`, `stt`, and `tts` all accept a provider-prefixed string descriptor. This is the canonical form — it routes through Lua's inference layer so you don't manage provider credentials yourself.

```typescript theme={null}
new LuaVoice({
  name: 'support-line',
  llm: 'openai/gpt-5.1-chat-latest',                  // LLM
  stt: 'deepgram/nova-3',                              // STT
  tts: 'elevenlabs/eleven_turbo_v2_5:<voiceId>',      // TTS — colon-separated voiceId
});
```

<Note>
  The model and voice catalogs below are a **living list** — your descriptor is forwarded straight to Lua's inference layer, so newer provider models may work before they're listed here and retired ones may drop off. Treat these tables as a starting point, not an exhaustive allowlist.
</Note>

### LLM options

Provider-prefixed model id. Grouped by tier — pick a tier based on the latency/cost/quality trade-off you need.

**Fast tier** — lowest latency, lowest cost:

| Descriptor                        | Notes                 |
| --------------------------------- | --------------------- |
| `openai/gpt-5-mini`               | Fast & cheap OpenAI.  |
| `openai/gpt-5-nano`               | Cheapest OpenAI tier. |
| `openai/gpt-4.1-mini`             | Stable, fast.         |
| `google/gemini-2.5-flash-lite`    | Fastest Gemini.       |
| `google/gemini-2.5-flash`         | Fast multimodal.      |
| `xai/grok-4-1-fast-non-reasoning` | Fast xAI tier.        |

**Balanced tier** — good default for most voice agents:

| Descriptor                    | Notes                                     |
| ----------------------------- | ----------------------------------------- |
| `openai/gpt-5`                | Balanced quality and speed.               |
| `openai/gpt-5.1-chat-latest`  | Balanced, chat-tuned. **Common default.** |
| `openai/gpt-4.1`              | Stable, balanced.                         |
| `google/gemini-3-flash`       | Newest Flash multimodal.                  |
| `xai/grok-4-1-fast-reasoning` | Reasoning at fast tier.                   |
| `deepseek-ai/deepseek-v3.2`   | Cost-efficient reasoning.                 |
| `moonshotai/kimi-k2-instruct` | Long-context instruct.                    |

**Quality tier** — best capability, higher latency/cost:

| Descriptor                     | Notes                   |
| ------------------------------ | ----------------------- |
| `openai/gpt-5.4`               | Top-tier OpenAI.        |
| `openai/gpt-5.3-chat-latest`   | Top-tier chat-tuned.    |
| `google/gemini-3-pro`          | Long context, top tier. |
| `google/gemini-2.5-pro`        | Stable Pro tier.        |
| `xai/grok-4.20-0309-reasoning` | Top-tier xAI reasoning. |

<Note>
  **Anthropic / Claude is intentionally absent** — Lua's inference layer does not carry Anthropic models for voice as of this writing. Use OpenAI, Google, xAI, DeepSeek, or Kimi for voice LLMs.
</Note>

### STT options

#### Deepgram (recommended)

Deepgram is the recommended STT provider, and `deepgram/nova-3` is the standard choice. `stt` is **required** for cascaded LLMs — omit it only when the `llm` is a realtime speech-to-speech model (which handles audio directly).

```typescript theme={null}
new LuaVoice({
  // ...
  stt: 'deepgram/nova-3',
  sttLanguage: 'en',        // BCP-47 code, or 'multi' for multilingual
});
```

| Descriptor                  | Notes                                                                                                              |
| --------------------------- | ------------------------------------------------------------------------------------------------------------------ |
| `deepgram/nova-3`           | Latest Nova series. Best accuracy + low latency. **Recommended default.**                                          |
| `deepgram/nova-2`           | Previous generation. Still solid.                                                                                  |
| `deepgram/nova-2-phonecall` | Tuned for narrowband (8 kHz) phone audio. Use when call quality is poor or when you want extra robustness on PSTN. |

Combine with `sttLanguage` to pin the spoken language:

* BCP-47 code (`'en'`, `'es'`, `'pt-BR'`, etc.) — pins recognition to that language.
* `'multi'` — multilingual transcription. Applies to both the Inference route and the direct Deepgram plugin.

<Tip>
  Want non-default Deepgram options (smart formatting, filler-word filtering, custom keywords)? Use the plugin class form: `stt: new deepgram.STT({ model: 'nova-3', smartFormat: true })`. See [Plugin and Realtime Engines](#plugin-and-realtime-engines) for the full plugin route.
</Tip>

#### ElevenLabs Scribe

ElevenLabs has an STT model called Scribe, available via the Inference route:

```typescript theme={null}
stt: 'elevenlabs/scribe_v2_realtime'
```

Useful when you want STT and TTS from the same provider, or when Scribe's behavior on a specific language outperforms Deepgram in your testing.

### TTS options

#### ElevenLabs (recommended)

ElevenLabs is the canonical TTS provider. The descriptor format is `elevenlabs/<model>:<voiceId>`.

```typescript theme={null}
new LuaVoice({
  // ...
  tts: 'elevenlabs/eleven_turbo_v2_5:pwMBn0SsmN1220Aorv15',
});
```

**Models:**

| Model                    | Latency | Languages    | Best for                                                     |
| ------------------------ | ------- | ------------ | ------------------------------------------------------------ |
| `eleven_v3`              | \~250ms | 70+          | Most expressive. Use when quality matters more than latency. |
| `eleven_turbo_v2_5`      | Low     | Multilingual | **Common default** — balanced latency + quality.             |
| `eleven_flash_v2_5`      | \~75ms  | Multilingual | Ultra-low latency. Use for fast, interactive turns.          |
| `eleven_multilingual_v2` | \~200ms | 29           | Lifelike emotion across many languages.                      |
| `eleven_flash_v2`        | \~75ms  | English only | Ultra-low latency, English-only.                             |

**Curated voice IDs:**

Lua maintains a curated list with metadata (gender, accent, style) the raw ElevenLabs API doesn't expose:

| Voice ID               | Name      | Accent     | Style                             |
| ---------------------- | --------- | ---------- | --------------------------------- |
| `pwMBn0SsmN1220Aorv15` | Matt      | American   | Male, Hyper-Conversational        |
| `ZTho75k1M56OV0k9XtSC` | Spence    | American   | Male, Soft-Spoken                 |
| `kdmDKE6EkgrWrrykO9Qt` | Alexandra | American   | Female, Conversational            |
| `h2sm0NbeIZXHBzJOMYcQ` | Natasha   | American   | Female, Calm Narrative            |
| `lUTamkMw7gOzZbFIwmq4` | James     | British    | Male, Professional                |
| `4BWwbsA70lmV7RMG0Acs` | Blondie   | British    | Female, Relaxed Casual            |
| `lcMyyd2HUfFzxdCaC4Ta` | Lucy      | British    | Female, Fresh Casual              |
| `4CrZuIW9am7gYAxgo2Af` | Shelley   | British    | Female, Clear Confident           |
| `56bWURjYFHyYyVf490Dp` | Emma      | Australian | Female, Warm Conversational       |
| `aCChyB4P5WEomwRsOKRh` | Salma     | Arabic     | Female, Conversational Expressive |
| `2zRM7PkgwBPiau2jvVXc` | Monika    | Indian     | Female, Deep and Natural          |
| `ecp3DWciuUyW7BYM7II1` | Anika     | Indian     | Female, Sweet and Lively          |
| `pzxut4zZz4GImZNlqQ3H` | Raju      | Indian     | Male, Natural Conversationalist   |

You can also use any ElevenLabs voice ID from your own ElevenLabs account — these are just the curated defaults.

**Alternative: object form**

If you'd rather not concatenate model and voice with a colon, the object form works too:

```typescript theme={null}
tts: { model: 'elevenlabs/eleven_turbo_v2_5', voice: 'pwMBn0SsmN1220Aorv15' }
```

#### Deepgram Aura

Deepgram offers TTS via the Aura family. The voice id is encoded inside the model id as `aura-2-<name>-<lang>`:

```typescript theme={null}
tts: 'deepgram/aura-2-thalia-en'
```

**Common Aura 2 voices (English):**

| ID                  | Name    | Gender            | Style              |
| ------------------- | ------- | ----------------- | ------------------ |
| `aura-2-thalia-en`  | Thalia  | Female (American) | Conversational     |
| `aura-2-asteria-en` | Asteria | Female (American) | Friendly           |
| `aura-2-luna-en`    | Luna    | Female (American) | Warm               |
| `aura-2-stella-en`  | Stella  | Female (American) | Professional       |
| `aura-2-athena-en`  | Athena  | Female (British)  | Authoritative      |
| `aura-2-hera-en`    | Hera    | Female (American) | Calm Narrative     |
| `aura-2-orion-en`   | Orion   | Male (American)   | Confident          |
| `aura-2-arcas-en`   | Arcas   | Male (American)   | Conversational     |
| `aura-2-perseus-en` | Perseus | Male (American)   | Engaging           |
| `aura-2-angus-en`   | Angus   | Male (Irish)      | Storyteller        |
| `aura-2-helios-en`  | Helios  | Male (British)    | Professional       |
| `aura-2-zeus-en`    | Zeus    | Male (American)   | Deep Authoritative |

Spanish voices are also available: `aura-2-celeste-es`, `aura-2-estrella-es`.

#### Other TTS providers (via Inference)

Lua's inference layer also exposes Cartesia, Inworld, Rime, and xAI TTS. The descriptors follow the same `provider/model` shape:

| Descriptor                    | Provider | Notes                      |
| ----------------------------- | -------- | -------------------------- |
| `cartesia/sonic-3`            | Cartesia | Newest, expressive.        |
| `cartesia/sonic-turbo`        | Cartesia | Ultra-low latency.         |
| `inworld/inworld-tts-1.5-max` | Inworld  | High-quality multilingual. |
| `rime/arcana`                 | Rime     | Multilingual, expressive.  |
| `xai/tts-1`                   | xAI      | 21 languages.              |

***

## Plugin and Realtime Engines

For most voice agents the [string-descriptor form](#string-descriptors-recommended) above is all you need. Reach for the plugin/class forms here in two cases: (1) you need provider-specific options the descriptor route doesn't expose, or (2) you're using a realtime (speech-to-speech) model in the `llm` slot.

`lua-cli/voice` re-exports the LiveKit plugin namespaces that `LuaVoice` accepts as class instances — importing through it means you don't add the underlying plugin packages as direct dependencies:

```typescript theme={null}
import { LuaVoice } from 'lua-cli';
import { deepgram, elevenlabs, openai, google, xai, inference } from 'lua-cli/voice';
```

### What's allowed where

The compiler enforces two separate allowlists:

| Form                                           | Allowed in `llm` / `stt` / `tts`                                                                                                                           |
| ---------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `'<provider>/<model>'` string descriptor       | **Any** provider supported by Lua's inference layer. The descriptor route handles credentials.                                                             |
| `new deepgram.<Class>({...})`                  | Plugin route. Only `deepgram` and `elevenlabs` are allowlisted.                                                                                            |
| `new elevenlabs.<Class>({...})`                | Plugin route. Only `deepgram` and `elevenlabs` are allowlisted.                                                                                            |
| `new inference.<Class>({ model, ... })`        | Typed shortcut for the descriptor route — same semantics as a string descriptor, just with autocomplete on the options.                                    |
| `new <provider>.realtime.RealtimeModel({...})` | Realtime route. `openai`, `google` (via `google.beta.realtime.*`), `xai` are allowlisted **for realtime only** (goes in the `llm` slot, replaces STT+TTS). |

<Warning>
  **`new openai.LLM(...)`, `new google.LLM(...)`, `new xai.LLM(...)` and similar class forms fail compile-time validation.** These providers are not on the plugin allowlist. Use string descriptors (`'openai/gpt-5'`) or — for speech-to-speech — the realtime form (`new openai.realtime.RealtimeModel({...})`).
</Warning>

### Plugin route: Deepgram + ElevenLabs

The two allowlisted plugin providers. Use these class forms when you need provider-specific options not exposed by the string-descriptor route.

#### Deepgram STT (plugin form)

```typescript theme={null}
import { LuaVoice } from 'lua-cli';
import { deepgram, elevenlabs } from 'lua-cli/voice';

export default new LuaVoice({
  name: 'support-line',
  llm: 'openai/gpt-5.1-chat-latest',
  stt: new deepgram.STT({
    model: 'nova-3',
    smartFormat: true,
    fillerWords: false,
  }),
  tts: new elevenlabs.TTS({
    voiceId: 'pwMBn0SsmN1220Aorv15',
    model: 'eleven_flash_v2_5',
  }),
});
```

Deepgram exposes two STT classes:

* **`new deepgram.STT({...})`** — Deepgram's v1 WebSocket endpoint. Use this for `nova-3`, `nova-2`, etc.
* **`new deepgram.STTv2({...})`** — Deepgram's v2 endpoint. Required for **Flux** models that use semantic endpointing (`eotThreshold`, `eagerEotThreshold`, `eotTimeoutMs`).

The compiler routes each to the correct underlying plugin based on which class you used.

#### ElevenLabs TTS (plugin form)

```typescript theme={null}
tts: new elevenlabs.TTS({
  voiceId: 'pwMBn0SsmN1220Aorv15',
  model: 'eleven_v3',
  stability: 0.5,
  similarityBoost: 0.75,
});
```

The plugin route lets you pass advanced ElevenLabs options (stability, similarity boost, style, speaker boost, etc.) that the descriptor route doesn't surface.

### Inference route (typed shortcut)

`inference.LLM`, `inference.STT`, `inference.TTS` are typed wrappers for the string-descriptor route. The compiler normalizes both forms to the same wire shape; the class form just gives you better TypeScript autocomplete on the options.

```typescript theme={null}
import { LuaVoice } from 'lua-cli';
import { inference } from 'lua-cli/voice';

export default new LuaVoice({
  name: 'support-line',
  llm: new inference.LLM({ model: 'openai/gpt-5.1-chat-latest' }),
  stt: new inference.STT({ model: 'deepgram/nova-3' }),
  tts: new inference.TTS({
    model: 'elevenlabs/eleven_turbo_v2_5',
    voice: 'pwMBn0SsmN1220Aorv15',
  }),
});
```

The `model` option is required — it's the same provider-prefixed string you'd pass directly. For TTS, pass `voice` separately.

This is the **only** way to use class syntax for providers that aren't on the plugin allowlist (OpenAI, Google, xAI, Cartesia, etc.).

### Realtime route (speech-to-speech)

The realtime route puts a speech-to-speech model in the `llm` slot, replacing the cascaded STT → LLM → TTS pipeline. The class-construction path differs by provider:

* **OpenAI**: `new openai.realtime.RealtimeModel({...})`
* **Google (Gemini)**: `new google.beta.realtime.RealtimeModel({...})` — note the `.beta.` prefix (matches Google's Node SDK shape)

```typescript theme={null}
import { LuaVoice } from 'lua-cli';
import { openai } from 'lua-cli/voice';

export default new LuaVoice({
  name: 'realtime-line',
  llm: new openai.realtime.RealtimeModel({
    model: 'gpt-realtime-1.5',
    voice: 'alloy',
  }),
  // stt and tts are NOT specified — realtime handles audio directly.
});
```

```typescript theme={null}
import { LuaVoice } from 'lua-cli';
import { google } from 'lua-cli/voice';

export default new LuaVoice({
  name: 'realtime-gemini',
  llm: new google.beta.realtime.RealtimeModel({
    model: 'gemini-3.1-flash-live-preview',
  }),
});
```

#### Available realtime models

| Class form                                                                           | Model id                        | Notes                                |
| ------------------------------------------------------------------------------------ | ------------------------------- | ------------------------------------ |
| `new openai.realtime.RealtimeModel({ model: 'gpt-realtime-1.5' })`                   | `gpt-realtime-1.5`              | OpenAI flagship realtime. GA.        |
| `new openai.realtime.RealtimeModel({ model: 'gpt-realtime-mini' })`                  | `gpt-realtime-mini`             | Cost-efficient OpenAI realtime. GA.  |
| `new google.beta.realtime.RealtimeModel({ model: 'gemini-3.1-flash-live-preview' })` | `gemini-3.1-flash-live-preview` | Newest Gemini realtime. Preview.     |
| `new google.beta.realtime.RealtimeModel({ model: 'gemini-2.5-flash-live-preview' })` | `gemini-2.5-flash-live-preview` | Cheaper Gemini alternative. Preview. |

<Note>
  `xai` is reserved in the realtime allowlist but no xAI realtime models are currently published.
</Note>

#### Half-cascade mode

You can keep a separate `tts` with a realtime LLM — the worker injects `modalities: ['text']` so the realtime model emits text and `tts` handles synthesis. Useful when you want realtime's low-latency reasoning but ElevenLabs' voice quality:

```typescript theme={null}
new LuaVoice({
  name: 'hybrid-line',
  llm: new openai.realtime.RealtimeModel({ model: 'gpt-realtime-mini' }),
  // stt omitted — realtime handles input audio.
  tts: 'elevenlabs/eleven_turbo_v2_5:pwMBn0SsmN1220Aorv15',
});
```

You **cannot** combine a realtime `llm` with a custom `stt` — the compiler rejects it. Realtime models handle audio input directly.

### Credentials

Plugin class instances rely on credentials provisioned by the Lua platform — you do **not** need to set `DEEPGRAM_API_KEY`, `ELEVENLABS_API_KEY`, etc. in your project's `.env`. Lua manages the provider credentials for you; your code just references the class form and the platform constructs the actual engine at runtime.

### When to use which form

| Goal                                                  | Recommended form                                                                             |
| ----------------------------------------------------- | -------------------------------------------------------------------------------------------- |
| Quick start, sensible defaults                        | **String descriptor** — `stt: 'deepgram/nova-3'`                                             |
| TypeScript autocomplete on options                    | **`inference.X`** — `stt: new inference.STT({ model: 'deepgram/nova-3' })`                   |
| Deepgram or ElevenLabs with provider-specific options | **Plugin class** — `stt: new deepgram.STT({ model: 'nova-3', smartFormat: true })`           |
| Speech-to-speech (OpenAI/Google/xAI realtime)         | **Realtime class** — `llm: new openai.realtime.RealtimeModel({ model: 'gpt-realtime-1.5' })` |

***

## Configuration Reference

```typescript theme={null}
new LuaVoice({
  // Required
  name: 'support-line',
  llm: 'openai/gpt-5.1-chat-latest',
  stt: 'deepgram/nova-3',
  tts: 'elevenlabs/eleven_turbo_v2_5:pwMBn0SsmN1220Aorv15',

  // Recommended
  description: 'Inbound phone voice for the support assistant',
  greeting: "Hi, this is your support line. How can I help?",
  sttLanguage: 'en',
  turnDetection: 'vad',
  krispEnabled: true,

  // Optional tuning
  maxToolSteps: 6,
  userAwayTimeout: 20,
  preemptiveGeneration: true,
  interruption: { mode: 'adaptive', falseInterruptionTimeout: 2.0 },

  // Optional polish
  pronunciations: { 'HVAC': 'H V A C', 'CFM': 'C F M' },
  persistTranscript: true,
  onToolFailureSay: 'Sorry, let me try that another way.',
  backgroundAudio: { ambient: 'office-ambience', thinking: 'keyboard-typing' },

  // Tools + lifecycle hooks
  tools: [/* ... */],
  onEnter: async (ctx) => {/* ... */},
  onUserTurnCompleted: async (turnCtx, message) => {/* ... */},
  onExit: async (ctx) => {/* ... */},
});
```

### Required fields

<ParamField path="name" type="string" required>
  Unique name for this voice. Used to address the voice in `lua voice --voice <name>` and as the server-side identifier. Allowed characters: `a-zA-Z0-9_-`, 1–64 chars.
</ParamField>

<ParamField path="llm" type="string | LLMConfig" required>
  The LLM that drives the conversation. String descriptor (e.g. `'openai/gpt-5.1-chat-latest'`) is the canonical form. See [LLM options](#llm-options) above for the catalog.
</ParamField>

<ParamField path="stt" type="string | STTConfig" required>
  Speech-to-text engine. String descriptor (e.g. `'deepgram/nova-3'`) is canonical. Required for cascaded LLMs; omit only when using a realtime speech-to-speech model in the `llm` slot.
</ParamField>

<ParamField path="tts" type="string | TTSConfig" required>
  Text-to-speech engine. String descriptor with colon-separated voice id (e.g. `'elevenlabs/eleven_turbo_v2_5:<voiceId>'`), or object form `{ model, voice }`. Required for cascaded LLMs.
</ParamField>

### Optional fields

<ParamField path="description" type="string">
  Human-readable description. Surfaced in the compiled manifest and admin listings.
</ParamField>

<ParamField path="greeting" type="string">
  Opening line spoken at session start. Empty string means no greeting. Generated through the LLM at session connect, so it can be dynamic if `onEnter` sets up context first.
</ParamField>

<ParamField path="sttLanguage" type="string">
  BCP-47 language code (e.g. `'en'`, `'es'`, `'pt-BR'`) or `'multi'` for multilingual transcription. Applies to both Inference STT and the Deepgram plugin.
</ParamField>

<ParamField path="turnDetection" type="'multilingual' | 'english' | 'vad' | 'stt' | 'manual'">
  How the agent decides when the user has finished speaking. `'vad'` is the safest choice for most setups. `'multilingual'` and `'english'` use LiveKit's turn-detector model; `'manual'` defers to your own logic.
</ParamField>

<ParamField path="vad" type="string" default="silero">
  Voice activity detection engine. `'silero'` is the only currently-supported value.
</ParamField>

<ParamField path="vadOptions" type="object">
  Silero VAD tuning. Useful when the default endpointing clips quiet callers or fires too eagerly mid-thought.

  * `minSpeechDuration` (ms, 0–5000) — speech required before a turn starts. Default: 50.
  * `minSilenceDuration` (ms, 0–5000) — silence required to end a turn. Default: 550.
  * `prefixPaddingDuration` (ms, 0–2000) — audio captured before detected speech start, forwarded into STT. Default: 500.
  * `activationThreshold` (0–1) — lower = more sensitive to speech onset.
</ParamField>

<ParamField path="krispEnabled" type="boolean" default="false">
  Krisp BVC background noise cancellation. Recommended for inbound phone calls — it removes background chatter, traffic, and other ambient noise. Billed separately, so opt-in.
</ParamField>

<ParamField path="maxToolSteps" type="number">
  Maximum sequential tool calls per turn (1–20). Higher values let the agent chain more tools before responding.
</ParamField>

<ParamField path="userAwayTimeout" type="number">
  Seconds of silence before the agent considers the user "away" and ends the session. Useful for cleanly handling abandoned calls.
</ParamField>

<ParamField path="preemptiveGeneration" type="boolean">
  Generate the assistant's response speculatively as the user is still speaking. Reduces perceived latency for predictable turns but can be wasted on highly interruptive callers.
</ParamField>

<ParamField path="interruption" type="InterruptionOptions">
  How the agent handles being interrupted mid-response.

  * `enabled` — whether interruption is allowed.
  * `mode` — `'adaptive'` (recommended) or `'vad'`.
  * `falseInterruptionTimeout` (seconds) — how long to wait before treating a brief noise as a false interruption.
  * `resumeFalseInterruption` (boolean) — resume the cut-off response after a false interruption.
  * `minDelay` / `maxDelay` (seconds) — bounds on the interruption response window.
</ParamField>

<ParamField path="pronunciations" type="Record<string, string>">
  Word-boundary text replacements applied before TTS synthesis. Keys are matched case-insensitively as whole words. Use for acronyms and proper nouns the TTS mispronounces.

  ```typescript theme={null}
  pronunciations: { 'HVAC': 'H V A C', 'kubectl': 'kube control' }
  ```

  Cascaded path only. Setting `pronunciations` on a **full-realtime** voice (realtime `llm` with no `tts`) is rejected at compile time — pair with a half-cascade `tts`, or drop the field.
</ParamField>

<ParamField path="backgroundAudio" type="{ ambient?, thinking? }">
  Background audio layered onto the agent's output. Pass a built-in clip name, a `{ source, volume, probability }` config, or an array (probabilistic mix).

  Built-in clips: `'office-ambience'`, `'keyboard-typing'`, `'keyboard-typing-2'`.

  ```typescript theme={null}
  backgroundAudio: {
    ambient: 'office-ambience',
    thinking: 'keyboard-typing',
  }
  ```
</ParamField>

<ParamField path="volume" type="number">
  Output speech volume, 0–100. Applied as a per-frame multiplier. Omit to pass the TTS provider's native level through unchanged.
</ParamField>

<ParamField path="persistTranscript" type="boolean" default="false">
  When `true`, the worker writes `session.history` to `Data.set('call:<sessionId>')` after the call ends. Read it back from a job or webhook with `Data.get('call:<sessionId>')` for post-call analytics, follow-ups, or QA.
</ParamField>

<ParamField path="onToolFailureSay" type="string">
  Short line spoken to the caller when a tool call fails (throws, times out, or returns an unsupported result) — fills the 2–3s gap before the LLM's own recovery response. Spoken once per failed call, then the error is surfaced to the LLM. Keep it short and on-brand (e.g. `'Sorry, let me try that another way.'`); omit for no spoken fallback.
</ParamField>

<ParamField path="tools" type="Array<LuaTool | LuaVoiceTool>">
  Voice-specific tools in addition to skills attached to the owning agent. See [Defining Voice Tools](#defining-voice-tools).
</ParamField>

***

## Lifecycle Hooks

Three hooks let you wire up per-session state, RAG injection, and post-call work.

<ParamField path="onEnter" type="(ctx: LuaVoiceHookContext) => Promise<void>">
  Fires after the session connects to the room and **before** the greeting. Use it to hydrate `session.userdata` from `User`, `Data`, etc., or to set up any per-call state.

  ```typescript theme={null}
  onEnter: async (ctx) => {
    if (ctx.caller?.phoneNumber) {
      const user = await User.get({ phone: ctx.caller.phoneNumber });
      ctx.session.userdata = { user, returning: !!user };
    }
  },
  ```
</ParamField>

<ParamField path="onUserTurnCompleted" type="(turnCtx, message) => Promise<void>">
  Fires after the user finishes a turn, **before** the LLM is invoked. This is the canonical RAG-injection point — `turnCtx.addMessage(...)` adds context messages the LLM sees on this turn.

  ```typescript theme={null}
  onUserTurnCompleted: async (turnCtx, message) => {
    const docs = await Data.search('kb', message.content, 3);
    for (const doc of docs) {
      turnCtx.addMessage({ role: 'system', content: doc.text });
    }
  },
  ```
</ParamField>

<ParamField path="onExit" type="(ctx: LuaVoiceHookContext) => Promise<void>">
  Fires when the session is closing. Use for transcript persistence, outcome reporting, CRM updates, etc.
</ParamField>

***

## Defining Voice Tools

Voice tools run during a voice conversation. `LuaVoiceTool` is a concrete class — **instantiate** it with a config object:

```typescript theme={null}
import { LuaVoiceTool } from 'lua-cli';
import { z } from 'zod';

export const getOrderStatusTool = new LuaVoiceTool({
  name: 'getOrderStatus',
  description: 'Look up the status of an order by ID',
  inputSchema: z.object({ orderId: z.string() }),
  execute: async (input, ctx) => {
    const order = await Data.get('orders', input.orderId);
    return { status: order.status, eta: order.eta };
  },
});
```

### Config fields

<ParamField path="name" type="string" required>
  Tool name. Used by the LLM to identify and call the tool.
</ParamField>

<ParamField path="description" type="string" required>
  What the tool does. Action-oriented description the LLM reads when deciding to invoke.
</ParamField>

<ParamField path="inputSchema" type="ZodType" required>
  Zod schema for the tool's input. Validated before `execute` is called.
</ParamField>

<ParamField path="execute" type="(input, ctx?: LuaVoiceToolCtx) => Promise<any>" required>
  Tool body. Receives the validated input and an optional voice-specific context.
</ParamField>

<ParamField path="condition" type="() => Promise<boolean>">
  Optional gate. When provided, the tool is only exposed to the LLM if `condition()` returns `true`. Use for feature flags or runtime availability checks.
</ParamField>

<ParamField path="flags" type="ToolFlag[]">
  Voice-specific tool flags (e.g. controlling barge-in behavior).
</ParamField>

### ctx — `LuaVoiceToolCtx`

<ParamField path="ctx.toolCallId" type="string">
  Identifier for this specific tool invocation.
</ParamField>

<ParamField path="ctx.voice.say" type="(text: string) => Promise<void>">
  Speak `text` to the caller via the active LiveKit session. Useful for status updates during long-running tool work ("Looking that up — one moment.").
</ParamField>

<ParamField path="ctx.voice.transferToHuman" type="(msisdn, opts?) => Promise<void>">
  Transfer the live caller to a human at `msisdn`. Two mechanisms:

  * **`mode: 'refer'`** (default) — SIP REFER on the inbound leg. Cheap (one billed leg) but depends on the inbound carrier accepting REFER end-to-end.
  * **`mode: 'bridge'`** — dial the human as a second SIP participant into the same room. Two billed legs but works regardless of carrier REFER support. Use for high-stakes transfers.

  `announce` is spoken before the transfer fires.

  ```typescript theme={null}
  await ctx.voice?.transferToHuman('+32477123456', {
    mode: 'bridge',
    announce: 'Transferring you to our sales team — one moment.',
  });
  ```
</ParamField>

You can also share regular `LuaTool` instances between chat skills and voice tools — just pass them in the same `tools` array. The `tools` field accepts both `LuaTool` and `LuaVoiceTool` instances.

***

## Function-style: `defineVoice`

Equivalent to `new LuaVoice(config)` if you prefer a function call:

```typescript theme={null}
import { defineVoice } from 'lua-cli';

export default defineVoice({
  name: 'support-line',
  llm: 'openai/gpt-5.1-chat-latest',
  stt: 'deepgram/nova-3',
  tts: 'elevenlabs/eleven_turbo_v2_5:pwMBn0SsmN1220Aorv15',
});
```

***

## Wiring Up to an Agent

```typescript theme={null}
import { LuaAgent } from 'lua-cli';
import supportLine from './voices/support-line.voice';
import supportSkill from './skills/support.skill';

export const agent = new LuaAgent({
  name: 'support-agent',
  persona: {
    base: 'You are a helpful support agent for Acme Corp.',
    voice: `Speak conversationally in two sentences or fewer. No markdown. Never output digits — spell numbers and prices in full English words ("one hundred twenty-nine dollars", "nine o'clock", "fifty miles").`,
    text: 'Use markdown headers and bullet lists where helpful.',
  },
  voices: [supportLine],
  skills: [supportSkill],
});
```

The agent's `persona.voice` branch is what gives `supportLine` its voice-specific prompt.

<Tip>
  **Voice persona tips:**

  * Keep replies short (1–2 sentences). Voice users can't skim.
  * No markdown — TTS reads it literally.
  * Spell out numbers and prices ("nine o'clock", "twenty dollars") — TTS reads digits robotically otherwise.
</Tip>

### Connect a phone number

Attaching the voice in code makes it *available*; to make the agent **answer phone calls**, bind a number to it. Push your voice first, then run the channels flow and choose **"☎️ Manage phone numbers"**:

```bash theme={null}
lua push        # publish the agent + its LuaVoice
lua channels    # → choose "☎️  Manage phone numbers"
```

From there you can **search** available numbers, **purchase** one, and **bind** it to this agent. During bind you pick **which LuaVoice answers inbound calls** on that number:

```text theme={null}
📞 Your agent is now reachable at +1 (415) 555-0142 via voice "support-line"
```

<Note>
  Binding requires a code-defined LuaVoice that has been pushed. Without one, inbound calls fall through to platform-default STT/LLM/TTS — no greeting, lifecycle hooks, or voice-only tools. Author the voice, `lua push`, then bind.
</Note>

When purchasing, answering **"Allow customers to text this number too?"** with **yes** provisions a voice **+ SMS** number; **no** provisions a voice-only number with lower-latency inbound. See [Channels Command](/cli/channels-command) for the full phone-number flow (list, unbind, release).

***

## Related

* [Voice Command](/cli/voice-command) — live testing and voice test suites
* [Plugin and Realtime Engines](#plugin-and-realtime-engines) — Deepgram/ElevenLabs class forms, realtime speech-to-speech
* [Persona Command](/cli/persona-command#channel-aware-personas) — voice-specific personas on the parent agent
* [LuaAgent API](/api/luaagent)