Audio Tutorial
Learn how to build MentraOS Apps that can:
- π Play audio files from URLs on connected smart glasses or phone
- π£οΈ Convert text to speech and play it through the glasses speakers or phone
- βΉοΈ Stop audio playback when needed
Audio routes through your phone by default. To route auto through the Mentra Live, connect to it through your phoneβs settings like any other bluetooth headphones. This is separate from the pairing process to MentraOS.
Prerequisitesβ
- MentraOS SDK β₯
2.1.2
installed in your project - A local development environment configured as described in Getting Started
1 - Set up the Projectβ
Copy the basic project structure from the Quickstart if you haven't already. We'll focus on the contents of src/index.ts
.
2 - Playing Audio from URLsβ
The most straightforward way to play audio is from a publicly accessible URL.
src/index.ts
import { AppServer, AppSession } from "@mentra/sdk";
class AudioDemoServer extends AppServer {
protected async onSession(
session: AppSession,
sessionId: string,
userId: string,
): Promise<void> {
session.logger.info(`π Audio demo session started for ${userId}`);
// Example: Play a notification sound
try {
const result = await session.audio.playAudio({
audioUrl: "https://okgodoit.com/cool.mp3"
});
if (result.success) {
session.logger.info(`β
Audio played successfully`);
if (result.duration) {
session.logger.info(`β±οΈ Duration: ${result.duration} ms`);
}
} else {
session.logger.error(`β Audio playback failed: ${result.error}`);
}
} catch (error) {
session.logger.error(`Exception during audio playback: ${error}`);
}
}
}
// Bootstrap the server
new AudioDemoServer({
packageName: process.env.PACKAGE_NAME ?? "com.example.audio",
apiKey: process.env.MENTRAOS_API_KEY!,
port: Number(process.env.PORT ?? "3000"),
}).start();
3 - Text-to-Speech (TTS)β
Convert any text to natural-sounding speech using ElevenLabs and play it on the glasses.
src/index.ts
import { AppServer, AppSession } from "@mentra/sdk";
class TTSServer extends AppServer {
protected async onSession(
session: AppSession,
sessionId: string,
userId: string,
): Promise<void> {
session.logger.info(`π£οΈ TTS demo session started`);
// Basic text-to-speech
try {
const result = await session.audio.speak("Welcome to Mentra OS! This is your audio assistant.");
if (result.success) {
session.logger.info("β
Speech synthesis successful");
} else {
session.logger.error(`β TTS failed: ${result.error}`);
}
} catch (error) {
session.logger.error(`Exception during TTS: ${error}`);
}
// Advanced TTS with custom voice settings
try {
const result = await session.audio.speak(
"This message uses custom voice settings for a different sound.",
{
voice_id: "your_elevenlabs_voice_id", // Optional: specific ElevenLabs voice
model_id: "eleven_flash_v2_5", // Optional: specific model
voice_settings: { // each setting is optional
stability: 0.7, // Voice consistency (0.0-1.0)
similarity_boost: 0.8, // Voice similarity (0.0-1.0)
style: 0.3, // Speaking style (0.0-1.0)
speed: 0.9 // Speaking speed (0.25-4.0)
}
}
);
if (result.success) {
session.logger.info("β
Advanced TTS successful");
}
} catch (error) {
session.logger.error(`Exception during advanced TTS: ${error}`);
}
}
}
TTS Configuration Optionsβ
Option | Type | Default | Description |
---|---|---|---|
voice_id | string | Server default | ElevenLabs voice ID |
model_id | string | eleven_flash_v2_5 | TTS model to use (see models below) |
voice_settings.stability | number | 0.5 | Voice stability and randomness (0.0-1.0) . Lower values introduce broader emotional range, higher values can result in monotonous voice |
voice_settings.similarity_boost | number | 0.75 | How closely AI adheres to original voice (0.0-1.0) |
voice_settings.style | number | 0.0 | Style exaggeration of the voice (0.0-1.0) . Amplifies original speaker's style but increases latency |
voice_settings.use_speaker_boost | boolean | false | Boosts similarity to original speaker. Increases computational load and latency |
voice_settings.speed | number | 1.0 | Playback speed. 1.0 = normal, <1.0 = slower, >1.0 = faster |
Available TTS Modelsβ
Model | Description | Languages | Latency |
---|---|---|---|
eleven_v3 | Human-like and expressive speech generation | 70+ languages | Standard |
eleven_flash_v2_5 | Ultra-fast model optimized for real-time use | All multilingual_v2 languages + hu, no, vi | ~75ms |
eleven_flash_v2 | Ultra-fast model (English only) | en | ~75ms |
eleven_turbo_v2_5 | High quality, low-latency with good balance | Same as flash_v2_5 | ~250-300ms |
eleven_turbo_v2 | High quality, low-latency (English only) | en | ~250-300ms |
eleven_multilingual_v2 | Most lifelike with rich emotional expression | en, ja, zh, de, hi, fr, ko, pt, it, es, id, nl, tr, fil, pl, sv, bg, ro, ar, cs, el, fi, hr, ms, sk, da, ta, uk, ru | Standard |
4 - Interactive Audio Appβ
Here's a complete example that combines voice activation with audio responses:
src/index.ts
import { AppServer, AppSession } from "@mentra/sdk";
class InteractiveAudioApp extends AppServer {
protected async onSession(
session: AppSession,
sessionId: string,
userId: string,
): Promise<void> {
session.logger.info(`π€ Interactive audio app started`);
// Welcome message
await session.audio.speak("Welcome to the interactive audio demo. Say 'play music' or 'tell me a joke'.");
// Listen for voice commands
const unsubscribe = session.events.onTranscription(async (data) => {
if (data.text.toLowerCase().includes("stop")) {
await session.audio.stopAudio();
return;
}
if (!data.isFinal) return;
const command = data.text.toLowerCase().trim();
session.logger.info(`Heard: "${command}"`);
if (command.includes("play music")) {
await this.playMusic(session);
} else if (command.includes("tell me a joke") || command.includes("joke")) {
await this.tellJoke(session);
} else if (command.includes("hello")) {
await session.audio.speak("Hello there! How can I help you today?");
}
});
// Clean up listener when session ends
this.addCleanupHandler(unsubscribe);
}
private async playMusic(session: AppSession): Promise<void> {
session.layouts.showTextWall("π΅ Playing music...");
try {
const result = await session.audio.playAudio({
audioUrl: "https://example.com/background-music.mp3",
volume: 0.6
});
if (result.success) {
await session.audio.speak("Hope you enjoyed the music!");
} else {
await session.audio.speak("Sorry, I couldn't play the music right now.");
}
} catch (error) {
session.logger.error(`Music playback error: ${error}`);
await session.audio.speak("There was an error playing music.");
}
}
private async tellJoke(session: AppSession): Promise<void> {
const jokes = [
"What do you call a pair of glasses that can see the future? ... Pre-scription glasses!",
"Why did the augmented reality app break up with the virtual reality app? ... It said the relationship wasn't real enough!",
"Why did the phone pair to the smart glasses? ... Because it lost its contacts!",
];
const joke = jokes[Math.floor(Math.random() * jokes.length)];
session.layouts.showTextWall("Telling a joke...");
await session.audio.speak(joke, {
voice_settings: {
style: 0.8, // More expressive for jokes
speed: 0.9 // Slightly slower for comedy timing
}
});
}
}
// Bootstrap the server
new InteractiveAudioApp({
packageName: process.env.PACKAGE_NAME ?? "com.example.interactiveaudio",
apiKey: process.env.MENTRAOS_API_KEY!,
port: Number(process.env.PORT ?? "3000"),
}).start();
5 - Audio Managementβ
Stopping Audioβ
// Stop all currently playing audio
session.audio.stopAudio();
Error Handlingβ
const result = await session.audio.playAudio({
audioUrl: "https://example.com/sound.mp3"
});
if (!result.success) {
// This could be because the audio was interrupted by session.audio.stopAudio(), or there could be an error
if (result.error) {
// Handle specific audio errors
session.logger.error(`Audio error: ${result.error}`);
// Provide user feedback
await session.audio.speak("Sorry, I couldn't play that audio file.");
}
}
Next Stepsβ
- See the detailed Audio Manager documentation
- Explore Device Capabilities to adapt audio features based on hardware
- Learn about Events to create voice-activated audio experiences
- Review Permissions for any audio-related permissions your app might need