Skip to main content

Audio Tutorial

Learn how to build MentraOS Apps that can:

  1. πŸ”Š Play audio files from URLs on connected smart glasses or phone
  2. πŸ—£οΈ Convert text to speech and play it through the glasses speakers or phone
  3. ⏹️ Stop audio playback when needed

Audio routes through your phone by default. To route auto through the Mentra Live, connect to it through your phone’s settings like any other bluetooth headphones. This is separate from the pairing process to MentraOS.


Prerequisites​

  1. MentraOS SDK β‰₯ 2.1.2 installed in your project
  2. A local development environment configured as described in Getting Started

1 - Set up the Project​

Copy the basic project structure from the Quickstart if you haven't already. We'll focus on the contents of src/index.ts.


2 - Playing Audio from URLs​

The most straightforward way to play audio is from a publicly accessible URL.

src/index.ts
import { AppServer, AppSession } from "@mentra/sdk";

class AudioDemoServer extends AppServer {
protected async onSession(
session: AppSession,
sessionId: string,
userId: string,
): Promise<void> {
session.logger.info(`πŸ”Š Audio demo session started for ${userId}`);

// Example: Play a notification sound
try {
const result = await session.audio.playAudio({
audioUrl: "https://okgodoit.com/cool.mp3"
});

if (result.success) {
session.logger.info(`βœ… Audio played successfully`);
if (result.duration) {
session.logger.info(`⏱️ Duration: ${result.duration} ms`);
}
} else {
session.logger.error(`❌ Audio playback failed: ${result.error}`);
}
} catch (error) {
session.logger.error(`Exception during audio playback: ${error}`);
}
}
}

// Bootstrap the server
new AudioDemoServer({
packageName: process.env.PACKAGE_NAME ?? "com.example.audio",
apiKey: process.env.MENTRAOS_API_KEY!,
port: Number(process.env.PORT ?? "3000"),
}).start();

3 - Text-to-Speech (TTS)​

Convert any text to natural-sounding speech using ElevenLabs and play it on the glasses.

src/index.ts
import { AppServer, AppSession } from "@mentra/sdk";

class TTSServer extends AppServer {
protected async onSession(
session: AppSession,
sessionId: string,
userId: string,
): Promise<void> {
session.logger.info(`πŸ—£οΈ TTS demo session started`);

// Basic text-to-speech
try {
const result = await session.audio.speak("Welcome to Mentra OS! This is your audio assistant.");

if (result.success) {
session.logger.info("βœ… Speech synthesis successful");
} else {
session.logger.error(`❌ TTS failed: ${result.error}`);
}
} catch (error) {
session.logger.error(`Exception during TTS: ${error}`);
}

// Advanced TTS with custom voice settings
try {
const result = await session.audio.speak(
"This message uses custom voice settings for a different sound.",
{
voice_id: "your_elevenlabs_voice_id", // Optional: specific ElevenLabs voice
model_id: "eleven_flash_v2_5", // Optional: specific model
voice_settings: { // each setting is optional
stability: 0.7, // Voice consistency (0.0-1.0)
similarity_boost: 0.8, // Voice similarity (0.0-1.0)
style: 0.3, // Speaking style (0.0-1.0)
speed: 0.9 // Speaking speed (0.25-4.0)
}
}
);

if (result.success) {
session.logger.info("βœ… Advanced TTS successful");
}
} catch (error) {
session.logger.error(`Exception during advanced TTS: ${error}`);
}
}
}

TTS Configuration Options​

OptionTypeDefaultDescription
voice_idstringServer defaultElevenLabs voice ID
model_idstringeleven_flash_v2_5TTS model to use (see models below)
voice_settings.stabilitynumber0.5Voice stability and randomness (0.0-1.0). Lower values introduce broader emotional range, higher values can result in monotonous voice
voice_settings.similarity_boostnumber0.75How closely AI adheres to original voice (0.0-1.0)
voice_settings.stylenumber0.0Style exaggeration of the voice (0.0-1.0). Amplifies original speaker's style but increases latency
voice_settings.use_speaker_boostbooleanfalseBoosts similarity to original speaker. Increases computational load and latency
voice_settings.speednumber1.0Playback speed. 1.0 = normal, <1.0 = slower, >1.0 = faster

Available TTS Models​

ModelDescriptionLanguagesLatency
eleven_v3Human-like and expressive speech generation70+ languagesStandard
eleven_flash_v2_5Ultra-fast model optimized for real-time useAll multilingual_v2 languages + hu, no, vi~75ms
eleven_flash_v2Ultra-fast model (English only)en~75ms
eleven_turbo_v2_5High quality, low-latency with good balanceSame as flash_v2_5~250-300ms
eleven_turbo_v2High quality, low-latency (English only)en~250-300ms
eleven_multilingual_v2Most lifelike with rich emotional expressionen, ja, zh, de, hi, fr, ko, pt, it, es, id, nl, tr, fil, pl, sv, bg, ro, ar, cs, el, fi, hr, ms, sk, da, ta, uk, ruStandard

4 - Interactive Audio App​

Here's a complete example that combines voice activation with audio responses:

src/index.ts
import { AppServer, AppSession } from "@mentra/sdk";

class InteractiveAudioApp extends AppServer {
protected async onSession(
session: AppSession,
sessionId: string,
userId: string,
): Promise<void> {
session.logger.info(`🎀 Interactive audio app started`);

// Welcome message
await session.audio.speak("Welcome to the interactive audio demo. Say 'play music' or 'tell me a joke'.");

// Listen for voice commands
const unsubscribe = session.events.onTranscription(async (data) => {
if (data.text.toLowerCase().includes("stop")) {
await session.audio.stopAudio();
return;
}

if (!data.isFinal) return;

const command = data.text.toLowerCase().trim();
session.logger.info(`Heard: "${command}"`);

if (command.includes("play music")) {
await this.playMusic(session);
} else if (command.includes("tell me a joke") || command.includes("joke")) {
await this.tellJoke(session);
} else if (command.includes("hello")) {
await session.audio.speak("Hello there! How can I help you today?");
}
});

// Clean up listener when session ends
this.addCleanupHandler(unsubscribe);
}

private async playMusic(session: AppSession): Promise<void> {
session.layouts.showTextWall("🎡 Playing music...");

try {
const result = await session.audio.playAudio({
audioUrl: "https://example.com/background-music.mp3",
volume: 0.6
});

if (result.success) {
await session.audio.speak("Hope you enjoyed the music!");
} else {
await session.audio.speak("Sorry, I couldn't play the music right now.");
}
} catch (error) {
session.logger.error(`Music playback error: ${error}`);
await session.audio.speak("There was an error playing music.");
}
}

private async tellJoke(session: AppSession): Promise<void> {
const jokes = [
"What do you call a pair of glasses that can see the future? ... Pre-scription glasses!",
"Why did the augmented reality app break up with the virtual reality app? ... It said the relationship wasn't real enough!",
"Why did the phone pair to the smart glasses? ... Because it lost its contacts!",
];

const joke = jokes[Math.floor(Math.random() * jokes.length)];
session.layouts.showTextWall("Telling a joke...");

await session.audio.speak(joke, {
voice_settings: {
style: 0.8, // More expressive for jokes
speed: 0.9 // Slightly slower for comedy timing
}
});
}
}

// Bootstrap the server
new InteractiveAudioApp({
packageName: process.env.PACKAGE_NAME ?? "com.example.interactiveaudio",
apiKey: process.env.MENTRAOS_API_KEY!,
port: Number(process.env.PORT ?? "3000"),
}).start();

5 - Audio Management​

Stopping Audio​

// Stop all currently playing audio
session.audio.stopAudio();

Error Handling​

const result = await session.audio.playAudio({
audioUrl: "https://example.com/sound.mp3"
});

if (!result.success) {
// This could be because the audio was interrupted by session.audio.stopAudio(), or there could be an error

if (result.error) {
// Handle specific audio errors
session.logger.error(`Audio error: ${result.error}`);

// Provide user feedback
await session.audio.speak("Sorry, I couldn't play that audio file.");
}
}

Next Steps​

  • See the detailed Audio Manager documentation
  • Explore Device Capabilities to adapt audio features based on hardware
  • Learn about Events to create voice-activated audio experiences
  • Review Permissions for any audio-related permissions your app might need