Voice Commands

Learn how to use your voice as triggers. Build custom commands to perform actions, or capture information.

By default, all voice transcriptions happens on device using a local Whisper model. It should work across multiple languages, you can increase quality by changing the underlying model or setting common used words (see Transcription).

In the Advanced Settings, you can see the configuration for voice commands. Here you can configure the model and whether to send audio data directly to the ai model (this is only supported by OpenAI's gpt-4o-audio-previewand should be considered beta). For more information on these settings see Transcription.

Once you finish the Quick Start, voice commands should work right away. Try holding the hotkey and say: "Write an email to my co-worker about the benefits of coffee" and see what happens.

By default, the "Process Audio" command is used to handle your command. Let's dive into this command and see how it works. Go to "Actions", and select "Process Audio".

Process audio is an Ask AI action, that uses the default AI provider and has a lot of actions it can choose to perform. Each action usually reflects a specific command. By default it has these actions:

Ask ChatGPT / Claude / Perplexity
Draft Email
Add Apple Reminder
Open Browser
Get Selected Text
Paste at Cursor
Send Notification
Open Application
Take Screenshot
Do Nothing

You can add or customize these actions as you please. Below the actions, you see the prompt which controls the AIs main behavior, this too can be fully customized.

In the prompt, you can specify the behavior you are looking for. The {{ value }} placeholders in the prompt will be replaced by variables before sending them to the AI. In this case the {{ originalInput }} is your spoken command, and your name and some time information is also inserted. For more information on templating, see the Templating page.

PreviousAI providers NextScreenshots

Last updated 10 months ago