Inbox AI
  • Introduction
  • Quick Start
  • Hotkeys / Triggers
  • AI providers
  • Voice Commands
  • Screenshots
  • Watch Directories
  • Email Processing
  • Handle Incoming Calls
  • Actions
    • Custom Actions
      • Ask AI
      • Make API Request
      • Terminal Command
      • Multiple Choice
      • Group
    • Built-in Actions
      • Mail Actions
        • Mark as Read
        • Mark as Unread
        • Set Flag
        • Set Background Color
        • Move to Junk
        • Move to Archive
        • Move to Trash
      • Flow Control
        • Return Result to Previous AI
      • System Actions
        • Copy to Clipboard
        • Get Selected Text
        • Take Screenshot
        • Paste at Cursor
        • Speak Text
        • Send Notification
        • Add Apple Reminder
      • Do Nothing
    • Folder
    • Next Action Field
    • Action Variables
    • Export
  • Templating
  • Transcription
  • Logging
  • Limits
  • Troubleshooting
  • Advanced Settings
  • Integrations
    • AppleScript
    • Tana
    • Apple Notes
    • Reflect
    • Apple Shortcuts
    • Raycast
    • X-Callback-URL
Powered by GitBook
On this page

Voice Commands

Learn how to use your voice as triggers. Build custom commands to perform actions, or capture information.

PreviousAI providersNextScreenshots

Last updated 7 months ago

By default, all voice transcriptions happens on device using a local Whisper model. It should work across multiple languages, you can increase quality by changing the underlying model or setting common used words (see Transcription).

In the Advanced Settings, you can see the configuration for voice commands. Here you can configure the model and whether to send audio data directly to the ai model (this is only supported by OpenAI's gpt-4o-audio-previewand should be considered beta). For more information on these settings see Transcription.

Once you finish the Quick Start, voice commands should work right away. Try holding the hotkey and say: "Write an email to my co-worker about the benefits of coffee" and see what happens.

By default, the "Process Audio" command is used to handle your command. Let's dive into this command and see how it works. Go to "Actions", and select "Process Audio".

Process audio is an Ask AI action, that uses the default AI provider and has a lot of actions it can choose to perform. Each action usually reflects a specific command. By default it has these actions:

  • Ask ChatGPT / Claude / Perplexity

  • Draft Email

  • Add Apple Reminder

  • Open Browser

  • Get Selected Text

  • Paste at Cursor

  • Send Notification

  • Open Application

  • Take Screenshot

  • Do Nothing

You can add or customize these actions as you please. Below the actions, you see the prompt which controls the AIs main behavior, this too can be fully customized.

In the prompt, you can specify the behavior you are looking for. The {{ value }} placeholders in the prompt will be replaced by variables before sending them to the AI. In this case the {{ originalInput }} is your spoken command, and your name and some time information is also inserted. For more information on templating, see the Templating page.

Audio settings
The process audio action