Skip to content

Screen

Screen

Your screen is an input. Press the Vision hotkey, dim the display, drag a region (or capture a full window or monitor), and the image becomes raw material for an AI that can describe it, extract its text, analyze its structure, or append it to a note.

Screen capture overlay with the dashboard and Control Panel visible

TIP
Vision pairs naturally with voice. After capturing, you can record a spoken question (*"What does this error mean?"*, *"Summarize the data in this table"*) before the AI processes the image.

How to capture the screen

Press the Vision hotkey from any app:

Ctrl+Alt+S (default, rebindable in Settings → Hotkeys)

The screen dims and a transparent snipping overlay appears with a hint bar:

Snipping overlay hint bar

GestureWhat it captures
Click on a windowThat specific application window
Click and dragA custom rectangular region
Shift + dragA freeform shape
Press FThe full active monitor
Press AAll monitors as one wide image
Press EscCancels and returns to your work

After the selection the overlay closes and the Vision Action panel opens.

The Vision Action panel

The action panel lets you (optionally) type or record a question, then pick what the AI should do with the screenshot:

Vision Action panel with Save / OCR / Edit / Clip / Chat / Note buttons

Vision source and capture-mode pickers

Vision source picker — Image / Video / Color

Capture mode picker — Region / Full Screen

Outputs that consume the screen

ActionWhat it doesDestination
OCRExtracts every character from the screenshotClipboard / cursor
Describe (Clip / Chat)AI describes what it sees in natural languageToast / Quick Chat
SaveWrites the screenshot to diskConfigured save folder
NoteAppends the image + your spoken description to your notes fileNote
ChatAttaches the image to a Quick Chat conversationQuick Chat

Color picker and video capture

The Vision hotkey family also includes two specialized tools:

  • Color Picker — a pixel magnifier cursor that samples colors from your screen, with a swatches tray and keyboard shortcuts.
  • Video Recording Bar — a small floating timer/bar for capturing short screen recordings.

Color picker — three colors sampled

Color picker in detail

Single-pixel magnifier tooltips show the hex and RGB values live as you move:

Color picker magnifier — white pixel

Color picker magnifier — orange pixel

Swatches accumulate as you click:

Color picker swatches — 5 colors picked

Local vs. cloud vision

Vision runs on a multimodal AI model:

  • Cloud — Gemini Flash (wallet or BYOK), OpenAI GPT-4o with BYOK
  • Local — Ollama with minicpm-v or moondream (OCR only fully supported on minicpm-v)

Configure in Settings → AI Engine → Vision.

NOTE
Local Vision models are smaller and quantized — OCR accuracy and long-context analysis are noticeably stronger on the cloud path.