How does dIKta.me work offline?

dIKta.me runs Whisper V3 Turbo and local LLMs (Gemma 3, Llama 3) directly on your GPU. No audio or text data ever leaves your machine. It is 100% air-gapped by default.

What operating systems does dIKta.me support?

dIKta.me is available for Windows 10+ (x64). macOS and Linux support are on the roadmap.

How much does dIKta.me cost?

Free trial with cloud credits included. Full Version: $20 one-time purchase for unlimited local dictation, all voice modes, and lifetime updates. No subscription required.

What languages does dIKta.me support for speech recognition?

Whisper V3 Turbo supports 90+ languages with automatic language detection. Bidirectional English-Spanish translation is built-in.

Do I need an NVIDIA GPU to use dIKta.me?

An NVIDIA GPU is recommended for the fastest local STT and LLM processing. However, dIKta.me also works on CPU (slower) and offers a cloud mode with wallet credits for users without a powerful GPU.

THE JOURNEY

Built to Evolve.

From a Python + Electron prototype to a native Windows engine — and a roadmap that goes much further. Here's the full story.

The Architecture Leap

V1 proved the concept. V2 is the real engine. Same vision, completely rebuilt.

Memory Footprint

~300 MB~60 MB

Startup Time

10–12 s< 3 s

Installer Size

~200 MB~70 MB

Test Coverage

~50 tests1,014 tests

Technical comparison: dIKta.me V1 vs V2
Metric	V1 (Prototype)	V2 (Native Engine)
Architecture	Python + Electron + ZeroMQ (3 processes)	C# + WinUI 3 (single process)
Memory	~300 MB	~50–80 MB
Startup	10–12 s (model warmup)	< 3 s (cloud mode)
Installer	~200 MB	~70 MB (self-contained)
Audio Stack	pyaudio + pycaw wrappers	Native NAudio + WASAPI
STT Options	Whisper only (local)	Whisper + Deepgram streaming + Gemini Audio
LLM Options	Ollama only	Ollama, Gemini, Anthropic, OpenAI, OpenRouter + more
Text-to-Speech	None	KokoroSharp (local) + Deepgram / OpenAI / Gemini (cloud)
Secret Storage	Electron safeStorage	DPAPI (OS-level, AES-256)
Test Suite	~50 pytest tests	1,014 xUnit tests (enterprise-grade)

What's New in V2

The rewrite wasn't just a port — it shipped an entirely new feature set.

💬

Overlay

Quick Chat

Floating AI chat window activated by hotkey. Text or voice input, Markdown output.

🔊

Voice Output

Text-to-Speech

dIKta.me speaks back — 5 voice engines including fully local Kokoro ONNX.

🎙️

Productivity

Voice Macros

Say a trigger phrase, get a full text block injected. Signatures, templates, addresses.

🔇

Audio

Audio Ducking

Automatically suppresses system volume from other apps during active recording.

🧙

Setup

First-Run Wizard

Guided STT / LLM / TTS stack configuration so anyone can be up in minutes.

👁️

Vision

Vision Core

Capture any screen region. Describe, extract text, read tables, or ask questions about what you see.

🔑

Account

Account & Wallet

OAuth login + managed cloud credits. Pay as you go, no subscriptions.

🧪

Quality

1,014 Tests

Enterprise-grade test coverage from day one. Build with confidence.

The Plugin Roadmap

What's Next

V2.1+ is a modular leap. Each phase is a hot-pluggable plugin that ships independently.

🔌

Phase 2 · Spec 15In Progress

Connectors

Route your voice directly into the tools you already use. No copy-paste, no context switching.

›Obsidian integration — dictate directly into your vault, tagged and linked.
›Webhooks, Discord, and Streamer.bot support for live broadcasting workflows.
›Hot-pluggable: enable or disable each connector without restarting the app.

🎙️

Phase 3 · Spec 15Planned

Meetings & Scribe

A dedicated workspace that turns your meetings into structured, searchable artifacts.

›One-click session recording with automatic speaker diarization.
›AI-generated summaries, action items, and decisions — locally, privately.
›Screenshot capture mid-meeting; attach context snapshots to the transcript.

🧠

Phase 4 · Spec 15Planned

Memory Layer

dIKta.me learns what matters to you. Cross-session semantic recall with zero cloud dependency.

›SQLite + vector search: store facts, preferences, and recurring context locally.
›Pipeline hooks surface relevant memories automatically before each LLM call.
›Full user control: review, edit, or wipe stored memories at any time.

✍️

Phase 5 · Spec 16Planned

Advanced Refine

System-wide grammar and style checking powered by your existing LLM — no expensive subscription tools needed.

›Hotkey-triggered inline diff popup with per-word accept / reject.
›Passive clipboard monitoring — catch errors in text you copy anywhere.
›Works in 100% of Windows apps (clipboard-based, no accessibility hacks).

🤖

Phase 6 · Spec 17Ideation

Chaviz — Voice Orchestrator

A Jarvis-like conversational agent for dIKta.me. Bilingual, push-to-talk, tool-calling. Your system-aware AI companion.

›Push-to-talk, multi-turn voice conversations with session context.
›Native tool-calling: query stats, switch modes, recall memory, trigger connectors.
›Persona-configurable — concise or detailed, English or Spanish.

The roadmap evolves. Follow @diktameapp on X (x.com/diktameapp) or star the repo to stay updated.