Built to Evolve.
From a Python + Electron prototype to a native Windows engine — and a roadmap that goes much further. Here's the full story.
The Architecture Leap
V1 proved the concept. V2 is the real engine. Same vision, completely rebuilt.
| Metric | V1 (Prototype) | V2 (Native Engine) |
|---|---|---|
| Architecture | Python + Electron + ZeroMQ (3 processes) | C# + WinUI 3 (single process) |
| Memory | ~300 MB | ~50–80 MB |
| Startup | 10–12 s (model warmup) | < 3 s (cloud mode) |
| Installer | ~200 MB | ~70 MB (self-contained) |
| Audio Stack | pyaudio + pycaw wrappers | Native NAudio + WASAPI |
| STT Options | Whisper only (local) | Whisper + Deepgram streaming + Gemini Audio |
| LLM Options | Ollama only | Ollama, Gemini, Anthropic, OpenAI, OpenRouter + more |
| Text-to-Speech | None | KokoroSharp (local) + Deepgram / OpenAI / Gemini (cloud) |
| Secret Storage | Electron safeStorage | DPAPI (OS-level, AES-256) |
| Test Suite | ~50 pytest tests | 1,014 xUnit tests (enterprise-grade) |
What's New in V2
The rewrite wasn't just a port — it shipped an entirely new feature set.
Quick Chat
Floating AI chat window activated by hotkey. Text or voice input, Markdown output.
Text-to-Speech
dIKta.me speaks back — 5 voice engines including fully local Kokoro ONNX.
Voice Macros
Say a trigger phrase, get a full text block injected. Signatures, templates, addresses.
Audio Ducking
Automatically suppresses system volume from other apps during active recording.
First-Run Wizard
Guided STT / LLM / TTS stack configuration so anyone can be up in minutes.
Vision Core
Capture any screen region. Describe, extract text, read tables, or ask questions about what you see.
Account & Wallet
OAuth login + managed cloud credits. Pay as you go, no subscriptions.
1,014 Tests
Enterprise-grade test coverage from day one. Build with confidence.
What's Next
V2.1+ is a modular leap. Each phase is a hot-pluggable plugin that ships independently.
Connectors
Route your voice directly into the tools you already use. No copy-paste, no context switching.
- ›Obsidian integration — dictate directly into your vault, tagged and linked.
- ›Webhooks, Discord, and Streamer.bot support for live broadcasting workflows.
- ›Hot-pluggable: enable or disable each connector without restarting the app.
Meetings & Scribe
A dedicated workspace that turns your meetings into structured, searchable artifacts.
- ›One-click session recording with automatic speaker diarization.
- ›AI-generated summaries, action items, and decisions — locally, privately.
- ›Screenshot capture mid-meeting; attach context snapshots to the transcript.
Memory Layer
dIKta.me learns what matters to you. Cross-session semantic recall with zero cloud dependency.
- ›SQLite + vector search: store facts, preferences, and recurring context locally.
- ›Pipeline hooks surface relevant memories automatically before each LLM call.
- ›Full user control: review, edit, or wipe stored memories at any time.
Advanced Refine
System-wide grammar and style checking powered by your existing LLM — no expensive subscription tools needed.
- ›Hotkey-triggered inline diff popup with per-word accept / reject.
- ›Passive clipboard monitoring — catch errors in text you copy anywhere.
- ›Works in 100% of Windows apps (clipboard-based, no accessibility hacks).
Chaviz — Voice Orchestrator
A Jarvis-like conversational agent for dIKta.me. Bilingual, push-to-talk, tool-calling. Your system-aware AI companion.
- ›Push-to-talk, multi-turn voice conversations with session context.
- ›Native tool-calling: query stats, switch modes, recall memory, trigger connectors.
- ›Persona-configurable — concise or detailed, English or Spanish.
The roadmap evolves. Follow @diktameapp on X (x.com/diktameapp) or star the repo to stay updated.