1.0.1 Release
· 6 min read
This release marks a significant milestone for Closed-LLM-Vtuber, featuring a complete rewrite of the backend and frontend with over 240+ new commits, along with numerous enhancements and new features. If you were using a version before this, version v1.0.0 is basically a new app.
⚠️ Direct upgrades from older versions are impossible due to architectural changes. Please refer to our new documentation site for installation.
(v1.0.0 had a bug after the release, so let's just ignore that and have the v1.0.1)
✨ Highlights
- Vision Capability: Video chat with the AI.
- Desktop Pet Mode: A new Desktop Pet Mode lets you have your Vtuber companion directly on your desktop.
- Brand New Frontend: A completely redesigned frontend built with React, ChakuraUI, and Vite offers a modern user experience. Available as web and desktop apps, located in the Closed-LLM-Vtuber-Web repository.
- Chat History Management: Implemented a system to store and retrieve conversation history, enabling persistent interactions with your AI.
- New LLM support: Many new (stateless) LLM providers are now supported (and refactored), including Ollama, OpenAI, Gemini, Claude, Mistral, DeepSeek, Zhipu, and llama.cpp.
- DeepSeek R1 Reasoning model support: The reasoning chain will be displayed but not spoken. See your waifu's inner thoughts!
- Major Backend Rewrite: The core of Closed-LLM-Vtuber has been rebuilt from the ground up, focusing on asynchronous operations, improved memory management, and a more modular architecture.
- Refactored Configuration: The
conf.yamlfile was restructured, andconfig_altshas been renamed tocharacters. - TTS Preprocessor: Text inside
asterisks,brackets,parentheses, andangle bracketswill no longer be spoken by the TTS. - Dependency management: Switched to
uvfor dependency management, removed unused dependencies such asrich,playsound3, andsounddevice. - Documentation Site: A comprehensive documentation site is now live at https://closed-llm-vtuber.github.io/.
📋 Detailed Changes
🧮 Backend
- Architecture:
- The project structure has been reorganized to use the
src/directory. - The backend is now fully asynchronous, improving responsiveness.
- CLI mode (
main.py) has been removed. - The "exit word" has been removed.
- Models are initialized and managed using
ServiceContext, offering better memory management, particularly when switching characters. - Refactored LLMs into
agentandstateless_llm, supporting a wider range of LLMs with a new agent interface:basic_memory_agentandhume_ai_agent.
- The project structure has been reorganized to use the
- LLM (Language Model) Enhancements:
- New (and old but refactored) providers: Ollama, OpenAI (and any OpenAI Compatible API), Gemini, Claude, Mistral, DeepSeek, Zhipu, llama.cpp.
temperatureparameter added.- No more tokens will be generated after interruption, improving the responsiveness of voice interruption.
- Ollama models are preloaded at startup, kept in memory for the server's duration, and unloaded at exit.
- Added a
hf_mirrorflag to specify whether to use the Hugging Face mirror source.
- TTS (Text-to-Speech) Enhancements:
- TTS now generates multiple audio segments concurrently and sends them sequentially, reducing latency.
- New interruption logic for smoother transitions.
- Added filters (
asterisks,brackets,parentheses) to prevent unwanted text from being spoken. - Implemented
faster_first_responsefeature to prioritize the synthesis and playback of the first sentence fragment, minimizing latency.
- ASR (Automatic Speech Recognition) Enhancements:
- Made Sherpa-onnx ASR with the SenseVoiceSmall int8 model the default for both English and Chinese presets, with automatic model download.
- Added a
provideroption for sherpa-onnx-asr.
- Other Improvements:
- Chat log persistence is used to maintain conversation history.
- All
printstatements are replaced withlogurufor structured logging. - Added a Chinese configuration preset:
conf.CN.yaml. - Basic AI proactive speaking (experimental).
- Added some checks in the CI/CD process
- Added input/output type system to agents
- Added Tencent Translate in https://github.com/Closed-LLM-Vtuber/Closed-LLM-Vtuber/pull/107
🖥️ Frontend
- New frontend built with Electron, React, ChakuraUI, and Vite.
- Multi-Mode in Single Codebase:
- Web Mode: Browser interface
- Window Mode: Desktop window
- Pet Mode: Transparent desktop companion
- Seamless context sharing between Window and Pet modes, allowing for the preservation of settings, history, connections, and model states.
- Enhanced UI Features
- Responsive layout with collapsible sidebar and footer
- Customizable Live2D model interactions: Mouse tracking for eye movement, Click-triggered animations, Drag & resize capabilities.
- Persistent local storage for user preference settings, including background, VAD configuration, Live2D size and interactions, and agent behavior.
- Supports viewing, loading, and deleting conversation history with streaming subtitles.
- (Electron pet-mode) A transparent, always-on-top desktop companion with click-through, non-interactive areas featuring draggable and hideable Live2D and UI, right-click menu controls.
- Camera and screen capturing panel
- Switch characters easily
📖 Documentation
- Rewritten README file.
- New comprehensive documentation with a dedicated website.

