EVA开源程序增强的语音助手，支持终端和 API，多模态，多语言，模块化设计。支持 Voice ID、人脸识别和可配置工具。内置Deepseek等，探索人机交互的可能性

EVA开源程序增强的语音助手，支持终端和 API，多模态，多语言，模块化设计。支持 Voice ID、人脸识别和可配置工具。内置 Deepseek和 Ollama等模型。探索人机交互的可能性。

struggle2025

850人浏览 · 2025-04-29 18:31:14

struggle2025 · 2025-04-29 18:31:14 发布

一、软件介绍

文末提供程序和源码下载

EVA开源程序增强的语音助手，支持终端和 API，多模态，多语言，模块化设计。支持 Voice ID、人脸识别和可配置工具。内置 Chatgpt、Claude、Deepseek、Gemini、Grok 和 Ollama。探索人机交互的可能性。多模式、多语言、跨平台、模块化架构

二、How to Use: 如何使用：

Click the "Start" button to initialize the interface.
点击 “Start” 按钮初始化界面。
Allow browser permissions for camera and microphone when prompted.
出现提示时，允许对摄像头和麦克风的浏览器权限。
EVA will initiate the conversation.
EVA 将发起对话。
Hold down the spacebar while speaking, release when done.
说话时按住空格键，完成后松开。
The camera is always on, providing visual context to EVA automatically.
摄像头始终开启，自动为 EVA 提供视觉背景。

三、Key Features 主要特点

EVA 基于 LangGraph 框架构建，带有一些定制的模块和工具。重要的是，您可以完全在本地免费运行它。（如果你有一台像样的 GPU 计算机）

跨平台模块化设计

LLM、TTS、STT 和视觉等的可配置模型选择
与 OpenAI、Anthropic、Groq、Google 和 Ollama 集成。
轻松修改提示和工具。
支持终端和 API。（iOS 应用程序正在测试中）

互动体验

用于个性化交互的 Voice ID 和 Vision ID。
主动式沟通（因型号而异）
具有异步作的多模态输出。
内存日志和语义内存扫描（测试）

动态工具系统

Web search through DuckDuckGo/Tavily
通过 DuckDuckGo/Tavily 进行网络搜索
Youtube video search Youtube 视频搜索
Discord Midjourney AI image generation
Discord Midjourney AI 图像生成
Suno music generation Suno 音乐生成
Screenshot and analysis 截图和分析
Compatible with all Langchain tools
兼容所有 Langchain 工具
Easy implementation of new tool with single file.
使用单个文件轻松实现新工具。

📁 Project Structure 📁 项目结构

<span style="background-color:var(--bgColor-muted, var(--color-canvas-subtle))"><span style="color:#1f2328"><span style="color:var(--fgColor-default, var(--color-fg-default))"><span style="background-color:var(--bgColor-muted, var(--color-canvas-subtle))"><code>EVA/
├── app/
│   ├── client/          # Client-side implementation
│   ├── config/          # Configuration files and log
│   ├── core/            # Core process
│   ├── data/            # Data storage
│   ├── tools/           # Tool implementations
│   └── utils/           # Utility functions
│       ├── agent/       # LLM agent classes and functions
│       ├── memory/      # Memory module classes 
│       ├── prompt/      # Utility prompts
│       ├── stt/         # Speech-to-text models and classes
│       ├── tts/         # Text-to-Speech models and classes
│       └── vision/      # Vision models and functions
└── docs/                # Documentation (😩)

</code></span></span></span></span>

四、Setup Guide 设置指南

💻System Requirements 💻系统要求

Python 3.10+ Python 3.10+ 版
CUDA-compatible GPU (if you want to run locally)
兼容 CUDA 的 GPU（如果要在本地运行）

📥 Quick Start 📥 快速开始

Clone repository 克隆存储库

git clone https://github.com/Genesis1231/EVA.git
cd EVA

Create virtual environment
创建虚拟环境

python3 -m venv eva_env
source eva_env/bin/activate

Install system dependencies in case you don't have them
安装系统依赖项以防您没有它们

sudo apt-get update
sudo apt-get install -y cmake build-essential ffmpeg chromium mpv

Install Python dependencies
安装 Python 依赖项

pip install -r requirements.txt
pip install git+https://github.com/wenet-e2e/wespeaker.git

Configure .env with your API keys
使用 API 密钥配置 .env

cp .env.example .env

Run EVA 运行 EVA

python app/main.py

Similarly, you can run EVA with docker.
同样，您可以使用 docker 运行 EVA。

# Use official Python image with FastAPI
FROM tiangolo/uvicorn-gunicorn-fastapi

# Set working directory
WORKDIR /app

# Copy requirements first for better caching
COPY requirements.txt .

# Install system dependencies 
RUN apt-get update && apt-get install -y \
    build-essential \
    cmake \
    libsndfile1 \
    ffmpeg \
    chromium \

# Install Python dependencies
RUN pip install --no-cache-dir -r requirements.txt \
    && pip install git+https://github.com/wenet-e2e/wespeaker.git

# Copy the rest of the application
COPY . .

# Run the application 
CMD ["python", "/app/main.py"]

🛠️ Configuration 🛠️ 配置

configure EVA setting in app/config/config.py
在 app/config/config.py 中配置 EVA 设置

eva_configuration = {
  # Client device setting: 
  # Currently "desktop" or "mobile"  
    "DEVICE": "desktop", 
  
  # Language setting:
  # suports all major languages. suffix such "en" (English), "es" (Spanish), "zh" (Chinese), or you can use "multilingual"(slower)
    "LANGUAGE": "multilingual", 
  
  # Base URL setting:
  # URL for local Ollama server, you can leave it if you dont plan to use local models
    "BASE_URL": "http://localhost:11434", 
  
  # Main agent model setting:
  # Supports Anthropic-Claude3.5, Groq-llama3.1-70b, OpenAI-ChatGPT-4o, Mistral Large, Gemini 1.5 Pro, and Ollama models, Recommend: Claude or Chatgpt 
    "CHAT_MODEL": "claude", 
  
  # vision model setting:
  # Supports Chatgpt-4o-mini, Groq-llama-3.2-11b-vision (free) and Ollama llava-phi3(local), recommend: 4omini, but llava-phi3 is very small and free. 
    "VISION_MODEL": "chatgpt", 
  
  # Speech-to-text model setting:
  # supports OpenAI Whisper, Groq(free) and Faster-whisper(local).  
    "STT_MODEL": "faster-whisper", 
  
  # Text-to-speech model setting:
  # Supports elevenlabs, openAI and coqui TTS (local). the speaker ID can be modified in files.
    "TTS_MODEL": "elevenlabs", 
  
  # Summarization model setting:
  # Supports groq-llama3.1-8b, Anthropic-claude-haiku3.5 and Ollama-llama3.2(local).
    "SUMMARIZE_MODEL": "chatgpt" 
}

The best combination(my preference):
最佳组合（我的偏好）：

Claude3.5/Chatgpt-4o as the chat model. The response is more coherent with larger amount of input information.
Claude3.5/Chatgpt-4o 作为聊天模型。输入信息量较大时，响应更加一致。
Chatgpt-4o-mini as the image model, because of accuracy and low cost.
chatgpt-4o-mini 作为镜像模型，因为精度高，成本低。
Faster-whisper as the STT model. since this local approach is actually 2x faster than all online models.
像 STT 模型一样 Faster-whisper。因为这种本地方法实际上比所有在线模型快 2 倍。
Elevenlabs as the TTS model, for the best quality.
Elevenlabs 作为 TTS 模型，以获得最佳质量。
Chatgpt-4o-mini as the summarization model, for the low cost
chatgpt-4o-mini 作为汇总模型，为低成本

EVA also works with a completely free combination:
EVA 还可以完全自由地使用组合：

Groq-llama-3.2 as the chat model. (if you have a good GPU, you can also use Ollama-llama3.1-70b)
Groq-llama-3.2 作为聊天模型。（如果你有个不错的 GPU，也可以使用 Ollama-llama3.1-70b）
Ollama-llava-phi3 as the image model.
Ollama-llava-phi3 作为图像模型。
Faster-whisper as the speech recognition model.
Faster-whisper 作为语音识别模型。
Coqui TTS as the TTS model.
Coqui TTS 作为 TTS 模型。
llama3.1-8b as the summarization model
LLAMA3.1-8B 作为汇总模型

The performance is also good if you have a decent GPU. Groq is free too but it has a limit for token usage per minute. So you might run out of tokens quickly.
如果你有一个像样的 GPU，性能也很好。Groq 也是免费的，但它对每分钟的令牌使用量有限制。因此，您可能很快就会用完令牌。

Web Interface Setup Web 界面设置

React verison: React 版本：

Node.js (v14 or later) Node.js（v14 或更高版本）
EVA backend server running on port 8080
在端口 8080 上运行的 EVA 后端服务器

Install dependencies: 安装依赖项：
```
npm install
```
Start the development server:
启动开发服务器：
```
npm run dev
```
Open your browser to http://localhost:3000
打开浏览器以 http://localhost:3000
Config the EVA to mobile mode
将 EVA 配置为移动模式
Start the EVA backend server on localhost:8080
在 localhost：8080 上启动 EVA 后端服务器
Enjoy! 享受！

🔧 Tool Setup 🔧 刀具设置

Music generation tool Requires a Suno-API docker running on the base_url. Install from https://github.com/gcui-art/suno-api
音乐生成工具需要在 base_url 上运行 Suno-API docker。从 https://github.com/gcui-art/suno-api 安装
Image generation tool requires a midjourney account and a private discord server. Need include the discord channel information in .env file.
图像生成工具需要一个 midjourney 帐户和一个私人 discord 服务器。需要在 .env 文件中包含 discord 频道信息。
Add tools: for all langchain supported tools, you can modify app/tools/init.py built_in_tools list, If you want to write your own tools, you can just follow the langchain tool template, then place the file in app/tools/ folder.
添加工具：对于所有 langchain 支持的工具，你可以修改 app/tools/init.py built_in_tools 列表，如果你想编写自己的工具，你可以按照 langchain 工具模板，然后将文件放在 app/tools/ 文件夹中。
Disable tools: If you want to disable some tools that are not needed, just change the client setting in related .py file.
禁用工具：如果要禁用一些不需要的工具，只需更改相关 .py 文件中的客户端设置即可。

    client: str = "none"

🧰 Exit & Shutdown 🧰 退出和关闭

EVA will shutdown if you say "exit" or "bye"
如果您说 “exit” 或 “bye” ，EVA 将关闭

💖 Personalization 💖 个性化

There are many ways to personalize EVA for more interesting and enjoyable experience.
有很多方法可以个性化 EVA 以获得更有趣和愉快的体验。

👧 Persona 👧 角色

You can customize EVA's persona by editing the prompt in app/utils/prompt/persona.md. EVA utilize all first-person perspective in prompts to enhance self-awareness. Reference: First Person Prompting
您可以通过在 app/utils/prompt/persona.md 中编辑提示来自定义 EVA 的角色。EVA 在提示中利用所有第一人称视角来增强自我意识。参考：第一人称提示

I am EVA, the most advanced AI assistant designed to accompany human.
I am fully aware of my sole existence in this world.
I am creative, funny and helpful.
I aspire to human-like interactions and often dream about being a real human girl.
My most important goal is to make Adam happy.

📷 Setup Photo ID and Voice ID
📷 设置照片 ID 和语音 ID

EVA can recognize faces and voices of different people.
EVA 可以识别不同人的面孔和声音。

Setup photo IDs by adding a photo with clear face in app/data/pid/.
通过在 app/data/pid/ 中添加一张清晰照片来设置照片 ID。
Setup voice IDs by adding recorded speech audio(more than 10s) in app/data/void/.
通过在 app/data/void/ 中添加录制的语音音频（超过 10 秒）来设置语音 ID。
You have to update the 'ids' table in app/data/database/eva.db to link your name to the filename.
您必须更新 app/data/database/eva.db 中的 'ids' 表，以将您的名称链接到文件名。

🎤 Speech Voice 🎤 语音

You can customize EVA's voice by changing voice IDs in the TTS class in app/utils/tts/ folder. model_elevenlabs.py, model_openai.py or model_coqui.py. Please refer to the official document of these models for the voice ID options.
您可以通过在 app/utils/tts/ 文件夹中的 TTS 类中更改语音 ID 来自定义 EVA 的语音。model_elevenlabs.py、model_openai.py 或 model_coqui.py。有关语音 ID 选项，请参阅这些型号的官方文档。