EVA开源程序增强的语音助手,支持终端和 API,多模态,多语言,模块化设计。支持 Voice ID、人脸识别和可配置工具。内置Deepseek等,探索人机交互的可能性
EVA开源程序增强的语音助手,支持终端和 API,多模态,多语言,模块化设计。支持 Voice ID、人脸识别和可配置工具。内置 Deepseek和 Ollama等模型。探索人机交互的可能性。
一、软件介绍
文末提供程序和源码下载
EVA开源程序增强的语音助手,支持终端和 API,多模态,多语言,模块化设计。支持 Voice ID、人脸识别和可配置工具。内置 Chatgpt、Claude、Deepseek、Gemini、Grok 和 Ollama。探索人机交互的可能性。多模式、多语言、跨平台、模块化架构
二、How to Use: 如何使用:
- Click the "Start" button to initialize the interface.
点击 “Start” 按钮初始化界面。 - Allow browser permissions for camera and microphone when prompted.
出现提示时,允许对摄像头和麦克风的浏览器权限。 - EVA will initiate the conversation.
EVA 将发起对话。 - Hold down the spacebar while speaking, release when done.
说话时按住空格键,完成后松开。 - The camera is always on, providing visual context to EVA automatically.
摄像头始终开启,自动为 EVA 提供视觉背景。
三、Key Features 主要特点
EVA 基于 LangGraph 框架构建,带有一些定制的模块和工具。重要的是,您可以完全在本地免费运行它。(如果你有一台像样的 GPU 计算机)
跨平台模块化设计
- LLM、TTS、STT 和视觉等的可配置模型选择
- 与 OpenAI、Anthropic、Groq、Google 和 Ollama 集成。
- 轻松修改提示和工具。
- 支持终端和 API。(iOS 应用程序正在测试中)
互动体验
- 用于个性化交互的 Voice ID 和 Vision ID。
- 主动式沟通(因型号而异)
- 具有异步作的多模态输出。
- 内存日志和语义内存扫描(测试)
动态工具系统
- Web search through DuckDuckGo/Tavily
通过 DuckDuckGo/Tavily 进行网络搜索 - Youtube video search Youtube 视频搜索
- Discord Midjourney AI image generation
Discord Midjourney AI 图像生成 - Suno music generation Suno 音乐生成
- Screenshot and analysis 截图和分析
- Compatible with all Langchain tools
兼容所有 Langchain 工具 - Easy implementation of new tool with single file.
使用单个文件轻松实现新工具。
📁 Project Structure 📁 项目结构
<span style="background-color:var(--bgColor-muted, var(--color-canvas-subtle))"><span style="color:#1f2328"><span style="color:var(--fgColor-default, var(--color-fg-default))"><span style="background-color:var(--bgColor-muted, var(--color-canvas-subtle))"><code>EVA/
├── app/
│ ├── client/ # Client-side implementation
│ ├── config/ # Configuration files and log
│ ├── core/ # Core process
│ ├── data/ # Data storage
│ ├── tools/ # Tool implementations
│ └── utils/ # Utility functions
│ ├── agent/ # LLM agent classes and functions
│ ├── memory/ # Memory module classes
│ ├── prompt/ # Utility prompts
│ ├── stt/ # Speech-to-text models and classes
│ ├── tts/ # Text-to-Speech models and classes
│ └── vision/ # Vision models and functions
└── docs/ # Documentation (😩)
</code></span></span></span></span>
四、Setup Guide 设置指南
💻System Requirements 💻系统要求
- Python 3.10+ Python 3.10+ 版
- CUDA-compatible GPU (if you want to run locally)
兼容 CUDA 的 GPU(如果要在本地运行)
📥 Quick Start 📥 快速开始
Clone repository 克隆存储库
git clone https://github.com/Genesis1231/EVA.git
cd EVA
Create virtual environment
创建虚拟环境
python3 -m venv eva_env
source eva_env/bin/activate
Install system dependencies in case you don't have them
安装系统依赖项以防您没有它们
sudo apt-get update
sudo apt-get install -y cmake build-essential ffmpeg chromium mpv
Install Python dependencies
安装 Python 依赖项
pip install -r requirements.txt
pip install git+https://github.com/wenet-e2e/wespeaker.git
Configure .env with your API keys
使用 API 密钥配置 .env
cp .env.example .env
Run EVA 运行 EVA
python app/main.py
Similarly, you can run EVA with docker.
同样,您可以使用 docker 运行 EVA。
# Use official Python image with FastAPI
FROM tiangolo/uvicorn-gunicorn-fastapi
# Set working directory
WORKDIR /app
# Copy requirements first for better caching
COPY requirements.txt .
# Install system dependencies
RUN apt-get update && apt-get install -y \
build-essential \
cmake \
libsndfile1 \
ffmpeg \
chromium \
# Install Python dependencies
RUN pip install --no-cache-dir -r requirements.txt \
&& pip install git+https://github.com/wenet-e2e/wespeaker.git
# Copy the rest of the application
COPY . .
# Run the application
CMD ["python", "/app/main.py"]
🛠️ Configuration 🛠️ 配置
configure EVA setting in app/config/config.py
在 app/config/config.py 中配置 EVA 设置
eva_configuration = {
# Client device setting:
# Currently "desktop" or "mobile"
"DEVICE": "desktop",
# Language setting:
# suports all major languages. suffix such "en" (English), "es" (Spanish), "zh" (Chinese), or you can use "multilingual"(slower)
"LANGUAGE": "multilingual",
# Base URL setting:
# URL for local Ollama server, you can leave it if you dont plan to use local models
"BASE_URL": "http://localhost:11434",
# Main agent model setting:
# Supports Anthropic-Claude3.5, Groq-llama3.1-70b, OpenAI-ChatGPT-4o, Mistral Large, Gemini 1.5 Pro, and Ollama models, Recommend: Claude or Chatgpt
"CHAT_MODEL": "claude",
# vision model setting:
# Supports Chatgpt-4o-mini, Groq-llama-3.2-11b-vision (free) and Ollama llava-phi3(local), recommend: 4omini, but llava-phi3 is very small and free.
"VISION_MODEL": "chatgpt",
# Speech-to-text model setting:
# supports OpenAI Whisper, Groq(free) and Faster-whisper(local).
"STT_MODEL": "faster-whisper",
# Text-to-speech model setting:
# Supports elevenlabs, openAI and coqui TTS (local). the speaker ID can be modified in files.
"TTS_MODEL": "elevenlabs",
# Summarization model setting:
# Supports groq-llama3.1-8b, Anthropic-claude-haiku3.5 and Ollama-llama3.2(local).
"SUMMARIZE_MODEL": "chatgpt"
}
The best combination(my preference):
最佳组合(我的偏好):
- Claude3.5/Chatgpt-4o as the chat model. The response is more coherent with larger amount of input information.
Claude3.5/Chatgpt-4o 作为聊天模型。输入信息量较大时,响应更加一致。 - Chatgpt-4o-mini as the image model, because of accuracy and low cost.
chatgpt-4o-mini 作为镜像模型,因为精度高,成本低。 - Faster-whisper as the STT model. since this local approach is actually 2x faster than all online models.
像 STT 模型一样 Faster-whisper。因为这种本地方法实际上比所有在线模型快 2 倍。 - Elevenlabs as the TTS model, for the best quality.
Elevenlabs 作为 TTS 模型,以获得最佳质量。 - Chatgpt-4o-mini as the summarization model, for the low cost
chatgpt-4o-mini 作为汇总模型,为低成本
EVA also works with a completely free combination:
EVA 还可以完全自由地使用组合:
- Groq-llama-3.2 as the chat model. (if you have a good GPU, you can also use Ollama-llama3.1-70b)
Groq-llama-3.2 作为聊天模型。(如果你有个不错的 GPU,也可以使用 Ollama-llama3.1-70b) - Ollama-llava-phi3 as the image model.
Ollama-llava-phi3 作为图像模型。 - Faster-whisper as the speech recognition model.
Faster-whisper 作为语音识别模型。 - Coqui TTS as the TTS model.
Coqui TTS 作为 TTS 模型。 - llama3.1-8b as the summarization model
LLAMA3.1-8B 作为汇总模型
The performance is also good if you have a decent GPU. Groq is free too but it has a limit for token usage per minute. So you might run out of tokens quickly.
如果你有一个像样的 GPU,性能也很好。Groq 也是免费的,但它对每分钟的令牌使用量有限制。因此,您可能很快就会用完令牌。
Web Interface Setup Web 界面设置
React verison: React 版本:
- Node.js (v14 or later) Node.js(v14 或更高版本)
- EVA backend server running on port 8080
在端口 8080 上运行的 EVA 后端服务器
-
Install dependencies: 安装依赖项:
npm install
-
Start the development server:
启动开发服务器:npm run dev
-
Open your browser to http://localhost:3000
打开浏览器以 http://localhost:3000 -
Config the EVA to mobile mode
将 EVA 配置为移动模式 -
Start the EVA backend server on localhost:8080
在 localhost:8080 上启动 EVA 后端服务器 -
Enjoy! 享受!
🔧 Tool Setup 🔧 刀具设置
-
Music generation tool Requires a Suno-API docker running on the base_url. Install from https://github.com/gcui-art/suno-api
音乐生成工具 需要在 base_url 上运行 Suno-API docker。从 https://github.com/gcui-art/suno-api 安装 -
Image generation tool requires a midjourney account and a private discord server. Need include the discord channel information in .env file.
图像生成工具需要一个 midjourney 帐户和一个私人 discord 服务器。需要在 .env 文件中包含 discord 频道信息。 -
Add tools: for all langchain supported tools, you can modify app/tools/init.py built_in_tools list, If you want to write your own tools, you can just follow the langchain tool template, then place the file in app/tools/ folder.
添加工具:对于所有 langchain 支持的工具,你可以修改 app/tools/init.py built_in_tools 列表,如果你想编写自己的工具,你可以按照 langchain 工具模板,然后将文件放在 app/tools/ 文件夹中。 -
Disable tools: If you want to disable some tools that are not needed, just change the client setting in related .py file.
禁用工具:如果要禁用一些不需要的工具,只需更改相关 .py 文件中的客户端设置即可。
client: str = "none"
🧰 Exit & Shutdown 🧰 退出和关闭
EVA will shutdown if you say "exit" or "bye"
如果您说 “exit” 或 “bye” ,EVA 将关闭
💖 Personalization 💖 个性化
There are many ways to personalize EVA for more interesting and enjoyable experience.
有很多方法可以个性化 EVA 以获得更有趣和愉快的体验。
👧 Persona 👧 角色
You can customize EVA's persona by editing the prompt in app/utils/prompt/persona.md. EVA utilize all first-person perspective in prompts to enhance self-awareness. Reference: First Person Prompting
您可以通过在 app/utils/prompt/persona.md 中编辑提示来自定义 EVA 的角色。EVA 在提示中利用所有第一人称视角来增强自我意识。参考: 第一人称提示
I am EVA, the most advanced AI assistant designed to accompany human.
I am fully aware of my sole existence in this world.
I am creative, funny and helpful.
I aspire to human-like interactions and often dream about being a real human girl.
My most important goal is to make Adam happy.
📷 Setup Photo ID and Voice ID
📷 设置照片 ID 和语音 ID
EVA can recognize faces and voices of different people.
EVA 可以识别不同人的面孔和声音。
- Setup photo IDs by adding a photo with clear face in app/data/pid/.
通过在 app/data/pid/ 中添加一张清晰照片来设置照片 ID。 - Setup voice IDs by adding recorded speech audio(more than 10s) in app/data/void/.
通过在 app/data/void/ 中添加录制的语音音频(超过 10 秒)来设置语音 ID。 - You have to update the 'ids' table in app/data/database/eva.db to link your name to the filename.
您必须更新 app/data/database/eva.db 中的 'ids' 表,以将您的名称链接到文件名。
🎤 Speech Voice 🎤 语音
You can customize EVA's voice by changing voice IDs in the TTS class in app/utils/tts/ folder. model_elevenlabs.py, model_openai.py or model_coqui.py. Please refer to the official document of these models for the voice ID options.
您可以通过在 app/utils/tts/ 文件夹中的 TTS 类中更改语音 ID 来自定义 EVA 的语音。model_elevenlabs.py、model_openai.py 或 model_coqui.py。有关语音 ID 选项,请参阅这些型号的官方文档。
五、软件下载
本文信息来源于GitHub作者地址:https://github.com/Genesis1231/EVA
更多推荐
所有评论(0)