Amadeus System Overview

This project is inspired by the Amadeus System from "Steins;Gate 0" and is a multimodal AI role-playing system. By integrating cutting-edge AI technologies, it aims to perfectly recreate virtual characters as interactive digital forms. Whether it's anime characters, game characters, or any other character you dream of, they can all achieve real dialogue and emotional interaction through this system.

Project Vision

By combining multiple AI technologies such as speech recognition, natural language processing, and emotional analysis, we have created a system that can:

Accurately restore character personalities and speaking patterns
Achieve natural and smooth real-time dialogue
Possess emotional understanding and expression capabilities
Continuously learn and remember interactions with users

System Architecture

                     +----------------------+
                     |      Client          |
                     | (UI/Interaction Layer)|
                     +----------+-----------+
                                ↕
                     +----------+-----------+
                     |  WebRTC Server       |
                     +----------+-----------+
                          ↗            ↖
   +------------------+---+----+------+------------------+
   |                  |        |      |                  |
+--+---+        +----+----+   |  +---+-------+    +-----+-----+
|Speech|        |         |   |  |           |    |           |
|Input +------->| Claude  |<--+--+ GPT-4.1   |<-->|   Mem0    |
|Module|        | Series  |<-+   |   nano    |    |           |
+--+---+        +----+----+  |   +-----------+    +-----------+
   |                 |       |
   |                 v       |
   |            +----+----+ |    +-----+-----+
   |            | Speech  | |    |  Visual  |
   +----------->| Output  | +----|  Input   |
                | Module  |      |  Module  |
                +----+----+      +-----+----+
                     |                |
          +----------+-----------+    |
          |          |           |    |
   Audio Output  Emotion Analysis Image Frame <----+
          |          |           |
          v          v           v
    [Return via RTM] [Analysis Results] [Processed Frames]
          |          |           |
          +----------+-----------+
                     |
                     ↓
              Real-time return to client

Core Component Description

1. Communication Layer

Real-time bidirectional communication based on WebRTC, using the open-source FastRTC framework
Ensures immediacy and naturalness of character reactions
Supports continuous dialogue flow
Optimized low-latency audio and video transmission

2. Speech Processing Module

ASR: Speech recognition, capturing every word from users
Fish Audio: High-quality audio processing ensuring smooth conversations
Real-time voice interaction, creating authentic conversational experiences

3. AI Processing Core

Claude Series Large Models:
- Responsible for character dialogue generation and processing
- Ensures responses align with character settings
- Maintains dialogue coherence and logic
GPT-4.1 nano:
- Emotional analysis and understanding
- Proactive topic guidance and interaction

4. Memory System

Deep memory storage based on Mem0
Records and learns interaction history with users
Builds character-specific memory databases
Achieves human-like memory retrieval and association

Featured Functions

Immersive Role-Playing
- Precise character personality restoration
- Dialogue style consistent with character design
- Context-aware interactive experience
Emotional Intelligence System
- Delicate emotional understanding capabilities
- Personalized emotional expression
- Interaction strategy adjustment based on scenarios
Proactive Interaction
- Proactive dialogue based on character settings
- Intelligent topic expansion
- Natural dialogue rhythm control
Evolutionary Memory
- Continuously growing interaction memory
- Personalized user relationship building
- Long-term memory accumulation and application

Technology Stack

React client interface
WebRTC for real-time communication and audio/video transmission, and core AI business logic implementation, using FastRTC open-source framework, with Python Flask providing interfaces
ASR speech recognition
Claude Series Large Models for character dialogue generation
GPT-4.1 nano for small classification tasks
CosyVoice2 for audio quality assurance
Mem0 providing memory storage system

Amadeus System Overview ​

Project Vision ​

System Architecture ​

Core Component Description ​

1. Communication Layer ​

2. Speech Processing Module ​

3. AI Processing Core ​

4. Memory System ​

Featured Functions ​

Technology Stack ​