ASR (Automatic Speech Recognition) Service

This project uses Automatic Speech Recognition (ASR) technology to convert speech to text, supporting multiple ASR services.

Standalone Whisper Deployment

If you want to deploy Whisper speech recognition service locally, you can try using the docker image onerahmet/openai-whisper-asr-webservice:latest-gpu and deploy through the following steps:

Environment Requirements

NVIDIA GPU
Docker and Docker Compose
CUDA-enabled server

Deployment Steps

Create a docker-compose.yml file with the following content:

yaml

version: '3'
services:
  whisper:
    image: onerahmet/openai-whisper-asr-webservice:latest-gpu
    ports:
      - 9000:9000  # Service exposure port
    environment:
      - ASR_MODEL=base
      - DEVICE_TYPE=cuda:0  # If no GPU available, set to cpu
    restart: unless-stopped
    volumes:
      - ./audio_data:/tmp  # Audio file storage directory

Run in the directory:

bash

docker-compose up -d

Test if the service is running:

bash

curl -X POST "http://localhost:9000/asr" -H "accept: application/json" -H "Content-Type: multipart/form-data" -F "audio_file=@test.wav;type=audio/wav"

Cloud ASR Services

Option 1: Groq API (Recommended for International Users)

Fast processing speed
Good accuracy
Competitive pricing

Option 2: Silicon Flow API (Recommended for Domestic Users)

Stable domestic access
Chinese language optimization
Good cost-effectiveness

Configuration Steps

Register account for your chosen service
Obtain API Key
Configure in Amadeus system settings
Test voice recognition functionality

Service Selection

Choose the appropriate ASR service based on your geographic location and language requirements.

ASR (Automatic Speech Recognition) Service ​

Standalone Whisper Deployment ​

Environment Requirements ​

Deployment Steps ​

Cloud ASR Services ​

Option 1: Groq API (Recommended for International Users) ​

Option 2: Silicon Flow API (Recommended for Domestic Users) ​

Configuration Steps ​