<!doctype html>
<html lang="en">
  <head>
    <meta charset="utf-8">
    <meta name="viewport" content="width=device-width, initial-scale=1">
    <title>Cradicle Explorer</title>
    <link href="/css/bootstrap/bootstrap.min.css" rel="stylesheet">
    <style>
      .form-control-dark::placeholder {
          color: #aaa;
          opacity: 1;
      }
    </style>
    <link rel="stylesheet" href="/assets/fontawesome/css/all.min.css">
    <link rel="icon" type="image/png" href="/favicon.png">


                <link href="/css/dashboard.css" rel="stylesheet">
                </head>
                <body>
                <header class="navbar navbar-dark sticky-top bg-dark flex-md-nowrap p-0 shadow">
                  <a class="navbar-brand col-md-3 col-lg-2 me-0 px-3 fs-6" href="/">Cradicle Explorer</a>
                  <button class="navbar-toggler position-absolute d-md-none collapsed" type="button" data-bs-toggle="collapse" data-bs-target="#sidebarMenu" aria-controls="sidebarMenu" aria-expanded="false" aria-label="Toggle navigation">
                    <span class="navbar-toggler-icon"></span>
                  </button>
                  <form method="get" action="/cgi-bin/main" style="width:100%;"><input class="form-control form-control-dark w-100 rounded-0 border-0" type="text" name="q" placeholder="Search repos" aria-label="Search"></form>
                  <div class="navbar-nav flex-row">
                    <div class="nav-item text-nowrap">
                      <a class="nav-link px-3 active" href="/cgi-bin/repo?id=z2VH5ywHEuNnGdAGXhXzxoBhecbaD">ai-audio-monitoring_Speaker-Verification</a>
                    </div>
                  </div>
                </header>
                <div class="container-fluid">
                  <div class="row">
                    <nav id="sidebarMenu" class="col-md-3 col-lg-2 d-md-block bg-dark sidebar collapse">
                      <div class="position-sticky pt-3 sidebar-sticky">
                        <ul class="nav flex-column">
                          <li class="nav-item">
                            <a class="nav-link active" href="/cgi-bin/repo?id=z2VH5ywHEuNnGdAGXhXzxoBhecbaD">
                              <i class="align-text-bottom fa-solid fa-info"></i>
                              Info
                            </a>
                          </li>
                          <li class="nav-item">
                            <a class="nav-link" href="/cgi-bin/repo?id=z2VH5ywHEuNnGdAGXhXzxoBhecbaD&issue=list">
                              <i class="align-text-bottom fa-solid fa-layer-group"></i>
                              Issues
                            </a>
                          </li>
                          <li class="nav-item">
                            <a class="nav-link" href="/cgi-bin/repo?id=z2VH5ywHEuNnGdAGXhXzxoBhecbaD&patch=list">
                              <i class="align-text-bottom fa-solid fa-vest-patches"></i>
                              Patches
                            </a>
                          </li>
                          <li class="nav-item">
                            <a class="nav-link" href="/cgi-bin/repo?id=z2VH5ywHEuNnGdAGXhXzxoBhecbaD&wallet=list">
                              <i class="align-text-bottom fa-solid fa-wallet"></i>
                              Wallets
                            </a>
                          </li>
                          <li class="nav-item">
                            <a class="nav-link" href="/cgi-bin/repo?id=z2VH5ywHEuNnGdAGXhXzxoBhecbaD&source=.">
                              <i class="align-text-bottom fa-solid fa-code"></i>
                              Source
                            </a>
                          </li>
                        <h6 class="sidebar-heading d-flex justify-content-between align-items-center px-3 mt-4 mb-1 text-muted text-uppercase">
                          <span></span>
                        </h6>
                        <ul class="nav flex-column mb-2">
                        
                        </ul>
                      </div>
                    </nav>
                <main class="col-md-9 ms-sm-auto col-lg-10">
                  <div class="container px-1 py-3">
        

    <div class="list-group">
    <div class="list-group-item">
    <div style="font-size:1.3rem;">ai-audio-monitoring_Speaker-Verification</div>
    <div class="repo-item">Whole Core Mirror: ai-audio-monitoring (Speaker-Verification)</div>
    <div>rad:z2VH5ywHEuNnGdAGXhXzxoBhecbaD</div>
    </div>
    <div class="list-group-item">
    <div>Visibility</div>
    <div class="repo-item">public</div>
    </div>
    <div class="list-group-item">
    <div>Delegates</div><div class="repo-item">did:key:z6MkgP54rvgrokax3cgjCMAnpf8wfyCJjmDkZcHprkP4Nsc2</div>
    </div>
    <div class="list-group-item">
    <div>Default branch</div>
    <div><span class="repo-item">master &#8594 2c7d5b2f0b79443fe587bf1c146748bc0bfdfcf2</span> (Sat Apr 25 18:11:57 2026)</div>
    </div>
    <div class="list-group-item">
    <div>Threshold</div>
    <div class="repo-item">1</div>
    </div>
    </div>
    
        <div class="list-group mt-3">
        <div class="list-group-item">
        <div class="mb-2" style="font-weight:bold;"><i class="fa-solid fa-book"></i> README.md</div>
        <pre style="margin:0; font-size:0.85rem; overflow-x:auto; color:#fafafa;">
---

# 🎙️ Speaker Verification and Voiceprint Recognition

[![License](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
[![GitHub stars](https://img.shields.io/github/stars/zhangzijie-pro/Speaker-Verification.svg?style=social)](https://github.com/zhangzijie-pro/Speaker-Verification/stargazers)
[![Hugging Face](https://img.shields.io/badge/HuggingFace-Model%20%26%20Dataset-yellow.svg)](https://huggingface.co/zzj-pro)
![Python](https://img.shields.io/badge/Python-3.9%2B-blue?logo=python)
![PyTorch](https://img.shields.io/badge/PyTorch-2.0%2B-ee4c2c?logo=pytorch)
![Task](https://img.shields.io/badge/Task-Speaker%20Verification-green)

&lt;div align=&quot;center&quot;&gt;
  &lt;a href=&quot;Readme_ch.md&quot;&gt;中文文档&lt;/a&gt; • 
  &lt;a href=&quot;https://github.com/zhangzijie-pro/Speaker-Verification&quot;&gt;GitHub&lt;/a&gt; • 
  &lt;a href=&quot;https://huggingface.co/zzj-pro&quot;&gt;Hugging Face&lt;/a&gt;
&lt;/div&gt;

&gt; A practical speaker verification system based on **ECAPA-TDNN + AAM-Softmax**, trained and evaluated on **CN-Celeb**.

---

## ✨ Features

- **SOTA Backbone**: ECAPA-TDNN (Res2Net + SE + Attentive Statistics Pooling)
- **Strong Discriminative Loss**: AAM-Softmax with angular margin
- **Balanced Sampling**: PK Batch Sampler (speaker-balanced)
- **Robust Evaluation**: EER, score distribution, t-SNE, Recall@K
- **Stable Inference**: Multi-crop averaging for reliable embeddings
- **Low Memory Design**: Optimized for ~6GB GPU (AMP + gradient clipping)

---

## 📂 Project Structure

```
Speaker-Verification/
│
├── processed/              # Preprocessed features &amp; metadata
│   ├── preprocess_cnceleb2_train.py
│   └── cn_celeb2/          # outputs
│       ├── fbank_pt/       # Saved fbank features (*.pt)
│       ├── train_fbank_list.txt
│       ├── val_meta.jsonl  # Validation metadata (speaker, feature path)
│       └── spk2id.json
│
├── configs/
│   ├── train.yaml
│   └── train_config.py     # Training hyperparameters
│
├── demos/
│   └── real_time.py        # real time to listen audio and test
│
├── data/
│   ├── dataset.py          # Train / validation datasets
│   └── pk_sampler.py       # PK batch sampler (speaker-balanced)
│
├── speaker_verification/
│   ├── checkpointing.py    
│   ├── inference.py
│   ├── head/
│   │   └── aamsoftmax.py   # AAM-Softmax loss
│   ├── models/
│   │   └── epaca.py        # model
│   └── audio/
│       └── features.py     # extract features
│
├── utils/
│   ├── meters.py           # Accuracy, average meters
│   ├── seed.py             # Reproducibility
│   ├── plot.py             # Training curves
│   ├── export.py           # export onnx/mnn and split model, head
│   └── path_utils.py       # Deal to path error
│
├── outputs/                # Training outputs (checkpoints, curves)
├── outputs_eval/           # Verification results (EER, ROC, DET, t-SNE)
│
├── train.py                # Main training script
├── finetune.py             # Main finetune script
├── verify_pairs.py         # Pairwise speaker verification
├── compare_two_wavs.py     # Compare two audio files
│
├── README.md
├── README_ch.md
└── LICENSE
```

---

## 🚀 Quick Start

### 1. Installation

```bash
git clone https://github.com/zhangzijie-pro/Speaker-Verification.git
cd Speaker-Verification
pip install -r requirements.txt
```

### 2. Data Preprocessing

```bash
python processed/preprocess_cnceleb2_train.py
```

### 3. Training

```bash
# Train with default config
python train.py

# Override parameters via command line
python train.py train.epochs=100 train.lr=5e-4 train.emb_dim=256
```

---

## 📈 Evaluation (Speaker Verification)

### Run full evaluation

```bash
python verify.py \
    --val_meta processed/cn_celeb2/val_meta.jsonl \
    --ckpt outputs/best.pt \
    --out_dir outputs_eval
```

**Outputs**:
- `roc.png`, `det.png`, `score_hist.png`
- `tsne.png` (speaker clustering)
- `metrics.txt` (EER, Recall@K, etc.)

---

## 🎯 Single Audio Comparison (Most Used)

```bash
python compare_two_wavs.py \
    --wav1 test1.wav \
    --wav2 test2.wav \
    --ckpt outputs/export/model.onnx   # Supports ONNX
```

---

## 🛠️ Model Export (Deployment)

```bash
# One-click export to ONNX + MNN
python scripts/export.py \
    --ckpt outputs/best.pt \
    --out_dir outputs/deploy \
    --onnx --mnn
```

**Supported deployment**:
- **ONNX Runtime** (Python / C++)
- **MNN** (Mobile / Edge)
- **TensorRT** (High-performance server)

---

## 🧠 Model Overview

### Backbone

- **ECAPA-TDNN**
  - Res2Net-style temporal convolutions
  - Squeeze-and-Excitation (SE)
  - Attentive Statistics Pooling
- Embedding dimension: **192 / 256**

### Loss

- **AAM-Softmax (Additive Angular Margin Softmax)**
  - Encourages large inter-speaker margins
  - Used only during training

### Embedding

- L2-normalized speaker embeddings
- Cosine similarity for verification

---

## 📊 Dataset

- **CN-Celeb**
  - ~1000 speakers
  - Highly diverse recording conditions
- Split:
  - `train`: speaker-disjoint
  - `val`: speaker-disjoint
- Features:
  - 80-dim log Mel-filterbank
  - 16kHz sampling rate

---

## 📌 Recommended Configuration (6GB GPU)

```yaml
# configs/train.yaml
emb_dim: 256
channels: 512
lr: 1e-3
epochs: 80
crop_frames: 200          # Training
crop_frames_val: 400      # Validation
num_crops: 6
p: 32
k: 4
```

---

## 🔮 Future Improvements

- [x] Hydra configuration
- [x] Parallel preprocessing
- [x] ONNX / MNN export
- [ ] Noise / RIR augmentation

---

## 📜 License

This project is released under the **Apache License 2.0**.  
The CN-Celeb dataset follows its original license and usage terms.

---

## 🙋 Notes

This repository is intended for:

- Learning speaker verification systems

It is **not** an off-the-shelf commercial system.
</pre>
        </div>
        </div>

</div>
</main>
</div>
</div>


</body>
</html>

