Acoustic Echo Cancellation Archives

I attended Meta’s RTC@Scale 2024 Conference where Meta talked about two new major changes that it accomplished while revamping the audio processing core stack. BERYL – new breakthrough Acoustic Echo Cancellation by Meta and MLOW – a new low bitrate audio codec fully written in software. this blog contains notes on Beryl. PDF of handwritten notes can be found here.

BERYL -full software AC (by Sriram Srinivasan & Hoang Do)

META did 20% reduction in “No Audio” or “Audio device reliability” issue on iOS & Android
15% reduction in P50 mouth to ear latency on Android
Revamp of Audio processing stack core for WhatsApp, Instagram messenger
- Very diverse user base
- Different kinds of handsets
- Different Geography
- Noisy conditions
- Both high end & Low end phones (more than 20% low end ARMV7)
Based on telemetry and user feedback Meta decided to tackle 1. ECHO and 2. Audio Quality under low bit rate network
High end devices use ML to suppress echo
To accommodate low end devices which cannot run ML, a baseline solution for echo cancellation is needed
Welcome BERYL
Bery/replaces WebRTC‘s AEC3, AECM on all devices
Interestingly users experiencing echo issues are also on low end devices which cannot run ML
Meta’s scale is too larger
- High end phones have hardware AEC
- Low end phones do not
- Stereo I spatial audio only possible in s/w
- H/w only does mono AEC

Beryl was needed because AM either leaves lot of residual echo or degrades quality of double-talk
AECM – Not scalable for millions of users & Quality not best
Beryl AEC = Low compute – DSP based s/w AEC
- Lite mode for low end devices
- Full made for high end
- Both modes adaptive vs. ACT being simple echo suppressor
- Near instant adaptation to changes
- Better double talk performance
- Multi-channel capture & render l6k1tz & 48 kHz
- Tuned using 3000 music t speech (monot stereo on 20T devices
- CPU usage increase of less than 7% compared to WebRTC AEC

Beryl Components

1. Delay Estimator

Clock drift when using external mic & speaker as they do not share common clock
Delay estimator, estimates delay between far- end reference signal (speaker) & near end capture signals (mic)
Beryl full made can handle non-causal delays (-ve delay)
Can handle delay up to 1 sec

2 Linear AEC

Estimate echo & subtract from capture signal
Beryl AEC is normalized least mean squared (NLMS) frequency domain dual filter algo
One fixed & one adaptive filter
Coefficients can be copied between filters
- relative difference in the powers of error signal between two filters and input mic signal
- Coupling factor between echo estimate E error signal *
Adaptation step size is configurable I depends on coherence between mic & reference signals, power and SIR
Great double talk performance compared to WebRTC AEC

3 Acoustic Echo Suppressor (AES)

Non linear distortions are introduced by amplifiers before speaker and after microphone
AES removes this non-linear echo (residual echo)
AES removes stationary echo noise, distortion, applies perceptual filtering & ambient noise matching

Implementation

Reduce memory, CPU & latency
Synchronization needed due to work on audio from input & output devices from different threads
- mutex in functions (Good safety but worse real time performance)
- Low level locks on shared data structures
- Thread safe low level data structures (ok safety, great realtime Performance)
Neon on ARMY7 & ARMG4
AUX on Intel
CPU 4110% of WebRTC AEC