I attended Meta’s RTC@Scale 2024 Conference where Meta talked about two new major changes that it accomplished while revamping the audio processing core stack. BERYL – new breakthrough Acoustic Echo Cancellation by Meta and MLOW – a new low bitrate audio codec fully written in software. this blog contains notes on Beryl. PDF of handwritten notes can be found here.
BERYL -full software AC (by Sriram Srinivasan & Hoang Do)
- META did 20% reduction in “No Audio” or “Audio device reliability” issue on iOS & Android
- 15% reduction in P50 mouth to ear latency on Android
- Revamp of Audio processing stack core for WhatsApp, Instagram messenger
- Very diverse user base
- Different kinds of handsets
- Different Geography
- Noisy conditions
- Both high end & Low end phones (more than 20% low end ARMV7)
- Based on telemetry and user feedback Meta decided to tackle 1. ECHO and 2. Audio Quality under low bit rate network
- High end devices use ML to suppress echo
- To accommodate low end devices which cannot run ML, a baseline solution for echo cancellation is needed
- Welcome BERYL
- Bery/replaces WebRTC‘s AEC3, AECM on all devices
- Interestingly users experiencing echo issues are also on low end devices which cannot run ML
- Meta’s scale is too larger
- High end phones have hardware AEC
- Low end phones do not
- Stereo I spatial audio only possible in s/w
- H/w only does mono AEC
- Beryl was needed because AM either leaves lot of residual echo or degrades quality of double-talk
- AECM – Not scalable for millions of users & Quality not best
- Beryl AEC = Low compute – DSP based s/w AEC
- Lite mode for low end devices
- Full made for high end
- Both modes adaptive vs. ACT being simple echo suppressor
- Near instant adaptation to changes
- Better double talk performance
- Multi-channel capture & render l6k1tz & 48 kHz
- Tuned using 3000 music t speech (monot stereo on 20T devices
- CPU usage increase of less than 7% compared to WebRTC AEC
Beryl Components
1. Delay Estimator
- Clock drift when using external mic & speaker as they do not share common clock
- Delay estimator, estimates delay between far- end reference signal (speaker) & near end capture signals (mic)
- Beryl full made can handle non-causal delays (-ve delay)
- Can handle delay up to 1 sec
2 Linear AEC
- Estimate echo & subtract from capture signal
- Beryl AEC is normalized least mean squared (NLMS) frequency domain dual filter algo
- One fixed & one adaptive filter
- Coefficients can be copied between filters
- relative difference in the powers of error signal between two filters and input mic signal
- Coupling factor between echo estimate E error signal *
- Adaptation step size is configurable I depends on coherence between mic & reference signals, power and SIR
- Great double talk performance compared to WebRTC AEC
3 Acoustic Echo Suppressor (AES)
- Non linear distortions are introduced by amplifiers before speaker and after microphone
- AES removes this non-linear echo (residual echo)
- AES removes stationary echo noise, distortion, applies perceptual filtering & ambient noise matching
Implementation
- Reduce memory, CPU & latency
- Synchronization needed due to work on audio from input & output devices from different threads
- mutex in functions (Good safety but worse real time performance)
- Low level locks on shared data structures
- Thread safe low level data structures (ok safety, great realtime Performance)
- Neon on ARMY7 & ARMG4
- AUX on Intel
- CPU 4110% of WebRTC AEC