I attended Meta’s RTC@Scale 2024 Conference where Meta talked about two new major changes that it accomplished while revamping the audio processing core stack. BERYL – new breakthrough Acoustic Echo Cancellation by Meta and MLOW – a new low bitrate audio codec fully written in software. this blog contains notes on Beryl. PDF of handwritten notes can be found here.

BERYL -full software AC (by Sriram Srinivasan & Hoang Do)
- META did 20% reduction in “No Audio” or “Audio device reliability” issue on iOS & Android
 - 15% reduction in P50 mouth to ear latency on Android
 - Revamp of Audio processing stack core for WhatsApp, Instagram messenger
- Very diverse user base
 - Different kinds of handsets
 - Different Geography
 - Noisy conditions
 - Both high end & Low end phones (more than 20% low end ARMV7)
 
 - Based on telemetry and user feedback Meta decided to tackle 1. ECHO and 2. Audio Quality under low bit rate network
 - High end devices use ML to suppress echo
 - To accommodate low end devices which cannot run ML, a baseline solution for echo cancellation is needed
 - Welcome BERYL
 - Bery/replaces WebRTC‘s AEC3, AECM on all devices
 - Interestingly users experiencing echo issues are also on low end devices which cannot run ML
 - Meta’s scale is too larger
- High end phones have hardware AEC
 - Low end phones do not
 - Stereo I spatial audio only possible in s/w
 - H/w only does mono AEC
 
 
- Beryl was needed because AM either leaves lot of residual echo or degrades quality of double-talk
 - AECM – Not scalable for millions of users & Quality not best
 - Beryl AEC = Low compute – DSP based s/w AEC 
- Lite mode for low end devices
 - Full made for high end
 - Both modes adaptive vs. ACT being simple echo suppressor
 - Near instant adaptation to changes
 - Better double talk performance
 - Multi-channel capture & render l6k1tz & 48 kHz
 - Tuned using 3000 music t speech (monot stereo on 20T devices
 - CPU usage increase of less than 7% compared to WebRTC AEC
 
 
Beryl Components
1. Delay Estimator
- Clock drift when using external mic & speaker as they do not share common clock
 - Delay estimator, estimates delay between far- end reference signal (speaker) & near end capture signals (mic)
 - Beryl full made can handle non-causal delays (-ve delay)
 - Can handle delay up to 1 sec
 
2 Linear AEC
- Estimate echo & subtract from capture signal
 - Beryl AEC is normalized least mean squared (NLMS) frequency domain dual filter algo
 - One fixed & one adaptive filter
 - Coefficients can be copied between filters 
- relative difference in the powers of error signal between two filters and input mic signal
 - Coupling factor between echo estimate E error signal *
 
 - Adaptation step size is configurable I depends on coherence between mic & reference signals, power and SIR
 - Great double talk performance compared to WebRTC AEC
 
3 Acoustic Echo Suppressor (AES)
- Non linear distortions are introduced by amplifiers before speaker and after microphone
 - AES removes this non-linear echo (residual echo)
 - AES removes stationary echo noise, distortion, applies perceptual filtering & ambient noise matching
 
Implementation
- Reduce memory, CPU & latency
 - Synchronization needed due to work on audio from input & output devices from different threads 
- mutex in functions (Good safety but worse real time performance)
 - Low level locks on shared data structures
 - Thread safe low level data structures (ok safety, great realtime Performance)
 
 - Neon on ARMY7 & ARMG4
 - AUX on Intel
 - CPU 4110% of WebRTC AEC
 
