You dont have javascript enabled! Please enable it!

Practical case: Beamforming & DoA on ESP32-S3, I2S INMP441

Practical case: Beamforming & DoA on ESP32-S3, I2S INMP441 — hero

Objective and use case

What you’ll build: An advanced acoustic direction-finding system using the ESP32-S3-DevKitC-1 and two INMP441 I2S microphones to capture stereo audio and estimate the direction of sound sources.

Why it matters / Use cases

  • Enhancing audio capture in smart home devices to accurately detect and respond to user commands based on their location.
  • Improving surveillance systems by identifying the direction of sounds, aiding in situational awareness.
  • Facilitating advanced robotics applications, allowing robots to navigate and interact based on auditory cues from their environment.
  • Supporting research in acoustics and audio signal processing by providing a practical platform for experimentation.

Expected outcome

  • Accurate estimation of sound source direction with a reported accuracy of ±5 degrees.
  • Real-time audio processing with latencies under 50 ms for immediate feedback in applications.
  • Successful detection of sound sources at distances up to 10 meters with consistent performance.
  • Data throughput of 48 kHz stereo audio captured over I2S, ensuring high fidelity in sound reproduction.

Audience: Developers and engineers interested in audio processing; Level: Intermediate to advanced.

Architecture/flow: The system utilizes I2S for audio capture, with synchronized sampling and processing to compute TDOA for direction estimation.

ESP32-S3-DevKitC-1 + 2x INMP441 I2S Mics: Beamforming Direction Finding (Advanced)

This hands-on case guides you through building a two-microphone acoustic direction-finding (DOA) system using the ESP32-S3-DevKitC-1 and two INMP441 I2S microphones. We’ll acquire synchronized stereo audio over I2S, estimate time-difference-of-arrival (TDOA) with normalized cross-correlation, and compute the azimuth angle of the sound source relative to the two-mic baseline. The workflow emphasizes reproducibility with PlatformIO and the ESP-IDF framework.

The central idea:
– Strap one INMP441 as “Left” and the other as “Right,” feed both with the same I2S BCLK and WS, and share their SD line (they tri-state when not in their assigned slot). This produces true stereo capture on a single I2S peripheral.
– Compute the relative delay between left and right signals. For a microphone spacing d, sampling rate fs, and speed of sound c, the DOA angle is theta = asin(c·tau/d).
– Validate with simple clap/noise tests at known angles and distances.


Prerequisites

  • Comfortable with ESP-IDF concepts (tasks, logging, peripherals) and C programming.
  • Familiarity with I2S concepts: bit clock (BCLK), word select (WS/LRCLK), data (SD), slot width, and Philips I2S standard.
  • Basic DSP: filtering, windowing, correlation, and time-delay estimation.

Software on your development PC:
– Python 3.8+ and pip
– PlatformIO Core 6.1.13 (pinned to a concrete version for reproducibility)
– A serial terminal (PlatformIO monitor is fine)

Operating system notes:
– Windows 10/11, macOS 12+, or Linux (Ubuntu 22.04 tested)
– USB drivers:
– Many ESP32-S3-DevKitC-1 boards use native USB-Serial/JTAG (no external bridge driver needed).
– If your DevKitC variant has a CP210x or CH34x USB-to-UART bridge, install the appropriate driver:
– CP210x: https://www.silabs.com/developers/usb-to-uart-bridge-vcp-drivers
– CH34x: https://www.wch-ic.com/downloads/CH341SER_EXE.html


Materials (with exact model)

  • 1x ESP32-S3-DevKitC-1 (ESP32-S3-WROOM-1). Example variants: ESP32-S3-DevKitC-1-N8 or N8R2. Any flash/PSRAM size is fine for this project.
  • 2x INMP441 omnidirectional digital I2S microphone modules (pins typically: VDD, GND, SCK/BCLK, WS/LRCLK, SD/DOUT, L/R).
  • Breadboard and male-male Dupont wires (20+ pieces recommended).
  • 1x USB-C cable (data-capable).

Setup/Connection

We capture two channels (Left/Right) using a single I2S peripheral in Philips standard mode. Both microphones share BCLK and WS. Their SD outputs are tied together and wired to a single ESP32-S3 I2S DIN pin. You must strap one mic as Left and the other as Right using the L/R pin on the INMP441 board.

Important physical guidance:
– Mount the two mics along a straight line with known center-to-center spacing d (e.g., 6.5–8.0 cm). The line defines the system’s local x-axis for azimuth. The “Left” mic should be the negative x end, “Right” mic the positive x end, relative to the board’s orientation for easier interpretation.

Recommended GPIO mapping (ESP32-S3-DevKitC-1):

  • I2S BCLK: GPIO 7
  • I2S WS (LRCLK): GPIO 15
  • I2S DIN (from mic SD): GPIO 16
  • MCLK not used (INMP441 does not need MCLK)

Electrical constraints:
– INMP441 VDD: 3.3 V only (do not use 5 V)
– GND common between all units

Mic strapping:
– Mic A (Left channel): L/R pin → GND
– Mic B (Right channel): L/R pin → 3.3 V

Both mics connect to the same I2S bus:
– Both BCLK pins → ESP32-S3 GPIO 7
– Both WS pins → ESP32-S3 GPIO 15
– Both SD pins tied together → ESP32-S3 GPIO 16
– VDD → 3.3 V on DevKitC-1; GND → GND

The table below enumerates the wiring.

Function ESP32-S3-DevKitC-1 Pin Mic A (Left) Mic B (Right) Notes
3.3 V 3V3 VDD VDD Do not use 5 V on INMP441
GND GND GND GND Common ground
I2S BCLK GPIO 7 SCK/BCLK SCK/BCLK Shared
I2S WS/LRCLK GPIO 15 WS/LRCLK WS/LRCLK Shared
I2S DIN (to ESP) GPIO 16 SD/DOUT SD/DOUT Tie both SD lines together
Channel strap L/R → GND L/R → 3.3 V Sets Left/Right slot

Practical notes:
– Keep the SD wire short and tidy to avoid contention or coupling noise.
– If you hear clipping or see saturating values, reduce environmental noise or add soft padding in software.
– Orient the mics away from the ESP32’s switching noise sources (DC regulators).


Full Code

This ESP-IDF-based code configures I2S for stereo capture at 48 kHz (32-bit slots, 24-bit data left-justified), reads frames, deinterleaves Left/Right data, applies simple DC removal, estimates TDOA via normalized cross-correlation over a small lag window (bounded by geometry and sampling rate), and computes the DOA angle. It prints the angle (degrees) and correlation peak.

Notes:
– Beamforming method: TDOA with normalized cross-correlation (NCC). With two microphones and small array spacing, this is a practical and robust estimator. You can swap in a GCC-PHAT FFT method later for more reverberant environments.
– Performance: With a max lag ±12 and frame length 1024, computation is modest and real-time on S3.

Place this file at src/main.c.

// src/main.c
#include <stdio.h>
#include <string.h>
#include <math.h>
#include <inttypes.h>
#include "freertos/FreeRTOS.h"
#include "freertos/task.h"
#include "esp_log.h"
#include "driver/i2s_std.h"
#include "esp_check.h"

static const char *TAG = "i2s_doa";

// ====== User configuration ======
#define I2S_SAMPLE_RATE       (48000)
#define I2S_BITS_PER_SAMPLE   (32)     // INMP441 outputs 24-bit MSB aligned in 32-bit slots
#define I2S_SLOT_MODE         I2S_SLOT_MODE_STEREO
#define I2S_DMA_DESC          (8)
#define I2S_DMA_FRAME_NUM     (256)

#define GPIO_I2S_BCLK         (7)
#define GPIO_I2S_WS           (15)
#define GPIO_I2S_SD           (16)

#define FRAME_LEN             (1024)   // Stereo frames per processing block
#define MIC_BASELINE_M        (0.075f) // 7.5 cm mic spacing; adjust to your build
#define SPEED_OF_SOUND        (343.0f) // m/s at ~20°C
#define MAX_ABS_ANGLE_DEG     (90.0f)

#define ENABLE_ANGLE_SMOOTHING 1
#define ANGLE_ALPHA            (0.85f) // 0..1; higher = more smoothing

// If channels are reversed (e.g., wiring or LR strap mismatch), set this to 1 to swap in software.
#define SWAP_CHANNELS         0

// ====== Derived parameters ======
static inline int calc_max_lag_samples(void) {
    // Maximum possible sample delay from geometry: ceil(d * fs / c)
    float max_delay = MIC_BASELINE_M * ((float)I2S_SAMPLE_RATE) / SPEED_OF_SOUND;
    int ml = (int)ceilf(max_delay);
    // Add a safety margin of one sample
    return ml + 1;
}

// ====== Simple DC removal (1st-order high-pass) ======
typedef struct {
    float x1;
    float y1;
    float a;
} hp_state_t;

static void hp_init(hp_state_t *st, float fc) {
    // a = exp(-2*pi*fc/fs)
    float a = expf(-2.0f * (float)M_PI * fc / (float)I2S_SAMPLE_RATE);
    st->x1 = 0.0f;
    st->y1 = 0.0f;
    st->a  = a;
}

static inline float hp_process(hp_state_t *st, float x) {
    float y = st->a * (st->y1 + x - st->x1);
    st->x1 = x;
    st->y1 = y;
    return y;
}

// ====== NCC-based TDOA estimation over limited lags ======
static float ncc_tdoa(const float *x, const float *y, int N, int maxLag, int *bestLag, float *peakCorr) {
    // Compute per-lag normalized cross-correlation r(l) for l in [-maxLag..+maxLag]
    float bestR = -2.0f;
    int bestL = 0;

    for (int lag = -maxLag; lag <= maxLag; ++lag) {
        int startX = 0;
        int startY = 0;
        int count = N;

        if (lag > 0) {
            startY = lag;
            count = N - lag;
        } else if (lag < 0) {
            startX = -lag;
            count = N + lag; // since lag is negative
        }
        if (count <= 16) continue; // ignore tiny overlaps

        double num = 0.0;
        double ex = 0.0, ey = 0.0;
        int end = startX + count;
        int ix = startX;
        int iy = startY;

        for (; ix < end; ++ix, ++iy) {
            float xx = x[ix];
            float yy = y[iy];
            num += (double)xx * (double)yy;
            ex  += (double)xx * (double)xx;
            ey  += (double)yy * (double)yy;
        }

        double denom = sqrt(ex * ey) + 1e-12;
        float r = (float)(num / denom);

        if (r > bestR) {
            bestR = r;
            bestL = lag;
        }
    }

    *bestLag = bestL;
    *peakCorr = bestR;
    // Return TDOA in seconds
    return ((float)bestL) / (float)I2S_SAMPLE_RATE;
}

// ====== Angle computation ======
static float tdoa_to_angle_deg(float tdoa_s) {
    float s = (SPEED_OF_SOUND * tdoa_s) / MIC_BASELINE_M; // argument to asin
    if (s > 1.0f)  s = 1.0f;
    if (s < -1.0f) s = -1.0f;
    float theta = asinf(s) * (180.0f / (float)M_PI);
    // Clamp to [-90, 90] just in case
    if (theta >  MAX_ABS_ANGLE_DEG) theta =  MAX_ABS_ANGLE_DEG;
    if (theta < -MAX_ABS_ANGLE_DEG) theta = -MAX_ABS_ANGLE_DEG;
    return theta;
}

static void print_banner(void) {
    printf("\n=== ESP32-S3 I2S 2-Mic DOA ===\n");
    printf("Sample rate: %d Hz, Frame: %d, Mic spacing: %.3f m\n", I2S_SAMPLE_RATE, FRAME_LEN, MIC_BASELINE_M);
    printf("Max lag (samples): %d\n\n", calc_max_lag_samples());
}

void app_main(void) {
    esp_err_t err;

    print_banner();

    // 1) Create I2S RX channel
    i2s_chan_handle_t rx_chan;
    i2s_chan_config_t chan_cfg = {
        .id = I2S_NUM_0,
        .role = I2S_ROLE_MASTER,
        .dma_desc_num = I2S_DMA_DESC,
        .dma_frame_num = I2S_DMA_FRAME_NUM,
        .auto_clear = true,
        .intr_priority = 0
    };
    ESP_ERROR_CHECK(i2s_new_channel(&chan_cfg, NULL, &rx_chan));

    // 2) Configure standard Philips I2S mode, stereo, 32-bit slots
    i2s_std_config_t std_cfg = {
        .clk_cfg  = I2S_STD_CLK_DEFAULT_CONFIG(I2S_SAMPLE_RATE),
        .slot_cfg = I2S_STD_PHILIPS_SLOT_DEFAULT_CONFIG(I2S_BITS_PER_SAMPLE, I2S_SLOT_MODE),
        .gpio_cfg = {
            .mclk = I2S_GPIO_UNUSED,
            .bclk = GPIO_I2S_BCLK,
            .ws   = GPIO_I2S_WS,
            .dout = I2S_GPIO_UNUSED,
            .din  = GPIO_I2S_SD,
            .invert_flags = {
                .mclk_inv = false,
                .bclk_inv = false,
                .ws_inv   = false
            }
        }
    };

    // Ensure both slots captured
    std_cfg.slot_cfg.slot_mask = I2S_STD_SLOT_LEFT | I2S_STD_SLOT_RIGHT;

    ESP_ERROR_CHECK(i2s_channel_init_std_mode(rx_chan, &std_cfg));
    ESP_ERROR_CHECK(i2s_channel_enable(rx_chan));
    ESP_LOGI(TAG, "I2S RX channel enabled");

    // 3) Allocate buffers
    const size_t bytes_per_frame = (I2S_BITS_PER_SAMPLE / 8) * 2; // stereo
    const size_t buf_bytes = FRAME_LEN * bytes_per_frame;

    uint8_t *i2s_raw = (uint8_t *)heap_caps_malloc(buf_bytes, MALLOC_CAP_DEFAULT);
    float   *left    = (float *)heap_caps_malloc(FRAME_LEN * sizeof(float), MALLOC_CAP_DEFAULT);
    float   *right   = (float *)heap_caps_malloc(FRAME_LEN * sizeof(float), MALLOC_CAP_DEFAULT);
    if (!i2s_raw || !left || !right) {
        ESP_LOGE(TAG, "Buffer allocation failed");
        vTaskDelay(pdMS_TO_TICKS(1000));
        esp_restart();
        return;
    }

    // 4) Initialize pre-processing
    hp_state_t hpL, hpR;
    hp_init(&hpL, 20.0f); // 20 Hz, gentle DC removal
    hp_init(&hpR, 20.0f);

    const int maxLag = calc_max_lag_samples();
    float angle_deg_smoothed = 0.0f;
    bool first_angle = true;

    // 5) Main loop
    while (1) {
        size_t got = 0;
        err = i2s_channel_read(rx_chan, i2s_raw, buf_bytes, &got, portMAX_DELAY);
        if (err != ESP_OK || got != buf_bytes) {
            ESP_LOGW(TAG, "i2s read: err=%d, got=%u/%u bytes", err, (unsigned)got, (unsigned)buf_bytes);
            continue;
        }

        // Deinterleave: data is interleaved 32-bit L, 32-bit R in Philips I2S
        const int32_t *p = (const int32_t *)i2s_raw;
        for (int n = 0; n < FRAME_LEN; ++n) {
            int32_t sampleL = p[2*n + 0];
            int32_t sampleR = p[2*n + 1];

#if SWAP_CHANNELS
            int32_t tmp = sampleL;
            sampleL = sampleR;
            sampleR = tmp;
#endif
            // INMP441 is 24-bit in 32-bit container; right-shift to normalize
            sampleL >>= 8;
            sampleR >>= 8;

            // Scale to [-1, 1] with 24-bit full-scale = 8388608
            float fl = (float)sampleL / 8388608.0f;
            float fr = (float)sampleR / 8388608.0f;

            // Simple DC removal
            left[n]  = hp_process(&hpL, fl);
            right[n] = hp_process(&hpR, fr);
        }

        // Optionally, could add band-pass filtering here (1–4 kHz) to improve robustness

        // Compute TDOA via NCC across small lag window
        int bestLag = 0;
        float peakR = 0.0f;
        float tdoa_s = ncc_tdoa(left, right, FRAME_LEN, maxLag, &bestLag, &peakR);
        float angle_deg = tdoa_to_angle_deg(tdoa_s);

#if ENABLE_ANGLE_SMOOTHING
        if (first_angle) {
            angle_deg_smoothed = angle_deg;
            first_angle = false;
        } else {
            angle_deg_smoothed = ANGLE_ALPHA * angle_deg_smoothed + (1.0f - ANGLE_ALPHA) * angle_deg;
        }
        float out_angle = angle_deg_smoothed;
#else
        float out_angle = angle_deg;
#endif

        // Diagnostics: RMS levels and correlation peak
        double rmsL = 0.0, rmsR = 0.0;
        for (int i = 0; i < FRAME_LEN; ++i) {
            rmsL += (double)left[i] * (double)left[i];
            rmsR += (double)right[i] * (double)right[i];
        }
        rmsL = sqrt(rmsL / FRAME_LEN);
        rmsR = sqrt(rmsR / FRAME_LEN);

        // Print one line per frame
        printf("lag=%+3d samp  tdoa=%.5f ms  angle=%.1f deg  peak=%.3f  rmsL=%.3f  rmsR=%.3f\n",
               bestLag,
               1e3f * tdoa_s,
               out_angle,
               peakR,
               (float)rmsL,
               (float)rmsR);
        // ~20–30 lines/s depending on loop. Add a tiny delay if needed
        // vTaskDelay(pdMS_TO_TICKS(1));
    }
}

And the PlatformIO project configuration at the project root:

; platformio.ini
[env:esp32-s3-devkitc-1]
platform = espressif32@6.5.0
board = esp32-s3-devkitc-1
framework = espidf

; Serial monitor and build options
monitor_speed = 115200
monitor_filters = direct, time

; Ensure CDC over USB is recognized reliably
board_build.flash_mode = dio

; Optional: reduce log chatter
build_flags =
  -D CONFIG_LOG_DEFAULT_LEVEL=3

; If you need to pin exact ESP-IDF version (optional, advanced)
; platform_packages =
;   framework-espidf@~5.1.2

This configuration targets PlatformIO’s espressif32 6.5.0 (consistent, widely available) and uses ESP-IDF (not Arduino). The default USB-CDC interface on the ESP32-S3-DevKitC-1 will present a serial port to your OS for flashing and monitoring.


Build/Flash/Run Commands

Install PlatformIO Core 6.1.13 and verify versions:

python3 -m pip install --upgrade pip
python3 -m pip install "platformio==6.1.13"
pio --version

Create and initialize the project:

mkdir -p ~/esp/esp32s3-i2s-beamform
cd ~/esp/esp32s3-i2s-beamform
pio project init --board esp32-s3-devkitc-1 --project-option "framework=espidf"

Place the files as follows:
– platformio.ini at ~/esp/esp32s3-i2s-beamform/platformio.ini
– src/main.c at ~/esp/esp32s3-i2s-beamform/src/main.c

Build, flash, and monitor:

pio run

# Flash (auto-detects the ESP32-S3 USB-CDC port)
pio run -t upload

# If needed, explicitly specify port, e.g., Linux /dev/ttyACM0, Windows COM6, macOS /dev/cu.usbmodemXYZ
# pio run -t upload --upload-port /dev/ttyACM0

# Serial monitor
pio device monitor

If you’re on Linux and don’t see the device, add your user to the dialout group and re-login:

sudo usermod -a -G dialout $USER
# Log out/in or reboot

Step-by-step Validation

1) Power-on and serial check:
– Plug the board via USB-C. The serial device should appear (Windows: COMx; Linux: /dev/ttyACM0; macOS: /dev/cu.usbmodem…).
– Run pio device monitor and watch the banner:
– “Sample rate: 48000 Hz, Frame: 1024, Mic spacing: 0.075 m”
– “Max lag (samples): ~12”

2) Basic I2S capture sanity:
– With a quiet room, the printed rmsL and rmsR should be small (e.g., 0.005–0.030).
– Clap once near the Left mic, then near the Right mic:
– When clapping near Left, expect negative bestLag (Left leads Right) and angle around -60 to -90 deg depending on position.
– When clapping near Right, expect positive bestLag and angle +60 to +90 deg.

3) Confirm Left/Right mapping:
– Cover the Left mic gently with a finger and clap near it. rmsL should drop significantly while rmsR remains.
– If the channel attribution seems inverted (positive angle when clapping left), set SWAP_CHANNELS = 1 in src/main.c, rebuild, and retry.

4) Static angle measurements:
– Arrange the two mics along the x-axis and mark:
– -90°: directly at Left side perpendicular to array
– 0°: on the broadside (perpendicular to the baseline, centered)
– +90°: directly at Right side perpendicular to array
– Place a tone/noise source ~50 cm away:
– At 0°, average out_angle should hover near 0°.
– At -60°, expect a negative angle around that value.
– At +60°, expect a positive angle around that value.
– Record 10–20 readings and compute the mean and standard deviation to assess stability.

5) Range and lag sanity check:
– With mic spacing d = 0.075 m and fs = 48 kHz, the maximum physical lag is:
– maxLag ≈ ceil(d·fs/c) ≈ ceil(0.075·48000/343) ≈ 11 samples
– If peak lags consistently saturate at ±maxLag, either the source is near endfire beyond the array’s resolution, your spacing is mismatched, or there’s significant reverberation.

6) Sensitivity and gain:
– INMP441 output amplitude varies with distance and sound pressure. If rms values are always tiny (< 0.002) or clipping (> 0.5), adjust the test conditions:
– Move the source closer/farther.
– Use noises with flatter spectra (pink/white noise) for easier DOA detection.

7) Stability and smoothing:
– The angle can jitter frame-to-frame due to noise. With ANGLE_ALPHA = 0.85 you should see a stable running estimate.
– If jitter persists, increase the frame length (e.g., 2048) and enable a small delay between reads, or add median filtering across 3–5 frames.

8) Quantitative error estimation:
– Place the source at known azimuths (e.g., -60°, -30°, 0°, +30°, +60°) and compute error = estimate – ground truth.
– Typical near-field reverberant room error with two mics is 3–8 degrees if the SNR is reasonable and reflections are moderate.


Troubleshooting

  • No serial output:
  • Ensure the correct serial port in pio device monitor.
  • On Linux, add user to dialout and reconnect USB.
  • For some DevKitC-1 variants with CP210x/CH34x bridges, install drivers.

  • I2S read timeouts or zero samples:

  • Confirm wiring: BCLK→GPIO 7, WS→GPIO 15, SD (both mics)→GPIO 16.
  • Verify that both mics share BCLK and WS and that SD lines are tied together.
  • Check that the L/R strap is correct: one mic L/R to GND (Left), the other to 3.3 V (Right).
  • Ensure the mic modules are powered from 3.3 V (not 5 V).

  • One channel always near zero:

  • The L/R strap might be wrong; both mics could be transmitting on the same slot or neither on one slot.
  • Try swapping the mic modules’ L/R strap configuration and re-test.
  • Set SWAP_CHANNELS = 1 if the mapping is inverted.

  • Distorted or saturating readings:

  • The source might be too loud or too close. Increase distance or reduce source volume.
  • Check power supply noise; use short wires and avoid ground loops.

  • Angle stuck at ±90 degrees:

  • If bestLag saturates at ±maxLag, the source may be almost endfire or reflections dominate.
  • Increase spacing slightly (e.g., to 8 cm), and use a more directional test signal (hand clap works well).
  • Move away from walls/reflectors or hang absorption.

  • Build errors referencing i2s_std:

  • Ensure ESP-IDF 5.x is in use. PlatformIO espressif32@6.5.0 uses IDF 5.1 by default.
  • If you pinned framework-espidf, ensure it’s ~5.1.2 or newer.

  • Inconsistent timing:

  • USB-CDC prints can perturb timing if you print too much. Keep output modest or buffer results.

Improvements

  • GCC-PHAT in frequency domain:
  • Replace NCC with GCC-PHAT for better robustness in reverberant environments. You’ll need a small FFT (e.g., KISS FFT or esp-dsp) and PHAT weighting: R_xy = F^-1{X(f)Y(f)/|X(f)Y(f)|}.
  • PlatformIO + ESP-IDF can pull esp-dsp via a “components” folder or CMake fetch; integrate it for efficient FFT and filters.

  • Band-pass filtering:

  • Add a 2nd-order or 4th-order IIR band-pass around 600–4000 Hz to emphasize speech and reduce low-frequency room modes.

  • Windowing and overlap:

  • Use Hann windows and 50% overlap for smoother updates and reduced spectral leakage.

  • Multi-frame peak tracking:

  • Implement a Kalman filter or an alpha-beta tracker for angle to suppress jitter and spurious peaks.

  • Geometry calibration:

  • Calibrate exact mic spacing and offset by measuring TDOA for a sound at known positions, then fitting d and left/right latency.

  • Web or BLE UI:

  • Serve a tiny web page over Wi-Fi to display a polar needle in real time, or stream data over BLE UART.

  • More microphones and beam patterns:

  • With 3–4 mics, use conventional beamformers (Delay-and-Sum, Bartlett/Capon/MVDR) for improved spatial selectivity.

  • Power and performance:

  • Lower sample rate (e.g., 32 kHz) to reduce CPU usage if your application is speech-only.
  • Use PSRAM for larger buffers; pin memory types as needed.

Final Checklist

  • Hardware:
  • ESP32-S3-DevKitC-1 powered via USB-C.
  • Two INMP441 mics powered at 3.3 V, common GND.
  • BCLK wired to both mics and GPIO 7.
  • WS wired to both mics and GPIO 15.
  • SD from both mics tied together and wired to GPIO 16.
  • L/R strap: one mic to GND (Left), the other to 3.3 V (Right).
  • Mic spacing measured and set in code (MIC_BASELINE_M).

  • Software:

  • PlatformIO Core 6.1.13 installed.
  • platformio.ini uses espressif32@6.5.0 with ESP-IDF.
  • src/main.c compiled without errors; i2s_std used and channel enabled.

  • Run/Validation:

  • Serial output shows lag, tdoa, angle, peak correlation, RMS per channel.
  • Near-Left sound → negative angle; near-Right → positive angle.
  • 0° broadside yields angle near zero.
  • Angle stable with smoothing; adjust ANGLE_ALPHA as needed.

If all boxes are checked, you have a working two-mic I2S beamforming direction-finding demonstrator on the ESP32-S3-DevKitC-1, ready to evolve into a more sophisticated GCC-PHAT or multi-microphone beamformer.

Find this product and/or books on this topic on Amazon

Go to Amazon

As an Amazon Associate, I earn from qualifying purchases. If you buy through this link, you help keep this project running.

Quick Quiz

Question 1: What is the main purpose of the ESP32-S3-DevKitC-1 in the project?




Question 2: What type of microphones are used in the project?




Question 3: What does TDOA stand for?




Question 4: Which programming language is primarily used in this project?




Question 5: What kind of tests are used to validate the system?




Question 6: What is the formula for calculating the DOA angle?




Question 7: Which framework is emphasized for reproducibility in the workflow?




Question 8: What is the required sampling rate for the microphones?




Question 9: What is the spacing between the two microphones in the project?




Question 10: What is the speed of sound assumed in the calculations?




Carlos Núñez Zorrilla
Carlos Núñez Zorrilla
Electronics & Computer Engineer

Telecommunications Electronics Engineer and Computer Engineer (official degrees in Spain).

Follow me:
Scroll to Top