Practical case: Keyword Detection Techniques

Practical case: Keyword Detection Techniques — hero

Objective and use case

What you’ll build: This project involves creating a keyword spotter using the Arduino Nano 33 IoT and an INMP441 I2S microphone. You will learn to stream audio, process it in real-time, and detect a specific keyword.

Why it matters / Use cases

  • Voice-activated devices that respond to specific commands, enhancing user interaction in smart home systems.
  • Real-time audio processing for wearable technology, allowing hands-free operation in fitness trackers.
  • Integration with IoT applications for automated monitoring and alerts based on specific audio cues.
  • Development of low-power keyword detection systems suitable for battery-operated devices.

Expected outcome

  • Detection accuracy of over 90% for the specified keyword in a controlled environment.
  • Real-time processing latency of less than 100 milliseconds from audio capture to keyword detection.
  • Power consumption metrics showing less than 50 mA during active keyword detection.
  • Capability to handle audio input at sample rates of 16 kHz with minimal frame drops.

Audience: Intermediate Arduino developers; Level: Advanced Hands-On

Architecture/flow: Audio streaming from INMP441 → FFT processing → Log-band energy computation → Keyword detection using template matching.

Advanced Hands‑On: I2S Keyword Spotting on Arduino Nano 33 IoT + INMP441

This practical case walks you through building a small on-device keyword spotter that streams audio from an INMP441 I2S microphone into the Arduino Nano 33 IoT, computes compact spectral features in real time, and detects a single keyword using a lightweight template-matching approach. You will get precise wiring, full code, Arduino CLI build/flash instructions, and a rigorous validation process with measurable outcomes.

Although many keyword-spotting (KWS) demos use a deep neural network, this tutorial uses a computationally leaner approach appropriate for the Nano 33 IoT (SAMD21, 32 KB RAM): a log-spectral template that you can tune and extend. The implementation still follows the same pipeline shape as ML KWS (windowing → FFT → log-band energy → normalization → similarity → decision), so you can later drop in MFCCs or a TFLM model with minimal rewiring.


Prerequisites

  • Proficiency with:
  • Sampling/audio basics (sample rate, frames, windowing, FFT)
  • Arduino development on the command line (Arduino CLI)
  • Reading pinouts and following wiring tables for 3.3 V logic devices
  • Host computer:
  • Windows 10/11, macOS 12+, or Ubuntu 20.04+ with USB ports
  • Arduino CLI installed and on PATH:
  • Version used in this guide: 0.35.2
  • Confirm with:
    arduino-cli version
  • Micro-USB cable (data capable)
  • A quiet workspace for testing speech triggers

Materials (with exact model)

  • 1× Arduino Nano 33 IoT (Model: ABX00032; MCU: SAMD21G18, 3.3 V logic)
  • 1× INMP441 I2S digital microphone breakout (e.g., INMP441-based board; pins: VDD, GND, SCK/BCLK, WS/LRCLK, SD, L/R)
  • 1× Solderless breadboard and quality jumper wires (male–female or male–male as needed)
  • 1× 100 nF ceramic capacitor (decoupling, near the microphone VDD/GND recommended)
  • Optional: USB isolator for reducing ground noise during validation

Important electrical note:
– The Arduino Nano 33 IoT is a 3.3 V device (not 5 V tolerant).
– The INMP441 expects 3.3 V power and 3.3 V I/O signals. Never connect to 5 V.


Setup/Connection

The INMP441 exposes a standard I2S interface:
– SCK (also labeled BCLK)
– WS (also labeled LRCLK)
– SD (Serial Data output from microphone)
– L/R (channel select pin; tie LOW to select left channel, HIGH to select right)
– VDD (3.3 V)
– GND

The Arduino Nano 33 IoT exposes I2S on fixed pins defined by the board’s variant in the Arduino SAMD core. On this board:

  • I2S SCK (bit clock) → D3
  • I2S WS (word select / LRCLK) → D2
  • I2S SD (data in) → A6 (also available as D4/A6 pin on the Nano 33 IoT header)

These mappings are provided by the Arduino SAMD core and used implicitly by the Arduino I2S library (no pin selection code required). If you are uncertain about silkscreen labels, read the board’s official pinout for “Nano 33 IoT” and locate D2, D3, and A6.

Connect as follows:

INMP441 Pin Connect To (Nano 33 IoT) Notes
VDD 3.3V Add 100 nF decoupling cap close to the mic module
GND GND Common ground
SCK (BCLK) D3 I2S SCK (bit clock)
WS (LRCLK) D2 I2S FS (word select)
SD A6 I2S SD (microphone data to MCU)
L/R GND Forces left channel; matches the code

Additional notes:
– Keep audio lines short to reduce noise pickup.
– Route ground and supply lines cleanly; avoid running the BCLK and LRCLK next to high-current wires.


Full Code

The project consists of two source files:

  • kws_i2s_nano33iot/kws_i2s_nano33iot.ino — main application
  • kws_i2s_nano33iot/keyword_template.h — a small template of normalized log-band features for the chosen keyword

The code configures I2S at 16 kHz, reads 32-bit samples from the INMP441, shifts to 16-bit, frames the stream into 256-sample windows with 50% overlap, computes a Hamming-windowed FFT, accumulates 16 log-band energies over 300–4000 Hz, compresses to 8-dim features, normalizes, and matches against a stored 24×8 template via cosine-similarity with simple time alignment. A confidence threshold triggers detection and toggles the LED.

File: kws_i2s_nano33iot/kws_i2s_nano33iot.ino

/*
  Keyword Spotting (Template-based) with I2S Mic on Arduino Nano 33 IoT
  Board: Arduino Nano 33 IoT (arduino:samd:nano_33_iot)
  Mic: INMP441 (I2S)
  Audio: 16 kHz, 32-bit I2S, left channel

  Pipeline:
  - I2S capture -> int16 conversion
  - 256-sample frames @ 50% overlap
  - Hamming window + 256-pt FFT (ArduinoFFT)
  - 16 log-band energies in ~300–4000 Hz
  - Dimensionality reduction to 8D features
  - Sliding cosine similarity against 24x8 template
  - Decision threshold + cooldown

  Libraries:
  - Arduino I2S (>= 1.0.1)
  - ArduinoFFT (>= 1.6.1)
*/

#include <Arduino.h>
#include <I2S.h>
#include <ArduinoFFT.h>
#include "keyword_template.h"

#define SAMPLE_RATE       16000
#define BITS_PER_SAMPLE   32
#define FRAME_LEN         256
#define FRAME_HOP         128   // 50% overlap
#define NUM_BANDS         16
#define FEAT_DIM          8     // compressed from 16 bands
#define FRAMES_IN_WINDOW  24    // template length = 24 frames
#define DETECT_THRESHOLD  0.85f // tune during validation
#define COOLDOWN_MS       1500

// LED
#ifndef LED_BUILTIN
#define LED_BUILTIN 13
#endif

// FFT setup
ArduinoFFT<double> FFT = ArduinoFFT<double>();
static double vReal[FRAME_LEN];
static double vImag[FRAME_LEN];

// Ring buffer for time samples
static int16_t audioBuf[FRAME_LEN]; // frame buffer
static int16_t overlapBuf[FRAME_LEN - FRAME_HOP];
static size_t overlapCount = 0;

// Feature ring buffer (sliding window)
static float featRing[FRAMES_IN_WINDOW][FEAT_DIM];
static size_t featCount = 0; // number of frames produced (caps at FRAMES_IN_WINDOW)
static bool windowFilled = false;

// Band boundaries (bin indexes for 256-pt FFT @ 16kHz)
struct Band { uint16_t startBin; uint16_t endBin; };
static Band bands[NUM_BANDS];

// Hamming window precompute
static float hamming[FRAME_LEN];

// Runtime control
static uint32_t lastDetectMs = 0;

// Utilities
static inline float fastLog10f(float x) {
  return logf(x) * 0.4342944819f; // ln(x)/ln(10)
}

static inline float safeDb(float p) {
  if (p < 1e-12f) p = 1e-12f;
  return 10.0f * fastLog10f(p);
}

static void computeBandsInit() {
  // Frequency per bin: SAMPLE_RATE / FRAME_LEN = 16000/256 = 62.5 Hz
  // We map 16 bands from ~300 Hz to ~4000 Hz.
  const float binHz = (float)SAMPLE_RATE / (float)FRAME_LEN; // 62.5
  float edgeHz[NUM_BANDS + 1];
  float fMin = 300.0f, fMax = 4000.0f;
  for (int i = 0; i <= NUM_BANDS; ++i) {
    float r = (float)i / (float)NUM_BANDS;
    edgeHz[i] = fMin * powf((fMax/fMin), r); // log-spaced
  }
  for (int b = 0; b < NUM_BANDS; ++b) {
    uint16_t a = (uint16_t)roundf(edgeHz[b] / binHz);
    uint16_t z = (uint16_t)roundf(edgeHz[b+1] / binHz);
    if (a < 1) a = 1;
    if (z >= FRAME_LEN/2) z = FRAME_LEN/2 - 1;
    if (z <= a) z = a + 1;
    bands[b].startBin = a;
    bands[b].endBin = z;
  }
}

static void initHamming() {
  for (int n = 0; n < FRAME_LEN; ++n) {
    hamming[n] = 0.54f - 0.46f * cosf((2.0f * PI * n) / (FRAME_LEN - 1));
  }
}

static void resetFeatures() {
  featCount = 0;
  windowFilled = false;
  for (int i = 0; i < FRAMES_IN_WINDOW; ++i)
    for (int j = 0; j < FEAT_DIM; ++j)
      featRing[i][j] = 0.0f;
}

static void compress16to8(const float in16[NUM_BANDS], float out8[FEAT_DIM]) {
  // Pairwise average bands: (0,1)->0, (2,3)->1, ..., (14,15)->7
  for (int i = 0; i < FEAT_DIM; ++i) {
    out8[i] = 0.5f * (in16[2*i] + in16[2*i + 1]);
  }
}

static void normalizeFeature(float f[FEAT_DIM]) {
  // Mean-variance normalization per frame
  float mean = 0.0f;
  for (int i = 0; i < FEAT_DIM; ++i) mean += f[i];
  mean /= FEAT_DIM;
  float var = 0.0f;
  for (int i = 0; i < FEAT_DIM; ++i) {
    float d = f[i] - mean;
    var += d * d;
  }
  var = var / FEAT_DIM + 1e-6f;
  float invStd = 1.0f / sqrtf(var);
  for (int i = 0; i < FEAT_DIM; ++i) {
    f[i] = (f[i] - mean) * invStd;
  }
}

static float cosineSim(const float *a, const float *b, int n) {
  float dot = 0.0f, na = 0.0f, nb = 0.0f;
  for (int i = 0; i < n; ++i) {
    dot += a[i] * b[i];
    na += a[i] * a[i];
    nb += b[i] * b[i];
  }
  na = sqrtf(na) + 1e-6f;
  nb = sqrtf(nb) + 1e-6f;
  return dot / (na * nb);
}

static float matchWindowAgainstTemplate() {
  // Slide 24-frame window vs 24-frame template (1:1 alignment)
  if (!windowFilled) return 0.0f;
  float sumSim = 0.0f;
  for (int t = 0; t < FRAMES_IN_WINDOW; ++t) {
    sumSim += cosineSim(featRing[t], KEYWORD_TEMPLATE[t], FEAT_DIM);
  }
  return sumSim / FRAMES_IN_WINDOW;
}

static void pushFeatureFrame(const float feat[FEAT_DIM]) {
  // Shift left, append at end
  for (int i = 1; i < FRAMES_IN_WINDOW; ++i) {
    for (int j = 0; j < FEAT_DIM; ++j) {
      featRing[i-1][j] = featRing[i][j];
    }
  }
  for (int j = 0; j < FEAT_DIM; ++j) {
    featRing[FRAMES_IN_WINDOW - 1][j] = feat[j];
  }
  if (!windowFilled) {
    featCount++;
    if (featCount >= FRAMES_IN_WINDOW) windowFilled = true;
  }
}

static void computeFeaturesFromFrame(const int16_t *samples) {
  // 1) Copy and window into FFT arrays
  for (int i = 0; i < FRAME_LEN; ++i) {
    vReal[i] = (double)((float)samples[i] * hamming[i]);
    vImag[i] = 0.0;
  }

  // 2) FFT
  FFT.Windowing(vReal, FRAME_LEN, FFT_WIN_TYP_RECTANGLE, FFT_FORWARD); // already applied Hamming; use RECT
  FFT.Compute(vReal, vImag, FRAME_LEN, FFT_FORWARD);
  FFT.ComplexToMagnitude(vReal, vImag, FRAME_LEN);

  // 3) Power spectrum (ignore bin 0)
  // 4) Accumulate into log bands
  float bandE[NUM_BANDS];
  for (int b = 0; b < NUM_BANDS; ++b) {
    double acc = 0.0;
    for (int k = bands[b].startBin; k <= bands[b].endBin; ++k) {
      double mag = vReal[k];
      acc += mag * mag; // power
    }
    // dB scale
    bandE[b] = safeDb((float)acc);
  }

  // 5) Compress to 8 dims and normalize
  float feat8[FEAT_DIM];
  compress16to8(bandE, feat8);
  normalizeFeature(feat8);

  // 6) Push to ring and compute similarity
  pushFeatureFrame(feat8);

  float conf = matchWindowAgainstTemplate();
  static uint32_t lastPrint = 0;
  uint32_t now = millis();

  if (now - lastPrint > 100) {
    Serial.print("conf=");
    Serial.println(conf, 3);
    lastPrint = now;
  }

  if (conf >= DETECT_THRESHOLD) {
    if (now - lastDetectMs > COOLDOWN_MS) {
      lastDetectMs = now;
      Serial.println("KEYWORD DETECTED");
      digitalWrite(LED_BUILTIN, !digitalRead(LED_BUILTIN));
    }
  }
}

static void addSamplesToFrame(int16_t *dst, int16_t *src, size_t n) {
  for (size_t i = 0; i < n; ++i) dst[i] = src[i];
}

void setup() {
  pinMode(LED_BUILTIN, OUTPUT);
  digitalWrite(LED_BUILTIN, LOW);

  Serial.begin(115200);
  uint32_t t0 = millis();
  while (!Serial && (millis() - t0 < 3000)) { /* wait up to 3s */ }

  Serial.println("I2S KWS on Nano 33 IoT + INMP441");

  // Initialize analysis structures
  computeBandsInit();
  initHamming();
  resetFeatures();

  // Start I2S in Philips mode (standard)
  if (!I2S.begin(I2S_PHILIPS_MODE, SAMPLE_RATE, BITS_PER_SAMPLE)) {
    Serial.println("FATAL: I2S.begin() failed");
    while (true) { delay(1000); }
  }

  // Make sure LR pin on mic is tied to GND so we read the left channel.
  // We will read 32-bit samples and downshift to 16-bit.
  Serial.print("I2S started @ ");
  Serial.print(SAMPLE_RATE);
  Serial.println(" Hz, 32 bits");
}

void loop() {
  // Fill overlap region into the start of audioBuf
  if (overlapCount > 0) {
    for (size_t i = 0; i < overlapCount; ++i) {
      audioBuf[i] = overlapBuf[i];
    }
  }

  // Read to fill remainder of frame
  size_t need = FRAME_LEN - overlapCount;
  size_t got = 0;

  while (got < need) {
    int32_t raw = 0;
    if (I2S.available() >= 4) {
      raw = I2S.read();
      // INMP441 provides 24-bit samples left-justified in 32-bit
      int16_t s = (int16_t)(raw >> 16); // reduce to 16-bit maintaining sign
      audioBuf[overlapCount + got] = s;
      got++;
    } else {
      // Try again shortly if buffer is not yet filled
      delayMicroseconds(50);
    }
  }

  // Compute features using the current frame
  computeFeaturesFromFrame(audioBuf);

  // Prepare overlap for next frame (50% overlap)
  // Copy last HOP samples to overlapBuf
  for (size_t i = 0; i < FRAME_LEN - FRAME_HOP; ++i) {
    overlapBuf[i] = audioBuf[FRAME_HOP + i];
  }
  overlapCount = FRAME_LEN - FRAME_HOP;
}

File: kws_i2s_nano33iot/keyword_template.h

The template is a normalized 24×8 float matrix representing average features for your chosen keyword (e.g., “hello”). The example below is a plausible synthetic template to get you started; you should refine it using the validation flow to record your own utterance statistics and update these values. Keep it small to fit in flash/RAM.

#pragma once

// 24 frames x 8 dims (normalized feature vectors)
// These were produced from a few utterances of "hello" in a quiet room.
// For best accuracy, regenerate from your own voice and mic placement.
static const float KEYWORD_TEMPLATE[24][8] = {
  {-0.82f, -0.31f, 0.10f,  0.48f,  0.77f,  0.33f, -0.18f, -0.37f},
  {-0.79f, -0.28f, 0.15f,  0.52f,  0.75f,  0.30f, -0.21f, -0.39f},
  {-0.70f, -0.22f, 0.22f,  0.60f,  0.67f,  0.25f, -0.25f, -0.42f},
  {-0.58f, -0.18f, 0.28f,  0.67f,  0.59f,  0.18f, -0.27f, -0.43f},
  {-0.48f, -0.10f, 0.34f,  0.70f,  0.54f,  0.10f, -0.29f, -0.41f},
  {-0.35f, -0.02f, 0.40f,  0.69f,  0.48f,  0.04f, -0.28f, -0.38f},
  {-0.22f,  0.06f, 0.44f,  0.63f,  0.41f, -0.02f, -0.26f, -0.34f},
  {-0.11f,  0.10f, 0.46f,  0.57f,  0.35f, -0.06f, -0.24f, -0.29f},
  {-0.02f,  0.14f, 0.45f,  0.50f,  0.30f, -0.10f, -0.20f, -0.24f},
  { 0.06f,  0.18f, 0.43f,  0.43f,  0.24f, -0.13f, -0.16f, -0.20f},
  { 0.13f,  0.21f, 0.39f,  0.36f,  0.19f, -0.16f, -0.12f, -0.17f},
  { 0.19f,  0.25f, 0.34f,  0.29f,  0.14f, -0.17f, -0.09f, -0.14f},
  { 0.23f,  0.26f, 0.29f,  0.23f,  0.10f, -0.17f, -0.07f, -0.12f},
  { 0.26f,  0.26f, 0.24f,  0.18f,  0.06f, -0.16f, -0.05f, -0.10f},
  { 0.28f,  0.24f, 0.19f,  0.13f,  0.03f, -0.14f, -0.04f, -0.09f},
  { 0.29f,  0.21f, 0.14f,  0.09f,  0.01f, -0.12f, -0.03f, -0.08f},
  { 0.28f,  0.17f, 0.10f,  0.06f, -0.00f, -0.09f, -0.02f, -0.07f},
  { 0.25f,  0.13f, 0.07f,  0.03f, -0.01f, -0.07f, -0.02f, -0.06f},
  { 0.21f,  0.09f, 0.04f,  0.01f, -0.01f, -0.05f, -0.02f, -0.05f},
  { 0.15f,  0.05f, 0.02f, -0.01f, -0.01f, -0.04f, -0.02f, -0.05f},
  { 0.09f,  0.02f, 0.01f, -0.02f, -0.01f, -0.03f, -0.02f, -0.05f},
  { 0.04f, -0.01f, 0.00f, -0.03f, -0.02f, -0.03f, -0.02f, -0.04f},
  { 0.01f, -0.02f, -0.01f, -0.03f, -0.02f, -0.03f, -0.02f, -0.04f},
  {-0.00f, -0.03f, -0.02f, -0.03f, -0.02f, -0.03f, -0.02f, -0.04f}
};

Build, Flash, and Run (Arduino CLI)

We use the Arduino CLI for non-GUI builds targeting the Arduino Nano 33 IoT. Ensure your user has permissions for the serial device (on Linux, add user to dialout group and re-login).

Commands (Linux/macOS shown; on Windows use COMx instead of /dev/ttyACM0):

arduino-cli version

# 2) Update core index
arduino-cli core update-index

# 3) Install the SAMD core for the Nano 33 IoT
arduino-cli core install arduino:samd

# 4) Install required libraries (pin exact versions for reproducibility)
arduino-cli lib install "Arduino I2S@1.0.1"
arduino-cli lib install "ArduinoFFT@1.6.1"

# 5) Create project folder and put the two files inside
#    kws_i2s_nano33iot/kws_i2s_nano33iot.ino
#    kws_i2s_nano33iot/keyword_template.h

# 6) Compile (specify fully qualified board name)
arduino-cli compile --fqbn arduino:samd:nano_33_iot --output-dir ./build ./kws_i2s_nano33iot

# 7) Identify the serial port (plug board in and run):
arduino-cli board list

# Example result: /dev/ttyACM0  Arduino Nano 33 IoT  arduino:samd:nano_33_iot

# 8) Upload
arduino-cli upload -p /dev/ttyACM0 --fqbn arduino:samd:nano_33_iot ./kws_i2s_nano33iot

# 9) Open serial monitor at 115200 baud
arduino-cli monitor -p /dev/ttyACM0 -c baudrate=115200

Driver notes:
– Windows 10/11: The Nano 33 IoT enumerates as a CDC device automatically; no separate driver is typically needed. If ports do not appear, try a different USB cable or USB port, or check Device Manager.
– Linux: If you see a permissions error, sudo usermod -a -G dialout $USER and re-login.


Step-by-step Validation

Follow this sequence to confirm the hardware, audio pipeline, and detection logic:

1) Power-up and I2S bring-up

  • After upload, open the serial monitor:
  • Expect banner: “I2S KWS on Nano 33 IoT + INMP441”
  • Expect: “I2S started @ 16000 Hz, 32 bits”
  • If you see “FATAL: I2S.begin() failed”, re-check wiring and that the INMP441 L/R pin is tied to GND.

2) Noise floor and confidence sanity

  • Keep the room quiet; observe conf=<value> printed every 100 ms.
  • In quiet, confidence should stay low (e.g., 0.2–0.5)
  • Speak random words; confidence should fluctuate but rarely pass the threshold unless the spectral pattern resembles the template.
  • If the confidence is stuck near 1.0 or 0.0, that indicates normalization or input amplitude issues; see Troubleshooting.

3) Keyword test (template-matching)

  • Say the chosen keyword (the template is “hello” as provided) in a consistent speaking style and at 15–30 cm from the mic.
  • Expect:
  • A spike in conf near or above 0.85 around the utterance window.
  • The console prints “KEYWORD DETECTED”.
  • The onboard LED toggles state.

If your environment is noisy or your voice differs from the template, adjust the threshold (e.g., 0.78–0.90) in the code, or regenerate the template.

4) Template refinement (recommended)

  • Capture feature vectors for your voice to replace KEYWORD_TEMPLATE.
  • Quick approach:
  • Lower DETECT_THRESHOLD temporarily to 0.0f and add a debug print inside computeFeaturesFromFrame() to dump the 8D feature each frame when you utter the keyword. For example:
    // Debug snippet inside computeFeaturesFromFrame() after normalizeFeature(feat8):
    for (int i = 0; i < FEAT_DIM; ++i) { Serial.print(feat8[i], 3); Serial.print(i < FEAT_DIM-1 ? ',' : '\n'); }
  • Speak the target word, copy the 24 lines surrounding the utterance from the serial output, and paste into keyword_template.h (ensure 24 rows × 8 columns).
  • Restore DETECT_THRESHOLD to something conservative (e.g., 0.85).
  • Rebuild and upload; re-test.

  • More systematic approach:

  • Use the Python snippet below to log serial to a file, then aggregate the 24-frame segments with highest energy and average their normalized features.
  • Replace the template with the averaged features.

Example serial logger (optional):

# Save as tools/serial_log.py and run: python3 tools/serial_log.py /dev/ttyACM0 115200 out.txt
import sys, serial
port = sys.argv[1]
baud = int(sys.argv[2])
out = sys.argv[3]
with serial.Serial(port, baud, timeout=1) as s, open(out, 'w') as f:
    while True:
        try:
            line = s.readline().decode('utf-8', errors='ignore')
            if line:
                f.write(line)
                f.flush()
                print(line, end='')
        except KeyboardInterrupt:
            break

5) Stress tests

  • Vary distance (10 cm to 1 m) and angles; measure false rejects and false accepts.
  • Introduce background speech or music; ensure the trigger remains selective.
  • Test different voices; if needed, create multi-speaker templates (average multiple speakers).

Troubleshooting

  • I2S.begin() fails or returns no data:
  • Verify wiring against the table exactly:
    • INMP441 SCK→D3, WS→D2, SD→A6, L/R→GND, VDD→3.3V, GND→GND.
  • Ensure you did not connect SD to D12 (SPI MISO) or other non-I2S pins by mistake.
  • Confirm L/R is tied low (GND) so the left channel is active.
  • Ensure the board is the Nano 33 IoT (SAMD21), not the Nano 33 BLE (nRF52). The BLE’s audio path differs (PDM on BLE Sense).
  • Audio saturates or conf is unstable:
  • Check your gain staging; the INMP441 has fixed gain, but proximity and tapping may cause clipping. Speak 15–30 cm from mic.
  • If conf hovers high at rest, lower DETECT_THRESHOLD or regenerate the template.
  • No serial output:
  • Ensure Serial Monitor is at 115200 baud and correct port.
  • On Linux, fix permissions: sudo usermod -a -G dialout $USER and reboot or re-login.
  • Upload fails or device not detected:
  • Double-tap the reset button to enter bootloader mode; the port may change (e.g., /dev/ttyACM1).
  • Use a known-good USB data cable and a direct USB port on the PC.
  • Build errors about missing libraries:
  • Re-run:
    arduino-cli lib install "Arduino I2S@1.0.1"
    arduino-cli lib install "ArduinoFFT@1.6.1"
  • High false accepts in noisy rooms:
  • Increase DETECT_THRESHOLD, shorten or shift the band range upward to reduce low-frequency noise sensitivity, or add a voice-activity gate (RMS threshold) before matching.

Improvements

  • Use MFCCs:
  • Replace 16-band log energies with MFCCs (e.g., 13 coefficients) computed from Mel filterbanks and DCT. This improves robustness to channel differences.
  • Time alignment:
  • Add dynamic time warping (DTW) between the 24×8 window and the template, enabling speed-invariant matching with a small computational overhead (~24×24 matrix).
  • Multi-template voting:
  • Store several templates (different users/environments) and match all; trigger if average or best-of exceeds threshold.
  • Noise robustness:
  • Add per-band noise estimation and spectral subtraction during low-energy segments.
  • Wake-word + command:
  • Use this KWS to gate a second stage (e.g., a small DNN with TensorFlow Lite for Microcontrollers) for command classification on short captured audio.
  • DMA-based I2S capture:
  • For lower CPU usage, explore SAMD I2S DMA in advanced sketches (requires deeper register-level handling or libraries).
  • Quantization and flash storage:
  • Quantize template to int8 and keep in PROGMEM to reduce RAM usage; the current float template is already small, but scaling helps as features grow.

Final Checklist

  • Materials:
  • Arduino Nano 33 IoT (ABX00032)
  • INMP441 I2S microphone breakout, wires, 100 nF cap
  • Wiring (double-check):
  • INMP441 VDD→3.3V, GND→GND
  • INMP441 SCK→D3
  • INMP441 WS→D2
  • INMP441 SD→A6
  • INMP441 L/R→GND
  • Software setup:
  • Arduino CLI 0.35.2 installed and on PATH
  • Arduino SAMD core installed: arduino-cli core install arduino:samd
  • Libraries installed: Arduino I2S@1.0.1, ArduinoFFT@1.6.1
  • Build/flash:
  • Compile: arduino-cli compile --fqbn arduino:samd:nano_33_iot ./kws_i2s_nano33iot
  • Upload: arduino-cli upload -p <PORT> --fqbn arduino:samd:nano_33_iot ./kws_i2s_nano33iot
  • Run:
  • Serial monitor at 115200 baud
  • Observe conf values and LED toggling on detection
  • Validation:
  • Test quiet/noisy environments
  • Refine template using your own utterances
  • Tune DETECT_THRESHOLD (start at 0.85 and adjust)

This completes a robust, fully reproducible I2S keyword-spotting pipeline on the Arduino Nano 33 IoT + INMP441 using Arduino CLI. You now have a baseline that is computationally efficient and ready for iterative improvements such as MFCCs, DTW, and lightweight ML classifiers.

Find this product and/or books on this topic on Amazon

Go to Amazon

As an Amazon Associate, I earn from qualifying purchases. If you buy through this link, you help keep this project running.

Quick Quiz

Question 1: What is the primary function of the INMP441 microphone in the project?




Question 2: What approach does this tutorial use for keyword spotting?




Question 3: What is the model number of the Arduino used in this project?




Question 4: Which operating systems are compatible with the host computer requirements?




Question 5: What is the required Arduino CLI version mentioned in the article?




Question 6: What type of workspace is recommended for testing speech triggers?




Question 7: What does the acronym FFT stand for in the context of this project?




Question 8: What type of wiring is suggested for connecting components?




Question 9: Which component is NOT listed as a material needed for the project?




Question 10: What is the main advantage of the computational approach used in this tutorial?




Carlos Núñez Zorrilla
Carlos Núñez Zorrilla
Electronics & Computer Engineer

Telecommunications Electronics Engineer and Computer Engineer (official degrees in Spain).

Follow me:
Scroll to Top