Practical case: HM01B0 + Portenta H7 People Counter

Objective and use case

What you’ll build: A real-time edge people counter using the Arduino Portenta H7 and the Himax HM01B0 Vision Shield, implementing efficient on-device processing for counting distinct individuals.

Why it matters / Use cases

Smart building management systems that track occupancy in real-time to optimize energy usage.
Retail analytics to gather foot traffic data for improving store layouts and marketing strategies.
Event management solutions that monitor crowd sizes for safety and compliance with regulations.
Public transportation systems that analyze passenger flow to enhance service efficiency.
Healthcare facilities that manage patient flow in waiting areas to improve service delivery.

Expected outcome

Live streaming of people counts with less than 100 ms latency.
Detection accuracy of over 90% in varied lighting conditions.
Ability to process and count up to 10 individuals simultaneously.
Diagnostics output including frame processing time and count validation messages.
Low power consumption, operating under 500 mW during peak processing.

Audience: Developers and engineers interested in edge computing; Level: Advanced

Architecture/flow: On-device processing using C++ with background subtraction and connected-components analysis.

Camera-Edge People Counting on Arduino Portenta H7 + Portenta Vision Shield (Himax HM01B0)

This hands-on advanced case builds a real-time edge people counter on the Arduino Portenta H7 paired with the Portenta Vision Shield (Himax HM01B0). You will implement a lightweight on-device background subtraction and connected-components pipeline in C++ to detect and count distinct moving people in the camera’s field of view, without sending frames to a host PC. We will use PlatformIO (CLI) to build and flash the firmware for the M7 core.

The end result will:
– Stream live counts and diagnostics over the USB serial port.
– Run entirely on the device at low resolution for efficiency.
– Provide a reproducible validation procedure to confirm the counter’s correctness.

Prerequisites

You are comfortable with:
PlatformIO CLI basics (project init, build, upload, serial monitor).
C++ on embedded targets and basic memory constraints.
Basic image processing concepts (thresholding, morphological operations, connected components).
Host OS: Windows 10/11, macOS 12+, or Ubuntu 20.04+.
PlatformIO Core installed:
Recommended: PlatformIO Core 6.1.x or newer.
Install via Python’s pip: pip install -U platformio or follow PlatformIO docs.
A data-capable USB-C cable (charge-only cables will not work).

Driver notes:
– Windows: The Portenta H7 appears as a USB serial device (Mbed serial). Windows 10/11 typically installs WinUSB/USB-CDC automatically. If you see “Unknown device,” update the driver via Windows Update or use Zadig to bind WinUSB to the Mbed serial interface. No CP210x/CH34x drivers are needed.
– macOS/Linux: No extra drivers typically required. Ensure you have permission to access serial ports (on Linux: add user to dialout group or use udev rules).

Materials

Item	Exact model	Notes
Microcontroller board	Arduino Portenta H7	Use M7 core for application code
Camera shield	Portenta Vision Shield (Himax HM01B0)	Either Ethernet or LoRa variant; both use HM01B0 grayscale camera
USB cable	USB-C data cable	Must support data
Host computer	Windows/macOS/Linux	With PlatformIO Core
Optional fixtures	Tripod/stand	To stabilize the camera during validation

Setup/Connection

Stack the Portenta Vision Shield firmly onto the Portenta H7:
Align the high-density connectors; the camera lens faces outward from the board.
Ensure there is no gap and both connectors are fully seated—misalignment can cause I/O failures.
Connect the Portenta H7 USB-C data port to your computer.
Power is supplied via USB-C; no external power required for this project.
Lighting:
Use a well-lit environment to improve silhouette separation.
Avoid strong backlighting that causes low contrast on grayscale frames.

Notes:
– The Himax HM01B0 is a low-power grayscale sensor. To keep processing light, we will use a 160×160 resolution capture.
– This demo does not require Ethernet/LoRa functionality; the shield variant doesn’t matter as long as it includes HM01B0.

Full Code

This firmware:
– Initializes the HM01B0 camera at 160×160 (grayscale).
– Maintains a running-average background model in RAM.
– Computes frame differencing, thresholds to a binary foreground mask.
– Applies a tiny morphological cleanup (dilate then erode) to remove noise.
– Runs a two-pass connected components labeling (CCL) to count blobs.
– Ignores blobs smaller than a configurable area threshold to reduce false positives.
– Streams counts and performance metrics over serial at 115200 bps.

Place this file at src/main.cpp in your PlatformIO project.

#include <Arduino.h>

// Attempt to support the standard Arduino library for the Portenta Vision Shield HM01B0.
// Make sure PlatformIO installs arduino-libraries/Arduino_HM01B0 (see platformio.ini).
#include <Arduino_HM01B0.h>

// Configuration parameters
static const uint16_t IMG_W = 160;
static const uint16_t IMG_H = 160;
static const uint32_t SERIAL_BAUD = 115200;

// Background model parameters
static const float BG_LEARN_RATE = 0.02f;   // 0..1, higher learns background faster
static const uint8_t DIFF_THRESHOLD = 25;   // grayscale difference threshold
static const uint16_t MIN_BLOB_AREA = 80;   // adjust after validation; depends on scene/scale
static const uint8_t MORPH_ITER = 1;        // one iteration of 3x3 dilate followed by erode

// Frame buffers
static uint8_t frame[IMG_W * IMG_H];
static uint8_t fgMask[IMG_W * IMG_H];       // 0 or 255
static uint8_t tmpMask[IMG_W * IMG_H];      // scratch for morphology
static float background[IMG_W * IMG_H];     // running-average background

// Camera object
HM01B0 himax;

// Utilities
static inline uint32_t idx(uint16_t x, uint16_t y) { return y * IMG_W + x; }

// Simple 3x3 dilation
static void dilate3x3(const uint8_t* in, uint8_t* out)
{
  for (uint16_t y = 0; y < IMG_H; ++y) {
    for (uint16_t x = 0; x < IMG_W; ++x) {
      uint8_t m = 0;
      for (int dy = -1; dy <= 1; ++dy) {
        int yy = (int)y + dy;
        if (yy < 0 || yy >= (int)IMG_H) continue;
        for (int dx = -1; dx <= 1; ++dx) {
          int xx = (int)x + dx;
          if (xx < 0 || xx >= (int)IMG_W) continue;
          m = (in[idx(xx, yy)] > m) ? in[idx(xx, yy)] : m;
          if (m == 255) break; // early out
        }
      }
      out[idx(x, y)] = m;
    }
  }
}

// Simple 3x3 erosion
static void erode3x3(const uint8_t* in, uint8_t* out)
{
  for (uint16_t y = 0; y < IMG_H; ++y) {
    for (uint16_t x = 0; x < IMG_W; ++x) {
      uint8_t m = 255;
      for (int dy = -1; dy <= 1; ++dy) {
        int yy = (int)y + dy;
        if (yy < 0 || yy >= (int)IMG_H) { m = 0; break; }
        for (int dx = -1; dx <= 1; ++dx) {
          int xx = (int)x + dx;
          if (xx < 0 || xx >= (int)IMG_W) { m = 0; break; }
          uint8_t v = in[idx(xx, yy)];
          if (v < m) m = v;
        }
        if (m == 0) break; // early out
      }
      out[idx(x, y)] = m;
    }
  }
}

// Two-pass connected components labeling (4-connectivity)
static uint16_t connectedComponents(const uint8_t* binary, uint16_t* labels, uint16_t maxLabels)
{
  // Very small union-find for labeling; memory-constrained but OK for 160x160
  static uint16_t parent[IMG_W * IMG_H / 2]; // upper bound on labels; conservative
  uint16_t nextLabel = 1;

  // Initialize labels to 0
  for (uint32_t i = 0; i < (uint32_t)IMG_W * IMG_H; ++i) labels[i] = 0;

  auto uf_find = [&](uint16_t a) {
    while (parent[a] != a) {
      parent[a] = parent[parent[a]];
      a = parent[a];
    }
    return a;
  };

  auto uf_union = [&](uint16_t a, uint16_t b) {
    a = uf_find(a);
    b = uf_find(b);
    if (a < b) parent[b] = a;
    else if (b < a) parent[a] = b;
  };

  // Pass 1: provisional labels and equivalences
  for (uint16_t y = 0; y < IMG_H; ++y) {
    for (uint16_t x = 0; x < IMG_W; ++x) {
      if (binary[idx(x, y)] == 0) continue; // background
      uint16_t up    = (y > 0)           ? labels[idx(x, y-1)] : 0;
      uint16_t left  = (x > 0)           ? labels[idx(x-1, y)] : 0;
      uint16_t label = 0;

      if (up == 0 && left == 0) {
        // New label
        if (nextLabel >= maxLabels) continue; // out of labels, silently ignore
        label = nextLabel;
        parent[label] = label;
        nextLabel++;
      } else if (up != 0 && left == 0) {
        label = up;
      } else if (up == 0 && left != 0) {
        label = left;
      } else { // both non-zero
        label = (up < left) ? up : left;
        if (up != left) uf_union(up, left);
      }
      labels[idx(x, y)] = label;
    }
  }

  // Pass 2: resolve equivalences
  for (uint16_t y = 0; y < IMG_H; ++y) {
    for (uint16_t x = 0; x < IMG_W; ++x) {
      uint16_t l = labels[idx(x, y)];
      if (l) labels[idx(x, y)] = uf_find(l);
    }
  }

  // Compaction: relabel to 1..N
  // Count number of labels
  // We'll map root label -> compact label
  const uint16_t MAX_LABELS = 4096; // safety
  static uint16_t mapRoot[4097];    // 0..4096 inclusive
  for (uint16_t i = 0; i <= 4096; ++i) mapRoot[i] = 0;

  uint16_t nLabels = 0;
  for (uint32_t i = 0; i < (uint32_t)IMG_W * IMG_H; ++i) {
    uint16_t l = labels[i];
    if (l) {
      uint16_t root = l;
      if (root <= 4096 && mapRoot[root] == 0) {
        nLabels++;
        mapRoot[root] = nLabels;
      }
      if (root <= 4096) labels[i] = mapRoot[root];
    }
  }
  return nLabels;
}

static void initBackground(const uint8_t* img)
{
  for (uint32_t i = 0; i < (uint32_t)IMG_W * IMG_H; ++i) {
    background[i] = (float)img[i];
  }
}

static void updateForegroundAndBackground(const uint8_t* img, uint8_t* mask)
{
  for (uint32_t i = 0; i < (uint32_t)IMG_W * IMG_H; ++i) {
    float bg = background[i];
    float v  = (float)img[i];
    float diff = fabsf(v - bg);
    mask[i] = (diff >= DIFF_THRESHOLD) ? 255 : 0;
    // Update running average background
    background[i] = (1.0f - BG_LEARN_RATE) * bg + BG_LEARN_RATE * v;
  }
}

static uint16_t countBlobs(uint8_t* mask, uint16_t minArea)
{
  // Morphology: dilate then erode to close small gaps
  const uint8_t* in = mask;
  for (uint8_t i = 0; i < MORPH_ITER; ++i) {
    dilate3x3(in, tmpMask);
    in = tmpMask;
  }
  for (uint8_t i = 0; i < MORPH_ITER; ++i) {
    erode3x3(in, mask);
    in = mask;
  }

  // Connected components
  static uint16_t labels[IMG_W * IMG_H];
  uint16_t nLabels = connectedComponents(mask, labels, 4096);

  // Compute areas
  static uint16_t areas[4097]; // label -> area
  for (uint16_t i = 0; i <= 4096; ++i) areas[i] = 0;
  for (uint32_t i = 0; i < (uint32_t)IMG_W * IMG_H; ++i) {
    uint16_t l = labels[i];
    if (l) areas[l]++;
  }

  // Count blobs above area threshold
  uint16_t count = 0;
  for (uint16_t l = 1; l <= nLabels; ++l) {
    if (areas[l] >= minArea) count++;
  }
  return count;
}

void setup()
{
  Serial.begin(SERIAL_BAUD);
  while (!Serial && millis() < 3000) { /* wait for host */ }

  // Initialize camera
  if (!himax.begin()) {
    Serial.println("ERROR: HM01B0 begin() failed");
    while (1) { delay(1000); }
  }

  // Try to set resolution to 160x160 if available
  // Many HM01B0 drivers offer discrete modes; if your library names differ, adjust here.
  if (!himax.setResolution(HM01B0::RESOLUTION_160X160)) {
    Serial.println("WARN: 160x160 resolution not supported by driver; trying default.");
  }

  // Optional: set frame rate if your library supports it
  // himax.setFrameRate(HM01B0::FPS_30);

  // Prime background with initial frames
  Serial.println("Priming background model...");
  for (int i = 0; i < 5; ++i) {
    int bytes = himax.readFrame(frame);
    if (bytes <= 0) {
      Serial.println("ERROR: Failed to read frame during priming");
      while (1) { delay(1000); }
    }
    delay(50);
  }
  // One more frame to initialize background array
  if (himax.readFrame(frame) <= 0) {
    Serial.println("ERROR: Failed to read frame for background init");
    while (1) { delay(1000); }
  }
  initBackground(frame);

  Serial.println("Ready. Streaming people counts...");
  Serial.println("CSV header: ms,count,fg_pixels,proc_ms");
}

void loop()
{
  uint32_t t0 = millis();

  int bytes = himax.readFrame(frame);
  if (bytes <= 0) {
    Serial.println("ERROR: readFrame failed");
    delay(50);
    return;
  }

  // Foreground mask and background update
  updateForegroundAndBackground(frame, fgMask);

  // Count number of foreground pixels (for diagnostics)
  uint32_t fgPixels = 0;
  for (uint32_t i = 0; i < (uint32_t)IMG_W * IMG_H; ++i) {
    if (fgMask[i]) fgPixels++;
  }

  // Blob counting
  uint16_t peopleCount = countBlobs(fgMask, MIN_BLOB_AREA);

  uint32_t t1 = millis();
  uint32_t procMs = (t1 >= t0) ? (t1 - t0) : 0;

  // Stream as CSV: timestamp, people_count, fg_pixels, processing_ms
  Serial.print(millis());
  Serial.print(",");
  Serial.print(peopleCount);
  Serial.print(",");
  Serial.print(fgPixels);
  Serial.print(",");
  Serial.println(procMs);

  // Target ~10–15 FPS depending on processing time and scene
  // Adjust delay as needed. If procMs is large, you can set delay(0).
  delay(20);
}

Notes:
– The class/method names assume Arduino’s Arduino_HM01B0 library for the Portenta Vision Shield. If your installed library uses slightly different names (e.g., grab() instead of readFrame() or different resolution enum), adjust those calls. The rest of the pipeline (background subtraction, morphology, CCL) is portable.

Build/Flash/Run Commands (PlatformIO)

We will target the M7 core of the STM32H747 on Portenta H7.

1) Initialize a new PlatformIO project for Portenta H7, M7 core:

pio project init --board portenta_h7_m7

2) Edit platformio.ini to match the following (replace the entire file):

[env:portenta_h7_m7]
platform = ststm32
board = portenta_h7_m7
framework = arduino

; Serial monitor
monitor_speed = 115200

; Use a predictable upload method. For Portenta H7, double-tap reset
; to enter the mbed mass-storage bootloader if needed.
upload_protocol = mbed

; Library dependencies
lib_deps =
  arduino-libraries/Arduino_HM01B0 @ ^1.0.3

; Optional: faster build output
build_flags =
  -O2

3) Place the firmware in src/main.cpp (from the Full Code section).

4) Build:

pio run

5) Put the board in bootloader mode (only if upload fails automatically):
– Double-press the reset button quickly. The board should enumerate as a mass-storage device (e.g., PORTENTA).
– Then run upload:

pio run -t upload

6) Open the serial monitor:

pio device monitor -b 115200

You should see CSV lines like:

ms,count,fg_pixels,proc_ms
1532,0,412,9
1554,1,2289,10
1576,1,2305,10
...

If you do not see output, press the reset button once while the serial monitor is open.

Step-by-step Validation

Follow these steps to validate the people counting logic in increasingly complex scenarios. Keep the device camera steady (use a stand) and ensure consistent lighting.

1) Baseline (empty scene)
– Aim the camera at a static background (e.g., a wall or an empty corridor).
– Observe serial output for 5–10 seconds.
– Expected:
– count should settle at 0.
– fg_pixels near 0 except for minor noise (hundreds at most).
– proc_ms typically 7–20 ms depending on host noise, power, and scene.

2) Single person enters
– Have one person walk into the field of view and stop.
– Expected:
– fg_pixels spikes as the person enters.
– After morphology and blob analysis, count should go to 1 once they are stationary.
– Slight lag is normal as the background model and morphology stabilize.

3) Two people, well separated
– Add a second person a clear distance away from the first (avoid overlap).
– Expected:
– count should increase to 2.
– If not, increase separation or adjust MIN_BLOB_AREA down slightly if your people appear small in frame.

4) Partial occlusion
– Have two people stand closer so their silhouettes overlap slightly.
– Expected:
– count may drop to 1 due to merged blobs; this is a known limitation without advanced segmentation.
– To mitigate, angle the camera to minimize overlaps or move farther away to reduce merging.

5) Motion robustness
– Have one person walk across the frame.
– Expected:
– During motion, count may fluctuate 0↔1 transiently; when they stop, it should stabilize at 1.
– If flicker is heavy, slightly increase MORPH_ITER or lower DIFF_THRESHOLD to keep the moving silhouette more coherent.

6) Lighting changes
– Turn a light on/off or open a window.
– Expected:
– Brief fg_pixels spike but count should return to baseline.
– If slow drift causes false positives, increase BG_LEARN_RATE so background adapts faster.

7) Log capture for offline inspection (optional)
– You can pipe serial to a CSV file for later plotting/analysis:
– Linux/macOS:
pio device monitor -b 115200 --raw > people_count_log.csv
– Windows PowerShell:
pio device monitor -b 115200 --raw | Tee-Object -FilePath people_count_log.csv

Optional host script to parse and print rolling counts (replace COMx with your port):

import sys, serial
from collections import deque

port = sys.argv[1] if len(sys.argv) > 1 else "/dev/ttyACM0"
ser = serial.Serial(port, 115200, timeout=1)
buf = deque(maxlen=10)
print("Connected to", port)
while True:
    line = ser.readline().decode(errors='ignore').strip()
    if not line or line.startswith("ms,"):
        continue
    try:
        ms, count, fg, proc = line.split(",")
        c = int(count)
        buf.append(c)
        avg = sum(buf)/len(buf)
        print(f"t={ms} ms count={c} avg10={avg:.2f}")
    except Exception:
        pass

Troubleshooting

No serial output
Ensure the right port:
- Windows: check Device Manager under “Ports (COM & LPT)” for “USB Serial Device (COMx)” or “Arduino Portenta H7”.
- macOS: try /dev/tty.usbmodem*.
- Linux: try /dev/ttyACM0 or /dev/ttyACM1.
Use: pio device list to enumerate serial ports.
Press reset once while the monitor is open.
Upload fails
Double-press reset to enter the mbed mass-storage bootloader (a drive named “PORTENTA” or similar appears), then run pio run -t upload again.
Try a different USB-C cable and port. Avoid USB hubs when possible.
Camera readFrame() errors
Reseat the Vision Shield—ensure connectors are fully aligned and pressed.
Power-cycle the board.
Confirm library is installed: Arduino_HM01B0.
Reduce resolution if supported; otherwise keep the default and update buffer dimensions accordingly.
Counts always 0
Reduce DIFF_THRESHOLD from 25 to 15–20.
Increase ambient light or reduce backlight.
Verify fg_pixels changes when you move: if fg_pixels is near zero, the threshold is too high.
Counts unstable (flicker)
Increase MORPH_ITER from 1 to 2 (costs CPU).
Raise MIN_BLOB_AREA to ignore small noise.
Decrease BG_LEARN_RATE if the background adapts too quickly and erodes moving silhouettes.
Multiple people merge into one blob
Camera angle: elevate slightly to separate people’s silhouettes.
Reduce MIN_BLOB_AREA.
Increase resolution (if RAM allows) and adjust buffer sizes and morphology accordingly.
Performance issues (proc_ms too high)
Reduce resolution to 128×128 (if the driver supports it) to cut compute and memory.
Decrease MORPH_ITER.
Optimize thresholds for your lighting to avoid heavy noise.

Improvements

Bidirectional line counting
Define a virtual line. Track blob centroids across frames and increment “in” or “out” counts as centroids cross the line. You can keep a small history of blob centroids and match by nearest-neighbor.
Smarter segmentation
Replace background subtraction with a tiny neural model (e.g., 96×96 person detector) using TensorFlow Lite for Microcontrollers. Run bounding-box detection and count boxes above confidence thresholds. This is more robust to lighting changes and merges but adds flash/RAM overhead.
Adaptive thresholds
Compute an Otsu-like threshold per frame or maintain a running variance of the background to adapt DIFF_THRESHOLD dynamically.
Region-of-interest (ROI)
Process only the central area or a corridor zone to reduce both compute and false positives.
Dual-core partitioning
Offload lower-rate background model maintenance to M4 while M7 handles the image pipeline at a fixed rate. Requires inter-core communication primitives in the Portenta environment.
Telemetry/export
Publish counts over Ethernet/LoRa (depending on shield variant) to a backend (MQTT/HTTP). Use batching to limit bandwidth.
On-device logging
Maintain a small ring buffer of counts and timestamps in RAM or external storage to support offline audits.

Final Checklist

Hardware
Arduino Portenta H7 stacked with Portenta Vision Shield (Himax HM01B0)
Stable USB-C data connection to host
Adequate lighting in the test environment
Software
PlatformIO Core installed and accessible in your shell
Project initialized with board = portenta_h7_m7
platformio.ini includes upload_protocol = mbed and Arduino_HM01B0 library dependency
src/main.cpp added with the full firmware code
Build/Flash
pio run completes without errors
Upload via pio run -t upload (use double-tap reset if needed)
Run/Validate
Serial monitor at 115200 bps shows CSV: ms,count,fg_pixels,proc_ms
Empty scene => count ≈ 0
One person => stable count ≈ 1
Two separated people => stable count ≈ 2
Adjust DIFF_THRESHOLD, MIN_BLOB_AREA, MORPH_ITER, and BG_LEARN_RATE as needed

By following this guide, you achieve a functional, real-time camera-edge people counter running entirely on the Arduino Portenta H7 + Portenta Vision Shield (Himax HM01B0), built and deployed with PlatformIO, and validated step by step for reliability in your specific environment.

Find this product and/or books on this topic on Amazon

Go to Amazon

As an Amazon Associate, I earn from qualifying purchases. If you buy through this link, you help keep this project running.

Carlos Núñez Zorrilla

Electronics & Computer Engineer

Telecommunications Electronics Engineer and Computer Engineer (official degrees in Spain).

Follow me:

Who we are