Practical case: Recyclables classifier with TensorRT

Objective and use case

What you’ll build: A real-time recyclables classifier on a Jetson Nano 4GB using a CSI Arducam IMX219 8MP and a TensorRT-optimized ONNX model. The app captures live frames, runs accelerated inference, and maps ImageNet predictions to Paper/Plastic/Metal/Glass with on-screen labels and confidence.

Why it matters / Use cases

Smart bins: Automatically route items (e.g., plastic bottle vs. metal can) to the correct chute; ~20 FPS with <80 ms latency enables responsive sorting and LED/servo triggers.
Assisted recycling education: Live overlay shows “Plastic (92%)” and a green check; offline operation on Nano improves classroom demos without internet.
Factory pre-sorting: Low-cost camera + Nano flags low-confidence items (<0.6) for manual QA on lines moving ~0.5 m/s; 15–20 FPS supports timely highlighting.
Edge kiosks: Lobby or cafeteria stations that display Paper/Plastic/Metal/Glass with confidence; runs in 5–10 W power budgets and logs simple stats locally.

Expected outcome

Stable live camera feed at 15–30 FPS at 1280×720 capture resolution.
End-to-end latency of ~50–90 ms per frame (exposure → overlay) with FP16 TensorRT and 224×224 inference.
Resource profile: ~60–80% GPU, <40% total CPU, 1.5–3.0 GB RAM; sustained operation >1 hour without dropped frames.
Displayed classification with confidence; top-1 recyclable category accuracy ≥80% on a small validation set (bottle/can/jar/newspaper).
Power draw within Nano 10 W mode; device temperature maintained <70°C with a small fan.

Audience: Makers, students, and edge AI/robotics developers; Level: Intermediate (basic Linux + Jetson + Python/C++).

Architecture/flow: CSI camera (IMX219) → GStreamer (nvarguscamerasrc) → zero-copy CUDA buffer → resize/normalize → TensorRT engine (FP16, ONNX) → softmax → map top-1 to Paper/Plastic/Metal/Glass → overlay (text/confidence) → display; optional GPIO/REST hooks.

Prerequisites

A microSD-flashed Jetson Nano 4GB developer kit with JetPack (L4T) on Ubuntu (headless or with display).
Internet access to download the ONNX model and label file.
Basic Linux shell usage and Python 3 familiarity.

Verify JetPack (L4T), kernel, and installed packages:

cat /etc/nv_tegra_release
jetson_release -v

uname -a
dpkg -l | grep -E 'nvidia|tensorrt'

Expect to see L4T release (e.g., R32.x for JetPack 4.x, or R35.x for JetPack 5.x), aarch64 kernel, and packages such as nvidia-l4t-tensorrt and libnvinfer.

Materials (with exact model)

Jetson Nano 4GB Developer Kit (P3450, B01 preferred for dual CSI).
Camera: Arducam IMX219 8MP (imx219), CSI-2 ribbon cable.
microSD card (≥ 64 GB, UHS-I recommended).
5V/4A DC power supply (barrel jack) or 5V/3A USB-C (ensure adequate current).
Active cooling (recommended for sustained MAXN): 5V fan or heatsink+fan.
Network: Ethernet or USB Wi-Fi for package/model downloads.
Optional: USB keyboard/mouse and HDMI display (not required for the tutorial).

Setup/Connection

1) Power and cooling
– Connect the 5V/4A PSU to the Nano. Add a 5V fan to the 5V/GND header to prevent thermal throttling in MAXN.

2) CSI camera installation (Arducam IMX219 8MP)
– Power off the Nano.
– Locate CAM0 (preferred) CSI connector. Lift the locking tab gently.
– Insert the ribbon cable with the blue side facing the Ethernet/USB ports (the contacts face the connector’s contacts).
– Fully seat the cable and press down the locking tab.
– Power on the Nano.

3) Confirm the camera works with GStreamer
– Test capture and save one JPEG (no GUI needed):

gst-launch-1.0 -e nvarguscamerasrc sensor-id=0 num-buffers=1 \
  ! nvvidconv ! video/x-raw,format=I420 \
  ! jpegenc ! filesink location=camera_test.jpg

If successful, camera_test.jpg should appear and be a non-zero size.

4) Confirm GPU and power settings
– Check current power mode:

sudo nvpmodel -q

For performance benchmarking (watch thermals), set MAXN and max clocks:

sudo nvpmodel -m 0
sudo jetson_clocks

You can revert later:

sudo nvpmodel -m 1
sudo systemctl restart nvargus-daemon

5) Install needed packages
– TensorRT Python bindings are included with JetPack; add OpenCV, NumPy, and PyCUDA:

sudo apt-get update
sudo apt-get install -y python3-opencv python3-numpy python3-pycuda

Confirm imports in Python:

python3 - << 'PY'
import cv2, numpy, tensorrt as trt, pycuda.driver as cuda
print("OK: OpenCV", cv2.__version__)
print("OK: TRT", trt.__version__)
PY

Full Code

We will:
– Download SqueezeNet 1.1 ONNX (fast, good for Nano).
– Build a TensorRT engine (FP16).
– Run a Python app that captures camera frames via GStreamer, performs preprocessing, executes inference with TensorRT, decodes top-5 ImageNet labels, and maps them to recyclables categories using keyword rules.

Directory layout:
– models/: ONNX model, TensorRT engine, labels.
– scripts/: Python runtime.

1) Prepare folders and downloads:

mkdir -p ~/jetson-recyclables/{models,scripts}
cd ~/jetson-recyclables/models

# SqueezeNet 1.1 ONNX (opset 7) from ONNX Model Zoo
wget -O squeezenet1.1-7.onnx \
  https://github.com/onnx/models/raw/main/vision/classification/squeezenet/model/squeezenet1.1-7.onnx

# Simple English labels for ImageNet-1k
wget -O imagenet_labels.json \
  https://raw.githubusercontent.com/anishathalye/imagenet-simple-labels/master/imagenet-simple-labels.json

2) Build the TensorRT engine with trtexec (FP16):

cd ~/jetson-recyclables/models
/usr/src/tensorrt/bin/trtexec \
  --onnx=squeezenet1.1-7.onnx \
  --saveEngine=squeezenet1.1_fp16.engine \
  --fp16 \
  --workspace=1024 \
  --verbose

Note:
– We omit explicit input names since SqueezeNet has fixed 1×3×224×224 input in this model.
– Adjust workspace if you face memory limits.

3) Python runtime (scripts/run_classifier.py)

This script:
– Opens CSI camera with nvarguscamerasrc via OpenCV+GStreamer.
– Loads the TensorRT engine and binds buffers with PyCUDA.
– Preprocesses to 224×224 with ImageNet mean/std normalization.
– Runs inference and maps the top-5 ImageNet labels to recyclables classes with keyword rules.
– Prints per-frame classification and rolling FPS.

# ~/jetson-recyclables/scripts/run_classifier.py
import os
import time
import json
import ctypes
import numpy as np
import cv2

import tensorrt as trt
import pycuda.driver as cuda
import pycuda.autoinit  # Initializes CUDA driver context

MODELS_DIR = os.path.expanduser('~/jetson-recyclables/models')
ENGINE_PATH = os.path.join(MODELS_DIR, 'squeezenet1.1_fp16.engine')
IMAGENET_LABELS_JSON = os.path.join(MODELS_DIR, 'imagenet_labels.json')

# Recyclables mapping via keyword heuristics (basic-level approximation)
MAPPING_RULES = {
    "plastic": ["plastic", "bottle", "water bottle", "shampoo", "detergent", "packet", "cup"],
    "metal":   ["can", "aluminium", "aluminum", "tin", "steel", "iron"],
    "glass":   ["glass", "wine bottle", "beer bottle", "goblet", "vase", "jar"],
    "paper":   ["newspaper", "magazine", "bookshop", "book", "paper", "notebook", "carton", "envelope", "tissue"]
}

# GStreamer pipeline for CSI camera
def gstreamer_pipeline(sensor_id=0, capture_width=1280, capture_height=720,
                       display_width=1280, display_height=720, framerate=30, flip_method=0):
    return (
        "nvarguscamerasrc sensor-id={} ! "
        "video/x-raw(memory:NVMM), width={}, height={}, framerate={}/1, format=NV12 ! "
        "nvvidconv flip-method={} ! "
        "video/x-raw, width={}, height={}, format=BGRx ! "
        "videoconvert ! "
        "video/x-raw, format=BGR ! appsink drop=true max-buffers=1"
    ).format(sensor_id, capture_width, capture_height, framerate,
             flip_method, display_width, display_height)

def load_labels(path):
    with open(path, 'r') as f:
        labels = json.load(f)
    assert len(labels) == 1000, "Expected 1000 ImageNet labels"
    return labels

def build_trt_context(engine_path):
    logger = trt.Logger(trt.Logger.WARNING)
    trt.init_libnvinfer_plugins(logger, '')
    with open(engine_path, 'rb') as f, trt.Runtime(logger) as runtime:
        engine = runtime.deserialize_cuda_engine(f.read())
    context = engine.create_execution_context()
    # Identify I/O bindings
    bindings = []
    host_inputs = []
    cuda_inputs = []
    host_outputs = []
    cuda_outputs = []
    for i in range(engine.num_bindings):
        name = engine.get_binding_name(i)
        dtype = trt.nptype(engine.get_binding_dtype(i))
        is_input = engine.binding_is_input(i)
        shape = context.get_binding_shape(i)
        size = trt.volume(shape)
        host_mem = cuda.pagelocked_empty(size, dtype)
        device_mem = cuda.mem_alloc(host_mem.nbytes)
        bindings.append(int(device_mem))
        if is_input:
            host_inputs.append(host_mem)
            cuda_inputs.append(device_mem)
            input_shape = shape
            input_name = name
        else:
            host_outputs.append(host_mem)
            cuda_outputs.append(device_mem)
            output_shape = shape
            output_name = name
    stream = cuda.Stream()
    io = {
        "engine": engine, "context": context, "bindings": bindings, "stream": stream,
        "host_inputs": host_inputs, "cuda_inputs": cuda_inputs,
        "host_outputs": host_outputs, "cuda_outputs": cuda_outputs,
        "input_shape": input_shape, "output_shape": output_shape,
        "input_name": input_name, "output_name": output_name
    }
    return io

def preprocess_bgr_to_nchw_224(bgr):
    # Center-crop shortest side, resize to 224x224, convert BGR->RGB, normalize
    h, w = bgr.shape[:2]
    side = min(h, w)
    y0 = (h - side) // 2
    x0 = (w - side) // 2
    cropped = bgr[y0:y0+side, x0:x0+side]
    resized = cv2.resize(cropped, (224, 224), interpolation=cv2.INTER_LINEAR)
    rgb = cv2.cvtColor(resized, cv2.COLOR_BGR2RGB).astype(np.float32) / 255.0
    mean = np.array([0.485, 0.456, 0.406], dtype=np.float32)
    std  = np.array([0.229, 0.224, 0.225], dtype=np.float32)
    norm = (rgb - mean) / std
    chw = np.transpose(norm, (2, 0, 1))  # C,H,W
    return chw

def softmax(x):
    x = x - np.max(x)
    e = np.exp(x)
    return e / np.sum(e)

def map_to_recyclable(labels_topk):
    # labels_topk: list of (label_str, prob)
    label_texts = [l.lower() for l, _ in labels_topk]
    for category, keys in MAPPING_RULES.items():
        for key in keys:
            if any(key in txt for txt in label_texts):
                return category
    return "unknown"

def infer_trt(io, input_chw):
    # Copy to host (float32) and device
    inp = input_chw.astype(np.float32).ravel()
    np.copyto(io["host_inputs"][0], inp)
    cuda.memcpy_htod_async(io["cuda_inputs"][0], io["host_inputs"][0], io["stream"])
    # Inference
    io["context"].execute_async_v2(bindings=io["bindings"], stream_handle=io["stream"].handle)
    # Copy back
    cuda.memcpy_dtoh_async(io["host_outputs"][0], io["cuda_outputs"][0], io["stream"])
    io["stream"].synchronize()
    out = np.array(io["host_outputs"][0])
    return out  # 1000 logits

def main():
    labels = load_labels(IMAGENET_LABELS_JSON)
    io = build_trt_context(ENGINE_PATH)

    cap = cv2.VideoCapture(gstreamer_pipeline(), cv2.CAP_GSTREAMER)
    if not cap.isOpened():
        raise RuntimeError("Could not open CSI camera via GStreamer")

    print("Starting recyclables classifier. Press Ctrl+C to stop.")
    frame_count = 0
    t0 = time.time()
    rolling = []
    try:
        while True:
            ok, frame = cap.read()
            if not ok:
                print("Frame grab failed; retrying...")
                continue

            prep = preprocess_bgr_to_nchw_224(frame)
            t_infer0 = time.time()
            logits = infer_trt(io, prep)
            probs = softmax(logits)
            top5_idx = probs.argsort()[-5:][::-1]
            top5 = [(labels[i], float(probs[i])) for i in top5_idx]
            category = map_to_recyclable(top5)
            t_infer1 = time.time()
            latency_ms = (t_infer1 - t_infer0) * 1000.0

            frame_count += 1
            rolling.append(latency_ms)
            if len(rolling) > 100:
                rolling.pop(0)
            avg_ms = sum(rolling)/len(rolling)
            fps = 1000.0 / avg_ms if avg_ms > 0 else 0.0

            # Log concise output
            print(f"[#{frame_count:04d}] {category.upper()} | "
                  f"Top-1: {top5[0][0]} ({top5[0][1]*100:.1f}%) | latency {latency_ms:.1f} ms | est FPS {fps:.1f}")
    except KeyboardInterrupt:
        pass
    finally:
        cap.release()
        dt = time.time() - t0
        if dt > 0:
            print(f"Frames: {frame_count}, Avg FPS: {frame_count/dt:.2f}")

if __name__ == "__main__":
    main()

Save the script and make it executable:

chmod +x ~/jetson-recyclables/scripts/run_classifier.py

Build/Flash/Run commands

1) Verify platform and packages:

cat /etc/nv_tegra_release
uname -a
dpkg -l | grep -E 'nvidia|tensorrt'

2) Set performance mode (watch thermals):

sudo nvpmodel -m 0
sudo jetson_clocks

3) Camera sanity check:

gst-launch-1.0 -e nvarguscamerasrc sensor-id=0 num-buffers=1 \
  ! nvvidconv ! video/x-raw,format=I420 \
  ! jpegenc ! filesink location=camera_test.jpg

4) Model + engine:

cd ~/jetson-recyclables/models
wget -O squeezenet1.1-7.onnx \
  https://github.com/onnx/models/raw/main/vision/classification/squeezenet/model/squeezenet1.1-7.onnx
wget -O imagenet_labels.json \
  https://raw.githubusercontent.com/anishathalye/imagenet-simple-labels/master/imagenet-simple-labels.json

/usr/src/tensorrt/bin/trtexec \
  --onnx=squeezenet1.1-7.onnx \
  --saveEngine=squeezenet1.1_fp16.engine \
  --fp16 --workspace=1024

5) Start tegrastats in a second terminal to observe performance:

sudo tegrastats

6) Run the recyclables classifier:

python3 ~/jetson-recyclables/scripts/run_classifier.py

Stop with Ctrl+C. The script prints total frames and average FPS on exit.

Step-by-step Validation

1) JetPack and TensorRT presence
– A valid L4T string (e.g., R32.7.4 or R35.x) and dpkg listing libnvinferX confirm TensorRT install.

2) Camera path and exposure
– camera_test.jpg exists and opens (size > 50 KB): confirms nvarguscamerasrc pipeline is working and the imx219 sensor is recognized by the argus stack.
– If not, see Troubleshooting.

3) Engine build validation (trtexec)
– trtexec should end with a summary like:
– Input shape: 1x3x224x224
– Average on 200 runs: XX ms, QPS: YY, Latency: …
– If FPS (1 / latency) is around 100–200 for SqueezeNet FP16 on Nano, your GPU acceleration is working.

4) Live inference metrics
– Start tegrastats in one terminal, run the script in another. You should see:
– GPU (GR3D) utilization spikes >50% during inference.
– EMC (memory controller) activity increases.
– RAM stays within 1–2 GB usage (depending on other processes).
– Example tegrastats snippet during run:
– RAM 1800/3964MB (lfb 450x4MB) SWAP 0/1024MB
– GPU GR3D_FREQ 921MHz GR3D% 65
– EMC_FREQ 1600MHz EMC% 45

5) Throughput and latency
– In the script output, confirm:
– latency ~20–30 ms (per-frame inference on 224×224) → 33–50 FPS inference stage.
– End-to-end estimate shows 30–40 FPS if not limited by capture/resize.
– Expected results on Jetson Nano 4GB in MAXN with SqueezeNet FP16:
– Inference latency: 15–35 ms
– Est FPS: 28–50
– If much lower, check power mode and jetson_clocks.

6) Classification sanity checks
– Hold up a plastic bottle: see PLASTIC category frequently (due to keywords “bottle”, “water bottle”).
– Show metal can: METAL category common (keyword “can”).
– Show a glass jar or bottle: GLASS category appears.
– Show paper/cardboard: PAPER category appears (“book”, “notebook”, “envelope”, “carton”).
– Items not mapping: category “unknown” will be printed (expected).

7) Power revert and cleanup
– Stop the script (Ctrl+C).
– Stop tegrastats.
– Revert to a lower power profile if desired:

sudo nvpmodel -m 1

Troubleshooting

Camera not detected / pipeline fails
Ensure ribbon orientation (blue side outward) and the connector lock is fully engaged.
Confirm sensor-id=0 or try sensor-id=1 if you are on B01 and CAM0 is empty.
Restart the argus service:
sudo systemctl restart nvargus-daemon
Check dmesg for imx219-related logs:
dmesg | grep -i imx219
nvarguscamerasrc works once, then errors
Argus may have a stuck session; restart it:
sudo systemctl restart nvargus-daemon
Avoid running multiple camera apps simultaneously.
PyCUDA import error
Install via apt:
sudo apt-get install -y python3-pycuda
Or fall back to pip (can be slower to build):
sudo apt-get install -y python3-pip python3-dev
pip3 install pycuda
trtexec cannot parse ONNX
Re-download the ONNX:
rm -f squeezenet1.1-7.onnx && wget …
Ensure TensorRT version supports the opset (7). JetPack 4.x/5.x should handle it.
Low FPS
Confirm MAXN and max clocks:
sudo nvpmodel -m 0 && sudo jetson_clocks
Reduce camera resolution to 640×480 in gstreamer_pipeline to lower preprocessing cost.
Use smaller flip_method and avoid GUI windows.
Ensure no heavy background processes (close browsers/IDEs).
Memory errors
Ensure workspace=1024 is not too large for engine build; try 512.
Close other Python programs and free memory.
Avoid keeping large image buffers in Python.
Classification mismatches
This tutorial uses ImageNet labels + keyword mapping (heuristic). For production-grade recyclables classification, fine-tune on recyclables datasets (e.g., TrashNet-like) and export a 4-class ONNX, then rebuild the TensorRT engine.

Improvements

Use a domain-specific ONNX classifier
Train/fine-tune MobileNetV2/ResNet18 on recyclables (paper/plastic/metal/glass) and export to ONNX.
Replace squeezenet1.1-7.onnx with your trained 4-class model; update labels file accordingly (4 lines).
Rebuild engine:
/usr/src/tensorrt/bin/trtexec –onnx=your_model.onnx –saveEngine=recyclables_fp16.engine –fp16 –workspace=1024
INT8 optimization (if you prepare a calibration cache)
Collect a small calibration dataset (e.g., 100–500 images).
Build with INT8:
/usr/src/tensorrt/bin/trtexec –onnx=your_model.onnx –int8 –calib= –saveEngine=recyclables_int8.engine –workspace=1024
INT8 can further reduce latency on Nano when calibrations are correct.
DeepStream pipeline (future path B)
Convert the model to a DeepStream nvinfer config and build a zero-copy pipeline with nvstreammux → nvinfer → nvdsosd for overlays.
This offers multi-stream and low-overhead GStreamer integration.
Hardware I/O integration
Drive a relay or servomotor to actuate a sorter based on the category.
Publish results to MQTT for dashboards.
Better mapping logic
Use a lightweight rules engine or a small secondary classifier that maps 1k ImageNet embeddings to recyclables classes.

Reference table

Item	Command/Path	Expected/Notes
Check JetPack/L4T	cat /etc/nv_tegra_release	Shows L4T release (R32.x/R35.x)
List TRT packages	dpkg -l \| grep -E ‘nvidia	tensorrt’
Set MAXN	sudo nvpmodel -m 0	Enables highest power mode
Max clocks	sudo jetson_clocks	Locks clocks to max (monitor thermals)
tegrastats	sudo tegrastats	Live CPU/GPU/EMC/mem stats
Test camera	gst-launch-1.0 nvarguscamerasrc … ! jpegenc ! filesink	Produces camera_test.jpg
Download ONNX	wget squeezenet1.1-7.onnx	Fixed 1×3×224×224 input
Build TRT engine	/usr/src/tensorrt/bin/trtexec –onnx=… –fp16 –saveEngine=…	Outputs squeezenet1.1_fp16.engine
Run app	python3 scripts/run_classifier.py	Live console classification and FPS

Improvements to robustness and validation

Run timed inference with trtexec (no camera) for a baseline:

cd ~/jetson-recyclables/models
/usr/src/tensorrt/bin/trtexec --loadEngine=squeezenet1.1_fp16.engine --fp16 --duration=30

Expect stable latency (e.g., ~5–10 ms per inference on Nano for SqueezeNet). This isolates the model performance from camera+preprocessing pipeline.
Capture and replay frames:
Record a short burst of frames (e.g., 100 JPEGs) and feed them to a file-based inference script for reproducibility without camera variability.

Final Checklist

Objective achieved
Live camera capture from Arducam IMX219 (imx219) over CSI.
TensorRT FP16 engine built from ONNX and executed on GPU.
Real-time classification with console output and recyclables category mapping.
Quantitative validation
Reported per-frame latency and rolling FPS from the script (target ≥ 30 FPS inference).
tegrastats shows GPU utilization during run (target ≥ 50% GR3D under load).
trtexec baseline latency captured and documented.
Commands reproducibility
All commands provided for setup, engine build, and runtime.
Power mode steps and revert commands included.
Camera sanity test given via gst-launch-1.0.
Materials and connections
Exact model: Jetson Nano 4GB + Arducam IMX219 8MP (imx219).
CSI connection instructions detailed (orientation, slot).
No circuit drawings; text-only instructions and code.
Next steps (optional)
Replace ImageNet model with a 4-class recyclables ONNX.
Explore INT8 and DeepStream for further gains.
Integrate actuator/MQTT for sorter or dashboard.

By following the steps above, you have a working tensorrt-recyclables-classifier on the Jetson Nano 4GB + Arducam IMX219 8MP (imx219) that demonstrates the complete edge AI loop: acquisition, acceleration, interpretation, and measurable performance on-device.

Find this product and/or books on this topic on Amazon

Go to Amazon

As an Amazon Associate, I earn from qualifying purchases. If you buy through this link, you help keep this project running.

Carlos Núñez Zorrilla

Electronics & Computer Engineer

Telecommunications Electronics Engineer and Computer Engineer (official degrees in Spain).

Follow me:

Who we are