Practical case: Recyclables classifier with TensorRT

Practical case: Recyclables classifier with TensorRT — hero

Objective and use case

What you’ll build: A real-time recyclables classifier on a Jetson Nano 4GB using a CSI Arducam IMX219 8MP and a TensorRT-optimized ONNX model. The app captures live frames, runs accelerated inference, and maps ImageNet predictions to Paper/Plastic/Metal/Glass with on-screen labels and confidence.

Why it matters / Use cases

  • Smart bins: Automatically route items (e.g., plastic bottle vs. metal can) to the correct chute; ~20 FPS with <80 ms latency enables responsive sorting and LED/servo triggers.
  • Assisted recycling education: Live overlay shows “Plastic (92%)” and a green check; offline operation on Nano improves classroom demos without internet.
  • Factory pre-sorting: Low-cost camera + Nano flags low-confidence items (<0.6) for manual QA on lines moving ~0.5 m/s; 15–20 FPS supports timely highlighting.
  • Edge kiosks: Lobby or cafeteria stations that display Paper/Plastic/Metal/Glass with confidence; runs in 5–10 W power budgets and logs simple stats locally.

Expected outcome

  • Stable live camera feed at 15–30 FPS at 1280×720 capture resolution.
  • End-to-end latency of ~50–90 ms per frame (exposure → overlay) with FP16 TensorRT and 224×224 inference.
  • Resource profile: ~60–80% GPU, <40% total CPU, 1.5–3.0 GB RAM; sustained operation >1 hour without dropped frames.
  • Displayed classification with confidence; top-1 recyclable category accuracy ≥80% on a small validation set (bottle/can/jar/newspaper).
  • Power draw within Nano 10 W mode; device temperature maintained <70°C with a small fan.

Audience: Makers, students, and edge AI/robotics developers; Level: Intermediate (basic Linux + Jetson + Python/C++).

Architecture/flow: CSI camera (IMX219) → GStreamer (nvarguscamerasrc) → zero-copy CUDA buffer → resize/normalize → TensorRT engine (FP16, ONNX) → softmax → map top-1 to Paper/Plastic/Metal/Glass → overlay (text/confidence) → display; optional GPIO/REST hooks.

Prerequisites

  • A microSD-flashed Jetson Nano 4GB developer kit with JetPack (L4T) on Ubuntu (headless or with display).
  • Internet access to download the ONNX model and label file.
  • Basic Linux shell usage and Python 3 familiarity.

Verify JetPack (L4T), kernel, and installed packages:

cat /etc/nv_tegra_release
jetson_release -v

uname -a
dpkg -l | grep -E 'nvidia|tensorrt'

Expect to see L4T release (e.g., R32.x for JetPack 4.x, or R35.x for JetPack 5.x), aarch64 kernel, and packages such as nvidia-l4t-tensorrt and libnvinfer.

Materials (with exact model)

  • Jetson Nano 4GB Developer Kit (P3450, B01 preferred for dual CSI).
  • Camera: Arducam IMX219 8MP (imx219), CSI-2 ribbon cable.
  • microSD card (≥ 64 GB, UHS-I recommended).
  • 5V/4A DC power supply (barrel jack) or 5V/3A USB-C (ensure adequate current).
  • Active cooling (recommended for sustained MAXN): 5V fan or heatsink+fan.
  • Network: Ethernet or USB Wi-Fi for package/model downloads.
  • Optional: USB keyboard/mouse and HDMI display (not required for the tutorial).

Setup/Connection

1) Power and cooling
– Connect the 5V/4A PSU to the Nano. Add a 5V fan to the 5V/GND header to prevent thermal throttling in MAXN.

2) CSI camera installation (Arducam IMX219 8MP)
– Power off the Nano.
– Locate CAM0 (preferred) CSI connector. Lift the locking tab gently.
– Insert the ribbon cable with the blue side facing the Ethernet/USB ports (the contacts face the connector’s contacts).
– Fully seat the cable and press down the locking tab.
– Power on the Nano.

3) Confirm the camera works with GStreamer
– Test capture and save one JPEG (no GUI needed):

gst-launch-1.0 -e nvarguscamerasrc sensor-id=0 num-buffers=1 \
  ! nvvidconv ! video/x-raw,format=I420 \
  ! jpegenc ! filesink location=camera_test.jpg
  • If successful, camera_test.jpg should appear and be a non-zero size.

4) Confirm GPU and power settings
– Check current power mode:

sudo nvpmodel -q
  • For performance benchmarking (watch thermals), set MAXN and max clocks:
sudo nvpmodel -m 0
sudo jetson_clocks
  • You can revert later:
sudo nvpmodel -m 1
sudo systemctl restart nvargus-daemon

5) Install needed packages
– TensorRT Python bindings are included with JetPack; add OpenCV, NumPy, and PyCUDA:

sudo apt-get update
sudo apt-get install -y python3-opencv python3-numpy python3-pycuda
  • Confirm imports in Python:
python3 - << 'PY'
import cv2, numpy, tensorrt as trt, pycuda.driver as cuda
print("OK: OpenCV", cv2.__version__)
print("OK: TRT", trt.__version__)
PY

Full Code

We will:
– Download SqueezeNet 1.1 ONNX (fast, good for Nano).
– Build a TensorRT engine (FP16).
– Run a Python app that captures camera frames via GStreamer, performs preprocessing, executes inference with TensorRT, decodes top-5 ImageNet labels, and maps them to recyclables categories using keyword rules.

Directory layout:
– models/: ONNX model, TensorRT engine, labels.
– scripts/: Python runtime.

1) Prepare folders and downloads:

mkdir -p ~/jetson-recyclables/{models,scripts}
cd ~/jetson-recyclables/models

# SqueezeNet 1.1 ONNX (opset 7) from ONNX Model Zoo
wget -O squeezenet1.1-7.onnx \
  https://github.com/onnx/models/raw/main/vision/classification/squeezenet/model/squeezenet1.1-7.onnx

# Simple English labels for ImageNet-1k
wget -O imagenet_labels.json \
  https://raw.githubusercontent.com/anishathalye/imagenet-simple-labels/master/imagenet-simple-labels.json

2) Build the TensorRT engine with trtexec (FP16):

cd ~/jetson-recyclables/models
/usr/src/tensorrt/bin/trtexec \
  --onnx=squeezenet1.1-7.onnx \
  --saveEngine=squeezenet1.1_fp16.engine \
  --fp16 \
  --workspace=1024 \
  --verbose

Note:
– We omit explicit input names since SqueezeNet has fixed 1×3×224×224 input in this model.
– Adjust workspace if you face memory limits.

3) Python runtime (scripts/run_classifier.py)

This script:
– Opens CSI camera with nvarguscamerasrc via OpenCV+GStreamer.
– Loads the TensorRT engine and binds buffers with PyCUDA.
– Preprocesses to 224×224 with ImageNet mean/std normalization.
– Runs inference and maps the top-5 ImageNet labels to recyclables classes with keyword rules.
– Prints per-frame classification and rolling FPS.

# ~/jetson-recyclables/scripts/run_classifier.py
import os
import time
import json
import ctypes
import numpy as np
import cv2

import tensorrt as trt
import pycuda.driver as cuda
import pycuda.autoinit  # Initializes CUDA driver context

MODELS_DIR = os.path.expanduser('~/jetson-recyclables/models')
ENGINE_PATH = os.path.join(MODELS_DIR, 'squeezenet1.1_fp16.engine')
IMAGENET_LABELS_JSON = os.path.join(MODELS_DIR, 'imagenet_labels.json')

# Recyclables mapping via keyword heuristics (basic-level approximation)
MAPPING_RULES = {
    "plastic": ["plastic", "bottle", "water bottle", "shampoo", "detergent", "packet", "cup"],
    "metal":   ["can", "aluminium", "aluminum", "tin", "steel", "iron"],
    "glass":   ["glass", "wine bottle", "beer bottle", "goblet", "vase", "jar"],
    "paper":   ["newspaper", "magazine", "bookshop", "book", "paper", "notebook", "carton", "envelope", "tissue"]
}

# GStreamer pipeline for CSI camera
def gstreamer_pipeline(sensor_id=0, capture_width=1280, capture_height=720,
                       display_width=1280, display_height=720, framerate=30, flip_method=0):
    return (
        "nvarguscamerasrc sensor-id={} ! "
        "video/x-raw(memory:NVMM), width={}, height={}, framerate={}/1, format=NV12 ! "
        "nvvidconv flip-method={} ! "
        "video/x-raw, width={}, height={}, format=BGRx ! "
        "videoconvert ! "
        "video/x-raw, format=BGR ! appsink drop=true max-buffers=1"
    ).format(sensor_id, capture_width, capture_height, framerate,
             flip_method, display_width, display_height)

def load_labels(path):
    with open(path, 'r') as f:
        labels = json.load(f)
    assert len(labels) == 1000, "Expected 1000 ImageNet labels"
    return labels

def build_trt_context(engine_path):
    logger = trt.Logger(trt.Logger.WARNING)
    trt.init_libnvinfer_plugins(logger, '')
    with open(engine_path, 'rb') as f, trt.Runtime(logger) as runtime:
        engine = runtime.deserialize_cuda_engine(f.read())
    context = engine.create_execution_context()
    # Identify I/O bindings
    bindings = []
    host_inputs = []
    cuda_inputs = []
    host_outputs = []
    cuda_outputs = []
    for i in range(engine.num_bindings):
        name = engine.get_binding_name(i)
        dtype = trt.nptype(engine.get_binding_dtype(i))
        is_input = engine.binding_is_input(i)
        shape = context.get_binding_shape(i)
        size = trt.volume(shape)
        host_mem = cuda.pagelocked_empty(size, dtype)
        device_mem = cuda.mem_alloc(host_mem.nbytes)
        bindings.append(int(device_mem))
        if is_input:
            host_inputs.append(host_mem)
            cuda_inputs.append(device_mem)
            input_shape = shape
            input_name = name
        else:
            host_outputs.append(host_mem)
            cuda_outputs.append(device_mem)
            output_shape = shape
            output_name = name
    stream = cuda.Stream()
    io = {
        "engine": engine, "context": context, "bindings": bindings, "stream": stream,
        "host_inputs": host_inputs, "cuda_inputs": cuda_inputs,
        "host_outputs": host_outputs, "cuda_outputs": cuda_outputs,
        "input_shape": input_shape, "output_shape": output_shape,
        "input_name": input_name, "output_name": output_name
    }
    return io

def preprocess_bgr_to_nchw_224(bgr):
    # Center-crop shortest side, resize to 224x224, convert BGR->RGB, normalize
    h, w = bgr.shape[:2]
    side = min(h, w)
    y0 = (h - side) // 2
    x0 = (w - side) // 2
    cropped = bgr[y0:y0+side, x0:x0+side]
    resized = cv2.resize(cropped, (224, 224), interpolation=cv2.INTER_LINEAR)
    rgb = cv2.cvtColor(resized, cv2.COLOR_BGR2RGB).astype(np.float32) / 255.0
    mean = np.array([0.485, 0.456, 0.406], dtype=np.float32)
    std  = np.array([0.229, 0.224, 0.225], dtype=np.float32)
    norm = (rgb - mean) / std
    chw = np.transpose(norm, (2, 0, 1))  # C,H,W
    return chw

def softmax(x):
    x = x - np.max(x)
    e = np.exp(x)
    return e / np.sum(e)

def map_to_recyclable(labels_topk):
    # labels_topk: list of (label_str, prob)
    label_texts = [l.lower() for l, _ in labels_topk]
    for category, keys in MAPPING_RULES.items():
        for key in keys:
            if any(key in txt for txt in label_texts):
                return category
    return "unknown"

def infer_trt(io, input_chw):
    # Copy to host (float32) and device
    inp = input_chw.astype(np.float32).ravel()
    np.copyto(io["host_inputs"][0], inp)
    cuda.memcpy_htod_async(io["cuda_inputs"][0], io["host_inputs"][0], io["stream"])
    # Inference
    io["context"].execute_async_v2(bindings=io["bindings"], stream_handle=io["stream"].handle)
    # Copy back
    cuda.memcpy_dtoh_async(io["host_outputs"][0], io["cuda_outputs"][0], io["stream"])
    io["stream"].synchronize()
    out = np.array(io["host_outputs"][0])
    return out  # 1000 logits

def main():
    labels = load_labels(IMAGENET_LABELS_JSON)
    io = build_trt_context(ENGINE_PATH)

    cap = cv2.VideoCapture(gstreamer_pipeline(), cv2.CAP_GSTREAMER)
    if not cap.isOpened():
        raise RuntimeError("Could not open CSI camera via GStreamer")

    print("Starting recyclables classifier. Press Ctrl+C to stop.")
    frame_count = 0
    t0 = time.time()
    rolling = []
    try:
        while True:
            ok, frame = cap.read()
            if not ok:
                print("Frame grab failed; retrying...")
                continue

            prep = preprocess_bgr_to_nchw_224(frame)
            t_infer0 = time.time()
            logits = infer_trt(io, prep)
            probs = softmax(logits)
            top5_idx = probs.argsort()[-5:][::-1]
            top5 = [(labels[i], float(probs[i])) for i in top5_idx]
            category = map_to_recyclable(top5)
            t_infer1 = time.time()
            latency_ms = (t_infer1 - t_infer0) * 1000.0

            frame_count += 1
            rolling.append(latency_ms)
            if len(rolling) > 100:
                rolling.pop(0)
            avg_ms = sum(rolling)/len(rolling)
            fps = 1000.0 / avg_ms if avg_ms > 0 else 0.0

            # Log concise output
            print(f"[#{frame_count:04d}] {category.upper()} | "
                  f"Top-1: {top5[0][0]} ({top5[0][1]*100:.1f}%) | latency {latency_ms:.1f} ms | est FPS {fps:.1f}")
    except KeyboardInterrupt:
        pass
    finally:
        cap.release()
        dt = time.time() - t0
        if dt > 0:
            print(f"Frames: {frame_count}, Avg FPS: {frame_count/dt:.2f}")

if __name__ == "__main__":
    main()

Save the script and make it executable:

chmod +x ~/jetson-recyclables/scripts/run_classifier.py

Build/Flash/Run commands

1) Verify platform and packages:

cat /etc/nv_tegra_release
uname -a
dpkg -l | grep -E 'nvidia|tensorrt'

2) Set performance mode (watch thermals):

sudo nvpmodel -m 0
sudo jetson_clocks

3) Camera sanity check:

gst-launch-1.0 -e nvarguscamerasrc sensor-id=0 num-buffers=1 \
  ! nvvidconv ! video/x-raw,format=I420 \
  ! jpegenc ! filesink location=camera_test.jpg

4) Model + engine:

cd ~/jetson-recyclables/models
wget -O squeezenet1.1-7.onnx \
  https://github.com/onnx/models/raw/main/vision/classification/squeezenet/model/squeezenet1.1-7.onnx
wget -O imagenet_labels.json \
  https://raw.githubusercontent.com/anishathalye/imagenet-simple-labels/master/imagenet-simple-labels.json

/usr/src/tensorrt/bin/trtexec \
  --onnx=squeezenet1.1-7.onnx \
  --saveEngine=squeezenet1.1_fp16.engine \
  --fp16 --workspace=1024

5) Start tegrastats in a second terminal to observe performance:

sudo tegrastats

6) Run the recyclables classifier:

python3 ~/jetson-recyclables/scripts/run_classifier.py

Expected console log snippet:
– Lines like:
[#0010] PLASTIC | Top-1: water bottle (65.3%) | latency 23.4 ms | est FPS 42.7
[#0011] METAL | Top-1: can (71.8%) | latency 22.8 ms | est FPS 43.9

Stop with Ctrl+C. The script prints total frames and average FPS on exit.

Step-by-step Validation

1) JetPack and TensorRT presence
– A valid L4T string (e.g., R32.7.4 or R35.x) and dpkg listing libnvinferX confirm TensorRT install.

2) Camera path and exposure
– camera_test.jpg exists and opens (size > 50 KB): confirms nvarguscamerasrc pipeline is working and the imx219 sensor is recognized by the argus stack.
– If not, see Troubleshooting.

3) Engine build validation (trtexec)
– trtexec should end with a summary like:
– Input shape: 1x3x224x224
– Average on 200 runs: XX ms, QPS: YY, Latency: …
– If FPS (1 / latency) is around 100–200 for SqueezeNet FP16 on Nano, your GPU acceleration is working.

4) Live inference metrics
– Start tegrastats in one terminal, run the script in another. You should see:
– GPU (GR3D) utilization spikes >50% during inference.
– EMC (memory controller) activity increases.
– RAM stays within 1–2 GB usage (depending on other processes).
– Example tegrastats snippet during run:
– RAM 1800/3964MB (lfb 450x4MB) SWAP 0/1024MB
– GPU GR3D_FREQ 921MHz GR3D% 65
– EMC_FREQ 1600MHz EMC% 45

5) Throughput and latency
– In the script output, confirm:
– latency ~20–30 ms (per-frame inference on 224×224) → 33–50 FPS inference stage.
– End-to-end estimate shows 30–40 FPS if not limited by capture/resize.
– Expected results on Jetson Nano 4GB in MAXN with SqueezeNet FP16:
– Inference latency: 15–35 ms
– Est FPS: 28–50
– If much lower, check power mode and jetson_clocks.

6) Classification sanity checks
– Hold up a plastic bottle: see PLASTIC category frequently (due to keywords “bottle”, “water bottle”).
– Show metal can: METAL category common (keyword “can”).
– Show a glass jar or bottle: GLASS category appears.
– Show paper/cardboard: PAPER category appears (“book”, “notebook”, “envelope”, “carton”).
– Items not mapping: category “unknown” will be printed (expected).

7) Power revert and cleanup
– Stop the script (Ctrl+C).
– Stop tegrastats.
– Revert to a lower power profile if desired:

sudo nvpmodel -m 1

Troubleshooting

  • Camera not detected / pipeline fails
  • Ensure ribbon orientation (blue side outward) and the connector lock is fully engaged.
  • Confirm sensor-id=0 or try sensor-id=1 if you are on B01 and CAM0 is empty.
  • Restart the argus service:
    sudo systemctl restart nvargus-daemon
  • Check dmesg for imx219-related logs:
    dmesg | grep -i imx219

  • nvarguscamerasrc works once, then errors

  • Argus may have a stuck session; restart it:
    sudo systemctl restart nvargus-daemon
  • Avoid running multiple camera apps simultaneously.

  • PyCUDA import error

  • Install via apt:
    sudo apt-get install -y python3-pycuda
  • Or fall back to pip (can be slower to build):
    sudo apt-get install -y python3-pip python3-dev
    pip3 install pycuda

  • trtexec cannot parse ONNX

  • Re-download the ONNX:
    rm -f squeezenet1.1-7.onnx && wget …
  • Ensure TensorRT version supports the opset (7). JetPack 4.x/5.x should handle it.

  • Low FPS

  • Confirm MAXN and max clocks:
    sudo nvpmodel -m 0 && sudo jetson_clocks
  • Reduce camera resolution to 640×480 in gstreamer_pipeline to lower preprocessing cost.
  • Use smaller flip_method and avoid GUI windows.
  • Ensure no heavy background processes (close browsers/IDEs).

  • Memory errors

  • Ensure workspace=1024 is not too large for engine build; try 512.
  • Close other Python programs and free memory.
  • Avoid keeping large image buffers in Python.

  • Classification mismatches

  • This tutorial uses ImageNet labels + keyword mapping (heuristic). For production-grade recyclables classification, fine-tune on recyclables datasets (e.g., TrashNet-like) and export a 4-class ONNX, then rebuild the TensorRT engine.

Improvements

  • Use a domain-specific ONNX classifier
  • Train/fine-tune MobileNetV2/ResNet18 on recyclables (paper/plastic/metal/glass) and export to ONNX.
  • Replace squeezenet1.1-7.onnx with your trained 4-class model; update labels file accordingly (4 lines).
  • Rebuild engine:
    /usr/src/tensorrt/bin/trtexec –onnx=your_model.onnx –saveEngine=recyclables_fp16.engine –fp16 –workspace=1024

  • INT8 optimization (if you prepare a calibration cache)

  • Collect a small calibration dataset (e.g., 100–500 images).
  • Build with INT8:
    /usr/src/tensorrt/bin/trtexec –onnx=your_model.onnx –int8 –calib= –saveEngine=recyclables_int8.engine –workspace=1024
  • INT8 can further reduce latency on Nano when calibrations are correct.

  • DeepStream pipeline (future path B)

  • Convert the model to a DeepStream nvinfer config and build a zero-copy pipeline with nvstreammux → nvinfer → nvdsosd for overlays.
  • This offers multi-stream and low-overhead GStreamer integration.

  • Hardware I/O integration

  • Drive a relay or servomotor to actuate a sorter based on the category.
  • Publish results to MQTT for dashboards.

  • Better mapping logic

  • Use a lightweight rules engine or a small secondary classifier that maps 1k ImageNet embeddings to recyclables classes.

Reference table

Item Command/Path Expected/Notes
Check JetPack/L4T cat /etc/nv_tegra_release Shows L4T release (R32.x/R35.x)
List TRT packages dpkg -l | grep -E ‘nvidia tensorrt’
Set MAXN sudo nvpmodel -m 0 Enables highest power mode
Max clocks sudo jetson_clocks Locks clocks to max (monitor thermals)
tegrastats sudo tegrastats Live CPU/GPU/EMC/mem stats
Test camera gst-launch-1.0 nvarguscamerasrc … ! jpegenc ! filesink Produces camera_test.jpg
Download ONNX wget squeezenet1.1-7.onnx Fixed 1×3×224×224 input
Build TRT engine /usr/src/tensorrt/bin/trtexec –onnx=… –fp16 –saveEngine=… Outputs squeezenet1.1_fp16.engine
Run app python3 scripts/run_classifier.py Live console classification and FPS

Improvements to robustness and validation

  • Run timed inference with trtexec (no camera) for a baseline:
cd ~/jetson-recyclables/models
/usr/src/tensorrt/bin/trtexec --loadEngine=squeezenet1.1_fp16.engine --fp16 --duration=30
  • Expect stable latency (e.g., ~5–10 ms per inference on Nano for SqueezeNet). This isolates the model performance from camera+preprocessing pipeline.

  • Capture and replay frames:

  • Record a short burst of frames (e.g., 100 JPEGs) and feed them to a file-based inference script for reproducibility without camera variability.

Final Checklist

  • Objective achieved
  • Live camera capture from Arducam IMX219 (imx219) over CSI.
  • TensorRT FP16 engine built from ONNX and executed on GPU.
  • Real-time classification with console output and recyclables category mapping.

  • Quantitative validation

  • Reported per-frame latency and rolling FPS from the script (target ≥ 30 FPS inference).
  • tegrastats shows GPU utilization during run (target ≥ 50% GR3D under load).
  • trtexec baseline latency captured and documented.

  • Commands reproducibility

  • All commands provided for setup, engine build, and runtime.
  • Power mode steps and revert commands included.
  • Camera sanity test given via gst-launch-1.0.

  • Materials and connections

  • Exact model: Jetson Nano 4GB + Arducam IMX219 8MP (imx219).
  • CSI connection instructions detailed (orientation, slot).
  • No circuit drawings; text-only instructions and code.

  • Next steps (optional)

  • Replace ImageNet model with a 4-class recyclables ONNX.
  • Explore INT8 and DeepStream for further gains.
  • Integrate actuator/MQTT for sorter or dashboard.

By following the steps above, you have a working tensorrt-recyclables-classifier on the Jetson Nano 4GB + Arducam IMX219 8MP (imx219) that demonstrates the complete edge AI loop: acquisition, acceleration, interpretation, and measurable performance on-device.

Find this product and/or books on this topic on Amazon

Go to Amazon

As an Amazon Associate, I earn from qualifying purchases. If you buy through this link, you help keep this project running.

Quick Quiz

Question 1: What is the primary purpose of the real-time recyclables classifier?




Question 2: Which camera is used in the recyclables classifier?




Question 3: What is the expected frame rate for stable live camera feed?




Question 4: What type of model is used for inference in the classifier?




Question 5: What is the power consumption range for the device?




Question 6: What is the minimum top-1 recyclable category accuracy expected?




Question 7: Which audience is primarily targeted by this project?




Question 8: What is the role of the TensorRT engine in the architecture?




Question 9: What type of overlay is used to display classifications?




Question 10: What is the expected end-to-end latency per frame?




Carlos Núñez Zorrilla
Carlos Núñez Zorrilla
Electronics & Computer Engineer

Telecommunications Electronics Engineer and Computer Engineer (official degrees in Spain).

Follow me:
Scroll to Top