Real-time Person Counting on ESP32 HMI with YOLOX Nano

April 25, 2025

Why Choose YOLOX Nano + ESP32 HMI for Real-time Person Counting?

Deploying Edge AI models onto resource-constrained microcontrollers makes the choice of model and platform critical for success. The combination of YOLOX Nano and modern ESP32 HMI (especially versions based on the ESP32-S3) offers significant advantages for implementing real-time person counting:

Lightweight & Efficient Edge AI Solution:
- The YOLOX Nano model itself has a small parameter count, making it highly suitable for embedded deployment on devices like an ESP32 HMI.
- After INT8 model quantization, its size shrinks dramatically to under 1MB, perfectly fitting within the storage limits of an ESP32 HMI.
- The built-in Vector Instructions acceleration capabilities of the ESP32-S3 chip can boost the local inference speed of YOLOX Nano to 4-6 FPS. This sufficiently meets the basic speed requirements for object detection applications like real-time person counting, making it an ideal choice for building efficient Edge AI applications.
Focus on Target, Controllable Accuracy:
- The pre-trained YOLOX Nano model can already recognize the "person" class, allowing for direct use in basic person counting tasks.
- Through fine-tuning, YOLOX Nano can be specialized for the specific object detection task of identifying humans, further optimizing accuracy and local inference speed on the ESP32 HMI, enhancing the reliability of real-time person counting.
Mature Ecosystem, Easy to Get Started:
- Espressif provides the ESP-DL library and ESP-PPQ model quantization toolchain for chips like the ESP32-S3, simplifying the process of deploying Edge AI models like YOLOX Nano from formats such as ONNX.
- Abundant documentation, community support, and good integration with the Arduino IDE and libraries like LVGL make developing Edge AI applications (such as real-time person counting) on ESP32 HMI devices, often featuring an IPS Screen and touch screen, much more accessible.

Model Preparation and ONNX Export (PC Side)

The first step in preparing the YOLOX Nano model for the ESP32 HMI is done on a PC. This is a crucial part of the Edge AI workflow.

Environment Setup:

(Ensure necessary libraries like torch, torchvision, onnx, and YOLOX are installed as mentioned previously).
```
pip install torch torchvision onnx
git clone https://github.com/Megvii-BaseDetection/YOLOX.git
cd YOLOX
pip install -r requirements.txt
pip install -e .
```
This forms the foundation for exporting the YOLOX Nano model and subsequent model quantization, preparing it for eventual execution on the ESP32-S3.
Configure Model (Optional):

(As mentioned before) If your real-time person counting application requires higher precision, you can fine-tune YOLOX Nano to focus more specifically on detecting the 'person' class.
Export ONNX Model:

(Use the command provided earlier, ensuring correct paths and parameters).
```
python tools/export_onnx.py \
  --output-name yolox_nano_person.onnx \
  -n yolox-nano \
  -f exps/default/yolox_nano.py \
  --export_batch_size 1 \
  --decode_in_inference \
  # -c path/to/your/checkpoint.pth # If using custom weights
```
Exporting the YOLOX Nano model to the ONNX format is a standard step for cross-platform deployment and model quantization, preparing it for efficient local inference on the ESP32 HMI.

Model Quantization and Conversion (PC Side)

Running a floating-point ONNX model directly on an ESP32 HMI is inefficient and resource-intensive. Model quantization is a key Edge AI optimization technique to enhance local inference speed and reduce memory footprint, crucial for deploying YOLOX Nano on the ESP32-S3 chip.

Install Quantization Tool:

(Install espressif-esp-ppq as shown before).
```
pip install espressif-esp-ppq torch_ppq
```
Perform Quantization and Conversion:

(Use the ppq command as shown, providing calibration data and specifying input shape).
```
ppq quantize \
  --platform esp-dl \
  --input_model ./yolox_nano_person.onnx \
  --input_shape "[1, 3, 224, 224]" \
  --calib_data ./calib_images/ \
  --quant_format espdl \
  --output_dir ./esp_model_output/ \
  --equalization
```
This step uses calibration data to perform INT8 model quantization on YOLOX Nano and generates an .espdl file suitable for the ESP-DL library on the ESP32 HMI. Model quantization significantly improves the local inference performance for real-time person counting and is a core part of a successful Edge AI deployment.

Device Deployment and Real-time Inference (Arduino Environment)

Now, we deploy the quantized YOLOX Nano model onto the ESP32 HMI module, using Arduino code to implement the real-time person counting functionality and display results on its HMI Display.

Project File Structure (Example):

(Organize your project using PlatformIO or similar, placing the .espdl file in the data/ directory as described).

YourProject/
├── platformio.ini
├── data/
│   └── model.espdl     # Quantized YOLOX Nano model for ESP32-S3
├── include/
├── lib/
├── src/
│   ├── main.cpp        # Main logic: person counting & HMI display
│   ├── camera_utils.cpp/h
│   ├── model_utils.cpp/h
│   └── ui_utils.cpp/h  # UI logic (potentially using LVGL)
└── CMakeLists.txt      # If using ESP-IDF directly

Ensure the .espdl model file is uploaded to the ESP32 HMI's filesystem (e.g., SPIFFS) so the Arduino code can load it for Edge AI inference.

Core Arduino Code Logic (`main.cpp` Example):

#include 
#include "esp_camera.h"
#include "esp_dl_model.h"      // ESP-DL for YOLOX Nano loading & inference on ESP32-S3
#include "dl_lib_matrix3d.h"
#include "FS.h"
#include "SPIFFS.h"            // Or LittleFS.h
// #include "lvgl.h"           // Include LVGL if using it for the ESP32 HMI display

// --- Config for YOLOX Nano on ESP32 HMI (ESP32-S3 based) ---
#define MODEL_INPUT_WIDTH  224 // Must match quantization input_shape
#define MODEL_INPUT_HEIGHT 224
#define MODEL_PATH "/model.espdl" // Path to quantized YOLOX Nano on filesystem
#define PERSON_CLASS_ID 0      // COCO class ID for person
#define CONFIDENCE_THRESHOLD 0.5f // Threshold for valid object detection

// Global model handle for YOLOX Nano
esp_dl_model_handle_t model_handle = {0};

// Function prototypes
void setupCamera();
void setupFileSystem();
void loadModel();
bool image_preprocess(camera_fb_t *fb, dl_matrix3du_t *out_tensor); // CRITICAL
void updateDisplayOnHMI(int count); // For updating the HMI Display

void setup() {
  Serial.begin(115200);
  Serial.println("ESP32 HMI Real-time Person Counting with YOLOX Nano (Edge AI Demo)");

  setupFileSystem(); // Initialize File System (SPIFFS/LittleFS)
  setupCamera();     // Initialize Camera connected to the ESP32 HMI
  loadModel();       // Load the quantized YOLOX Nano model for local inference

  // Initialize the HMI Display, Touch Screen, and LVGL (if used)
  // setupLVGL_On_ESP32_HMI();
  Serial.println("Setup complete. Starting loop for Real-time Person Counting...");
}

void loop() {
  camera_fb_t *fb = esp_camera_fb_get(); // Get frame from camera
  if (!fb) {
    Serial.println("Camera capture failed");
    delay(1000);
    return;
  }

  int personCount = 0; // Initialize count for this frame

  if (model_handle) { // Check if YOLOX Nano model is loaded
    dl_matrix3du_t *input_tensor = dl_matrix3du_alloc(1, MODEL_INPUT_WIDTH, MODEL_INPUT_HEIGHT, 3);
    if(!input_tensor){
        Serial.println("Failed to allocate input tensor");
        esp_camera_fb_return(fb);
        return;
    }

    // !!! CRITICAL: Implement image_preprocess function for ESP32 HMI !!!
    // Convert camera frame (fb->buf, potentially YUV/JPEG) to RGB format
    // Resize to MODEL_INPUT_WIDTH x MODEL_INPUT_HEIGHT for YOLOX Nano.
    // This preprocessing step is vital for accurate Edge AI results.
    bool success = image_preprocess(fb, input_tensor);

    if (success) {
      // --- Perform YOLOX Nano Local Inference for Object Detection ---
      // Leverage ESP32-S3's Vector Instructions for acceleration
      struct timeval start_time, end_time;
      gettimeofday(&start_time, NULL);

      esp_dl_model_run(model_handle, (dl_matrix3d_t *)input_tensor); // Run inference

      gettimeofday(&end_time, NULL);
      long elapsed_ms = (end_time.tv_sec - start_time.tv_sec) * 1000 + (end_time.tv_usec - start_time.tv_usec) / 1000;
      // Serial.printf("Inference Time: %ld ms\n", elapsed_ms); // Optional: print inference time

      // --- Process YOLOX Nano Results for Real-time Person Counting ---
      // NOTE: You need to implement the logic to parse the output tensor from YOLOX Nano.
      // This involves getting bounding boxes, scores, and class IDs.
      // The exact method depends on how ESP-DL exposes results for your model configuration.
      // Apply Non-Maximum Suppression (NMS) if it wasn't part of the exported model.
      // Pseudo-code below assumes you have a function `parse_yolox_output`
      // std::vector detections = parse_yolox_output(model_handle);
      // for (const auto& det : detections) {
      //    if (det.class_id == PERSON_CLASS_ID && det.score > CONFIDENCE_THRESHOLD) {
      //        personCount++; // Increment count for each detected person
      //    }
      //}
      // Replace pseudo-code with actual result parsing!

      Serial.printf("Detected Persons: %d\n", personCount); // Print count to Serial

      // Update the display on the ESP32 HMI's Touch Screen / IPS Screen
      updateDisplayOnHMI(personCount); // Function to draw count on the HMI Display (e.g., using LVGL)

    } else {
        Serial.println("Image preprocessing failed.");
    }
    dl_matrix3du_free(input_tensor); // Free the input tensor memory
  } else {
      // Serial.println("Model not loaded, skipping inference."); // Debug message
  }

  esp_camera_fb_return(fb); // Return camera frame buffer IMPORTANT!

  // Adjust delay to balance performance and responsiveness for Real-time Person Counting
  delay(100); // Controls processing loop frequency on the ESP32 HMI (adjust as needed)
}

void loadModel() {
  // Load the quantized YOLOX Nano model prepared for Edge AI on ESP32-S3
  if (!SPIFFS.begin(true)) {
      Serial.println("SPIFFS Mount Failed");
      return;
  }
  if (SPIFFS.exists(MODEL_PATH)) {
      model_handle = esp_dl_model_load(MODEL_PATH);
      if (!model_handle) {
          Serial.printf("Failed to load YOLOX Nano model from %s for ESP32 HMI\n", MODEL_PATH);
      } else {
          Serial.println("YOLOX Nano model loaded successfully for Edge AI inference.");
      }
  } else {
      Serial.printf("Model file not found at %s\n", MODEL_PATH);
  }
}

// --- Placeholder Function Definitions ---
void setupCamera() {
    // Implement camera initialization logic for your ESP32 HMI hardware
    Serial.println("Camera Initialized (Placeholder).");
}

void setupFileSystem() {
    if (!SPIFFS.begin(true)) {
        Serial.println("An Error has occurred while mounting SPIFFS");
        return;
    }
    Serial.println("SPIFFS Mounted successfully.");
}

void updateDisplayOnHMI(int count) {
    // Implement your display update logic here using LVGL or another library
    // Example: lv_label_set_text_fmt(ui_person_count_label, "Persons: %d", count);
    // Serial.printf("Updating HMI Display with count: %d (Placeholder)\n", count); // Debug print
}

// !!! CRITICAL IMPLEMENTATION NEEDED !!!
bool image_preprocess(camera_fb_t *fb, dl_matrix3du_t *out_tensor) {
    // 1. Convert frame buffer (fb->buf, format fb->format) to RGB.
    // 2. Resize the RGB image to MODEL_INPUT_WIDTH x MODEL_INPUT_HEIGHT.
    // 3. Normalize pixel values if required by the YOLOX Nano model.
    // 4. Fill the data into out_tensor->item pointer (check HWC or CHW order).
    // Use libraries like esp_camera utility functions or esp_image_lib if available.
    Serial.println("image_preprocess() needs implementation!");
    return false; // Return true on success
}

This Arduino code demonstrates loading the optimized YOLOX Nano model on the ESP32 HMI, utilizing the ESP32-S3's capabilities for efficient local inference, and implementing the core logic for real-time person counting. The object detection results (person count) can finally be visualized on the ESP32 HMI's IPS Screen / Touch Screen using libraries like LVGL.

Performance Optimization Tips (for YOLOX Nano on ESP32 HMI)

Resolution Trade-off: Adjusting the input resolution is key to balancing real-time person counting speed and object detection accuracy. Find the sweet spot on your ESP32 HMI.
Preprocessing Optimization: Image conversion is often a bottleneck in the local inference pipeline; optimizing this for the ESP32-S3 is crucial.
Stable Processing Cycle: Control the frame rate reasonably to ensure ESP32 HMI system stability and touch screen responsiveness.
Leverage Vector Instructions: Ensure you are using an ESP32-S3 and have configured your development environment correctly to enable Vector Instructions. This maximizes YOLOX Nano performance in Edge AI scenarios.
Model Selection: For extremely demanding real-time person counting tasks, consider exploring even lighter object detection models on the ESP32 HMI.

This tutorial has detailed how to implement a fully local inference-based real-time person counting function on the powerful ESP32 HMI platform (especially suitable for models featuring the ESP32-S3 chip) by deploying the efficient YOLOX Nano object detection model. We covered the workflow from ONNX model preparation, the critical technique of model quantization, to the final Edge AI deployment in the Arduino environment and displaying results on the HMI Display.

This Edge AI solution, based on the ESP32 HMI and YOLOX Nano, leverages the ESP32-S3's Vector Instructions acceleration. It not only enhances response speed and protects data privacy but also endows embedded devices with unprecedented intelligent vision capabilities. You can use this as a starting point to extend the real-time person counting functionality into more complex smart visitor systems or security applications, fully realizing the potential of the ESP32 HMI as an Edge AI terminal. Get hands-on, light up your ESP32 HMI's IPS Screen with YOLOX Nano, and unlock the infinite possibilities of Edge AI!

← Previous Next →

Real-time Person Counting on ESP32 HMI with YOLOX Nano

Why Choose YOLOX Nano + ESP32 HMI for Real-time Person Counting?

Lightweight & Efficient Edge AI Solution:

Focus on Target, Controllable Accuracy:

Mature Ecosystem, Easy to Get Started:

Model Preparation and ONNX Export (PC Side)

Environment Setup:

Configure Model (Optional):

Export ONNX Model:

Model Quantization and Conversion (PC Side)

Install Quantization Tool:

Perform Quantization and Conversion:

Device Deployment and Real-time Inference (Arduino Environment)

Project File Structure (Example):

Core Arduino Code Logic (`main.cpp` Example):

Performance Optimization Tips (for YOLOX Nano on ESP32 HMI)

About Elecrow

Elecrow Policy

Cooperation

Brand

Social media

Payment methods