Real-time Person Counting on ESP32 HMI with YOLOX Nano

In many security surveillance and smart access control scenarios, the demand for real-time person counting directly on local devices is increasing, especially when low latency and privacy are paramount. Traditional cloud-based solutions often fall short. This article focuses on how to leverage powerful ESP32 HMI modules equipped with robust ESP32-S3 chips and high-quality touch screens (like the Elecrow CrowPanel series), combined with an OV2640 camera, to build a potent Edge AI solution. We will utilize the lightweight object detection model, YOLOX Nano, detailing the entire process from model preparation and model quantization to final deployment and execution within the Arduino environment. The goal is to enable developers to easily master this "local inference, real-time display, low-latency response" Edge AI technique, empowering feature-rich ESP32 HMI devices with powerful visual perception for true real-time person counting.

Why Choose YOLOX Nano + ESP32 HMI for Real-time Person Counting?

Deploying Edge AI models onto resource-constrained microcontrollers makes the choice of model and platform critical for success. The combination of YOLOX Nano and modern ESP32 HMI (especially versions based on the ESP32-S3) offers significant advantages for implementing real-time person counting:

  1. Lightweight & Efficient Edge AI Solution:

    • The YOLOX Nano model itself has a small parameter count, making it highly suitable for embedded deployment on devices like an ESP32 HMI.
    • After INT8 model quantization, its size shrinks dramatically to under 1MB, perfectly fitting within the storage limits of an ESP32 HMI.
    • The built-in Vector Instructions acceleration capabilities of the ESP32-S3 chip can boost the local inference speed of YOLOX Nano to 4-6 FPS. This sufficiently meets the basic speed requirements for object detection applications like real-time person counting, making it an ideal choice for building efficient Edge AI applications.
  2. Focus on Target, Controllable Accuracy:

    • The pre-trained YOLOX Nano model can already recognize the "person" class, allowing for direct use in basic person counting tasks.
    • Through fine-tuning, YOLOX Nano can be specialized for the specific object detection task of identifying humans, further optimizing accuracy and local inference speed on the ESP32 HMI, enhancing the reliability of real-time person counting.
  3. Mature Ecosystem, Easy to Get Started:

    • Espressif provides the ESP-DL library and ESP-PPQ model quantization toolchain for chips like the ESP32-S3, simplifying the process of deploying Edge AI models like YOLOX Nano from formats such as ONNX.
    • Abundant documentation, community support, and good integration with the Arduino IDE and libraries like LVGL make developing Edge AI applications (such as real-time person counting) on ESP32 HMI devices, often featuring an IPS Screen and touch screen, much more accessible.

Model Preparation and ONNX Export (PC Side)

The first step in preparing the YOLOX Nano model for the ESP32 HMI is done on a PC. This is a crucial part of the Edge AI workflow.

  1. Environment Setup:

    (Ensure necessary libraries like torch, torchvision, onnx, and YOLOX are installed as mentioned previously).

    pip install torch torchvision onnx
    git clone https://github.com/Megvii-BaseDetection/YOLOX.git
    cd YOLOX
    pip install -r requirements.txt
    pip install -e .

    This forms the foundation for exporting the YOLOX Nano model and subsequent model quantization, preparing it for eventual execution on the ESP32-S3.

  2. Configure Model (Optional):

    (As mentioned before) If your real-time person counting application requires higher precision, you can fine-tune YOLOX Nano to focus more specifically on detecting the 'person' class.

  3. Export ONNX Model:

    (Use the command provided earlier, ensuring correct paths and parameters).

    python tools/export_onnx.py \
      --output-name yolox_nano_person.onnx \
      -n yolox-nano \
      -f exps/default/yolox_nano.py \
      --export_batch_size 1 \
      --decode_in_inference \
      # -c path/to/your/checkpoint.pth # If using custom weights

    Exporting the YOLOX Nano model to the ONNX format is a standard step for cross-platform deployment and model quantization, preparing it for efficient local inference on the ESP32 HMI.

Model Quantization and Conversion (PC Side)

Running a floating-point ONNX model directly on an ESP32 HMI is inefficient and resource-intensive. Model quantization is a key Edge AI optimization technique to enhance local inference speed and reduce memory footprint, crucial for deploying YOLOX Nano on the ESP32-S3 chip.

  1. Install Quantization Tool:

    (Install espressif-esp-ppq as shown before).

    pip install espressif-esp-ppq torch_ppq
  2. Perform Quantization and Conversion:

    (Use the ppq command as shown, providing calibration data and specifying input shape).

    ppq quantize \
      --platform esp-dl \
      --input_model ./yolox_nano_person.onnx \
      --input_shape "[1, 3, 224, 224]" \
      --calib_data ./calib_images/ \
      --quant_format espdl \
      --output_dir ./esp_model_output/ \
      --equalization

    This step uses calibration data to perform INT8 model quantization on YOLOX Nano and generates an .espdl file suitable for the ESP-DL library on the ESP32 HMI. Model quantization significantly improves the local inference performance for real-time person counting and is a core part of a successful Edge AI deployment.

Device Deployment and Real-time Inference (Arduino Environment)

Now, we deploy the quantized YOLOX Nano model onto the ESP32 HMI module, using Arduino code to implement the real-time person counting functionality and display results on its HMI Display.

  1. Project File Structure (Example):

    (Organize your project using PlatformIO or similar, placing the .espdl file in the data/ directory as described).

    YourProject/
    ├── platformio.ini
    ├── data/
    │   └── model.espdl     # Quantized YOLOX Nano model for ESP32-S3
    ├── include/
    ├── lib/
    ├── src/
    │   ├── main.cpp        # Main logic: person counting & HMI display
    │   ├── camera_utils.cpp/h
    │   ├── model_utils.cpp/h
    │   └── ui_utils.cpp/h  # UI logic (potentially using LVGL)
    └── CMakeLists.txt      # If using ESP-IDF directly

    Ensure the .espdl model file is uploaded to the ESP32 HMI's filesystem (e.g., SPIFFS) so the Arduino code can load it for Edge AI inference.

  2. Core Arduino Code Logic (`main.cpp` Example):

    #include 
    #include "esp_camera.h"
    #include "esp_dl_model.h"      // ESP-DL for YOLOX Nano loading & inference on ESP32-S3
    #include "dl_lib_matrix3d.h"
    #include "FS.h"
    #include "SPIFFS.h"            // Or LittleFS.h
    // #include "lvgl.h"           // Include LVGL if using it for the ESP32 HMI display
    
    // --- Config for YOLOX Nano on ESP32 HMI (ESP32-S3 based) ---
    #define MODEL_INPUT_WIDTH  224 // Must match quantization input_shape
    #define MODEL_INPUT_HEIGHT 224
    #define MODEL_PATH "/model.espdl" // Path to quantized YOLOX Nano on filesystem
    #define PERSON_CLASS_ID 0      // COCO class ID for person
    #define CONFIDENCE_THRESHOLD 0.5f // Threshold for valid object detection
    
    // Global model handle for YOLOX Nano
    esp_dl_model_handle_t model_handle = {0};
    
    // Function prototypes
    void setupCamera();
    void setupFileSystem();
    void loadModel();
    bool image_preprocess(camera_fb_t *fb, dl_matrix3du_t *out_tensor); // CRITICAL
    void updateDisplayOnHMI(int count); // For updating the HMI Display
    
    void setup() {
      Serial.begin(115200);
      Serial.println("ESP32 HMI Real-time Person Counting with YOLOX Nano (Edge AI Demo)");
    
      setupFileSystem(); // Initialize File System (SPIFFS/LittleFS)
      setupCamera();     // Initialize Camera connected to the ESP32 HMI
      loadModel();       // Load the quantized YOLOX Nano model for local inference
    
      // Initialize the HMI Display, Touch Screen, and LVGL (if used)
      // setupLVGL_On_ESP32_HMI();
      Serial.println("Setup complete. Starting loop for Real-time Person Counting...");
    }
    
    void loop() {
      camera_fb_t *fb = esp_camera_fb_get(); // Get frame from camera
      if (!fb) {
        Serial.println("Camera capture failed");
        delay(1000);
        return;
      }
    
      int personCount = 0; // Initialize count for this frame
    
      if (model_handle) { // Check if YOLOX Nano model is loaded
        dl_matrix3du_t *input_tensor = dl_matrix3du_alloc(1, MODEL_INPUT_WIDTH, MODEL_INPUT_HEIGHT, 3);
        if(!input_tensor){
            Serial.println("Failed to allocate input tensor");
            esp_camera_fb_return(fb);
            return;
        }
    
        // !!! CRITICAL: Implement image_preprocess function for ESP32 HMI !!!
        // Convert camera frame (fb->buf, potentially YUV/JPEG) to RGB format
        // Resize to MODEL_INPUT_WIDTH x MODEL_INPUT_HEIGHT for YOLOX Nano.
        // This preprocessing step is vital for accurate Edge AI results.
        bool success = image_preprocess(fb, input_tensor);
    
        if (success) {
          // --- Perform YOLOX Nano Local Inference for Object Detection ---
          // Leverage ESP32-S3's Vector Instructions for acceleration
          struct timeval start_time, end_time;
          gettimeofday(&start_time, NULL);
    
          esp_dl_model_run(model_handle, (dl_matrix3d_t *)input_tensor); // Run inference
    
          gettimeofday(&end_time, NULL);
          long elapsed_ms = (end_time.tv_sec - start_time.tv_sec) * 1000 + (end_time.tv_usec - start_time.tv_usec) / 1000;
          // Serial.printf("Inference Time: %ld ms\n", elapsed_ms); // Optional: print inference time
    
          // --- Process YOLOX Nano Results for Real-time Person Counting ---
          // NOTE: You need to implement the logic to parse the output tensor from YOLOX Nano.
          // This involves getting bounding boxes, scores, and class IDs.
          // The exact method depends on how ESP-DL exposes results for your model configuration.
          // Apply Non-Maximum Suppression (NMS) if it wasn't part of the exported model.
          // Pseudo-code below assumes you have a function `parse_yolox_output`
          // std::vector detections = parse_yolox_output(model_handle);
          // for (const auto& det : detections) {
          //    if (det.class_id == PERSON_CLASS_ID && det.score > CONFIDENCE_THRESHOLD) {
          //        personCount++; // Increment count for each detected person
          //    }
          //}
          // Replace pseudo-code with actual result parsing!
    
          Serial.printf("Detected Persons: %d\n", personCount); // Print count to Serial
    
          // Update the display on the ESP32 HMI's Touch Screen / IPS Screen
          updateDisplayOnHMI(personCount); // Function to draw count on the HMI Display (e.g., using LVGL)
    
        } else {
            Serial.println("Image preprocessing failed.");
        }
        dl_matrix3du_free(input_tensor); // Free the input tensor memory
      } else {
          // Serial.println("Model not loaded, skipping inference."); // Debug message
      }
    
      esp_camera_fb_return(fb); // Return camera frame buffer IMPORTANT!
    
      // Adjust delay to balance performance and responsiveness for Real-time Person Counting
      delay(100); // Controls processing loop frequency on the ESP32 HMI (adjust as needed)
    }
    
    void loadModel() {
      // Load the quantized YOLOX Nano model prepared for Edge AI on ESP32-S3
      if (!SPIFFS.begin(true)) {
          Serial.println("SPIFFS Mount Failed");
          return;
      }
      if (SPIFFS.exists(MODEL_PATH)) {
          model_handle = esp_dl_model_load(MODEL_PATH);
          if (!model_handle) {
              Serial.printf("Failed to load YOLOX Nano model from %s for ESP32 HMI\n", MODEL_PATH);
          } else {
              Serial.println("YOLOX Nano model loaded successfully for Edge AI inference.");
          }
      } else {
          Serial.printf("Model file not found at %s\n", MODEL_PATH);
      }
    }
    
    // --- Placeholder Function Definitions ---
    void setupCamera() {
        // Implement camera initialization logic for your ESP32 HMI hardware
        Serial.println("Camera Initialized (Placeholder).");
    }
    
    void setupFileSystem() {
        if (!SPIFFS.begin(true)) {
            Serial.println("An Error has occurred while mounting SPIFFS");
            return;
        }
        Serial.println("SPIFFS Mounted successfully.");
    }
    
    void updateDisplayOnHMI(int count) {
        // Implement your display update logic here using LVGL or another library
        // Example: lv_label_set_text_fmt(ui_person_count_label, "Persons: %d", count);
        // Serial.printf("Updating HMI Display with count: %d (Placeholder)\n", count); // Debug print
    }
    
    // !!! CRITICAL IMPLEMENTATION NEEDED !!!
    bool image_preprocess(camera_fb_t *fb, dl_matrix3du_t *out_tensor) {
        // 1. Convert frame buffer (fb->buf, format fb->format) to RGB.
        // 2. Resize the RGB image to MODEL_INPUT_WIDTH x MODEL_INPUT_HEIGHT.
        // 3. Normalize pixel values if required by the YOLOX Nano model.
        // 4. Fill the data into out_tensor->item pointer (check HWC or CHW order).
        // Use libraries like esp_camera utility functions or esp_image_lib if available.
        Serial.println("image_preprocess() needs implementation!");
        return false; // Return true on success
    }
    
    

    This Arduino code demonstrates loading the optimized YOLOX Nano model on the ESP32 HMI, utilizing the ESP32-S3's capabilities for efficient local inference, and implementing the core logic for real-time person counting. The object detection results (person count) can finally be visualized on the ESP32 HMI's IPS Screen / Touch Screen using libraries like LVGL.

Performance Optimization Tips (for YOLOX Nano on ESP32 HMI)

  • Resolution Trade-off: Adjusting the input resolution is key to balancing real-time person counting speed and object detection accuracy. Find the sweet spot on your ESP32 HMI.
  • Preprocessing Optimization: Image conversion is often a bottleneck in the local inference pipeline; optimizing this for the ESP32-S3 is crucial.
  • Stable Processing Cycle: Control the frame rate reasonably to ensure ESP32 HMI system stability and touch screen responsiveness.
  • Leverage Vector Instructions: Ensure you are using an ESP32-S3 and have configured your development environment correctly to enable Vector Instructions. This maximizes YOLOX Nano performance in Edge AI scenarios.
  • Model Selection: For extremely demanding real-time person counting tasks, consider exploring even lighter object detection models on the ESP32 HMI.

This tutorial has detailed how to implement a fully local inference-based real-time person counting function on the powerful ESP32 HMI platform (especially suitable for models featuring the ESP32-S3 chip) by deploying the efficient YOLOX Nano object detection model. We covered the workflow from ONNX model preparation, the critical technique of model quantization, to the final Edge AI deployment in the Arduino environment and displaying results on the HMI Display.

This Edge AI solution, based on the ESP32 HMI and YOLOX Nano, leverages the ESP32-S3's Vector Instructions acceleration. It not only enhances response speed and protects data privacy but also endows embedded devices with unprecedented intelligent vision capabilities. You can use this as a starting point to extend the real-time person counting functionality into more complex smart visitor systems or security applications, fully realizing the potential of the ESP32 HMI as an Edge AI terminal. Get hands-on, light up your ESP32 HMI's IPS Screen with YOLOX Nano, and unlock the infinite possibilities of Edge AI!