Skip to main content

PART ONE of Sauron - Intelligent Marine surveillance system:

 

PART TWO

Gimbal Firmware

This part of the project is what took me a lot of effort, and I would still consider it a work-in-progress. The STM32 program implements a dual-axis gimbal stabilization system running on STM32G474 MCU. It provides motor control for pitch and yaw axes using cascaded PID control loops, with real-time IMU-based stabilization (if enabled) and UART communication with a host SBC (Single Board Computer).

Key Features:

  • Dual-Axis Control: Independent pitch and yaw motor control
  • Cascaded PID Control: Position → Velocity → Torque control loops
  • Real-time Stabilization: IMU-based attitude correction, if stabilization is enabled. Otherwise direct position commands.
  • Robust Communication: Binary UART protocol with CRC error checking to talk to the SBC.
  • CAN Bus Motor Control: Asynchronous motor command/response handling
  • Live Debugging: Serial command interface for PID tuning and monitoring

A high-level flowchart of the program is as follows:

I am using RMD-L-12025 motors from a company called MyActuator. These are high torque direct drive motors with built in driver and encoder. After changing to the new gimbal mechanics, all the old tunings went out the window. So I had to start with re-turning the motors with the new mechanics. This took some time but eventually I managed to get decent workable tunings.

Pitch motor final tuning:

Yaw motor final tuning:

The pitch axis, being lighter, is easier to tune. The yaw motor has a lot more inertia to deal with, which is why you will notice the very different PID values. I also had to re-zero the motors in the new mechanics so that zero target angles meant pointing straight and level. I used a digital inclinometer for pitch axis.

Note that I had some issues with my asynchronous CAN messaging with motors in the last few days (to make the messaging faster compared to blocking calls). I couldn’t resolve these in time so I have temporarily switched to a simpler control loop where the motors are being used in position mode and follow the position commands sent by the SBC via the UART protocol. As of this moment, the control loop is not commanding the motors in Torque mode contrary to my earlier testing. Once I have found the root cause to my Async CAN communication issues, I will revert back to full stabilization.

The full current code is shared in the project GitHub folder for anyone to read/understand and also to try/modify for themselves.

Dataset and AI Model Creation

As mentioned earlier, the task involves detecting big as well as small boats from few hundred meters away to ~10km (limited by visibility). Other than using a relatively long focal length (35mm), I needed to build my own dataset and train my own model for best results.

I figured that most models do the inference on 640x640 images (scaled down to this size if the input images are bigger). This makes them lose detail. Since details were very important for my case to get maximum detection distance, I wanted the images to get processed in a way that they don’t get scaled down before inference.

For this I decided to use a “tiling” approach. In this approach, instead of scaling down images before inference, the image is divided into tiles, and the tiles are fed to the detection model before combining the detections and mapping them back to the high resolution image.

I figured that a 1920x1280 image can be divided into perfect 3 (columns) x 2 (rows), which gives 6 x 640x640 tiles (which is a popular size for inference) without any loss of detail. I have created a diagram to show this visually:
  

Building dataset:

Once I figured the dataset and the live images later will both need to be of 1920x1280 resolution. Hence I decided to take 1920x1280 images with the same camera/lens that I was going to use for inference later.

I used a suction cup with some accessories so I can get stable images since the “subjects” were going to be very small and far. It took me several days to collect the images. I uploaded these images to roboflow and labelled them manually. To train the model, since later I was going to infer using tiled images later, I cut each image into tiles before feeding them for training also (so both see images with similar object sizes. Luckily roboflow made this very easy.

Image preprocessing steps applied in roboflow:

  • Auto orientation (fix bad orientations in images)
  • First tiled in 2(rows) x 3 (cols) configuration. For 1920x1280 images, this gives 6 x 640*640 images, perfectly sized.
  • Then stretched to 640x640 (this step handles any original images in dataset that were taken in a different resolution (e.g., 1920x1080) and brings everything to the same 640x640 output size.

Image augmentations:

  • Convert 15% to grayscale
  • Exposure up/down for 15% of dataset
  • Saturation up/down for 15% of dataset

Overall, this gave me 6 images per original image due to tiling and then 3 times of each image due to the augmentations.

I have made the dataset available free and open source on the roboflow universe for anyone else to replicate my work:

https://universe.roboflow.com/saad-tiwana/marine_surface-c4ssc

Training Detection Model:

For training a detection model, I used “ultralytics hub” to train a yolo11m model. I trained the model for 200 iterations, and after trying the trained model in the web interface provided by ultralytics, I was happy with the model performance.

You will find my final model files available in the project github folder.

Model quantization for Axelera Metis via Degirum:

By this point, I had already decided I was going to use Degirum (https://www.degirum.com/) to deploy my models on the Axelera hardware. This decision was due to the extreme ease of use that the Degirum’s “pysdk” provides. Also the people at Degirum provided excellent support to get things going with Degirum on Axelera, for which I am very grateful.

I used Degirum’s online compiler to compile the output of the ultralytics hub model (pyTorch format) into a quantized model compatible Axelera hardware. The whole process was super simple and the online interface on Degirum’s website was very easy to use. One thing I learned from trying was that adding some images (I added ~100 images) to the compilation job made a big difference in the final model’s accuracy and after trying the original model and compiled models with same images side-by-side, I saw no difference in the accuracies/detections.

You will find my final (quantized/compiled) model files in the project github folder.

Detection Model deployment on Axelera Metis:

This model could then be easily downloaded to the RK3588 SBC with Axelera Metis device and used easily with Degirum’s pysdk. Degirum provided a lot of examples for various use case, including for my tiling use-case. Degirum’s pysdk enables deployment of a model with just a few lines of code which is very convenient and impressive. Note that the voyager-sdk provided by Axelera also does the job, although I took the path of least resistance in my case 🙂. Overall, I was very pleased with this whole process, and I was amazed by the detection results I was getting from the model, considering other public models I had used earlier did not give such good results with objects so small as in my case. This has opened my eyes to many use-cases that I want to try by building and training my own datasets and models.

Python application – Maritime Surveillance System

Application Architecture

The Python application is a modular, real-time maritime surveillance system built with Python 3.8+ running on RK3588 SBC. It orchestrates AI-powered object detection, gimbal control, and intelligent monitoring through a well-structured component architecture. All the settings for program configuration are gathered in the settings.json file.
Core Python Modules:

  • Main Application (main.py): Serves as the central orchestrator managing system state, alarm persistence, multiple operation modes (surveillance, live video, testing), hardware initialization coordination, and graceful emergency shutdown procedures.
  • Configuration System (config.py): Handles dynamic JSON configuration loading from settings.json with comprehensive validation, provides hardware profiles for camera, UART, gimbal, detection, and audio settings, and implements robust default fallbacks for invalid configurations.
  • Camera Management (camera.py): Implements advanced camera buffer management with 3-frame flushing for fresh captures, supports multiple OpenCV backends (V4L2, ANY) with automatic fallback, includes timeout handling, image validation, and automatic camera warm-up sequences.
  • UART Communication (uart_comm.py): Manages binary UART protocol communication with STM32 gimbal controller using CRC-8 validation, handles command processing for angle control and stabilization, implements retry logic with configurable timeouts, and performs angle range validation.
  • AI Detection (detector.py): Integrates Degirum SDK for high-performance AI inference on Axelera Metis device with tiling support for large images, performs coordinate conversion from normalized bounding boxes to azimuth/elevation angles, applies confidence thresholding, and handles multiple detection result formats.
  • Scan Planning (scan_planner.py): Generates FOV-based scan positions using camera field of view and overlap percentages, implements vertical serpentine scanning patterns with alternating column directions, optimizes scan order for minimal gimbal movement, and ensures complete area coverage.
  • Area Monitoring (monitor.py): Manages polygon-based restricted areas supporting complex shapes, implements class-specific monitoring for targeted object types, tracks dwell times for violation thresholds, and triggers audio alerts through Bluetooth speaker integration.
  • Data Logging (logger.py): Provides comprehensive CSV logging of detection records with timestamps and coordinates, manages annotated image storage with alarm status indicators, tracks system performance statistics, etc.

Note: All source code is available at the project’s Github folder

 

Continue to PART THREE of Sauron - Intelligent Marine surveillance system:

 

Hi ​@saadtiwana 

This is very impressive. We are glad that you found DeGirum PySDK useful for your project.


Thanks a lot ​@shashi.chilappagari , I am glad you liked it. Btw, the 3rd post has some videos with the results, just in case you missed it. Anyway, Degirum and PySDK (and your team behind them) had a huge role to play in enabling me to get my POC up and running in such a short time! I will continue experimenting further :)


Reply