Skip to main content
Question

πŸš€ FlowSentry-Wake | Stage 3 Update

  • February 5, 2026
  • 0 replies
  • 10 views

Taming the GStreamer Beast: From 3 FPS to Real-Time


1️⃣ The Bottleneck

We hit a wall.

Our Python-based torch-aipu pipeline was smart, but slow. It was handling video decoding, preprocessing, and visualization all on the host CPU.

The result? 3.4 FPS on the host. Frames were piling up. Latency was unacceptable.

To get real speed, we had to go lower level. We had to move the entire pipeline to GStreamer.

Β 

2️⃣ Building Custom Operators

Standard GStreamer elements weren't enough. Optical flow is unique. It needs to see "time."

We built a custom C++ Preprocess Operator. It acts as a short-term memory. It caches the previous frame. It stitches the "past" and "present" into a 6-channel tensor (NHWC) for the NPU.

We also built a Postprocess Operator. The model outputs raw numbers. This operator converts that math into a colorful RGB flow map and pushes it downstream as a FlowImage.

Β 

3️⃣ Fighting the Crashes (The Real Work)

Migration was not smooth. We fought issues across the entire stack. Here is the battle log:

πŸ› οΈ The Build Traps

The "No Rule" Error: Ran make operators inside the operators folder. Failed. It must be run from the root.

The Ghost Library: The system couldn't find libedgeflownet_pre.so. The build cache was stale. make clobber-libs was the only way to force a clean rebuild.

πŸ“ The Shape Mismatches

The Padding Double-Dip: We initially output 64 channels from our C++ code. Then GStreamer added its own padding. The result? A 122-channel monster that crashed the NPU.

The Fix: We output exactly 6 channels. We let GStreamer handle the padding to 64.

πŸ“Ί The RTSP & Runtime Headaches

The Name Game: rtspsrc is inconsistent. It changes pad names between versions (recv_rtp_src_%u vs _%u_%u). We had to write a fallback mechanism in gst_helper.py to catch them all.

The Audio Poison: Our camera sends audio. The video pipeline tried to link it and choked. We added a strict filter: media=video only.

The OpenCL Segfault: We wanted to draw flow maps directly on the frame. But libinplace_draw.so clashed with OpenCL memory mapping.

The Fix: We forced the pipeline to use the CPU for drawing (_force_cpu_pipeline = True). Stability returned immediately.

The OpenGL Failure: Rockchip GL drivers threw libGL error. We bypassed it by using OpenCV for the display output.

Β 

4️⃣ The Result: Stable 20 FPS

The pain was worth it.

We are now seeing a stable End-to-End ~20 FPS. The Metis NPU is cruising at 32 FPS.

We validated this with a stream test. No crashes. No leaks.

The system is now fast enough to "watch" a room in real-time.

Β 

πŸ”œ What is Next?

We have speed. We have flow. Now we add intelligence.

Next up: The Logic Layer. We will fuse Object Detection with this new high-speed Optical Flow to finally answer: Is that a shadow, or a person hiding in it?

More soon.