Skip to main content

Hello,

I am currently running performance tests on the Metis M.2 card with different models (e.g., YOLOv8n).
I observed a behavior that I would like to clarify:

  • When running with 1 AIPU core, I get lower latency and the same FPS as when using multiple cores (4).

  • However, with 4 AIPU cores, the latency per image is higher, even though the FPS does not increase.

  • In addition, CPU utilization is noticeably higher with 4 cores (around 17%) compared to only ~0.8% with 1 core.

I would like to know:

  1. Is this expected behavior (due to synchronization overhead between AIPU cores)?

  2. Is there a recommended configuration to optimize either latency or throughput (FPS) depending on the use case?

  3. Is it normal that CPU usage increases significantly when using 4 cores, given that processing is supposed to be largely handled by the AIPU?

Thank you in advance for your clarifications.

Best regards,

Hi ​@fatima-zahra.chenani! Great to see you here! Ah interesting, yes. This is expected, yeah, let’s look at why it’s probably happening:

  • Higher latency with 4 cores: Using multiple AIPU cores adds synchronisation overhead. For single-image inference, this could actually increase latency slightly.

  • No FPS gain: If your pipeline isn’t bottlenecked by AIPU compute time (e.g. single stream), then extra cores don’t boost throughput.

  • Increased CPU usage: Managing more AIPU cores increases host-side coordination, so CPU load naturally goes up a little.

Considering these, inference tool guide and deployment options, we could look at it in terms of:

  • Use 1 core for low-latency applications.

  • Use 4 cores for high-throughput, multi-stream workloads.

Let me know if that helps, and we can dig deeper!


Hi, which host is it you are using, and what is your source, is it a usb or rtsp fixed frame rate source.

As Spanner says we do currently expect higher latency with more cores involved. But also we get better throughput but if your source is frame limited then you wouldn’t see that.  And I would not expect the cpu usage to go up like that.

Could you try with a video file and tell us what the throughput is.