Hello,
I am currently running performance tests on the Metis M.2 card with different models (e.g., YOLOv8n).
I observed a behavior that I would like to clarify:
-
When running with 1 AIPU core, I get lower latency and the same FPS as when using multiple cores (4).
-
However, with 4 AIPU cores, the latency per image is higher, even though the FPS does not increase.
-
In addition, CPU utilization is noticeably higher with 4 cores (around 17%) compared to only ~0.8% with 1 core.
I would like to know:
-
Is this expected behavior (due to synchronization overhead between AIPU cores)?
-
Is there a recommended configuration to optimize either latency or throughput (FPS) depending on the use case?
-
Is it normal that CPU usage increases significantly when using 4 cores, given that processing is supposed to be largely handled by the AIPU?
Thank you in advance for your clarifications.
Best regards,