Skip to main content
Question

Metis PCIe: Voyager SDK Tiling Error on 1504×1504 Input (L1 Constraint Exceeded)

  • October 9, 2025
  • 4 replies
  • 98 views

Hi all,

I am trying to compile a high-resolution model (1504×1504) on a Metis PCIe card, but the compilation fails due to a memory constraint:

ERROR   : RuntimeError: Could not find a tiling factor that fits the memory constraints l1_constraint=4011520 l2_constraint=7894528. After attempt=7 and h_size=1 and adj_factor=1, memory usage still is memory_usage={L1: {190: 141376, 193: 5017600, 191: 143360}} and per-pool memory usage {L1: 5302336}.

The error indicates a required L1 size (≈5.3 MiB) exceeds the constraint (≈4.0 MiB).

My questions are:

  1. Voyager SDK Solution: Is there a Voyager SDK flag or setting to adjust the tiling to overcome this ≈4 MiB L1 limit?

  2. Hardware Solution: Are there any Metis card variants (or multi-AIPU cards) that feature a larger per-core L1 cache?

Any guidance on compiling this high-resolution model is appreciated. Thank you!

4 replies

Spanner
Axelera Team
Forum|alt.badge.img+2
  • Axelera Team
  • October 9, 2025

Hi ​@dany! Good question. Looks like you’re reading the compiler output right.

That l1_constraint line refers to the per-core on-chip memory on the Metis AIPU. The compiler automatically tries to fit within that limit, but it looks like the tile still overflows the L1 buffer.

I wonder if a slightly lower input resolution (e.g. 1280×1280) could help with this?


  • Author
  • Cadet
  • October 9, 2025

Hi ​@Spanner,

Thanks for your reply. Yes, a lower input resolution works fine. I was successfully able to deploy a model with a custom input resolution of 1504×1088.

We anticipate to use a higher resolution (e.g., 1504×1504) in the near future. Therefore, I am checking the available possibilities for achieving this on the Metis PCIe card.


  • Axelera Team
  • October 22, 2025

Hi ​@dany,

Thank you for reaching out and bringing this issue to our attention.

The error you're encountering is unusual and warrants further investigation. The compiler should successfully compile your model within the L1 memory limit, but something about the model or operator is causing the compiler to get confused. It could also be that the model shapes limit the compiler's ability to apply the transformations required to fit the operator to the hardware constraints.

To help diagnose the problem, would it be possible for you to provide details on the operator (type, input/output shapes) and/or information on the model (ONNX file or architecture details)?

This will enable us to identify the root cause more efficiently.

Best,
Fabian


  • Author
  • Cadet
  • October 22, 2025

Hi Fabian,

Thank you for the follow up.

I used the Yolo11n model architecture and the yolo11n-coco-onnx.yaml file for deployment. The input ONNX model is yolo11n_1504x1504.onnx, and the input tensor shape is [1, 3, 1504, 1504]

The only functional modifications made to this file were setting the ONNX weight path and fixing the input tensor shape, as shown in the snippet below:

models:
yolo11n-coco-onnx:
class: AxONNXModel
class_path: $AXELERA_FRAMEWORK/ax_models/base_onnx.py
weight_path: $AXELERA_FRAMEWORK/customers/test_coco/yolo11n_1504x1504.onnx
# weight_url: https://media.axelera.ai/artifacts/model_cards/weights/yolo/object_detection/yolo11n.onnx
# weight_md5: d00108b1a46170499d67a80825531d26
task_category: ObjectDetection
input_tensor_layout: NCHW
input_tensor_shape: [1, 3, 1504, 1504]
input_color_format: RGB
num_classes: 80
dataset: CocoDataset-COCO2017
extra_kwargs:
cal_seed: 129
num_cal_images: 100
compilation_config:
quantization_scheme: per_tensor_min_max
ignore_weight_buffers: false