Hardware: Axelera Metis PCIe | SDK: Voyager v1.6 | OS: Ubuntu
Context
I am trying to compile Cellpose cyto3 (a cell segmentation model) for the Metis AIPU as part of a benchmarking project comparing CPU and AIPU inference performance on kidney electron microscopy patches (512×512, uint8).
What I have done so far
The ONNX export now works cleanly. I wrap the Cellpose network in a small PyTorch wrapper that replaces the make_style branch with a zeros tensor (the make_style branch uses Flatten/Pow/ReduceSum/Div which are unsupported). Export produces:
- Inputs: 1 —
input, shape[1, 2, 512, 512] - Outputs: 1 —
output, shape[1, 3, 512, 512] - Nodes: 215 — op types: Add (32), BatchNorm (33), Conv (41), Gemm (12), MaxPool (3), Relu (33), Resize (3), Unsqueeze (24), Constant (18)
- opset: 11 | file size: 25.3 MB
The error
When I call compiler.quantize() from the Python API I get:
What the graph cleaner actually does
Using graph_cleaner_dump_core_onnx I can see that the core graph produced by the graph cleaner has 2 inputs instead of 1:
/downsample/res_down_0/proj/proj.0/BatchNormalization_output_0
These are intermediate tensors from inside the first residual block of the downsample path — not real model inputs. The graph cleaner is splitting the network mid-residual-block, putting the first few nodes into the preamble and leaving the rest of the block as the core. This creates two dangling inputs into the core.
What I have tried
- Removing all Identity nodes from the ONNX graph before compilation
- Removing the Gemm and Unsqueeze nodes left over from the make_style branch
- Various graph_cleaner_condition and CompilerConfig settings — most are immutable
My questions
- Is there a known issue with the graph cleaner and residual blocks at the start of a network? Is there a configuration option to tell it where to make the cut?
- Would using
deploy.pywith a YAML file instead of the Python API avoid this issue? If so, could you share a minimal YAML template for a custom ONNX model with tensor input (not image input)? - Would running
onnxsimon the graph before compilation help? - Has anyone successfully compiled Cellpose cyto3 for the Metis? If so, what ONNX export recipe was used?
Happy to share the ONNX file