Skip to main content
Question

QuantizationError "core grpah has several inputs" when compiling Cellpose cyto3 ONNX - graph cleaner splits mid-residual block

  • June 9, 2026
  • 1 reply
  • 7 views

Hardware: Axelera Metis PCIe  |  SDK: Voyager v1.6  |  OS: Ubuntu

Context

I am trying to compile Cellpose cyto3 (a cell segmentation model) for the Metis AIPU as part of a benchmarking project comparing CPU and AIPU inference performance on kidney electron microscopy patches (512×512, uint8).

What I have done so far

The ONNX export now works cleanly. I wrap the Cellpose network in a small PyTorch wrapper that replaces the make_style branch with a zeros tensor (the make_style branch uses Flatten/Pow/ReduceSum/Div which are unsupported). Export produces:

  • Inputs: 1 — input, shape [1, 2, 512, 512]
  • Outputs: 1 — output, shape [1, 3, 512, 512]
  • Nodes: 215 — op types: Add (32), BatchNorm (33), Conv (41), Gemm (12), MaxPool (3), Relu (33), Resize (3), Unsqueeze (24), Constant (18)
  • opset: 11  |  file size: 25.3 MB

The error

When I call compiler.quantize() from the Python API I get:

QuantizationError: Input model cannot be quantized because the core graph has several inputs that are not currently supported by Quantizer.

What the graph cleaner actually does

Using graph_cleaner_dump_core_onnx I can see that the core graph produced by the graph cleaner has 2 inputs instead of 1:

/downsample/res_down_0/conv_0/conv_0.1/Relu_output_0
/downsample/res_down_0/proj/proj.0/BatchNormalization_output_0

These are intermediate tensors from inside the first residual block of the downsample path — not real model inputs. The graph cleaner is splitting the network mid-residual-block, putting the first few nodes into the preamble and leaving the rest of the block as the core. This creates two dangling inputs into the core.

What I have tried

  • Removing all Identity nodes from the ONNX graph before compilation
  • Removing the Gemm and Unsqueeze nodes left over from the make_style branch
  • Various graph_cleaner_condition and CompilerConfig settings — most are immutable

My questions

  1. Is there a known issue with the graph cleaner and residual blocks at the start of a network? Is there a configuration option to tell it where to make the cut?
  2. Would using deploy.py with a YAML file instead of the Python API avoid this issue? If so, could you share a minimal YAML template for a custom ONNX model with tensor input (not image input)?
  3. Would running onnxsim on the graph before compilation help?
  4. Has anyone successfully compiled Cellpose cyto3 for the Metis? If so, what ONNX export recipe was used?

Happy to share the ONNX file

1 reply

  • Author
  • Cadet
  • June 9, 2026

Update — additional things tried:

I have made more progress on the setup side but the core error persists. Here is what I tried since the original post:

1. Switched to deploy.py with a YAML file I got the full deploy.py workflow working correctly — the YAML loads, the model is found, the ONNX is loaded, the 20 calibration images are found. However deploy.py hits the exact same QuantizationError as the Python API, so the issue is in the compiler itself not in how I was calling it.

2. Ran onnxsim on the ONNX graph onnxsim successfully simplified the graph and removed all 12 Gemm nodes and all 24 Unsqueeze nodes (these were leftover from the make_style branch). The simplified graph now contains only: Add (32), BatchNormalization (33), Conv (41), MaxPool (3), Relu (33), Resize (3), Constant (212). File size went from 25.3 MB to 23.9 MB. Despite this much cleaner graph, the same QuantizationError occurs.

3. Confirmed the core graph split Using graph_cleaner_dump_core_onnx on the simplified graph, the core graph still has 2 inputs — the graph cleaner is cutting inside the first residual block regardless of the Gemm/Unsqueeze nodes being present or not.

Conclusion: The graph cleaner is consistently misidentifying the start of the first residual block as a pre/post-processing boundary and splitting there. This happens with both the original and simplified ONNX, and with both the Python API and deploy.py. The ONNX graph itself appears clean — 1 input, 1 output, standard CNN operations only. The cut point is always the same two tensors from inside res_down_0.

Is there a way to explicitly tell the graph cleaner where to make the cut, or to disable the split entirely for a model that has no pre/post-processing?