Skip to main content
Question

Custom model with a CBAM attention block fails AIPU quantization looks like a MatMul to Gemm fusion issue on a dual-branch attention pattern

  • June 27, 2026
  • 0 replies
  • 2 views

Hi,

I'm trying to deploy a custom classification model on the Metis AIPU using the Voyager SDK. The model includes a CBAM attention block, but it fails during quantization with a topological sort error that points to a Gemm node.

To verify that the issue wasn't in my model, I inspected the ONNX graph before quantization. The CBAM block contains only MatMul operations and no Gemm nodes. The only Gemm nodes in the model are in the final classifier head. Because of that, the Gemm mentioned in the error doesn't actually exist in the exported ONNX graph. It looks like the compiler is creating it internally during a MatMul to Gemm fusion pass, and somehow placing it before the node that produces one of its inputs.

The graph pattern that seems to trigger this consists of two parallel MatMul branches that merge into a shared Add operation. As far as I know, this is a standard CBAM channel attention structure and not an unusual graph layout.

Is this a known limitation of the compiler when handling this type of dual branch pattern? Also, is there a way to disable the MatMul to Gemm fusion pass or another recommended workaround?