Skip to main content

Hi folks,

after I resolved some of the issues I had with my Metis PCIe device, I stopped working with it since it was so loud that I got headache after 30min of runtime.

Now I realized that there is a new FW which silences the fan and I upgraded the SDK and the FW:

(venv) root@holodeck7:/voyager-sdk# axdevice -v
INFO: Found PCI device: 01:00.0 Processing accelerators: Axelera AI Metis AIPU (rev 02)
INFO: Found AIPU driver: metis 90112 0
INFO: Firmware version matches: v1.3.1
INFO: Using device metis-0:1:0
Device 0: metis-0:1:0 4GiB pcie flver=1.2.0-rc2 bcver=1.0 clock=800MHz(0-3:800MHz) mvm=0-3:100%
device_runtime_firmware=v1.3.1
board_controller_board_type=matterhorn
sw_throttling: 200°C, hysteresis 5°C, throttle rate:12%
hw_throttling: 105°C, hysteresis 10°C
pvt_warning_threshold: 95°C

BUT the fan keeps on spinning at max although the Metis is not being used at all.

Tried to find means to read-out the actual temperature, which does not seem to work:

(venv) root@holodeck7:/voyager-sdk# triton_multi_ctx --board-temp
blibtriton_linux.c:1082] Device communication timed out: device did not respond within 1 seconds. (705)
Failed to read sensor temperature from the board controller

I also tried to reduce the chip clock to the minimum:

(venv) root@holodeck7:/voyager-sdk# axdevice -v
INFO: Found PCI device: 01:00.0 Processing accelerators: Axelera AI Metis AIPU (rev 02)
INFO: Found AIPU driver: metis 90112 0
INFO: Firmware version matches: v1.3.1
INFO: Using device metis-0:1:0
Device 0: metis-0:1:0 4GiB pcie flver=1.2.0-rc2 bcver=1.0 clock=100MHz(0-3:100MHz) mvm=0-3:100%
device_runtime_firmware=v1.3.1
board_controller_board_type=matterhorn
sw_throttling: 200°C, hysteresis 5°C, throttle rate:12%
hw_throttling: 105°C, hysteresis 10°C
pvt_warning_threshold: 95°C

Still, Metis is idle but the fan is spinning at max. I am ok when the fan boosts when the card is being used, but when it is idle I expect the fan to just stop. I can’t use this PC unless I unplug the Metis PCIe card, which is a pita and might wear out the PCIe connector over time…

 

So simple question: how can I control the fan? Most relevant: I want the fan to stop when device is idle.

 

Thanks

Hi zefir

It looks like your board controller firmware version is 1.0 (bcver=1.0). 
Follow these steps to update the bc-firmware (from fw update tutorial)
 

$ cd firmware_release_public_v1.3.0
$ ./flash_update.sh --bc-update

NOTICE: This is not a failsafe or revertible function. Removing power during this process might brick your device. Verify to proceed with "y" when prompted. If you run into any issues, contact an FAE or consult the community support.

 

This should make working with Metis comfortable for your ears😀


Ah yes, FW 1.3 does now include dynamic fan control, but it’s not directly controllable. And it won’t shut down fully, I believe - just reduce when the device is either not in use, or after it’s had time to cool down after being used.


I.CAN.HEAR.AGAIN!!

Thank you ​@emustafa - fan is now almost silent.

Is the last line something to be concerned of / needs to be fixed?

(venv) root@holodeck7:/voyager-sdk/firmware_release_public_v1.3.0# axdevice -v
INFO: Found PCI device: 01:00.0 Processing accelerators: Axelera AI Metis AIPU (rev 02)
INFO: Found AIPU driver: metis 90112 0
INFO: Current firmware version v1.3.1-stage0 != required version v1.3.1

 


I.CAN.HEAR.AGAIN!!

Thank you ​@emustafa - fan is now almost silent.

Is the last line something to be concerned of / needs to be fixed?

(venv) root@holodeck7:/voyager-sdk/firmware_release_public_v1.3.0# axdevice -v
INFO: Found PCI device: 01:00.0 Processing accelerators: Axelera AI Metis AIPU (rev 02)
INFO: Found AIPU driver: metis 90112 0
INFO: Current firmware version v1.3.1-stage0 != required version v1.3.1

 

Hi ​@zefir ,

Glad you were able to update your board controller and your firmware. I agree, the low fan speed is amazing

While this last line is not “supposed” to happen, I’ve had it happen myself and there’s no need to worry. Generally there’s no issue here. If you run into firmware issues, you can always just load the correct firmware into volatile memory using axdevice --refresh.


Reply