Recently I tried to update the firmware of a Mellanox ConnectIB HCA. However, even trying to query the state of the hca was not possible:
> mstflint -d 81:00.0 q -E- Cannot open Device: 81:00.0. No such device MFE_REG_ACCESS_FAILED dmesg reported: mlx5_ib: Mellanox Connect-IB Infiniband driver v1.0 (June 2013) mlx5_ib 0000:81:00.0: setting latency timer to 64 mlx5_ib 0000:81:00.0: firmware version: 10.0.2410 mlx5_ib 0000:81:00.0: Driver cmdif rev(5) differs from firmware's(3) mlx5_ib 0000:81:00.0: Failed initializing command interface, aborting mlx5_ib: probe of 0000:81:00.0 failed with error -22
After removing Mellanox OFED rpms and installing the distributions mstflint rpm, I got a more helpful error message:
> mstflint -d 81:00.0 q -W- Unknown dev id: 0xbadacce5 Warning: memory access to device 81:00.0 failed: No such device or address. Warning: Fallback on IO: much slower, and unsafe if device in use. -E- Can not open 81:00.0: Flash cache replacement is active. -E- Please use the -override_cache_replacement option in order to access the flash directly.
mstflint -d 81:00.0 -override_cache_replacement q
I was able to query the hca. The “-override_cache_replacement” flag also allowed to flash an up-to-date firmware.
-override_cache_replacement flag also helps when the following error is thrown:
[root@fry21 ~]# mstflint -d 01:00.0 q -E- Cannot open Device: 01:00.0. No such device. MFE_MAD_SEND_ERR