A few points:
* You can do exact int8 with fp16 hardware as there are enough mantissa bits, however getting it to be bit by bit exact will be rather slow.
* AFAIK most GPUs can do int8, but tensor core support may require a newer one.
* Lc0 currently only supports int8 inside onnx files and performance (and whether it works at all) depends a lot on what the specific onnxruntime provider does.
* There are tools to convert models to int8, but performance again is not so great, more so if the activation functions are not relu.
* I'm attaching an int8 version of 744204 that will run with the onnx backend if anyone wants to try it, it is known to work with onnx-cpu.
https://discord.com/channels/425419482568196106/425419999096733706/1350863506148294738