Not logged inCSS-Forum
Forum CSS-Online Help Search Login
CSS-Shop Impressum Datenschutz
Up Topic Hauptforen / CSS-Forum / Lc0 v0.24 dev DX backend für AMD Radeon GPU
- - By Damir Desevac Date 2020-02-10 11:42 Upvotes 1
Hallo,

Das offizielle Lc0-Release mit DX-Backend und LogitQ ist noch nicht verfügbar. Aber Sie können es mit dieser Version testen:

https://www.chess2u.com/t13501-lc0-v0-24-dev-dx-backend-for-amd-radeon-gpu#92389

https://www.dropbox.com/s/66fojbhpagc5jlk/lc0-test-windows-gpu-dx22.zip?dl=0

Anforderungen:
Betriebssystem: Windows 10 aktualisiert
AMD-Treiber: Neueste> = 20.1

DX ist 3-4 x schneller als opencl, aber langsamer als cudnn-fp16, daher wird es nur für AMD Radeon GPU empfohlen. Sie können es aber auch auf Nvidia GPU testen.
hier einige benchmarks
Code:
Nvidia RTX 2060
===============
Network          Net-Size   OpenCL    cudnn-fp32  dx-fp32     cudnn-fp16   dx-fp16
----------------------------------------------------------------------------------
T59  59611       128x10     11624     35470        31088       88009        51326
T30  32390       256x20      2225      4972         6459       15137        17391
T40  42850       256x20      1371      4571          318*      12780        14198
T60  61996       320x24      1555      3421         3317        8418         7669
SV-big-t40-1705  384x30       856      1854          104*       4955         4799
SV-huge-50       512x40       361       802           29*       2241          121*

* - schlechte Leistung aufgrund eines Treiberfehlers (wird hoffentlich bald behoben).

Nvidia RTX Titan
================
Network          Net-Size   cudnn-fp32  dx-fp32     cudnn-fp16   dx-fp16
------------------------------------------------------------------------
T59  59611       128x10     70266       50523       123414       73675
T30  32390       256x20     14438       14173        40452       42276
T40  42850       256x20     12932       13113        36753       38388
T60  61996       320x24      8137        7505        17472       18578
SV-big-t40-1705  384x30      4143        3933        11054       13086
SV-huge-50       512x40      1942        1918         4844        7015
Code:
AMD RX 5700XT (Navi)
====================
Network          Net-Size   OpenCL   dx-fp32   dx-fp16
-------------------------------------------------------
T59  59611       128x10      12095   37845      55888
T30  32390       256x20       1505    5814      10198
T40  42850       256x20        900    4666       8041
T60  61996       320x24       1774    2874       5183
SV-big-t40-1705  384x30          .    1479       2801
SV-huge-50       512x40          .       *       1080

AMD RX Vega VII
===============
Network          Net-Size   OpenCL    dx-fp32   dx-fp16
-------------------------------------------------------
T59  59611       128x10     14889     46099     59721
T30  32390       256x20     1525       8055     12821
T40  42850       256x20     2754       6168      9156
T60  61996       320x24     2490       3722      6078
SV-big-t40-1705  384x30     1426       2129      3254
SV-huge-50       512x40        .        852      1323
Parent - - By Lothar Jung Date 2020-02-10 17:38
Und hier die entsprechende angekündigte Karte von AMD:

https://www.heise.de/newsticker/meldung/AMD-Grafikchip-Arcturus-Compute-Monster-mit-8192-Shader-Kernen-4657336.html

Ohne 3D- und Textureinheiten mit INT8 und sehr schnell.
Parent - - By Peter Martan Date 2020-02-10 18:24
Danke, Lothar!
Parent - - By Lothar Jung Date 2020-02-10 21:14 Upvotes 1
Hier noch eine neue AMD-Workstation Karte für den mittleren Preisbereich:

https://www.heise.de/newsticker/meldung/AMD-Radeon-Pro-W5500-Workstation-Grafikkarte-mit-8-GByte-fuer-400-US-Dollar-4656899.html

Nach Intel greift AMD jetzt auch Nvidia preislich und leistungsmässig an.

Lothar
Parent - - By Eduard Nemeth Date 2020-02-13 13:05
Hier der Thread im TalkChess Forum

http://talkchess.com/forum3/viewtopic.php?f=2&t=73045

Kann jemand hier im Forum erklären wie dieses dx funktioniert?

Ich habe die Engine heruntergeladen und installiert. Es scheint dass diese Engine keine Cuda-Treiber benötigt. Bei mir laufen die Netze dennoch genauso schnell. Warum ist das so? Ich habe keine AMD GPU sondern Nvidia cuda.
Parent - - By Lothar Jung Date 2020-02-13 14:32 Edited 2020-02-13 14:34
dx funktioniert auch mit Nvidia, scaled aber etwas anders:

<@!143240917602664448>, On nvidia hardware, the dx backend uses a different convolution algorithm with fp16 (winograd) which scales better with bigger networks. I found it to be slightly less precise than the direct algorithm used by cudnn-fp16 backend. On AMD, we just call Convolution metacommand (so we don't know what algorithm AMD's driver uses), and that is slightly more precise than even cudnn-fp16.

I don't think this slight difference in precision is causing any differences in lc0's playing strength although in my LTC tests (200 games match with a T60 network vs SF11) I found cudnn-fp16 backend to be slightly better than dx-fp16 backend - but the results were within error bars. I also suspect it could be due to better speed of cudnn-fp16 backend for small batch sizes, so I want to also run some fixed-nodes test.
Parent - - By Eduard Nemeth Date 2020-02-13 14:56
Wie stellt man dx-fp16 backend ein? Ich sehe nur dx stehen.
Parent - - By Lothar Jung Date 2020-02-13 15:20 Edited 2020-02-13 15:24
How should we name our `--backend=` parameter for DirectX 12 backend?
- -backend=dx
- -backend=dx-fp16
Parent - - By Eduard Nemeth Date 2020-02-13 15:30
Danke!
Parent - By Lothar Jung Date 2020-02-14 09:04
Threads from talkchess:

Very interesting. On my RTX 2070 this backend is only 10% slower with T40 nets than using cudnn-fp16 backend, only 2% slower with T60 nets, and about 15% faster than cudnn-fp16 with the huge SV net 512x40b.

It seems so. Here I checked at 30s + 0.3s the strength in a sanity check with 384x30b SV net 1538, one of hte strongest nets to LTC on a strong GPU.

Score of LargeNet_1538_dx vs LargeNet_1538_cudnn: 115 - 81 - 204 [0.542]
Elo difference: 29.6 +/- 23.8, LOS: 99.2 %, DrawRatio: 51.0 %

400 of 400 games finished.

Normalized Elo (pentanomial): 0.170 +/- 0.050 (1 SD)

DX performs better strength-wise too on these larger nets than cuDNN on an RTX 2070 GPU.
Up Topic Hauptforen / CSS-Forum / Lc0 v0.24 dev DX backend für AMD Radeon GPU

Powered by mwForum 2.29.3 © 1999-2014 Markus Wichitill