Fedora 40 Looks To Ship AMD ROCm 6 For End-To-End Open-Source GPU Acceleration

ylai@lemmy.ml · 1 year ago

Fedora 40 Looks To Ship AMD ROCm 6 For End-To-End Open-Source GPU Acceleration

AlmightySnoo 🐢🇮🇱🇺🇦@lemmy.world · edit-2 1 year ago

HIP is amazing. For everyone saying “nah it can’t be the same, CUDA rulez”, just try it, it works on NVidia GPUs too (there are basically macros and stuff that remap everything to CUDA API calls) so if you code for HIP you’re basically targetting at least two GPU vendors. ROCm is the only framework that allows me to do GPGPU programming in CUDA style on a thin laptop sporting an AMD APU while still enjoying 6 to 8 hours of battery life when I don’t do GPU stuff. With CUDA, in terms of mobility, the only choices you get are a beefy and expensive gaming laptop with a pathetic battery life and heating issues, or a light laptop + SSHing into a server with an NVidia GPU.

Molecular0079@lemmy.world · 1 year ago

The problem with ROCm is that its very unstable and a ton of applications break on it. Darktable only renders half an image on my Radeon 680M laptop. HIP in Blender is also much slower than Optix. We’re still waiting on HIP-RT.

AlmightySnoo 🐢🇮🇱🇺🇦@lemmy.world · 1 year ago

ROCm is that its very unstable

That’s true, but ROCm does get better very quickly. Before last summer it was impossible for me to compile and run HIP code on my laptop, and then after one magic update everything worked. I can’t speak for rendering as that’s not my field, but I’ve done plenty of computational code with HIP and the performance was really good.

But my point was more about coding in HIP, not really about using stuff other people made with HIP. If you write your code with HIP in mind from the start, the results are usually good and you get good intuition about the hardware differences (warps for instance are of size 32 on NVidia but can be 32 or 64 on AMD and that makes a difference if your code makes use of warp intrinsics). If however you just use AMD’s CUDA-to-HIP porting tool, then yeah chances are things won’t work on the first run and you need to refine by hand, starting with all the implicit assumptions you made about how the NVidia hardware works.

filister@lemmy.world · 1 year ago

How is the situation with ROCm using consumer GPUs for AI/DL and pytorch? Is it usable or should I stick to NVIDIA? I am planning to buy a GPU in the next 2-3 months and so far I am thinking of getting either 7900XTX or the 4070 Ti Super, and wait to see how the reviews and the AMD pricing will progress.

AlmightySnoo 🐢🇮🇱🇺🇦@lemmy.world · edit-2 1 year ago

Works out of the box on my laptop (the export below is to force ROCm to accept my APU since it’s not officially supported yet, but the 7900XTX should have official support):

Last year only compiling and running your own kernels with hipcc worked on this same laptop, the AMD devs are really doing god’s work here.

filister@lemmy.world · edit-2 1 year ago

Anything that is still broken or works better on CUDA? It is really hard to get the whole picture on how things are on ROCm as the majority of people are not using it and in the past I did some tests and it wasn’t working well.

AlmightySnoo 🐢🇮🇱🇺🇦@lemmy.world · edit-2 1 year ago

Hard to tell as it’s really dependent on your use. I’m mostly writing my own kernels (so, as if you’re doing CUDA basically), and doing “scientific ML” (SciML) stuff that doesn’t need anything beyond doing backprop on stuff with matrix multiplications and elementwise nonlinearities and some convolutions, and so far everything works. If you want some specific simple examples from computer vision: ResNet18 and VGG19 work fine.