GPU Shaders: Difference between revisions

Rodrigo (talk | contribs)
Created page with "= Overview = Like most D3D11-era GPUs the Tegra X1 has 5 pipeline shader stages: Vertex, tessellation control, tessellation evaluation, geometry and fragment. Maxwell GPUs hav..."
 
No edit summary
 
(5 intermediate revisions by one other user not shown)
Line 13: Line 13:


== Registers ==
== Registers ==
Maxwell GPUs have 254 type-less general purpose registers and one special register with id 255, ''nvdisasm'' shows it as RZ and ''envydis'' as 0x0. Writing here is a no-op unless there are side effects. Reading from RZ is returns zero. The fewer registers a shader uses, the more it can be parallelized.
Maxwell GPUs have 255 type-less general purpose registers and one special register with id 255, ''nvdisasm'' shows it as RZ and ''envydis'' as 0x0. Writing here is a no-op unless there are side effects. Reading from RZ returns zero. The fewer registers a shader uses, the more it can be parallelized.


General purpose registers or GPRs are 32 bits long and are the same for all operations, these are given meaning on the instructions. Half float instructions are SIMD and operate on 16 bit pairs, meanwhile double instructions take two registers to operate. uint64 instructions are read two subsequent registers but operate on individual uint32 values extending their domain through the carry flag.
General purpose registers or GPRs are 32 bits long and are the same for all operations, these are given meaning on the instructions. Half float instructions are SIMD and operate on 16 bit pairs, meanwhile double instructions take two registers to operate. uint64 values are emulated using uint32 instructions extending their domain through condition codes; when an instruction has to read an uint64 value it reads two subsequent registers.


It is a common technique to read or write subsequent registers with a single instruction. For example TEXS (used to sample a texture) reads the texture coordinates from Ra and Ra+1, although it gets more complex with other layouts like 3D textures (it's not Ra+2). The result of the sample is given in an Rd and their subsequent registers.
It is a common technique to read or write subsequent registers with a single instruction. For example TEXS (used to sample a texture) reads the texture coordinates from Ra and Ra+1, although it gets more complex with other layouts like 3D textures (it's not Ra+2). The result of the sample is given in an Rd and their subsequent registers.
TODO Add nvdisasm example


== Predicates ==
== Predicates ==
There are 5 general purpose "1-bit" predicates. These can be used to conditionally execute a given instructions. There are 2 more predicates which evaluate as always true and always false.
There are 6 general purpose "1-bit" predicates. These can be used to conditionally execute a given instructions. There is an extra predicate that always evaluates as true (and false when negated), writing here is a no-op unless there are side-effects. It is shown as "PT" in ''nvdisasm''.


Most of the time predicates can be negated.
Most of the time predicates can be negated.
TODO Add nvdisasm example


== Condition codes ==
== Condition codes ==
Line 34: Line 30:


= Analysis Tools =
= Analysis Tools =
== Disassembling ==
== Disassembling ==
To disassemble a Maxwell shader or compute kernel there are two applications:
To disassemble a Maxwell shader or compute kernel there are two applications:
Line 61: Line 56:


Since it's the product of reverse engineering the GPU and ''nvdisasm'' itself, it might contain errors.
Since it's the product of reverse engineering the GPU and ''nvdisasm'' itself, it might contain errors.
== Dumping existing shaders ==
One method to dump shaders from commercial games and homebrew applications is running it on an emulator.
Ryujinx[https://ryujinx.org/#/] offers in its configuration file an entry to dump the encountered shaders in two directories. ''Code'' contains shader dumps that are compatible with 'nvdisasm' and 'envydis' as they are. ''Full'' is the same as ''Code'' but it includes the header, it's intended to be decompiled with ''Ryujinx.ShaderTools''.
yuzu[https://yuzu-emu.org/] can dump shaders using its disk-based shader cache with an external tool. ''maxwell-dump''[https://gist.github.com/ReinUsesLisp/7ba72d3162e60cab283194fcca3474b2] is a small C application for this, it follows the same convention as Ryujinx for ''Code'' and ''Full''. To find the shader file dumped by the emulator, right click the game from the Qt front-end and select the option to view the transferable cache file. It's important to highlight that the file format might change in the future leaving this tool unable to extract the binaries until it's updated.
= Maxwell Instruction Set Architecture =
TODO Define instructions