GPU Shaders: Difference between revisions
Created page with "= Overview = Like most D3D11-era GPUs the Tegra X1 has 5 pipeline shader stages: Vertex, tessellation control, tessellation evaluation, geometry and fragment. Maxwell GPUs hav..." |
No edit summary |
||
(5 intermediate revisions by one other user not shown) | |||
Line 13: | Line 13: | ||
== Registers == | == Registers == | ||
Maxwell GPUs have | Maxwell GPUs have 255 type-less general purpose registers and one special register with id 255, ''nvdisasm'' shows it as RZ and ''envydis'' as 0x0. Writing here is a no-op unless there are side effects. Reading from RZ returns zero. The fewer registers a shader uses, the more it can be parallelized. | ||
General purpose registers or GPRs are 32 bits long and are the same for all operations, these are given meaning on the instructions. Half float instructions are SIMD and operate on 16 bit pairs, meanwhile double instructions take two registers to operate. uint64 | General purpose registers or GPRs are 32 bits long and are the same for all operations, these are given meaning on the instructions. Half float instructions are SIMD and operate on 16 bit pairs, meanwhile double instructions take two registers to operate. uint64 values are emulated using uint32 instructions extending their domain through condition codes; when an instruction has to read an uint64 value it reads two subsequent registers. | ||
It is a common technique to read or write subsequent registers with a single instruction. For example TEXS (used to sample a texture) reads the texture coordinates from Ra and Ra+1, although it gets more complex with other layouts like 3D textures (it's not Ra+2). The result of the sample is given in an Rd and their subsequent registers. | It is a common technique to read or write subsequent registers with a single instruction. For example TEXS (used to sample a texture) reads the texture coordinates from Ra and Ra+1, although it gets more complex with other layouts like 3D textures (it's not Ra+2). The result of the sample is given in an Rd and their subsequent registers. | ||
== Predicates == | == Predicates == | ||
There are | There are 6 general purpose "1-bit" predicates. These can be used to conditionally execute a given instructions. There is an extra predicate that always evaluates as true (and false when negated), writing here is a no-op unless there are side-effects. It is shown as "PT" in ''nvdisasm''. | ||
Most of the time predicates can be negated. | Most of the time predicates can be negated. | ||
== Condition codes == | == Condition codes == | ||
Line 34: | Line 30: | ||
= Analysis Tools = | = Analysis Tools = | ||
== Disassembling == | == Disassembling == | ||
To disassemble a Maxwell shader or compute kernel there are two applications: | To disassemble a Maxwell shader or compute kernel there are two applications: | ||
Line 61: | Line 56: | ||
Since it's the product of reverse engineering the GPU and ''nvdisasm'' itself, it might contain errors. | Since it's the product of reverse engineering the GPU and ''nvdisasm'' itself, it might contain errors. | ||