GPU: Difference between revisions
this is good stuff |
No edit summary |
||
Line 1: | Line 1: | ||
==Mapping Memory== | == Mapping Memory == | ||
First, to map a memory region on the GPU Address Space, caching needs to be disabled by using [[SVC#svcSetMemoryAttribute|svcSetMemoryAttribute]]. The Address passed is the Virtual Address of the region that will be mapped, the size is the region size, and State0/1 are both set to 8 to disable caching of the memory region. This is done to ensure that the GPU can actually "see" the data written there, and it doesn't get stuck on some cache. | First, to map a memory region on the GPU Address Space, caching needs to be disabled by using [[SVC#svcSetMemoryAttribute|svcSetMemoryAttribute]]. The Address passed is the Virtual Address of the region that will be mapped, the size is the region size, and State0/1 are both set to 8 to disable caching of the memory region. This is done to ensure that the GPU can actually "see" the data written there, and it doesn't get stuck on some cache. | ||
Line 7: | Line 7: | ||
The above process is used to map all data that will be used by the GPU, like Textures, Command Lists (a.k.a. Push Buffers), Vertex/Index buffers and Shaders. They usually have their own mapping, but Command Lists can share the same mapping. | The above process is used to map all data that will be used by the GPU, like Textures, Command Lists (a.k.a. Push Buffers), Vertex/Index buffers and Shaders. They usually have their own mapping, but Command Lists can share the same mapping. | ||
==Commands | == FIFO Commands == | ||
The GPU implements a variation of Tegra's push buffer format for it's PFIFO engine. PFIFO is a special engine responsible for receiving user command lists and routing them to the appropriate engines (2D, 3D, DMA). | |||
Commands are submitted to the GPU's PFIFO engine through [[NV_services#NVGPU_IOCTL_CHANNEL_SUBMIT_GPFIFO|NVGPU_IOCTL_CHANNEL_SUBMIT_GPFIFO]]. | |||
This ioctl takes an array of gpfifo entries where each entry points to a FIFO command list. This list is composed of alternating 32-bit words containing FIFO commands and their respective arguments. | |||
=== Command Structure === | |||
{| class="wikitable" | {| class="wikitable" | ||
Line 19: | Line 23: | ||
|- | |- | ||
|12-0 | |12-0 | ||
| | |Method | ||
|- | |- | ||
|15-13 | |15-13 | ||
| | |Subchannel | ||
|- | |- | ||
|28-16 | |28-16 | ||
| | |Argument count (in 32-bits Words) or inline data (see below) | ||
|- | |- | ||
|31-29 | |31-29 | ||
| | |[[#Submission_mode|Submission mode]] | ||
|} | |} | ||
==== | Note: Methods are treated as 4-byte addressable locations, and hence their numbers are written down multiplied by 4. | ||
Note: The command's arguments, when present, follow the command word immediately. | |||
==== Submission mode ==== | |||
{| class="wikitable" | {| class="wikitable" | ||
Line 38: | Line 46: | ||
! scope="col"| Description | ! scope="col"| Description | ||
! scope="col"| Offical name | ! scope="col"| Offical name | ||
|- | |||
|0 | |||
|Increasing mode (old) | |||
| | |||
|- | |- | ||
|1 | |1 | ||
| | |Increasing mode - Tells PFIFO to read as much arguments as specified by '''argument count''', while automatically incrementing the '''method''' value. This means that each argument will be written to a different method location. | ||
|INCR | |INCR | ||
|- | |||
|2 | |||
|Non-increasing mode (old) | |||
| | |||
|- | |- | ||
|3 | |3 | ||
| | |Non-increasing mode - Tells PFIFO to read as much arguments as specified by '''argument count'''. However, all arguments will be written to the same method location. | ||
|NONINCR | |NONINCR | ||
|- | |- | ||
|4 | |4 | ||
|Inline | |Inline mode - Tells PFIFO to read '''inline data''' from bits 28-16 of the command word, thus eliminating the need to pass additional words for the arguments. | ||
|IMM | |IMM | ||
|- | |- | ||
|5 | |5 | ||
| | |Increase-once mode - Tells PFIFO to read as much arguments as specified by '''argument count''' and automatically increments the '''method''' value once only. | ||
| | | | ||
|} | |} | ||
=== Command List === | |||
All methods with values < 0x100 are special and executed by the PFIFO's DMA puller. The others are forwarded to the engine object currently bound to a given subchannel. | |||
{| class="wikitable" border="1" | |||
|- | |||
! Command || Method || Subchannel || Arg Count || Mode || Name | |||
|- | |||
| 0x2001?000 || 0x000 || Variable || 1 || 1 || [[#BindObject|BindObject]] | |||
|- | |||
| 0xA0020E00 || 0xE00 || 0 || 2 || 5 || BeginTransformFeedback | |||
|- | |||
| 0xA0030E30 || 0xE30 || 0 || 3 || 5 || DrawArrays | |||
|- | |||
| 0xA0050E36 || 0xE36 || 0 || 5 || 5 || DrawElements | |||
|- | |||
| 0xA0020E2E || 0xE2E || 0 || 2 || 5 || PopDebugGroupId | |||
|- | |||
| 0xA0040E2C || 0xE2C || 0 || 4 || 5 || PushDebugGroup | |||
|- | |||
| 0x2001054C || 0x54C || 0 || 1 || 1 || ResetCounter | |||
|- | |||
| 0x8001047F || 0x47F || 0 || 1 || 4 || ResolveDepthBuffer | |||
|- | |||
| 0x200104C4 || 0x4C4 || 0 || 1 || 1 || SetAlphaRef | |||
|- | |||
| 0x200404C7 || 0x4C7 || 0 || 4 || 1 || SetBlendColor | |||
|- | |||
| 0x2001064F || 0x6F4 || 0 || 1 || 1 || SetDepthClamp | |||
|- | |||
| 0x200200CD || 0xCD || 0 || 2 || 1 || SetInnerTessellationLevels | |||
|- | |||
| 0x200204EC || 0x4EC || 0 || 2 || 1 || SetLineWidth | |||
|- | |||
| 0x200400C9 || 0xC9 || 0 || 4 || 1 || SetOuterTessellationLevels | |||
|- | |||
| 0x8???0373 || 0x373 || 0 || Variable || 4 || SetPatchSize | |||
|- | |||
| 0x20010546 || 0x546 || 0 || 1 || 1 || SetPointSize | |||
|- | |||
| 0x20030554 || 0x554 || 0 || 3 || 1 || SetRenderEnableConditional | |||
|- | |||
| 0x200403EF || 0x3EF || 0 || 4 || 1 || SetSampleMask | |||
|- | |||
| 0x200103D9 || 0x3D9 || 0 || 1 || 1 || SetTiledCacheTileSize | |||
|} | |||
Note: These still need to be heavily verified and ''could'' be wrong. | |||
=== BindObject === | |||
In order to bind an engine object to a specific subchannel, method 0 (BindObject) must be used first. The target subchannel is specified in bits 15-13 of the command word. | |||
After the engine object is bound to the desired subchannel, setting it's value in bits 15-13 of any subsequent command word will make PFIFO forward the command to the target engine. | |||
This method only takes one argument, an [[#Engine_IDs|engine ID]]. | |||
====Engine | ==== Engine IDs ==== | ||
{| class="wikitable" | {| class="wikitable" | ||
|- | |- | ||
! scope="col"| | ! scope="col"| ID | ||
! scope="col"| Engine | ! scope="col"| Engine | ||
|- | |- | ||
| | |0x902D | ||
|2D | |FERMI_TWOD_A (2D) | ||
|- | |- | ||
| | |0xB197 | ||
|3D | |MAXWELL_B (3D) | ||
|- | |- | ||
| | |0xB1C0 | ||
| | |MAXWELL_COMPUTE_B | ||
|- | |- | ||
| | |0xA140 | ||
| | |KEPLER_INLINE_TO_MEMORY_B | ||
|- | |- | ||
| | |0xB0B5 | ||
|DMA | |MAXWELL_DMA_COPY_A (DMA) | ||
|} | |} | ||
=== Fences === | |||
==Fences== | |||
Command | Command lists can contain fences to ensure that commands are executed on the correct order, and subsequent commands are only sent when the previously sent commands were already processed by the GPU. Fences uses the QUERY_* commands, and works like this: | ||
* First, QUERY_ADDRESS_HIGH and QUERY_ADDRESS_LOW commands are added to the Command List, with the High/Low 32 bits part of the 64-bits GPU Virtual Address where the fence is located. This GPU Virtual Address needs to be mapped to the process Virtual Address beforehand. | * First, QUERY_ADDRESS_HIGH and QUERY_ADDRESS_LOW commands are added to the Command List, with the High/Low 32 bits part of the 64-bits GPU Virtual Address where the fence is located. This GPU Virtual Address needs to be mapped to the process Virtual Address beforehand. | ||
Line 103: | Line 160: | ||
* Finally, QUERY_GET is added and contains the mode and other unknown data. | * Finally, QUERY_GET is added and contains the mode and other unknown data. | ||
The above commands are added using the | The above commands are added using the [[#Submission_mode|increasing mode]], since the Ids for all those 4 registers are sequential. | ||
====QUERY_GET Structure==== | ==== QUERY_GET Structure ==== | ||
{| class="wikitable" | {| class="wikitable" | ||
Line 122: | Line 179: | ||
|} | |} | ||
====QUERY_GET Mode==== | ==== QUERY_GET Mode ==== | ||
{| class="wikitable" | {| class="wikitable" | ||
Line 149: | Line 206: | ||
On the CPU side, the game code should wait until the value at the address pointed to by QUERY_ADDRESS is >= to the last written SEQUENCE value. Official code waits for this condition to be true on a loop, and won't send any further commands before that. | On the CPU side, the game code should wait until the value at the address pointed to by QUERY_ADDRESS is >= to the last written SEQUENCE value. Official code waits for this condition to be true on a loop, and won't send any further commands before that. | ||
==Vertex Data Submission== | == Vertex Data Submission == | ||
Note: This is a observation on how the game Puyo Puyo Tetris sends textured squares to the GPU. | Note: This is a observation on how the game Puyo Puyo Tetris sends textured squares to the GPU. | ||
Line 162: | Line 219: | ||
# VERTEX_END_GL is used with value 0 (currently unknown what this value means). | # VERTEX_END_GL is used with value 0 (currently unknown what this value means). | ||
== References == | |||
==References== | |||
FIFO engine overview: | |||
[https://envytools.readthedocs.io/en/latest/hw/fifo/intro.html] | |||
Method values from the Fermi family GPU (a bit older than the Tegra X1, but values seems to be mostly the same): | |||
[https://github.com/envytools/envytools/blob/master/rnndb/graph/gf100_3d.xml] | [https://github.com/envytools/envytools/blob/master/rnndb/graph/gf100_3d.xml] | ||
Line 216: | Line 230: | ||
[https://github.com/envytools/envytools/blob/master/rnndb/graph/nv_3ddefs.xml] | [https://github.com/envytools/envytools/blob/master/rnndb/graph/nv_3ddefs.xml] | ||
Command | Command word packing code used on Mesa3d: | ||
[https://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/drivers/nouveau/nvc0/nvc0_winsys.h] | [https://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/drivers/nouveau/nvc0/nvc0_winsys.h] |