Line 1: |
Line 1: |
− | == Mapping Memory == | + | = Classes = |
| + | See [[GPU_Classes|GPU Classes]]. |
| | | |
| + | = Mapping Memory = |
| First, to map a memory region on the GPU Address Space, caching needs to be disabled by using [[SVC#svcSetMemoryAttribute|svcSetMemoryAttribute]]. The Address passed is the Virtual Address of the region that will be mapped, the size is the region size, and State0/1 are both set to 8 to disable caching of the memory region. This is done to ensure that the GPU can actually "see" the data written there, and it doesn't get stuck on some cache. | | First, to map a memory region on the GPU Address Space, caching needs to be disabled by using [[SVC#svcSetMemoryAttribute|svcSetMemoryAttribute]]. The Address passed is the Virtual Address of the region that will be mapped, the size is the region size, and State0/1 are both set to 8 to disable caching of the memory region. This is done to ensure that the GPU can actually "see" the data written there, and it doesn't get stuck on some cache. |
| | | |
Line 7: |
Line 9: |
| The above process is used to map all data that will be used by the GPU, like Textures, Command Lists (a.k.a. Push Buffers), Vertex/Index buffers and Shaders. They usually have their own mapping, but Command Lists can share the same mapping. | | The above process is used to map all data that will be used by the GPU, like Textures, Command Lists (a.k.a. Push Buffers), Vertex/Index buffers and Shaders. They usually have their own mapping, but Command Lists can share the same mapping. |
| | | |
− | == FIFO Commands ==
| + | = FIFO Commands = |
− | | + | The GPU uses Nvidia's push buffer format for it's PFIFO engine. PFIFO is a special engine responsible for receiving user command lists and routing them to the appropriate engines (2D, 3D, DMA). |
− | The GPU implements a variation of Tegra's push buffer format for it's PFIFO engine. PFIFO is a special engine responsible for receiving user command lists and routing them to the appropriate engines (2D, 3D, DMA). | |
| | | |
| Commands are submitted to the GPU's PFIFO engine through [[NV_services#NVGPU_IOCTL_CHANNEL_SUBMIT_GPFIFO|NVGPU_IOCTL_CHANNEL_SUBMIT_GPFIFO]]. | | Commands are submitted to the GPU's PFIFO engine through [[NV_services#NVGPU_IOCTL_CHANNEL_SUBMIT_GPFIFO|NVGPU_IOCTL_CHANNEL_SUBMIT_GPFIFO]]. |
Line 15: |
Line 16: |
| This ioctl takes an array of gpfifo entries where each entry points to a FIFO command list. This list is composed of alternating 32-bit words containing FIFO commands and their respective arguments. | | This ioctl takes an array of gpfifo entries where each entry points to a FIFO command list. This list is composed of alternating 32-bit words containing FIFO commands and their respective arguments. |
| | | |
− | See the [[GPU|GPU]] page for a list of commands, with the register addresses and their descriptions.
| + | == Command Structure == |
− | | |
− | === Command Structure ===
| |
− | | |
| {| class="wikitable" | | {| class="wikitable" |
| |- | | |- |
Line 44: |
Line 42: |
| Note: The command's arguments, when present, follow the command word immediately. | | Note: The command's arguments, when present, follow the command word immediately. |
| | | |
− | ==== Submission mode ====
| + | === Submission mode === |
− | | |
| {| class="wikitable" | | {| class="wikitable" |
| |- | | |- |
Line 77: |
Line 74: |
| |} | | |} |
| | | |
− | === SetObject ===
| + | == SetObject == |
− | | |
| In order to bind an engine object to a specific subchannel, method 0 (SetObject) must be used first. The target subchannel is specified in bits 15-13 of the command word. | | In order to bind an engine object to a specific subchannel, method 0 (SetObject) must be used first. The target subchannel is specified in bits 15-13 of the command word. |
| | | |
| After the engine object is bound to the desired subchannel, setting it's value in bits 15-13 of any subsequent command word will make PFIFO forward the command to the target engine. | | After the engine object is bound to the desired subchannel, setting it's value in bits 15-13 of any subsequent command word will make PFIFO forward the command to the target engine. |
| | | |
− | This method only takes one argument, an [[#Engine_IDs|Engine ID]]. | + | This method only takes one argument, a [[#GPU_Classes|GPU Class ID]]. |
− | | |
− | ==== Engine IDs ====
| |
− | | |
− | {| class="wikitable"
| |
− | |-
| |
− | ! scope="col"| ID
| |
− | ! scope="col"| Engine
| |
− | |-
| |
− | |0x902D
| |
− | |FERMI_TWOD_A (2D)
| |
− | |-
| |
− | |0xB197
| |
− | |MAXWELL_B (3D)
| |
− | |-
| |
− | |0xB1C0
| |
− | |MAXWELL_COMPUTE_B
| |
− | |-
| |
− | |0xA140
| |
− | |KEPLER_INLINE_TO_MEMORY_B
| |
− | |-
| |
− | |0xB0B5
| |
− | |MAXWELL_DMA_COPY_A (DMA)
| |
− | |}
| |
− | | |
− | === Macro ===
| |
| | | |
| + | == Macro == |
| Macros are small programs that can be uploaded to the gpu and are capable of reading and writing to the 3D engine registers on the GPU. The macros also accepts parameters, stored on a FIFO. Macros can be called using methods starting at 0xe00, where the first method triggers the macro execution, and the second one is used to push parameters to the FIFO, that can be read from the macro program using a instruction called ''parm''. This instruction pops the FIFO and reads the next parameter, while also allowing programs to use a variable number of parameters if desired. | | Macros are small programs that can be uploaded to the gpu and are capable of reading and writing to the 3D engine registers on the GPU. The macros also accepts parameters, stored on a FIFO. Macros can be called using methods starting at 0xe00, where the first method triggers the macro execution, and the second one is used to push parameters to the FIFO, that can be read from the macro program using a instruction called ''parm''. This instruction pops the FIFO and reads the next parameter, while also allowing programs to use a variable number of parameters if desired. |
| | | |
Line 116: |
Line 88: |
| Official games uses those macros to conditionally write registers, one example of such uses is the macro at 0xe24, that is used to set shader registers (including shader address and binding the c1 Constant Buffer to the shader). In some cases, it's also used to set registers unconditionally. | | Official games uses those macros to conditionally write registers, one example of such uses is the macro at 0xe24, that is used to set shader registers (including shader address and binding the c1 Constant Buffer to the shader). In some cases, it's also used to set registers unconditionally. |
| | | |
− | === Fences ===
| + | == Fences == |
− | | |
| Command lists can contain fences to ensure that commands are executed on the correct order, and subsequent commands are only sent when the previously sent commands were already processed by the GPU. Fences uses the ReportSemaphore* registers, and works like this: | | Command lists can contain fences to ensure that commands are executed on the correct order, and subsequent commands are only sent when the previously sent commands were already processed by the GPU. Fences uses the ReportSemaphore* registers, and works like this: |
| | | |
Line 129: |
Line 100: |
| On the CPU side, the game code should wait until the value at the address pointed to by ReportSemaphoreOffset is >= to the last written value. Official code waits for this condition to be true on a loop, and won't send any further commands before that. | | On the CPU side, the game code should wait until the value at the address pointed to by ReportSemaphoreOffset is >= to the last written value. Official code waits for this condition to be true on a loop, and won't send any further commands before that. |
| | | |
− | == Vertex Data Submission ==
| + | = Vertex Data Submission = |
− | | |
| Note: This is a observation on how the game Puyo Puyo Tetris sends textured squares to the GPU. | | Note: This is a observation on how the game Puyo Puyo Tetris sends textured squares to the GPU. |
| | | |
Line 142: |
Line 112: |
| # VERTEX_END_GL is used with value 0 (currently unknown what this value means). | | # VERTEX_END_GL is used with value 0 (currently unknown what this value means). |
| | | |
− | == Texture View ==
| + | = Texture View = |
− | | |
| Texture information such as address, format and size is sent to the GPU through a structure know as Texture View (a.k.a. Texture Image Control, or TIC). Each texture that the game uses needs a separate TIC, and those TICs are written to a table, one after the other. Each [[#TIC_Structure|TIC entry]] has 0x20 bytes, and is composed of 8 32-bits words where the texture information is packed. | | Texture information such as address, format and size is sent to the GPU through a structure know as Texture View (a.k.a. Texture Image Control, or TIC). Each texture that the game uses needs a separate TIC, and those TICs are written to a table, one after the other. Each [[#TIC_Structure|TIC entry]] has 0x20 bytes, and is composed of 8 32-bits words where the texture information is packed. |
| | | |
Line 160: |
Line 129: |
| The texture is accessed on the shader using one of the texture sampling instructions (usually the TEXS instruction). One of the parameters for this instruction is the ''Handle'' index. This index start at 8, so the index 8 will access the handle at 8 * 4 = 0x20 on the ''Texture Constant Buffer''. Each shader stage has a separate Constant Buffer, so for fragment shaders, this is located at CB_ADDRESS + 4 * CB_SIZE + TEXS_index * 4 (where the first 4 is the index of the fragment shader stage, and the second 4 is the size of a word, 4 bytes). | | The texture is accessed on the shader using one of the texture sampling instructions (usually the TEXS instruction). One of the parameters for this instruction is the ''Handle'' index. This index start at 8, so the index 8 will access the handle at 8 * 4 = 0x20 on the ''Texture Constant Buffer''. Each shader stage has a separate Constant Buffer, so for fragment shaders, this is located at CB_ADDRESS + 4 * CB_SIZE + TEXS_index * 4 (where the first 4 is the index of the fragment shader stage, and the second 4 is the size of a word, 4 bytes). |
| | | |
− | === TIC Structure ===
| + | == TIC Structure == |
− | | |
| {| class="wikitable" border="1" | | {| class="wikitable" border="1" |
| |- | | |- |
Line 185: |
Line 153: |
| |} | | |} |
| | | |
− | ==== Channel Data Type ====
| + | === Channel Data Type === |
− | | |
| {| class="wikitable" border="1" | | {| class="wikitable" border="1" |
| |- | | |- |
Line 206: |
Line 173: |
| |} | | |} |
| | | |
− | == References ==
| + | = References = |
− | | |
| FIFO engine overview: | | FIFO engine overview: |
| [https://envytools.readthedocs.io/en/latest/hw/fifo/intro.html] | | [https://envytools.readthedocs.io/en/latest/hw/fifo/intro.html] |