Changes

Undo revision 12793 by Hexkyz (talk)
Line 1: Line 1: −
== Mapping Memory ==
+
= Classes =
 +
See [[GPU_Classes|GPU Classes]].
    +
= Mapping Memory =
 
First, to map a memory region on the GPU Address Space, caching needs to be disabled by using [[SVC#svcSetMemoryAttribute|svcSetMemoryAttribute]]. The Address passed is the Virtual Address of the region that will be mapped, the size is the region size, and State0/1 are both set to 8 to disable caching of the memory region. This is done to ensure that the GPU can actually "see" the data written there, and it doesn't get stuck on some cache.
 
First, to map a memory region on the GPU Address Space, caching needs to be disabled by using [[SVC#svcSetMemoryAttribute|svcSetMemoryAttribute]]. The Address passed is the Virtual Address of the region that will be mapped, the size is the region size, and State0/1 are both set to 8 to disable caching of the memory region. This is done to ensure that the GPU can actually "see" the data written there, and it doesn't get stuck on some cache.
   Line 7: Line 9:  
The above process is used to map all data that will be used by the GPU, like Textures, Command Lists (a.k.a. Push Buffers), Vertex/Index buffers and Shaders. They usually have their own mapping, but Command Lists can share the same mapping.
 
The above process is used to map all data that will be used by the GPU, like Textures, Command Lists (a.k.a. Push Buffers), Vertex/Index buffers and Shaders. They usually have their own mapping, but Command Lists can share the same mapping.
   −
== FIFO Commands ==
+
= FIFO Commands =
 
+
The GPU uses Nvidia's push buffer format for it's PFIFO engine. PFIFO is a special engine responsible for receiving user command lists and routing them to the appropriate engines (2D, 3D, DMA).
The GPU implements a variation of Tegra's push buffer format for it's PFIFO engine. PFIFO is a special engine responsible for receiving user command lists and routing them to the appropriate engines (2D, 3D, DMA).
      
Commands are submitted to the GPU's PFIFO engine through [[NV_services#NVGPU_IOCTL_CHANNEL_SUBMIT_GPFIFO|NVGPU_IOCTL_CHANNEL_SUBMIT_GPFIFO]].
 
Commands are submitted to the GPU's PFIFO engine through [[NV_services#NVGPU_IOCTL_CHANNEL_SUBMIT_GPFIFO|NVGPU_IOCTL_CHANNEL_SUBMIT_GPFIFO]].
Line 15: Line 16:  
This ioctl takes an array of gpfifo entries where each entry points to a FIFO command list. This list is composed of alternating 32-bit words containing FIFO commands and their respective arguments.
 
This ioctl takes an array of gpfifo entries where each entry points to a FIFO command list. This list is composed of alternating 32-bit words containing FIFO commands and their respective arguments.
   −
See the [[GPU|GPU]] page for a list of commands, with the register addresses and their descriptions.
+
== Command Structure ==
 
+
{| class="wikitable" border="1"
=== Command Structure ===
  −
 
  −
{| class="wikitable"
   
|-
 
|-
! scope="col"| Bits
+
! Bits || Description
! scope="col"| Description
   
|-
 
|-
|12-0
+
| 0-1 || Reserved
|Method
   
|-
 
|-
|15-13
+
| 2-12 || Method address
|Subchannel
   
|-
 
|-
|27-16
+
| 13-15 || Method subchannel
|Argument count (in 32-bits Words) or inline data (see below)
   
|-
 
|-
|28?
+
| 16-28 || Method count, immediate-data or [[#Tertiary opcode|tertiary opcode]]
|Hmm?
   
|-
 
|-
|31-29
+
| 29-31 || [[#Secondary opcode|Secondary opcode]]
|[[#Submission_mode|Submission mode]]
   
|}
 
|}
   −
Note: Methods are treated as 4-byte addressable locations, and hence their numbers are written down multiplied by 4.
+
Methods are treated as 4-byte addressable locations, and hence their numbers are written down multiplied by 4. The command's arguments, when present, follow the command word immediately.
   −
Note: The command's arguments, when present, follow the command word immediately.
+
=== Secondary opcode ===
 
+
{| class="wikitable" border="1"
==== Submission mode ====
+
|-
 
+
! Mode || Description
{| class="wikitable"
+
|-
 +
| 0 || [[#GRP0_USE_TERT|GRP0_USE_TERT]]
 
|-
 
|-
! scope="col"| Mode
+
| 1 || [[#INC_METHOD|INC_METHOD]]
! scope="col"| Description
  −
! scope="col"| Offical name
   
|-
 
|-
|0
+
| 2 || [[#GRP2_USE_TERT|GRP2_USE_TERT]]
|Increasing mode (old)
  −
|
   
|-
 
|-
|1
+
| 3 || [[#NON_INC_METHOD|NON_INC_METHOD]]
|Increasing mode - Tells PFIFO to read as much arguments as specified by '''argument count''', while automatically incrementing the '''method''' value. This means that each argument will be written to a different method location.
  −
|INCR
   
|-
 
|-
|2
+
| 4 || [[#IMMD_DATA_METHOD|IMMD_DATA_METHOD]]
|Non-increasing mode (old)
  −
|
   
|-
 
|-
|3
+
| 5 || [[#ONE_INC|ONE_INC]]
|Non-increasing mode - Tells PFIFO to read as much arguments as specified by '''argument count'''. However, all arguments will be written to the same method location.
  −
|NONINCR
   
|-
 
|-
|4
+
| 6 || Reserved
|Inline mode - Tells PFIFO to read '''inline data''' from bits 28-16 of the command word, thus eliminating the need to pass additional words for the arguments.
  −
|IMM
   
|-
 
|-
|5
+
| 7 || [[#END_PB_SEGMENT|END_PB_SEGMENT]]
|Increase-once mode - Tells PFIFO to read as much arguments as specified by '''argument count''' and automatically increments the '''method''' value once only.
  −
|
   
|}
 
|}
   −
=== SetObject ===
+
==== GRP0_USE_TERT ====
 +
Tells PFIFO to read [[#Tertiary opcode|tertiary opcode]] from bits 16-17 of the command word.
 +
 
 +
==== INC_METHOD ====
 +
Tells PFIFO to read as much arguments as specified by '''method count''', while automatically incrementing the '''method address''' value. This means that each argument will be written to a different method location.
 +
 
 +
==== GRP2_USE_TERT ====
 +
Tells PFIFO to read [[#Tertiary opcode|tertiary opcode]] from bits 16-17 of the command word.
   −
In order to bind an engine object to a specific subchannel, method 0 (SetObject) must be used first. The target subchannel is specified in bits 15-13 of the command word.
+
==== NON_INC_METHOD ====
 +
Tells PFIFO to read as much arguments as specified by '''method count'''. However, all arguments will be written to the same method location.
   −
After the engine object is bound to the desired subchannel, setting it's value in bits 15-13 of any subsequent command word will make PFIFO forward the command to the target engine.
+
==== IMMD_DATA_METHOD ====
 +
Tells PFIFO to read '''immediate-data''' from bits 16-28 of the command word, thus eliminating the need to pass additional words for the arguments.
   −
This method only takes one argument, an [[#Engine_IDs|Engine ID]].
+
==== ONE_INC ====
 +
Tells PFIFO to read as much arguments as specified by '''method count''' and automatically increments the '''method address''' value once only.
   −
==== Engine IDs ====
+
==== END_PB_SEGMENT ====
 +
Tells PFIFO to stop processing any further methods.
   −
{| class="wikitable"
+
=== Tertiary opcode ===
 +
{| class="wikitable" border="1"
 
|-
 
|-
! scope="col"| ID
+
! Mode || Description
! scope="col"| Engine
   
|-
 
|-
|0x902D
+
| 0 || GRP0_INC_METHOD or GRP2_NON_INC_METHOD
|FERMI_TWOD_A (2D)
   
|-
 
|-
|0xB197
+
| 1 || GRP0_SET_SUB_DEV_MASK
|MAXWELL_B (3D)
   
|-
 
|-
|0xB1C0
+
| 2 || GRP0_STORE_SUB_DEV_MASK
|MAXWELL_COMPUTE_B
   
|-
 
|-
|0xA140
+
| 3 || GRP0_USE_SUB_DEV_MASK
|KEPLER_INLINE_TO_MEMORY_B
  −
|-
  −
|0xB0B5
  −
|MAXWELL_DMA_COPY_A (DMA)
   
|}
 
|}
   −
=== Macro ===
+
== SetObject ==
 +
In order to bind an engine object to a specific subchannel, method 0 (SetObject) must be used first. The target subchannel is specified in bits 13-15 of the command word.
 +
 
 +
After the engine object is bound to the desired subchannel, setting it's value in bits 13-15 of any subsequent command word will make PFIFO forward the command to the target engine.
 +
 
 +
This method only takes one argument, a [[#GPU_Classes|GPU Class ID]].
    +
== Macro ==
 
Macros are small programs that can be uploaded to the gpu and are capable of reading and writing to the 3D engine registers on the GPU. The macros also accepts parameters, stored on a FIFO. Macros can be called using methods starting at 0xe00, where the first method triggers the macro execution, and the second one is used to push parameters to the FIFO, that can be read from the macro program using a instruction called ''parm''. This instruction pops the FIFO and reads the next parameter, while also allowing programs to use a variable number of parameters if desired.
 
Macros are small programs that can be uploaded to the gpu and are capable of reading and writing to the 3D engine registers on the GPU. The macros also accepts parameters, stored on a FIFO. Macros can be called using methods starting at 0xe00, where the first method triggers the macro execution, and the second one is used to push parameters to the FIFO, that can be read from the macro program using a instruction called ''parm''. This instruction pops the FIFO and reads the next parameter, while also allowing programs to use a variable number of parameters if desired.
   Line 116: Line 105:  
Official games uses those macros to conditionally write registers, one example of such uses is the macro at 0xe24, that is used to set shader registers (including shader address and binding the c1 Constant Buffer to the shader). In some cases, it's also used to set registers unconditionally.
 
Official games uses those macros to conditionally write registers, one example of such uses is the macro at 0xe24, that is used to set shader registers (including shader address and binding the c1 Constant Buffer to the shader). In some cases, it's also used to set registers unconditionally.
   −
=== Fences ===
+
== Fences ==
 
   
Command lists can contain fences to ensure that commands are executed on the correct order, and subsequent commands are only sent when the previously sent commands were already processed by the GPU. Fences uses the ReportSemaphore* registers, and works like this:
 
Command lists can contain fences to ensure that commands are executed on the correct order, and subsequent commands are only sent when the previously sent commands were already processed by the GPU. Fences uses the ReportSemaphore* registers, and works like this:
   Line 126: Line 114:  
The above commands are added using the [[#Submission_mode|increasing mode]], since all those 4 registers are sequential.
 
The above commands are added using the [[#Submission_mode|increasing mode]], since all those 4 registers are sequential.
   −
Official games sets Operation to 0 (Release), bit 4 to 1, bits 15-12 (Unit) to 0xF, and bit 28 to 1 (OneWord). The ReportSemaphorePayload value is then written by the GPU to the address pointed to by ReportSemaphoreOffset.
+
Official games sets Operation to 0 (Release), bit 4 to 1, bits 12-15 (Unit) to 0xF, and bit 28 to 1 (OneWord). The ReportSemaphorePayload value is then written by the GPU to the address pointed to by ReportSemaphoreOffset.
 
On the CPU side, the game code should wait until the value at the address pointed to by ReportSemaphoreOffset is >= to the last written value. Official code waits for this condition to be true on a loop, and won't send any further commands before that.
 
On the CPU side, the game code should wait until the value at the address pointed to by ReportSemaphoreOffset is >= to the last written value. Official code waits for this condition to be true on a loop, and won't send any further commands before that.
   −
== Vertex Data Submission ==
+
= Vertex Data Submission =
 
   
Note: This is a observation on how the game Puyo Puyo Tetris sends textured squares to the GPU.
 
Note: This is a observation on how the game Puyo Puyo Tetris sends textured squares to the GPU.
   Line 142: Line 129:  
# VERTEX_END_GL is used with value 0 (currently unknown what this value means).
 
# VERTEX_END_GL is used with value 0 (currently unknown what this value means).
   −
== Texture View ==
+
= Texture View =
 
   
Texture information such as address, format and size is sent to the GPU through a structure know as Texture View (a.k.a. Texture Image Control, or TIC). Each texture that the game uses needs a separate TIC, and those TICs are written to a table, one after the other. Each [[#TIC_Structure|TIC entry]] has 0x20 bytes, and is composed of 8 32-bits words where the texture information is packed.
 
Texture information such as address, format and size is sent to the GPU through a structure know as Texture View (a.k.a. Texture Image Control, or TIC). Each texture that the game uses needs a separate TIC, and those TICs are written to a table, one after the other. Each [[#TIC_Structure|TIC entry]] has 0x20 bytes, and is composed of 8 32-bits words where the texture information is packed.
   Line 160: Line 146:  
The texture is accessed on the shader using one of the texture sampling instructions (usually the TEXS instruction). One of the parameters for this instruction is the ''Handle'' index. This index start at 8, so the index 8 will access the handle at 8 * 4 = 0x20 on the ''Texture Constant Buffer''. Each shader stage has a separate Constant Buffer, so for fragment shaders, this is located at CB_ADDRESS + 4 * CB_SIZE + TEXS_index * 4 (where the first 4 is the index of the fragment shader stage, and the second 4 is the size of a word, 4 bytes).
 
The texture is accessed on the shader using one of the texture sampling instructions (usually the TEXS instruction). One of the parameters for this instruction is the ''Handle'' index. This index start at 8, so the index 8 will access the handle at 8 * 4 = 0x20 on the ''Texture Constant Buffer''. Each shader stage has a separate Constant Buffer, so for fragment shaders, this is located at CB_ADDRESS + 4 * CB_SIZE + TEXS_index * 4 (where the first 4 is the index of the fragment shader stage, and the second 4 is the size of a word, 4 bytes).
   −
=== TIC Structure ===
+
== TIC Structure ==
 
   
{| class="wikitable" border="1"
 
{| class="wikitable" border="1"
 
|-
 
|-
 
! Word || Bits || Description
 
! Word || Bits || Description
 
|-
 
|-
| 0 || 6-0 || [[GPU_Texture_Formats#Texture_Formats|Texture Format]]
+
| 0 || 0-6 || [[GPU_Texture_Formats#Texture_Formats|Texture Format]]
 
|-
 
|-
| 0 || 9-7 || [[#Channel_Data_Type|R Channel Data Type]]
+
| 0 || 7-9 || [[#Channel_Data_Type|R Channel Data Type]]
 
|-
 
|-
| 0 || 12-10 || [[#Channel_Data_Type|G Channel Data Type]]
+
| 0 || 10-12 || [[#Channel_Data_Type|G Channel Data Type]]
 
|-
 
|-
| 0 || 15-13 || [[#Channel_Data_Type|B Channel Data Type]]
+
| 0 || 13-15 || [[#Channel_Data_Type|B Channel Data Type]]
 
|-
 
|-
| 0 || 18-16 || [[#Channel_Data_Type|A Channel Data Type]]
+
| 0 || 16-18 || [[#Channel_Data_Type|A Channel Data Type]]
 
|-
 
|-
| 1 || 31-0 || Lower 32-bits of the Texture GPU Virtual Address
+
| 1 || 0-31 || Lower 32-bits of the Texture GPU Virtual Address
 
|-
 
|-
| 2 || 15-0 || Higher 16-bits of the Texture GPU Virtual Address
+
| 2 || 0-15 || Higher 16-bits of the Texture GPU Virtual Address
 
|-
 
|-
| 4 || 15-0 || Texture Width minus 1
+
| 4 || 0-15 || Texture Width minus 1
 
|-
 
|-
| 5 || 15-0 || Texture Height minus 1
+
| 5 || 0-15 || Texture Height minus 1
 
|}
 
|}
   −
==== Channel Data Type ====
+
=== Channel Data Type ===
 
   
{| class="wikitable" border="1"
 
{| class="wikitable" border="1"
 
|-
 
|-
Line 206: Line 190:  
|}
 
|}
   −
== References ==
+
= Shaders =
 +
See [[GPU_Shaders|GPU Shaders]].
    +
= References =
 
FIFO engine overview:
 
FIFO engine overview:
 
[https://envytools.readthedocs.io/en/latest/hw/fifo/intro.html]
 
[https://envytools.readthedocs.io/en/latest/hw/fifo/intro.html]