このページはhttp://www.vgleaks.com/durango-memory-system-overview/からの引用です
作業中・・・
We have read multiples replies and discussions aroundDurango’s memory system throughout the internet, due to we would like to share this information with all of you. In this article we expose the different types of memories that Durango has and how this memories work together with the rest of the system.
The central elements of the Durango memory system are thenorth bridgeand theGPU memory system. The memory system supports multiple clients (for example, the CPU and the GPU), coherent and non-coherent memory access, and two types of memory (DRAM and ESRAM).
The following diagram shows you the Durango memory clients with the maximum available bandwidth in every path.
As you can see on the right side of the diagram, the Durango console has:8 GB of DRAM.32 MB of ESRAM.
As you can see on the right side of the diagram, the Durango console has:
The maximum combined read and write bandwidth to DRAM is 68 GB/s (gigabytes per second). In other words, the sum of read and write bandwidth to DRAM cannot exceed 68 GB/s. You can realistically expect that about 80 – 85% of that bandwidth will be achievable (54.4 GB/s – 57.8 GB/s).
DRAM bandwidth is shared between the following components:CPUGPUDisplay scan outMoveenginesAudio system
DRAM bandwidth is shared between the following components:
The maximum combined ESRAM read and write bandwidth is 102 GB/s. Having high bandwidth and lower latency makes ESRAM a really valuable memory resource for the GPU.
ESRAM bandwidth is shared between the following components:GPUMove engines
ESRAM bandwidth is shared between the following components:
There are two types of coherency in the Durango memory system:Fully hardware coherentI/O coherent
There are two types of coherency in the Durango memory system:
The two CPU modules arefully coherent. The termfully coherentmeans that the CPUs do not need to explicitly flush in order for the latest copy of modified data to be available (except when usingWrite Combinedaccess).
The rest of the Durango infrastructure (the GPU and I/O devices such as, Audio and the Kinect Sensor) isI/O coherent. The termI/O coherentmeans that those clients can access data in the CPU caches, but that their own caches cannot be probed.
When the CPU produces data, other system clients can choose to consume that data without any extra synchronization work from the CPU.
The total coherent bandwidth through the north bridge is limited to about 30 GB/s.
The CPU requests do not probe any other non-CPU clients, even if the clients have caches. (For example, the GPU has its own cache hierarchy, but the GPU is not probed by the CPU requests.) Therefore, I/O coherent clients must explicitly flush modified data for any latest-modified copy to become visible to the CPUs and to the other I/O coherent clients.
The GPU can perform both coherent and non-coherent memory access. Coherent read-bandwidth of the GPU is limited to 30 GB/s when there is a cache miss, and it’s limited to 10 – 15 GB/s when there is a hit. A GPU memory page attribute determines the coherency of memory access.
The Durango console has two CPU modules, and each module has its own 2 MB L2 cache. Each module has four cores, and each of the four cores in each module also has its own 32 KB L1 cache.
When a local L2 miss occurs, the Durango console probes the adjacent L2 cache via the north bridge. Since there is no fast path between the two L2 caches, to avoid cache thrashing, it’s important that you maximize the sharing of data between cores in a module, and that you minimize the sharing between the two CPU modules.
Typical latencies for local and remote cache hits are shown in this table.Remote L2 hitapproximately 100 cyclesRemote L1 hitapproximately 120 cyclesLocal L1 Hit3 cycles for 64-bit values5 cycles for 128-bit valuesLocal L2 Hitapproximately 30 cycles
Typical latencies for local and remote cache hits are shown in this table.
Each of the two CPU modules connects to the north bridge by a bus that can carry up to 20.8 GB/s in each direction.
From a program standpoint, normal x86 ordering applies to both reads and writes. Stores are strongly ordered (becoming visible in program order with no explicit memory barriers), and reads are out of order.
Keep in mind that if the CPU usesWrite Combinedmemory writes, then a memory synchronization instruction (SFENCE) must follow to ensure that the writes are visible to the other client devices.
The GPU can read at 170 GB/s and write at 102 GB/s through multiple combinations of its clients. Examples of GPU clients are theColor/Depth Blocksand theGPU L2 cache.
The GPU has a direct non-coherent connection to the DRAM memory controller and to ESRAM. The GPU also has a coherent read/write path to the CPU’s L2 caches and to DRAM.
For each read and write request from the GPU, the request uses one path depending on whether the accessed resource is located in “coherent” or “non-coherent” memory.
Some GPU functions share a lower-bandwidth (25.6 GB/s), bidirectional read/write path. Those GPU functions include:Command buffer and vertex index fetchMove enginesVideo encoding/decoding enginesFront buffer scan out
Some GPU functions share a lower-bandwidth (25.6 GB/s), bidirectional read/write path. Those GPU functions include:
As the GPU is I/O coherent, data in the GPU caches must be flushed before that data is visible to other components of the system.
The available bandwidth and requirements of other memory clients limit the total read and write bandwidth of the GPU.
This table shows an example of the maximum memory-bandwidths that the GPU can attain with different types of memory transfers.Source memoryDestination memoryMaximum read bandwidth (GB/s)Maximum write bandwidth (GB/s)Maximum total bandwidth (GB/s)ESRAMESRAM51.251.2102.4ESRAMDRAM68.2*68.2136.4DRAMESRAM68.268.2*136.4DRAMDRAM34.134.168.2
This table shows an example of the maximum memory-bandwidths that the GPU can attain with different types of memory transfers.
Although ESRAM has 102.4 GB/s of bandwidth available, in a transfer case, the DRAM bandwidth limits the speed of the transfer.
ESRAM-to-DRAM and DRAM-to-ESRAM scenarios are symmetrical.
The Durango console has 25.6 GB/s of read and 25.6 GB/s of write bandwidth shared between:Four move enginesDisplay scan out and write-backVideo encoding and decoding
The Durango console has 25.6 GB/s of read and 25.6 GB/s of write bandwidth shared between:
Thedisplay scan outconsumes a maximum of 3.9 GB/s of read bandwidth (multiply 3 display planes × 4 bytes per pixel × HDMI limit of 300 megapixels per second), anddisplay write-backconsumes a maximum of 1.1 GB/s of write bandwidth (multiply 30 bits per pixel × 300 megapixels per second).
You may wonder what happens when the GPU is busy copying data and a move engine is told to copy data from one type of memory to another. In this situation, the memory system of the GPU shares bandwidth fairly between source and destination clients. The maximum bandwidth can be calculated by using the peak-bandwidth diagram at the start of this article.
If you want to see how all of this works, justread the examplewe’ve written for all of you.
このサイトはreCAPTCHAによって保護されており、Googleの プライバシーポリシー と 利用規約 が適用されます。
1文字以上入力してください
本文は少なくとも1文字以上必要です。
1文字以上入力してください。
下から選んでください: