There are a number of issues being discussed. I will attempt to clarify in no particular order:

1. LockRect() will most likely cause a Flush of the DP2 command stream to the driver. Depending on what is contained in that DP2 command stream, the driver will take some time to parse/render before returning the Flush. This is probably where the 4ms is from. Since the LockRect is on a
POOL_SYSTEMMEM surface, there is no swizzling or deswizzling at LockRect time, so very few cycles are actually spent inside D3D.

2. Let me repeat: please use UpdateTexture as far as possible. CopyRects is a slow legacy method that exists just to enable certain things such as copying from vidmem to sysmem (which is slow anyways). It is not meant to be used in performance sensitive cases.

3. The reason UpdateTexture forces use of system mem source and default dest is because this is the only guaranteed high performance code path possible with the majority of current hardware architectures that rely on swizzling. The "AGP source fast DMA" that Tom alludes to does not work that way on most drivers. So even if you "think" it works using DX7, you would be horrified to know what the driver is really doing inside.

4. You can use sub-rect updates with UpdateTextures. LockRect automatically marks regions dirty and UpdateTexture updates only the dirty regions.

5. John Howson mentions *incorrectly* that CopyRects is the same as UpdateTexture to drivers. CopyRects uses the slow (and legacy) Blt DDI while UpdateTextures uses a DP2 TEXBLT token. That said, John is probably referring to the internal implementation of a texture blt in the driver...
this code is probably shared between legacy Blt and DP2 TEXBLT, but to *get to* that shared code are two different code paths with massively different overhead (and other issues) in each of these paths.

6. There are absolutely no hidden copies or strangeness with UpdateTexture as far as *D3D* is concerned. All UpdateTexture does is put a DP2 token in the command stream. Period. This is a few hundred cycles at the most. It is then upto the driver to come up with the best way to upload the texture. Some drivers do a swizzle using the host CPU while copying the output to a staging area for later DMA, some drivers inline the sysmem contents into the hardware command buffer and let the hardware manage the swizzle, and some drivers do entirely different things. But, yes, all these things do involve at least one hidden "copy". This is where dynamic textures come in.

7. DX 8.1 dynamic textures are an attempt to fix the unbounded texture Blt/Lock possibilities that the DX7 API offered which caused serious performance issues due to the lack of hardware generality to support such unbounded usage. Dynamic textures in DX 8.1 work very much like dynamic VBs and are used in the same way and are subject to the same restricted usage. DX 8.1 class drivers can attempt to support dynamic textures using an "alias" AGP unswizzled surface and doing a hw assisted swizzle from this alias AGP to local video. Or even some other way... basically offering this as a new restricted feature with strict rules and WHQL tests gives IHVs a
chance to rethink/reimplement upload code for performance sensitive ISV usage.

8. You can directly LockRect POOL_DEFAULT textures if they are created using private IHV swizzled formats even in DX 8.0! Then, there is no swizzling or deswizzling or intermmediate copies involved. However, this is not a good alternative to dynamic textures since LockRect in this case will **stall** the GPU.

9. Please talk to your friendly IHVs if you want to see 8.1 dynamic texture support. We know that a more efficient implementation than what UpdateTexture allows is possible even with current hardware... it is a question of IHVs putting in the development resources for this.

결론은, 프레임마다 텍스춰가 바뀌게 될 때에는 UpdateTexture 를 써서 텍스춰를 갱신할 것이며, UpdateTexture는 폴리곤을 그리기 이전에 그려야 stall 이 최소화된다는 얘기임. 배경에 라이트맵이 많이 들어가거나, 라이트맵이 변하거나, 2D 를 많이 찍거나 하는 경우에 참고하시길

imcgames 의 김학규입니다