(PSO)How to use shader pipeline cache effectively.

jp.lee / Technology-center . tech art department leader / lizhengbiao@xd.com

Let’s see how to use the shader pipeline cache.

I will talk about the above table of contents.
First, we’ll talk about pipeline state objects, how to use them in Unreal Engine, and also add a cache using PSO to see how to use them.

Let’s look at the PSO cache first.

Let’s move on because we know everything.

It’s a graphics pipeline … you know all this, so let’s move on.

Compute shaders are now universally used in the latest mobile games.

Pipeline?
Graphics hardware support

  • Optimized hardware unit allocation for each stage stage.
  • It can be nested because it is divided into stages: maximum efficiency.

When you run the pipeline, what if you want to do a slightly different action?

  • Example) If you have the ABCDE pipeline.
    • 1:AB–E Only / 2:A–DE only / 3:AB’C-E

“What is hardware support in the graphics pipeline?”
Each step is assigned to an optimized hardware unit so that it can be optimized for each hardware.
On the right is how the CPU handles instructions.
If you look at the process of processing, it is 8 cycles because of 2 commands because it is serial processing.
It can be said that doing with a pipeline made it possible to solve the overlapping instruction so that it could be processed in 5 cycles.

The pipeline consists of several stages.

  • Each stage has different actions based on the preset state information.

State.

  • Ex)Blend State, DepthStencil State,Rasterizer State,Sampler State,….
  • State setting required when using a pipeline.
    • State change itself is overloaded.
    • What if the state is not changed? Use the previous state as it is and use it for performance optimization.

State information is called State.
State setting is required for each pipeline use.
When We run this pipeline, you have to make an appointment in advance which stage and how it should work.
The state itself is heavy to change.
It is best not to change the state.
Therefore, there is a way to optimize by listing similar states.
Unreal Engine itself is structured in that context.

In the past, all of these states were handled individually.
In the case of DX9, for example, the state for alpha blending was used one by one.

In the past, all states one by one.

  • Ex) D3D9 Render State: Alpha Blending State, Texture Stage State

Improvement: so that we can do some related settings at once.

  • Ex) ID3D11BlendState:alpha-to-coverage,independent blending,render targets.
  • The goal is to reduce the overload of station changes by setting other related settings as well.
  • Can be created and set at render time.

Latest hardware.

  • Dependencies between hardware units exist.
  • When setting Ex.hardware blend, Taster State also affects Blend State.

After that, a slight improvement was made by bundling related states and processing them at once.
For example, in the case of DX11, Blend State is the alpha-to-coverage value, and information that determines how to render each MRT when the render target supports MRT, and whether to use these blending MRTs as one information or individually. Until it is processed.

Let’s set the state at a pipeline level.
Pipeline State

  • Hardware configuration of how the input data will be interpreted and drawn.
  • Shaders and render states (Blend, Depth Stencil,Rasterizer,…) and others.
  • Pipeline State Objects Manage pipeline state through PSO.


The concept of letting the pipeline work at once is the pipeline state.
It refers to the configuration for the entire hardware.
It is controlled through an object called PSO.
An object that contains pipeline state information.

Pipeline state object.
An object containing pipeline state information.Pipeline State Object ==PSO

  • Supported Graphics API: D3D12 / Vulkan / Metal
  • Used for pipeline state management.
  • Judging and validating the state in advance.
  • Allows pipeline states to be replaced more quickly at render time.

Pipeline State Objects Sets most pipeline states through PSO.

  • Set to PSO.
    • All shader bytecodes, Blend State, Rasterizer, DepthStencil State, Multi-Sampling information, and more.
  • Set by command list.
    • Resource binding, viewport information, Blend Factor, Scissor rects, DepthStencil Reference Value, etc.


The purpose itself is intended to manage pipeline states.
It is to determine whether the pipeline works without problems with the pipeline state in advance.
In actual use, it is the level to believe and use.
We can change the entire state much faster.

Things that don’t change well on a pipeline basis.
Viewport precision, scissors testing, etc… are supposed to be handled at the command list level.

Low level cache.

Low level cache
Since the PSO itself was already created with the assumption of recycling, it has already been arranged in the graphics API stage.
D3D12 / Vulkan / Metal

  • Cache support for runtime generated PSO.


D3D12 / Vulkan

  • Create a load-time PSO by file out the PSO to disk.


OpenGL

  • ProgramBinary 지원 디바이스(OpenGL ES 3.0 이상)Create a load-time PSO by file out the PSO to disk.


However, OpenGL is not actually an API that supports PSO. Instead, it works like a PSO on hardware that supports a feature called ProgramBinary, and reads it later.

RHI
A thin layer on the platform-specific graphics API. Platform-independent code that handles all operations.
PSO generated at the low level is stored as a render resource.
Utilizing this archived information, the Map container containing the PSO is used to search the cache.

Low level Cache – D3D12
Simultaneous use of runtime cache and cache loaded from file.

  • Runtime cache = Search and download “GraphicsPopelineStateInitializer” from RHI.
  • Loaded cache = Search using low level description information.
    • Low level description: Platform-dependent pipeline state descriptor.
    • Ex> ShaderByteCodeHash,D3D12_SHADER_BYTECODE,D3D12_BLEND_DESC,D3D12_RATERIZER_DESC,…
  • Search from faster runtime cache.
    • Only platform-independent pipeline information (Graphics Popeline State Initializer) is received from RHI.
    • It is a platform-dependent form (Low Level Description) and attempts to search immediately without translation.


Why is this way? GraphicsPopelineStateInitializer is a platform-dependent Class.
In this process, it is fast that it does not undergo conversion.

Low level cache – Vulkan
Same as D3D12 + Supports Pipeline LRU Cache.

LRU Cache?

  • LRU = Least Recently Used
  • Recently unused data from the cache, freeing up cache space for new data.

Pipeline LRU Cache

  • LRU support for low level cache.
  • Very useful for Android Vulkan platforms with insufficient shader memory.

Related settings.

  • #define VULKAN_ENABLE_LRU_CACHE 1
  • r.Vulkan.EnablePipelineLRUCache = 1
  • r.Vulkan.PipelineLRUSize = 10 * 1024 * 1024 / r.Vulkan.PipelineLRUCacheEvictBinary = 1


By assigning LRU to the PSO object, the memory space can be flexibly secured even if a heatcing phenomenon occurs. This is mainly because of the Android platform.
When developing the Android version of Fortnite Mobile, it was applied to solve the Android platform memory problem.

Low Level Cache – OpenGL.
Does not support PSO.

  • Not a bulk change through the pipeline state…
  • Shader State + Render State updated respectively.
    • Low-level cache support for bound shader states (BoundShaderState, BSS).


Helps to make batch changes only for shader states, not batch changes. BSS Cache

Low Level Cache – OpenGL.
Program Binary Cache

  • OpenGL compiles and shades individual shaders and creates them as Program Objects.
  • Ability to write program files so that program objects are not recompiled so that they can be loaded and reused later.
  • Separate Shader Object support.

LRU algorithm support.

  • Very useful for OpenGL ES platforms that lack shader memory.
  • For Mali GPU, the maximum shader memory heap size allowed by the driver is small.

Related settings.

  • r.ProgramBinaryCashe.Enable=1
  • r.OpenGL.EnableProgramLRUCache=1
  • r.OpenGL.ProgramLRUCount=700/r.OpenGL.ProgramLRUBinarySize=35*1024*1024

Shader Pipe-line cache.

Shader Pipe-line cache.

  • An object that utilizes RHI level API to help low level digging.
  • Focus on when to generate/create PSO.
  • It is called PSO cache.
  • Replace Shader Cache that existed in the past.
  • Purpose: To make it possible for users who run the app for the first time to play without a runtime hitch.
    • In the build for distribution, the necessary data for PSO creation is set in advance.
    • Compile the PSO when the user does not notice through the data when executing the build for distribution.

PSO cache action flow.

  • Create a PSO at runtime with a test build.
    • Convert the generated PSO to binary PSO and save the file.
  • To accumulate multiple play results, merge binary PSO to pipeline metadata.
  • Convert pipeline metadata back to binary PSO when cooking a deployment build.
  • Create binary PSOs at initialization time using binary PSOs in your deployment builds and register them in a low level cache.
  • Using PSO in a deployment build.

Use PSO cache.

  • r.ShaderPipelineCache.Enable =1/Command line”-psocache”
  • ShareMaterialShaderCode(Shader code library) enabled.

Shader Pipeline Cache-Action.
Test build

  • Cooking test builds.
  • PSO generated during play is saved as binary PSO.
  • Merge binary PSO into pipeline metadata.

Deployment build.

  • Convert pipeline metadata to binary PSO when cooking.
  • Create PSO at initialization time, register in low level cache.
  • Using PSO at runtime.

Shader Pipeline Cache-Test build.
Cooking test Bildu.

  • Stable shader information of all materials in the content is generated. = Save to scl.csv.
    • Storage information: ClassNameAndObjectPath,ShaderType,ShaderClass,MaterialDomain,FeatureLevel,QualityLevel,TargetFrequency,TargetPlatform,VFType,Permulation,OutputHash.
    • Why you need Output Hash:Share Material Shader Code.
  • This information is later used by the “pipeline metadata” generator.
  • Let’s talk about what information is stored on the back page.

Shader Pipeline Cache-Test build.

Shader Pipeline Cache-Test build.
PSO generated during play is saved as binary PSO.

  • You are not directly distributing files in the low level cache.
  • Utilizing platform independent information (GraphicsPipelineInitializer) = ushaderpipeline creation.
    • Shader with binding BoundShaderState:VertexDeclarationRHI,VertexShaderRHI,PixelShaderRHI,GeometryShaderRHI,DomainShaderRHI,HullShaderRHI.
    • Render states:BlendState,RasterizerState,DepthStencilState,ImmutableSamperState.
    • DepthStencilRelated degree:DepthStencilTargetFormat,DepthStencilTargetFlag,DepthTargetLoadAction,DepthTargetStoreAction,StencilTargetLoadAction,StencilTargetStoreAction,DepthStencilAcess.
    • etc:bDepthBounds,PrimitiveType,RenderTargetsEnabled,RenderTargetFormats,RenderTargetFlags.
    • Multi-sampling information:NumSamples.
  • Binary PSO storage.
    • r.ShaderPipelineCache.LogPSO = 1/Commandline”-logPSO”

Shader Pipeline Cache-Test build.
Merge binary PSO into pipeline metadata

  • It is important to run/render all content as much as possible in a test build so that there is no pipeline information being excluded.
  • Binary PSO and Stable Shader information are combined to generate pipeline metadata = stablepc.csv storage.

Shader Pipeline Cache-Action.
Test build

  • Test Build Cooking
  • PSO generated during play is saved as binary PSO.
  • Merging binary PSOs into pipeline data.

Distribution Build

  • Convert pipeline metadata to binary PSO when cooking.
  • PSO creation at the time of initialization, registration in low level cache.
  • Using PSO at runtime.

Shader Pipeline Cache-Deployment build.

Shader Pipeline Cache-Deployment build.
PSO creation at initialization time, registered in low level cache.

  • The PSO is generated in advance at an engine initial time or at an arbitrary time, so that the PSO is recycled.
  • Creating PSO to be actually used through the precompile process of the PSO cache.
    • Create GraphicsPipelineInitializer for every binary PSO.


-> Call SetGraphicsPopelineState(…)
->PipelineStateCache::GetAndOrCreateGraphicsPipelineState(…)
->GraphicsPipelineCache.Find(…) search failed
->RHICreateGraphicsPopelineState(…)

Shader Pipeline Cache-Deployment build.

Shader Pipeline Cache-Deployment build.
Using PSO at runtime.

  • Create GraphicsPipelineInitializer for each draw call


->SetGraphicsPipelineState(…) Call
->PipelineStateCache::GetAndOrCreateGraphicsPipelineState(…)through.
->GGraphicsPipelineCache.Find(…)Search success.
->RHISetGraphicsPipelineState(…)

Shader Pipeline Cache-Derived data.
[Test build] =-logPSO

  • [Cooking output]:scl.csv / ushaderbytecode
  • [Execution Output]:rec.upopelinecache
  • [Merge output]:stablepc.csv


[Deploy build]

  • [Cooking input]:stablepc.csv
  • [Cooking Output]:stable.upipelinecache / ushaderbytecode

PSO Cache Usage Guide.

Precautions.
#The suggestions may not fit all projects.
Choose the way that fits your project with the concepts outlined.

  • Assumptions often used.
  • android OpenGL ES3.1
  • Content distribution method:Minimal APK + DLC w/HttpChunkInstallData
    • Minimal APK:Android ETC
    • DLC w/ HttpChunkInstallData:Android ASTC
  • Share Material Shader Code = True

Binary PSO storage.
r.ShaderPipelineCache.Save
Ex. -logPSO autosave does not work as desired.

  • PSO logging requirements.
  • r.ShaderPipelineCache.Enabled=1/r.ShaderPipelineCache.LogPSO=1/r.ShaderPipelineCache.SaveBoundPSOLog=1
  • Depending on the project, it can be set via device profile or command line or console command.
  • r.ShaderPipelineCache.Save Direct execution:{ProjectDir}\Saved\CollectedPSOs

Control when PSO is produced.
Engine default settings: Engine Preinit
Slow the PSO cache precompile process.

  • Read binary PSO and proceed to compile with Batch.
  • The PSO cache behaves differently for each tick.
  • Set Pause status with Pause Batching() / Resume Batching().

PSO generation rate control.
r.ShaderPipelineCache.SetBatchMode[Pause|Fast|Background]

  • Batch mode
    • When precompile, process the batch with a renderthread time slice.
    • Set batch amount and maximum allocation time information to be processed in one frame.
    • Engine default: Fast mode = 50 PSOs+16ms / Background mode = 1 PSO + no time limit.
  • Try compiling during the loading screen? Fast Mode!
  • Try compiling during gameplay?Background Mode!

DLC + Shader Code Library
Trouble shooting

  • [Empty DLC plugin only] Cooking failure
  • Crash when activating project launcher/cooking/build DLC.
  • SaveShaderCodeLibrary(…)in CookOnTheFlyServer.cpp line6163

DLC + Shader Code Library
Runtime crash

  • Crash when Shader Code Library is not ready when trying to access DLC content after pak mount.
  • Cause: When the engine is initialized, FShaderCodeLibrary::InitForRuntime(…) does an open operation for the plugin, but the DLC plugin that appears after downloading the content and mounting the Pak will be excluded from this operation.
  • The simplest solution is to open the plugin Shader Code Library directly after mounting Pak.

DLC + Shader Code Library
Reopen PSO cache?

  • Basic engine operation.
    • At engine Perinit, open Shader Code Library with project name (Global, Game) or plugin name (excluding DLC).
    • When the engine is preinit, the Shader Pipeline Cache is also opened as the project name.
      • Create/load Program Binary Cashe using the same GUID as Shader Pipeline Cache.
  • In general, proceed as follows.
    • [Engine initialization] Run engine with AK->Open ShaderCodeLibray in AKP->Open PSO cache->PSO cache Precompile.
    • [Level for patching] Pak mount->Open ShaderCodeLibrary in DLC to remove crash->Reopen PSO cache?

DLC + Shader Code Library
PSO cache deployment strategy.

  • You should know that from 4.22…
    • Shader Code Library supports DLC (plug-in).
    • Shader Pipeline Cache does not support DLC (plug-in).
    • It is important to make sure stable.upipelinecache contains all content including DLC.
  • Prepare in advance even if it takes a long time with as much information as possible.
    • Damage to user experience = time that occurs when the game is opened <hitch that occurs during gameplay.
    • Not very different from the method used by Fortnite.

Fortnite case.
Not very different from the method used by Fortnite.

  • IPA based on iOS = 166.3 MB / DLC download 4.11 GB
  • Create PSO cache for DLC directly at the patch level after downloading and installing DLC.
  • Restart the game after all caches are created or load play levels.

End of Contents.

Categories: tutorials

Tagged as:

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s