(WIP) Stylized Toon ver.0.7

Written by JP.Lee
心动的 Technical Art team leader.
Former Allegorithmic Adobe Lead technical Artist.
Former Netease Technical art director at HZ.
Former Samsung Electronics R&D Center Technical Artist and Researcher.

Most of my time since I left the company has been devoted to Stylized Shading research.

This is because most of the company’s projects are either cartoon rendering or unusual Stylized.

Since I’ve worked on live-action style for many years, it seems that it took only 2 months to adapt to this style.

Because the outline processing has not yet been completed, the Smooth-Group for Outline cannot be packed.

自从我离开公司以来,我大部分时间都致力于风格化着色研究。

这是因为该公司的大多数项目都是卡通渲染或不寻常的风格化。

由于我从事真人表演风格已有多年,因此似乎只用了2个月就适应了这种风格。

由于大纲处理尚未完成,因此无法打包SmoothGroup for Outline。

It was called Light space Toon.

它被称为光空间卡通。

The vertex color should be used as well, but the company is busy, so…

顶点颜色也应该使用,但是公司很忙,所以…

技术美术师的定义

JP.Lee

心动 Tech art team leader as expert.

leegoonz@163.com

技术美术师的定义
有技术知识的美术师。
我把这个解释为一半美术师,一半程序员。
另外,可以定义为能够将逻辑部分和非逻辑部分很好地混合运用的人。

可以在技术领域内执行的小角色划分。

LIGHTING STAGE SUPPORT.

最近的游戏照明技术已经发展到可以像电影一样表现视觉上的自然感。

但从根本上Lighting Stage是Production Design 的一部分,它具有引导游戏节奏或协助某些场景叙事表达的作用,并全面确立美术指导者追求的新风格时最重要的工具。

因此,技术美术师必须充分了解最新渲染引擎中包含的照明技术细节,并与美术指导者或照明美术师进行沟通,最终起到引领最佳照明效果的重要作用。

首先,需要大量的渲染理论知识,且要求可以轻松改写与照明相关的代码(Shader and post-processing等领域) 的级别。

此外,视频表现技术的艺术理论也要扎实。

从试图结合古典电影的制作设计手法尝试(Bioshock 系列可以作为一个很好的例子),到现代电影制作设计手法,都需要进行理论研究。

所有这些电影技法都应用于动画或最近的游戏开发技术领域里。

与动画设计一样,玩家在游戏中可以获得的各种体验,也可以从灯光营造的氛围中获得。

同样,对于FPS或TPS游戏,照明设计本身也成为关卡设计(Level design)的一部分。

ART CONTENTS CREATION SUPPORT.

制作并应用新技术的Asset,提高内容的质量,或帮助程序员制作体现所需的测试Asset。

多数工作室正在开发的游戏的美术风格各不相同。

特别是追求独创性渲染风格的游戏,在艺术Asset的制作方式上就存在差异。

制作该部分的脚本(Script),或在新开发的着色器(shader)上输出适合的贴图(Texture)的方法也需要定义, 而且这些都必须制作成文档+视频的形式,并提供一个示例以便于理解。

特别是对于需要开发近两年的项目,美术师会经常被更换,这时需要给新加入的美术师设立一个容易理解且能够快速适应的环境。

ANIMATION TECHNICAL RIGGING AND MOVEMENTS STRUCTURE DESIGN.

这是需要与设计游戏玩法的企划人多进行沟通的部分。

新的游戏玩法体验有很多种,其中Motion设计部分也占了很大比重。

为了让我们以后遇到的Motion设计师(Animator)能够更快,更容易地制作出多样的Motion,用技术进行业务协助。

根据是制作迪斯尼动画动作片,还是设计写实风格的动作片,角色的骨架造型将有所不同。

在写实动画片中,由技术美术人员参与怪物、四足行走、多脚行走、汽车和角色的肩关节和胸部骨骼的自然信息处理等,用技术支持使结果更加自然,制作更加简单。

基本上可以操作Expression(Script),修改和制作Mel Script或Msx Script,并且需要对线性代数有一定的了解。

另外,需要对生物的骨骼和关节有基本的理解能力,并了解拉动和推动肌肉之间的差异。

需要对Anatomy and muscles有洞察力,同时也需要对Geometrical deformation 具备一定的理解能力。

在某些情况下,需要有能将Vertex单位进行Data driven的能力,需要灵活运用Maya的Driven controller和max的Driven Controller以及Modifier的API。

SEMI TOOLS DESIGN AND SCRIPTING.

与其他业务相比,该领域需要对编程有高度的结构性了解。

开发的程序越复杂,越要在开发的时候注意维护以及扩张。

因此,对基本OOP的概念要准确,而且也是需要充分理解程序的多形性(Polymorphism)的领域。

利用MAX script, MEL script, python等,协助美术人员可以更方便地使用开发工具。

灵活利用C++, C#等, 重新开发美术人员所需的独立应用,或向所使用的引擎/编辑器添加新功能。

我们绝对不能忘记这一过程的重要部分。

必须设计出如何评价所制作的Script的使用性的分析要素。

另外,还要弄清楚开发人员反复的失误是什么,并且要把这项数据存到数据库中。

可以每月对存储的信息进行分析,并且根据分析结果修改软件,也可以通知开发人员持续性错误产生的部分具体是哪个方面。

OPTIMIZING ART ASSETS AND AUTOMATION WORKFLOW.

确认游戏客户端是否因美术数据Crash、变慢或者容量过大等问题,之后变更Art Asset或修改pipeline等方式解决问题。

自动化工程是有限的。

因此,应先筛选出保障性可以达到100%的内容先行开发,而保障性低于80%的,比起自动化处理,更应该设计能从本地系统过滤掉的整个工作流。

例如,对于保障性达到100%的内容,可以以一个简单的错误输出为例。在输出插件中错误地设置了一个选项时,这有可能会在自动化系统中Repair或只是单纯的告知制作人员问题点的检查功能。

WRITING PROPOSAL DOCUMENTS.

当要求程序员和游戏设计师进行协作时,以技术知识为基础,尽可能创建切合实际/具体的提案,达成明确的共识并实现快速的体现。

PRODUCTION GUIDELINE DECISION, GUIDANCE

为了将美术Assets与游戏系统很好地连接起来, 又或者为了管理好美术Assets,需要决定和引导各种规则、规定。

这个也叫Asset pipe-line。

指确认首次生成的Mesh data、Texture data、Asset bundle data等是否按照规范正确储存的所有自动化过程。

这里需要各方面的合作。

COMMON PROBLEM SOLVING SUPPORT.

以美术-程序知识为基础,快速掌握开发过程中发生的各种问题及bug,并提出解决方案。

例如,技术设计师拥有关于Bent Normal的知识。

知道其本来的含义,并且在某种程度上也认识到它是可以应用于某个地方的信息。

显然,这是信息,因此可以应用它的方案有很多。

但是,重要的是要确保信息是否按照预期使用,并起到解决错误的重要作用。

TASK DRIVING.

在需要多个小组协作的开发工作中,协调各小组负责人之间的意见及日程,核实进展情况,开展工作。

COMMON TECHNICAL ISSUES SUPPORT.

收集美术人员在使用游戏引擎以及各种开发工具的过程中产生的各类问题,并就这些问题提出解决办法。

Fundamental

技术美术师的角色与精神.

必须保持牢固的中立心态。

要忠于自己的业务,但要正确认识到自己的业务并不是仅仅盯着电脑工作。

而且除了工作态度之外,还必须在以下两个领域中培养自己的能力,这是直接工作方法中的核心部分。

1. 准确了解如何使生产率提高一倍。

2. 准确了解如何改善视觉效果。

确切地说,以上两个是彼此直接相关的,请不要把这两项分开来看。

理解电脑游戏时代和硬件结构的变化。

如果培养出对时代变化的洞察力,就能提前知道下一代美术家们会提出什么样的要求。

因此,如果参与下一代游戏开发,可以在最短的时间内明确判断美术人员需求的适合行,并事前准备有关需求的相关知识。

先行了解企业的目标要素。

在推动或开展业务的过程中,请理解目前的公司所确立的短期目标和长期目标。

树立沟通原则。

如果设立理查德·费曼的“洞察提问者提出的问题水平,来作答的原则”,从沟通的角度来说,将能引导更坚固的互相理解关系。

解决与个别工作室团队的分歧。

普遍理解人类基本的对应方法。

如果觉得世界上的所有人都想从你这里抢走东西,那任何事情都无法进行。

与沟通对象第一次交流意见时,不要给对方留下对方意见错误的印象。

最理想的观点交换首要起点重要因素。

1. 让对方强烈意识到双方是平等关系。

2. 让对方意识到并不是要删掉对方的东西。

3. 尽量不要进行能产生‘我的想法更对,你的想法不对’这一类认知的对话。

以上三点非常重要,但要真正落实,自我情绪控制难度很大。

先在认识的基础上,与负责的工作室部门的核心人员建立交情。

甚至可以想一想你的观点对那个核心人员的KPI可能有帮助的方案。

我们部门将提高OKRs的比重,而不是KPI的比重。(逐步实行,并建立适合我们部门的OKRs方式。)

促进提升团队角色的方法。

五年以上经验者和五年以下经验者分类。

引入与Agile pair programming技术相似的概念。

按重点处理领域分成两人一组。

五年以上经验者在投入到自己重点领域工作后,担任合作同事的业务促进顾问,但不分担业务,只是确认合作同事的工作准确性或担任选择方案的指导者角色。

每月重新指定两人一组的团队,每个团队里的两个人需要进行相互评估。

回顾一下,相互评估的对象如何推进了最初团队设定的宏观目标,在中间阶段为促进企业或项目的目标做了哪些贡献,哪些事项进行了变更且原因是什么,再进行评价。

每周的周会尽可能不要举行,采取只进行推进想法的必要会议。

信息共享会议是生产率低的会议,因此信息共享用团队BBS等渠道记录并共享信息。

信息共享和沟通要引导职员代替数字媒体(Space, Confluence.)。

一般性的决策采用数字方式。

但是,促进性想法是唯一需要面对面解决的,所以一定要见面开会。

为了在促进想法会议能够自由地发挥想象,应寻找类似咖啡厅等,比公司会议室更不受压力的空间进行自由讨论。

促进与引擎团队或工具团队的协作。

一起成长。

两人一组的虚拟团队要求至少任命一个引擎团队作为技术指导。

TA团队需要整合需求,制定业务促进方案,选定引擎团队协作人员来执行每项人物。

简单的示例:

以上Flow并不是固定的。

所以,虚拟团队的领导应该知道,在哪个时间点上,应该与哪个部门进行密切沟通。

如果是高级TA不能只是停留在提出想法上。

必须确认各个项目团队的人员构成特征、态度、反馈方式等,根据情况配置能够很好的融入各个项目团队的Flow,并且能够引导虚拟团队。

End of Contents.

Extra technique for Skin shader by URP

Dual Lobe BeckMann

half3 DirectBDRFXD(BRDFData brdfData, BRDFDataXD brdfDataXD, half3 normalWS, half3 lightDirectionWS, half3 viewDirectionWS,half mask)
{
    #ifndef _SPECULARHIGHLIGHTS_OFF
    float3 halfDir = SafeNormalize(float3(lightDirectionWS) + float3(viewDirectionWS));

    float NoH = saturate(dot(normalWS, halfDir));
    half LoH = saturate(dot(lightDirectionWS, halfDir));

    float d = NoH * NoH * brdfData.roughness2MinusOne + 1.00001f;
    half nv = saturate(dot(normalWS,lightDirectionWS));
    half LoH2 = LoH * LoH;
    float sAO = saturate(-0.3f + nv * nv);
    sAO =  lerp(pow(0.75, 8.00f), 1.0f, sAO);
    half SpecularOcclusion = sAO;
    half specularTermGGX = brdfData.roughness2 / ((d * d) * max(0.1h, LoH2) * brdfData.normalizationTerm);
    #if _USESKIN
    half specularTermBeckMann = (2.0 * (brdfData.roughness2) / ((d * d) * max(0.1h, LoH2) * brdfData.normalizationTerm)) * _LobeWeight * mask;
    half specularTerm = (specularTermGGX / 2 + specularTermBeckMann) * SpecularOcclusion ;

    // On platforms where half actually means something, the denominator has a risk of overflow
    // clamp below was added specifically to "fix" that, but dx compiler (we convert bytecode to metal/gles)
    // sees that specularTerm have only non-negative terms, so it skips max(0,..) in clamp (leaving only min(100,...))
#if defined (SHADER_API_MOBILE) || defined (SHADER_API_SWITCH)
    specularTerm = specularTerm - HALF_MIN;
    specularTerm = clamp(specularTerm, 0.0, 100.0); // Prevent FP16 overflow on mobiles
    #endif

    half3 color = specularTerm * brdfData.specular + brdfData.diffuse;
    return color;
    #endif
    #else
    return brdfData.diffuse;
    #endif
}

Dual Lobe BeckMann


Custom HDR Reflection for More Metallic.

half3 GlossyEnvironmentReflectionSkin(half3 reflectVector, half perceptualRoughness, half occlusion , half mask)
{
    #if !defined(_ENVIRONMENTREFLECTIONS_OFF)
    half mip = PerceptualRoughnessToMipmapLevel(perceptualRoughness);
    half4 customIrradiance = SAMPLE_TEXTURECUBE_LOD(_cubemap , sampler_cubemap, reflectVector, mip);
    half4 encodedIrradiance = SAMPLE_TEXTURECUBE_LOD(unity_SpecCube0, samplerunity_SpecCube0, reflectVector, mip);
    half4 blendIrraciance = lerp(encodedIrradiance , customIrradiance , mask);

    #if !defined(UNITY_USE_NATIVE_HDR)
    half3 irradiance = DecodeHDREnvironment(blendIrraciance, unity_SpecCube0_HDR);
    #else
    half3 irradiance = blendIrraciance.rgb;
    #endif
    return irradiance * occlusion;
    #endif // GLOSSY_REFLECTIONS

    return _GlossyEnvironmentColor.rgb * occlusion;
}

Custom HDR Reflection for More Metallic.

开发皮肤着色器的注意事项。 2019版本。

Abstract

object
Various considerations for the development of skin shaders for OpenGL 3.2 or higher devices released in the generation after 2019.

Subject
Using Dual Lobe Skin Shader and Blurry Normal.
Hardware Test phase by context.

What is the dual lobe specular?双镜叶高光(dual lobe specular)为最终结果提供两个个别specular lobe的粗糙度值。他们结合在一起为实现优秀的皮肤表现提供高质量的子像素的微弱频率,从而表现自然的视觉效果。<— epic games认为。

其实这部分已经从2013年开始使用并被大家所知,只是在Epik Games 2018年发表的时候更强调了而已。

Softer Lobe

Tighter Lobe

Combined Lobes

简单来说,丰富自然的皮肤表面的specular只用一个specular去控制的话有点勉强。虽然跟Clear Coat 是不同的层面,皮肤凹凸和皮脂分布、油份分布之间的差异性非常多样化,因此为了表现更写实更自然的皮肤光泽,最近几年开始使用两个specular是大趋势。根据个人喜好,Softer Lobe 可以使用一般的GGX或者与之类似的Trowbridge-Reitz;Tighter Lobe 可以使用 BackMann。如果要多考虑到优化的话,可以全部使用BackMann,同时specular模型只计算一次,在Softer Lobe 里下一个分支之前针对Branching和Narrow 的值进行mapping之后,最终合并一下即可。

让我们来修改下现有PBR shader的机制。大的有三种需要添加修改。Geometry Shadow Model 需要修改。Curvature 和Blurry Scatter需要添加,Back Scatter 也需要添加。全部都以Forward rendering 为标准进行体现。

以上示例无法获取,可以先简单看一下Mike示例。和Dual Lobe 的基本概念相似,如下:这部分是使用了两个粗糙值…

整体表面的平均粗糙度值0.95第二个粗糙度值为1.05,使用贴图采样器进行插值。这个值与另一个更精细的微粗糙度相乘,与下面外表面的Fresnel结果相加

我们来debug看下每一个表现部分。

整体来看我们分析示例,修改一个master material需要对shader编译一并处理,很烦。难道在家也得建一个完美的版本系统么;

角色上半身使用的shading模型分类好之后给大家看。
头发 皮肤 衣服

Albedo

Combined Roughness Buffer

Specular Occlusion buffer

Metallic Buffer

SS AO buffer(Can not used Mobile devices)

Opacity area buffer

Pre HDR Tone Mapping result.

Post HDR Tone Mapping result.

Return Final Image Buffer.

All combined final result with Color grading and etc…最终渲染的Bloom处理是一种不可在手机游戏中使用的[convolution]类型。

Report on full shading analysis using netease game 楚留香 modeling.

特别事项:游戏本身在帧调试的时候没有关于“Tone Mapping”的信息,我想对这部分做一个预测。实际上,在后期处理部分,是否使用“Tone Mapping”虽然无法准确把握,但在GPU调试中并未出现。网易楚留香的TA在完成这个项目后加入了腾讯王者荣耀组,现在腾讯王者荣耀(成都工作室)组也没有在后期处理中使用Tone Mapping。取而代之的是Tone Mapping公式是在角色shader内部计算和使用的方式来处理的。因为最近,王者荣耀中国国内版也正在变更为PBR,这时Tone Mapping成为了重要的一部分。由于Tone Mapping影响整体渲染并影响FX,因此推测只有角色的渲染部分和特定部分在Shader中是对ACES公式(实际上并不复杂)修改后使用的。因为如果不使用Tone Mapping的话,得到上述的肤色结果或者Specular结果会很辛苦或者不可能。

应用ARTISTIC IBL Lighting并进行修改。修改IBL环境光照值并观察变化趋势。这个阶段很重要。渲染的花是照明,因此在采用基本shader的情况下也可以凭美术的努力提高至类似品质。

楚留香选角窗口和实际游戏内角色品质偏差很严重。分析结果显示:以选角窗口可扩大显示的五官标准进行制作,睫毛的浓度(清晰度,分布密度等..)以及眉毛都是以选角窗口标准进行制作的。所以实际游戏里面脸部五官的分辨力降低,外貌的特征表现不明显。分析标准是根据2019年10月楚留香版本标准。
从网易同事那边获得的楚留香原件。

使用楚留香模型进行的阴影分析报告。-楚留香游戏引擎(弥赛亚)和Unity引擎功能的实现不同,以个人见解为基础进行分析。重点观察过的东西。为了实现柔和的阴影效果而要看的东西。

通过白模确认造型。只更改ambient浓度和阴影的浓度和人物的Portrait Lighrting角度进行测试。

Change test using white ambient lighting.

使用人为制作的白色环境光,确认曲面弯曲的变化会如何变化.

应用Roughness并继续确认整体形状。因为Highlight和Ambient效果决定了面部的立体感本身看起来如何,所以这是一个非常重要的测试阶段。

IBL 创建自己的灯光.Create your own IBL lights used by Substance Designer.

reference of SD file

HDR_Custo

m.sbs8.85KB

ARTISTIC IBL Lighting 申请.边修改IBL环境光照值,边观察变化。这个阶段很重要。渲染之花是照明,因此最好是在采用基本shader的情况下凭美术家的努力提高质量到一定程度。

该屏幕是使用Unity的PBR基本功能的结果。使用了IBL照明和一个定向照明。结果是没有应用皮肤效果。

Adding AO(AO Value : 0.75) + Skin detailed Normal ( Normal Value : 0.14 , Tile Count : X7 Y7 )

Adding Fake SSS(Temporaly test used by Thickness data with Emissive Color of SSS)

校正摄像机角度和视点,同时将钳口线校正为更细的角度。

添加了使用AO信息控制着色器中瞳孔EyeLashes中假阴影区域的功能。

Default.

Remapped color use to Channel mixer.Blue Channel mix use to increase Red 0 to 10

Shader Test result.

Reference of Specular BRDF Candidates lists below.BackMann NDF function

float BeckmannNormalDistribution(float roughness, float NdotH)
{
    float roughnessSqr = roughness * roughness;
    float NdotHSqr = NdotH*NdotH;
    return max(0.000001,(1.0 / (3.1415926535*roughnessSqr*NdotHSqr*NdotHSqr)) 
    * exp((NdotHSqr-1)/(roughnessSqr*NdotHSqr)));
}

Blend Normals ( Now Just blending with Vertex Normal )

float3 BlendNormals(float lightDiffusion, float vertexNdotL, float bumpNdotL )
{
   
   float redIntensity  = lerp(0.0f, 0.6f, lightDiffusion);
   float greenBlueIntensity = 1;
   float red = lerp(vertexNdotL, bumpNdotL, redIntensity);
   float greenBlue = lerp(vertexNdotL, bumpNdotL, greenBlueIntensity);
   greenBlue = min(red, greenBlue);
   return saturate(float3(red, greenBlue, greenBlue));
}

Simple Back Lighting

inline float3 BackLighting(float3 lightColor , float NdotL, float shadowMap, float AO, float transTex, float SssScale, float3 backScatterColor)
{   
   float backLight = lerp(NdotL , 1, transTex) - lerp(NdotL, 1.0, 0.4);
   float3 result = saturate(backLight) * lightColor * (shadowMap + AO)  * (backScatterColor * SssScale);
   return result;
   
}

Anti-Tilled technique by Stochastic

Solution researcher : JP

Explore ways to reduce the appearance of far-ground floor texture repetitions as clearly as possible.Technology that can be used on top level mobile hardware or primarily for the PC version.

Stochastic Method
PS ALU : 204

Applied this method belowEvaluate count of instructor.

Standard Method
PS ALU : 172

conclusion.


2 or 3 SAMPLERs increase.
It supports a number of mobile hardware released after the first half of 2018.
When using for terrain, it is recommended to select one of the layers with the highest weight.
When doing this, the performance problem is not a big problem.

结论。
2或3个采样器增加。
它支持2018年上半年之后发布的许多移动硬件。
用于地形时,建议选择权重最高的图层之一。
这样做时,性能问题不是大问题。

Reference

GPU ZEN 2

ACES MATCHING SHADER FOR SUBSTANCE DESIGNER

IDT ODT Simulated ACES UE4 log Tone-Mapped

SD的旧版本不具有ACES色彩配置文件功能,因此我自己制作。

之后,我们与总部开发团队进行了沟通,并将OpenColor IO正式添加到SD,并将ACES Tonemapping添加到Render节点。

Customized Unity Standard mobile PBR from 2018

Change issued.

Environment reflection method used by UE4.

Some reflect approximation method has optimized friendly performance of mobile.

Huge reduce to texture sampler counts.

AO map generated used from height data with simple math.

Matcap Rim lighting for Mobile by 2013 in years.

SBSAR with Matcap Rim lighting for Mobile hardware.

我将继续与Substance Unity合作,在现有的B2M之外创建一些着色器。

由于我们尚未对光的方向进行任何计算,因此根据支持的视图空间的遮罩替换了环境照明。

坦白讲,复杂的结构很困难,因为这种方法必须在较旧的移动硬件上使用。

所以我决定一起制作一个实质性的移动版本…

我们只是添加一些参数。

为了在不显着增加度量和参数的情况下减少计算量,我们基于查找纹理实现了边缘照明,然后基于边缘颜色的颜色基础构造了各种圆形渐变贴图。

我们为Invert.Y提供了一个范围来处理现有法线贴图的Y轴反转。

-1至1 …如果创建自定义材质检查器并再次附加组件,则可以将其除以true false。 但是,这种方法很麻烦,而且无论如何都只能以-1和1的值传递给normal.y。

自从我于2007年首次开发和使用它以来,它已经针对手机游戏环境进行了修改。

从那时起,这种方法在中国已被广泛使用。

(PSO)How to use shader pipeline cache effectively.

Let’s see how to use the shader pipeline cache.

I will talk about the above table of contents.
First, we’ll talk about pipeline state objects, how to use them in Unreal Engine, and also add a cache using PSO to see how to use them.

Let’s look at the PSO cache first.

Let’s move on because we know everything.

It’s a graphics pipeline … you know all this, so let’s move on.

Compute shaders are now universally used in the latest mobile games.

Pipeline?
Graphics hardware support

  • Optimized hardware unit allocation for each stage stage.
  • It can be nested because it is divided into stages: maximum efficiency.

When you run the pipeline, what if you want to do a slightly different action?

  • Example) If you have the ABCDE pipeline.
    • 1:AB–E Only / 2:A–DE only / 3:AB’C-E

“What is hardware support in the graphics pipeline?”
Each step is assigned to an optimized hardware unit so that it can be optimized for each hardware.
On the right is how the CPU handles instructions.
If you look at the process of processing, it is 8 cycles because of 2 commands because it is serial processing.
It can be said that doing with a pipeline made it possible to solve the overlapping instruction so that it could be processed in 5 cycles.

The pipeline consists of several stages.

  • Each stage has different actions based on the preset state information.

State.

  • Ex)Blend State, DepthStencil State,Rasterizer State,Sampler State,….
  • State setting required when using a pipeline.
    • State change itself is overloaded.
    • What if the state is not changed? Use the previous state as it is and use it for performance optimization.

State information is called State.
State setting is required for each pipeline use.
When We run this pipeline, you have to make an appointment in advance which stage and how it should work.
The state itself is heavy to change.
It is best not to change the state.
Therefore, there is a way to optimize by listing similar states.
Unreal Engine itself is structured in that context.

In the past, all of these states were handled individually.
In the case of DX9, for example, the state for alpha blending was used one by one.

In the past, all states one by one.

  • Ex) D3D9 Render State: Alpha Blending State, Texture Stage State

Improvement: so that we can do some related settings at once.

  • Ex) ID3D11BlendState:alpha-to-coverage,independent blending,render targets.
  • The goal is to reduce the overload of station changes by setting other related settings as well.
  • Can be created and set at render time.

Latest hardware.

  • Dependencies between hardware units exist.
  • When setting Ex.hardware blend, Taster State also affects Blend State.

After that, a slight improvement was made by bundling related states and processing them at once.
For example, in the case of DX11, Blend State is the alpha-to-coverage value, and information that determines how to render each MRT when the render target supports MRT, and whether to use these blending MRTs as one information or individually. Until it is processed.

Let’s set the state at a pipeline level.
Pipeline State

  • Hardware configuration of how the input data will be interpreted and drawn.
  • Shaders and render states (Blend, Depth Stencil,Rasterizer,…) and others.
  • Pipeline State Objects Manage pipeline state through PSO.


The concept of letting the pipeline work at once is the pipeline state.
It refers to the configuration for the entire hardware.
It is controlled through an object called PSO.
An object that contains pipeline state information.

Pipeline state object.
An object containing pipeline state information.Pipeline State Object ==PSO

  • Supported Graphics API: D3D12 / Vulkan / Metal
  • Used for pipeline state management.
  • Judging and validating the state in advance.
  • Allows pipeline states to be replaced more quickly at render time.

Pipeline State Objects Sets most pipeline states through PSO.

  • Set to PSO.
    • All shader bytecodes, Blend State, Rasterizer, DepthStencil State, Multi-Sampling information, and more.
  • Set by command list.
    • Resource binding, viewport information, Blend Factor, Scissor rects, DepthStencil Reference Value, etc.


The purpose itself is intended to manage pipeline states.
It is to determine whether the pipeline works without problems with the pipeline state in advance.
In actual use, it is the level to believe and use.
We can change the entire state much faster.

Things that don’t change well on a pipeline basis.
Viewport precision, scissors testing, etc… are supposed to be handled at the command list level.

Low level cache.

Low level cache
Since the PSO itself was already created with the assumption of recycling, it has already been arranged in the graphics API stage.
D3D12 / Vulkan / Metal

  • Cache support for runtime generated PSO.


D3D12 / Vulkan

  • Create a load-time PSO by file out the PSO to disk.


OpenGL

  • ProgramBinary 지원 디바이스(OpenGL ES 3.0 이상)Create a load-time PSO by file out the PSO to disk.


However, OpenGL is not actually an API that supports PSO. Instead, it works like a PSO on hardware that supports a feature called ProgramBinary, and reads it later.

RHI
A thin layer on the platform-specific graphics API. Platform-independent code that handles all operations.
PSO generated at the low level is stored as a render resource.
Utilizing this archived information, the Map container containing the PSO is used to search the cache.

Low level Cache – D3D12
Simultaneous use of runtime cache and cache loaded from file.

  • Runtime cache = Search and download “GraphicsPopelineStateInitializer” from RHI.
  • Loaded cache = Search using low level description information.
    • Low level description: Platform-dependent pipeline state descriptor.
    • Ex> ShaderByteCodeHash,D3D12_SHADER_BYTECODE,D3D12_BLEND_DESC,D3D12_RATERIZER_DESC,…
  • Search from faster runtime cache.
    • Only platform-independent pipeline information (Graphics Popeline State Initializer) is received from RHI.
    • It is a platform-dependent form (Low Level Description) and attempts to search immediately without translation.


Why is this way? GraphicsPopelineStateInitializer is a platform-dependent Class.
In this process, it is fast that it does not undergo conversion.

Low level cache – Vulkan
Same as D3D12 + Supports Pipeline LRU Cache.

LRU Cache?

  • LRU = Least Recently Used
  • Recently unused data from the cache, freeing up cache space for new data.

Pipeline LRU Cache

  • LRU support for low level cache.
  • Very useful for Android Vulkan platforms with insufficient shader memory.

Related settings.

  • #define VULKAN_ENABLE_LRU_CACHE 1
  • r.Vulkan.EnablePipelineLRUCache = 1
  • r.Vulkan.PipelineLRUSize = 10 * 1024 * 1024 / r.Vulkan.PipelineLRUCacheEvictBinary = 1


By assigning LRU to the PSO object, the memory space can be flexibly secured even if a heatcing phenomenon occurs. This is mainly because of the Android platform.
When developing the Android version of Fortnite Mobile, it was applied to solve the Android platform memory problem.

Low Level Cache – OpenGL.
Does not support PSO.

  • Not a bulk change through the pipeline state…
  • Shader State + Render State updated respectively.
    • Low-level cache support for bound shader states (BoundShaderState, BSS).


Helps to make batch changes only for shader states, not batch changes. BSS Cache

Low Level Cache – OpenGL.
Program Binary Cache

  • OpenGL compiles and shades individual shaders and creates them as Program Objects.
  • Ability to write program files so that program objects are not recompiled so that they can be loaded and reused later.
  • Separate Shader Object support.

LRU algorithm support.

  • Very useful for OpenGL ES platforms that lack shader memory.
  • For Mali GPU, the maximum shader memory heap size allowed by the driver is small.

Related settings.

  • r.ProgramBinaryCashe.Enable=1
  • r.OpenGL.EnableProgramLRUCache=1
  • r.OpenGL.ProgramLRUCount=700/r.OpenGL.ProgramLRUBinarySize=35*1024*1024

Shader Pipe-line cache.

Shader Pipe-line cache.

  • An object that utilizes RHI level API to help low level digging.
  • Focus on when to generate/create PSO.
  • It is called PSO cache.
  • Replace Shader Cache that existed in the past.
  • Purpose: To make it possible for users who run the app for the first time to play without a runtime hitch.
    • In the build for distribution, the necessary data for PSO creation is set in advance.
    • Compile the PSO when the user does not notice through the data when executing the build for distribution.

PSO cache action flow.

  • Create a PSO at runtime with a test build.
    • Convert the generated PSO to binary PSO and save the file.
  • To accumulate multiple play results, merge binary PSO to pipeline metadata.
  • Convert pipeline metadata back to binary PSO when cooking a deployment build.
  • Create binary PSOs at initialization time using binary PSOs in your deployment builds and register them in a low level cache.
  • Using PSO in a deployment build.

Use PSO cache.

  • r.ShaderPipelineCache.Enable =1/Command line”-psocache”
  • ShareMaterialShaderCode(Shader code library) enabled.

Shader Pipeline Cache-Action.
Test build

  • Cooking test builds.
  • PSO generated during play is saved as binary PSO.
  • Merge binary PSO into pipeline metadata.

Deployment build.

  • Convert pipeline metadata to binary PSO when cooking.
  • Create PSO at initialization time, register in low level cache.
  • Using PSO at runtime.

Shader Pipeline Cache-Test build.
Cooking test Bildu.

  • Stable shader information of all materials in the content is generated. = Save to scl.csv.
    • Storage information: ClassNameAndObjectPath,ShaderType,ShaderClass,MaterialDomain,FeatureLevel,QualityLevel,TargetFrequency,TargetPlatform,VFType,Permulation,OutputHash.
    • Why you need Output Hash:Share Material Shader Code.
  • This information is later used by the “pipeline metadata” generator.
  • Let’s talk about what information is stored on the back page.

Shader Pipeline Cache-Test build.

Shader Pipeline Cache-Test build.
PSO generated during play is saved as binary PSO.

  • You are not directly distributing files in the low level cache.
  • Utilizing platform independent information (GraphicsPipelineInitializer) = ushaderpipeline creation.
    • Shader with binding BoundShaderState:VertexDeclarationRHI,VertexShaderRHI,PixelShaderRHI,GeometryShaderRHI,DomainShaderRHI,HullShaderRHI.
    • Render states:BlendState,RasterizerState,DepthStencilState,ImmutableSamperState.
    • DepthStencilRelated degree:DepthStencilTargetFormat,DepthStencilTargetFlag,DepthTargetLoadAction,DepthTargetStoreAction,StencilTargetLoadAction,StencilTargetStoreAction,DepthStencilAcess.
    • etc:bDepthBounds,PrimitiveType,RenderTargetsEnabled,RenderTargetFormats,RenderTargetFlags.
    • Multi-sampling information:NumSamples.
  • Binary PSO storage.
    • r.ShaderPipelineCache.LogPSO = 1/Commandline”-logPSO”

Shader Pipeline Cache-Test build.
Merge binary PSO into pipeline metadata

  • It is important to run/render all content as much as possible in a test build so that there is no pipeline information being excluded.
  • Binary PSO and Stable Shader information are combined to generate pipeline metadata = stablepc.csv storage.

Shader Pipeline Cache-Action.
Test build

  • Test Build Cooking
  • PSO generated during play is saved as binary PSO.
  • Merging binary PSOs into pipeline data.

Distribution Build

  • Convert pipeline metadata to binary PSO when cooking.
  • PSO creation at the time of initialization, registration in low level cache.
  • Using PSO at runtime.

Shader Pipeline Cache-Deployment build.

Shader Pipeline Cache-Deployment build.
PSO creation at initialization time, registered in low level cache.

  • The PSO is generated in advance at an engine initial time or at an arbitrary time, so that the PSO is recycled.
  • Creating PSO to be actually used through the precompile process of the PSO cache.
    • Create GraphicsPipelineInitializer for every binary PSO.


-> Call SetGraphicsPopelineState(…)
->PipelineStateCache::GetAndOrCreateGraphicsPipelineState(…)
->GraphicsPipelineCache.Find(…) search failed
->RHICreateGraphicsPopelineState(…)

Shader Pipeline Cache-Deployment build.

Shader Pipeline Cache-Deployment build.
Using PSO at runtime.

  • Create GraphicsPipelineInitializer for each draw call


->SetGraphicsPipelineState(…) Call
->PipelineStateCache::GetAndOrCreateGraphicsPipelineState(…)through.
->GGraphicsPipelineCache.Find(…)Search success.
->RHISetGraphicsPipelineState(…)

Shader Pipeline Cache-Derived data.
[Test build] =-logPSO

  • [Cooking output]:scl.csv / ushaderbytecode
  • [Execution Output]:rec.upopelinecache
  • [Merge output]:stablepc.csv


[Deploy build]

  • [Cooking input]:stablepc.csv
  • [Cooking Output]:stable.upipelinecache / ushaderbytecode

PSO Cache Usage Guide.

Precautions.
#The suggestions may not fit all projects.
Choose the way that fits your project with the concepts outlined.

  • Assumptions often used.
  • android OpenGL ES3.1
  • Content distribution method:Minimal APK + DLC w/HttpChunkInstallData
    • Minimal APK:Android ETC
    • DLC w/ HttpChunkInstallData:Android ASTC
  • Share Material Shader Code = True

Binary PSO storage.
r.ShaderPipelineCache.Save
Ex. -logPSO autosave does not work as desired.

  • PSO logging requirements.
  • r.ShaderPipelineCache.Enabled=1/r.ShaderPipelineCache.LogPSO=1/r.ShaderPipelineCache.SaveBoundPSOLog=1
  • Depending on the project, it can be set via device profile or command line or console command.
  • r.ShaderPipelineCache.Save Direct execution:{ProjectDir}\Saved\CollectedPSOs

Control when PSO is produced.
Engine default settings: Engine Preinit
Slow the PSO cache precompile process.

  • Read binary PSO and proceed to compile with Batch.
  • The PSO cache behaves differently for each tick.
  • Set Pause status with Pause Batching() / Resume Batching().

PSO generation rate control.
r.ShaderPipelineCache.SetBatchMode[Pause|Fast|Background]

  • Batch mode
    • When precompile, process the batch with a renderthread time slice.
    • Set batch amount and maximum allocation time information to be processed in one frame.
    • Engine default: Fast mode = 50 PSOs+16ms / Background mode = 1 PSO + no time limit.
  • Try compiling during the loading screen? Fast Mode!
  • Try compiling during gameplay?Background Mode!

DLC + Shader Code Library
Trouble shooting

  • [Empty DLC plugin only] Cooking failure
  • Crash when activating project launcher/cooking/build DLC.
  • SaveShaderCodeLibrary(…)in CookOnTheFlyServer.cpp line6163

DLC + Shader Code Library
Runtime crash

  • Crash when Shader Code Library is not ready when trying to access DLC content after pak mount.
  • Cause: When the engine is initialized, FShaderCodeLibrary::InitForRuntime(…) does an open operation for the plugin, but the DLC plugin that appears after downloading the content and mounting the Pak will be excluded from this operation.
  • The simplest solution is to open the plugin Shader Code Library directly after mounting Pak.

DLC + Shader Code Library
Reopen PSO cache?

  • Basic engine operation.
    • At engine Perinit, open Shader Code Library with project name (Global, Game) or plugin name (excluding DLC).
    • When the engine is preinit, the Shader Pipeline Cache is also opened as the project name.
      • Create/load Program Binary Cashe using the same GUID as Shader Pipeline Cache.
  • In general, proceed as follows.
    • [Engine initialization] Run engine with AK->Open ShaderCodeLibray in AKP->Open PSO cache->PSO cache Precompile.
    • [Level for patching] Pak mount->Open ShaderCodeLibrary in DLC to remove crash->Reopen PSO cache?

DLC + Shader Code Library
PSO cache deployment strategy.

  • You should know that from 4.22…
    • Shader Code Library supports DLC (plug-in).
    • Shader Pipeline Cache does not support DLC (plug-in).
    • It is important to make sure stable.upipelinecache contains all content including DLC.
  • Prepare in advance even if it takes a long time with as much information as possible.
    • Damage to user experience = time that occurs when the game is opened <hitch that occurs during gameplay.
    • Not very different from the method used by Fortnite.

Fortnite case.
Not very different from the method used by Fortnite.

  • IPA based on iOS = 166.3 MB / DLC download 4.11 GB
  • Create PSO cache for DLC directly at the patch level after downloading and installing DLC.
  • Restart the game after all caches are created or load play levels.

End of Contents.

Tile Based Deferred Rendering

Written by JP.Lee
心动的 Technical Art team leader.

Immediate ModeRendering (IMR)

把三角形Rasterizing,以像素(pixel)为单位进行Shading的传统方式。三角形通过流水线及时移动。耗电以及浪费memory bandwidth(Early Visibility Test功能:为了使用这个功能,三角形要以序列的方式进入。因此,应在应用部分将三角形进行分类(sorting)。

Tile BasedRendering (TBR)

在Embedded中成本最高的是内存的读写。内存的读写与电量和memory bandwidth密切相关。所以在Embedded中最有效的优化方法之一是减少内存访问的次数。TDR度为了取代以前使用的将在IMR中的可见三角形挑选出来的Z-buffer(depth buffer)而研发的方法。将要渲染的画面分成许多tile,以tile为单位进行rasterization。(根据不同的系统情况,存在同时移动的tile。但是不是所有的三角形都能及时通过流水线进行移动。)(每一次划分tile的时候,硬件会创建包括各tile在内的有关Geometry列表的指针。使得各个tile在进行渲染的时候,都可以获取相关的Geometry buffer。)这样不使用z-buffer,改用tile进行处理的话,只使用在system-memory上的intermediate buffer 就可以解决了。

Tile BasedDeferred Rendering (TBDR)

TDR的改良版本。在TDR里也可以对看不见的部分进行Shading或贴图(Texture Mapping)。在TBDR中,GPU里面通过分类找出三角形排列顺序,可以去掉不可见的部分)这个方法称为Hidden Surface Removal (HSR)。因为去掉了没有意义的三角形.,比起TBR,更充分地利用了memory bandwidthTBDR的优点如下1. 因为没有Z-buffer,内存利用率更高2. 没有Z-buffer访问,减少开销3. 不用担心渲染顺序,可以正确的处理半透明效果。(blending效果).

Alpha Blending 和 Alpha Test

Alpha Blending(Alpha Blending) VS Alpha Test(Alpha Test)在传统Desktop Game使用的锂材料,比起Alpha Blending,更建议使用Alpha Test。在不透明的情况下,像不透明的铁丝网,撕裂的衣料,比起用成本高的blending进行计算,不如使用Alpha Test,同时也可以省去像素计算的步骤。但是在Mobile Game中,刚好与此相反,比起Alpha Test,更推荐使用Alpha Blending。 为了进行Alpha Test,Pixel Shader中使用了动态分支(if语句)。在台式电脑中可以高速处理Shader的动态分支,但在移动设备上对动态分支的处理并不好。因此,Alpha Test成为了使Shader性能降低的原因。而且,TBDR无法高速处理像素屏蔽。之前提到过的在TBDR中,收集完DrawCall的Vertex Shader结果,经过Hidden Surface Removal后,对实际上能看到的像素进行处理。但是这只对不使用Alpha Test的完全不透明的mesh适用。因为使用Alpha Test的话,在Vertex处理阶段无法判断Polygon是否被屏蔽,无法进行Deferred处理。在Unity3D中,为了防止上述情况的发生,在对完全不透明的Object进行Rendering处理后,Alpha Test再对Object进行Rendering处理。所以有条件的使用Alpha Test的话,并不会带来严重的后果。但是,TBDR芯片组的构造一开始就不适合Alpha Test的工作,所以最好不要使用Alpha Test。 그렇기 때문에, Unity3D의 내장 Shader 중 Mobile 카테고리에는 알파 테스트 Shader가 존재하지 않습니다.因此,Unity3D的内置Shader中,Mobile类别中不存在Alpha Test Shader。Alpha Blending与此相反,比在台式电脑上的运行速度更快。其通过内部读写对象buffer进行Alpha Blending过程的输出。而在台式电脑中要访问位于DRAM的frame buffer,占用了大量的bandwidth.但是在TBDR中,这个过程发生在芯片内部的Memory里,以tile为单位进行,所以处理速度很快。补充说明 我们来看一下mesh plan的图片。如图,Alpha Blending区域变大的话,比起增加mesh顶点,更好的方式是将它分解,减少透明的区域,从而有利于整体性能的提高。

Render Texture

如果使用Render Texture的话, Unity3D会内部改变Render Target。在台式电脑上改变Render Target的做法也会降低其性能。因为为了改变Render Target,CPU会等待GPU的反应,打破了两者的并列关系。 而且,在TBDR里会产生更严重的问题。Render Target变化时,在Parameter Buffer的全部数据经过处理后向Frame Buffer输出,为Render Target空出了Parameter Buffer。以这种方式进行时,Render Target变化时要增加deferred循环的处理。因此,在Unity3D的摄像机中把Render Texture作为Target Texture使用的话,会降低TBDR的工作效率。

图片后期处理效果(Image post process Effect)

最近的设备性能有所提升,可以使用如调色,Bloom效果等图片后期处理功能。但是,这些图片后期处理功能不能滥用,只能有选择性的使用。首先,图片后期处理会在内部改变Render Target。但最大的问题是bandwidth。尽管像素处理能力也很重要,但最重要的问题是bandwidth。图片后期处理把Render Target变为 Pixel Shader的输入纹理。因为这时输入的Texture不是芯片内部的tile,而是作为Render Target占用了公用Memory,占用了大量的带宽。(例 : 1080p)所以,要慎重地进行图片的后期处理。

Camera Clear(Clear)

在以前台式电脑的图形卡中,有每帧开始前故意不清除之前的内容,即每帧绘制在上一帧之上的情况。但是现代的台式电脑一定要进行清除,才能保证硬件高速的运行。在移动设备的TBR中也是如此。只有清除才能空出芯片内部的Buffer,加速Rendering过程。所以无论是台式电脑,还是移动设备,都不建议在Clear Flag属性中选择Don’t clear选项。

MSAA

在台式电脑中,MSAA的开销很大。和上面一样,bandwidth是最大的问题。例如,用MSAA 2X处理分辨率为1080p的画面的话,需要约2160p的bandwidth。如果仅仅是DRAM就占用那么多的bandwidth,开销会很大。但是,用TBR方式的话,处理可以在芯片内部的区块中完成。用16×16或32×32大小的区块进行MSAA的处理,不会造成很大的开销。在Unity3D中,把质量设置(Quality Settings) 的Anti Aliasing设置为2或4,便可以使用MSAA。 ProfilingUnity3D5中新增了Frame Debugger,每一帧都可以在渲染过程中进行Debugging。可以通过这个功能,很容易地看到Object的渲染过程和批处理情况。 이미지 출처 : http://docs.unity3d.com/Manual/FrameDebugger.html但可惜的是,Unity3D的Frame Debugger无法看到每一个draw call(dp)的GPU 性能的详细情况。可幸运的是,每一个Chipset Vendor都提供了一个可以详细分析渲染过程的工具。Adreno芯片通过Adreno分析器(Profiler),Mali芯片通过Mali分析器(Profiler)或DS-5,Iphone通过XCODE进行分析。 이미지 출처 : http://www.slideshare.net/ozlael/graphics-opt-ndc大概只有一个问题。尽管使用TBR方式的芯片每一个call都可以确认一次其性能(Performance),但是并不能直观的看到每一个call的性能。像之前说的那样,进行draw call(dp)时,不是立刻处理Pixel Shader,而是使用留存的Parameter Buffer结果。因此,在所有draw call(dp)结束之后,才真正地开始渲染过程,所以并不能看到每一个call的实际性能。所以在X-code分析Iphone的渲染过程时,性能项会显示数字0。有时可能也会出现不是0的数字,但出现的数字并不值得信任。在这种情况下,只能利用帧的性能(Performance)或call当时使用的纹理等信息进行推测。像上面提到过的一样,因性能项显示为0,所以Iphone的分析会有一些棘手。 照片出处 : http://www.slideshare.net/ozlael/graphics-opt-ndc 画面变化率用TBR方式进行渲染,渲染过程在芯片内的tile进行,不会占用bandwidth。但是,在向位于DRAM的frame buffer输出tile是,不可避免地要占用bandwidth。为了节约这部分bandwidth的开销, Mali公司使用了Transaction Elimination技术。如果遇到和之前frame及画面结果一样的tile,不再更新tile,循环使用以前的tile。这样可以减少芯片内存复制到系统内存的量。下图中用绿色标识的字就是循环使用的这一部分。 이미지 출처 : http://community.arm.com/所以,如果使用固定摄像机的话,在使用天空盒等背景时,最大程度地减少变化是一种比较好的方法。实际上,这种方法在3D游戏上应用的情况并不多见,比较多应用于2D游戏中。

%d 블로거가 이것을 좋아합니다: