Converting RGB to LogLuv in a fragment shader

There are a few places on the net that I visit pretty regularly. One such place is the forums at Beyond3D where several cool developers hang around and post good stuff. Some of my favourite Beyond3D contributors are DeanA (Dean Ashton, a colleague over at SCEE ATG, with a never-updated blog), and DeanoC (Dean Calver) and nAo (Marco Salvi) who happen to be lead and graphics programmers, respectively, on Heavenly Sword over at Ninja Theory. All fellow PS3 programmers in other words.

Of note in recent times is Dean C talking about the Atomic Cache facility of the SPEs both on his blog and on the forums. Though here I’m going to talk about something both Dean C and Marco have talked a lot about in the past, namely how they use LogLuv encoding as part of their HDR solution. (Lots of funny speculation ensued on the forums as they didn’t quite give enough info for people to connect the dots, even to the extent where websites felt they needed to conduct interviews on the topic.)

As most (good) developers do, when respected people talk about a piece of tech of theirs, which you are not currently employing, you investigate. So, back then, I decided to look into (amongst other things) what it would take to encode RGB into LogLuv in a pixel shader. Here’s what I found.

The canonical reference for LogLuv is Greg Ward’s paper The LogLuv Encoding for Full Gamut, High Dynamic Range Images (see also the book High Dynamic Range Imaging which Greg coauthored). His paper talks about both 24-bit and 32-bit LogLuv encodings, but here we’re only interested in the 32-bit one, which uses 16 bits for luminance information and 16 bits for chrominance. Figure 1 of Ward’s paper gives the representation as:

bit     31: flag negative luminances (1 bit)
bit 30..16: log encoding of luminance (15 bits)
bit  15..8: u coordinate (8 bits)
bit   7..0: v coordinate (8 bits)

The pertinent information on converting from [R,G,B] to [Le,Ue,Ve] (i.e. LogLuv) is spread over multiple sections of Ward’s paper, so to save you some work, I’ll summarize. The conversion is done as follows:

[X,Y,Z] = [R,G,B]*M
x = X/(X+Y+Z)
y = Y/(X+Y+Z)
u'=4*x/(-2*x+12*y+3)
v'=9*y/(-2*x+12*y+3)
Ue = floor(410*u')
Ve = floor(410*v')
Le = floor(256*(log2(Y)+64))

where M is the 3×3 matrix [0.497,0.339,0.164; 0.256,0.678,0.066; 0.023,0.113,0.864].

To explain the “magic” constants in Ward’s math, we note that we support Y in the range (5.4*10-20, 1.8*1019), because log2() of these values give the range (-64.0,64.0), which for Ward’s Le calculation brings Le into the desired integer range [0, 215-1] (a 15-bit integer).

Ward states the gamut of perceivable u and v values lies in the range [0, 0.62] and he therefore scales the u and v values by 410 to result in an integer [0, 255]. For a fragment shader we need Le, Ue, and Ve to lie in the [0, 1] range, as the hardware will automatically turn floats in that range into a [0, 255] integer (clamped). However, we will in the end be splitting Le over two such integers, so we’ll turn Le into a float of range [0,256). Making the appropriate changes turns the math into:

[X,Y,Z] = [R,G,B]*M
x = X/(X+Y+Z)
y = Y/(X+Y+Z)
u'= 4*x/(-2*x+12*y+3)
v'= 9*y/(-2*x+12*y+3)
Ue = (1/0.62)*u'
Ve = (1/0.62)*v'
Le = 2*(log2(Y)+64)

There are quite a few optimizations we can do at this point. In an attempt at being educational, I’ll apply them one by one. First, substitute the expressions for x and y in the expressions for u’ and v’ and simplify, to obtain this calculation:

[X,Y,Z] = [R,G,B]*M
u' = 4*X/(X+15*Y+3*Z)
v' = 9*Y/(X+15*Y+3*Z)
Ue = (1/0.62)*u'
Ve = (1/0.62)*v'
Le = 2*(log2(Y)+64)

Next we fold the computations for U’, v’, Ue, and Ve:

[X,Y,Z] = [R,G,B]*M
Ue = (4/0.62)*X/(X+15*Y+3*Z)
Ve = (9/0.62)*Y/(X+15*Y+3*Z)
Le = 2*(log2(Y)+64)

Here we note that it is possible to fold the dot product dot([1,15,3], [X,Y,Z]) into the vector-matrix multiplication so that it ends up in the Z component of the result (which I’ll call XYZ). The new math is then

[X,Y,XYZ] = [R,G,B]*M'
Ue = (4/0.62)*X/XYZ
Ve = (9/0.62)*Y/XYZ
Le = 2*(log2(Y)+64)

where M’ = M * [1,0,1; 0,1,15; 0,0,3]. We can now also fold the (4/0.62) and (9/0.62) constants into the matrix multiply:

[X',Y,XYZ'] = [R,G,B]*M'
Ue = X'/XYZ'
Ve = Y /XYZ'
Le = 2*(log2(Y)+64)

The new matrix is M’ = M * [1,0,1; 0,1,15; 0,0,3] * [4/9,0,0; 0,1,0; 0,0,0.62/9]. At this point, there’s hardly any math left and no(?) optimizations left to apply, so now it’s time to code. However, turning this into production code we have two potential problem sources:

  1. Division by zero.
  2. log2() arguments less-than or equal to zero.

To avoid visible glitches both issues must be handled, which we can do by strategically adding in some small epsilons to force values to be strictly positive where it matters. When all that is done, we get the following code (Cg code, of course) as a result:

const static float3x3 m = float3x3(
    0.2209, 0.3390, 0.4184,
    0.1138, 0.6780, 0.7319,
    0.0102, 0.1130, 0.2969);

inline float4 PS3_LogLuv_Encode(in float3 rgb) {
    float4 res; // float4(Ue, Ve, LeHigh, LeLow)
    float3 Xp_Y_XYZp = mul(rgb,m);
    Xp_Y_XYZp = max(Xp_Y_XYZp, float3(1e-6, 1e-6, 1e-6));
    res.xy = Xp_Y_XYZp.xy / Xp_Y_XYZp.z;
    float Le = 2 * log2(Xp_Y_XYZp.y) + 128;
    res.z = Le / 256;
    res.w = frac(Le);
    return res;
}

Running this code through NVShaderPerf gives (from memory) 5 cycles for 9 instructions. When inserted at the end of a longer shader where there is plenty of room for instruction pairing, the total overhead for the LogLuv conversion will be less than this, perhaps around 3 cycles. I haven’t checked with Marco to see how this compares to what he’s doing, but it matches the cycle numbers he mentioned in various posts so it’ll be pretty close.

As Marco discusses on e.g. Dean’s blog you might want to adjust this representation a little to avoid getting carry problems during interpolation, which I haven’t done here but left as an exercise to the reader. Another exercise is to do the conversion from LogLuv back to RGB. Enjoy!

12 thoughts on “Converting RGB to LogLuv in a fragment shader”

  1. Well done Christer!
    Your implementation is actually faster than mine as I did not fold the dot product into the matrix multiplication and I also had to split the log luminance in a more convoluted way as using the same transformation you used I was not being able to perfectly go back to a RGB colour without losing a tiny bit of intensity.
    BTW..do you know any game that is making use of the same base technique?
    The only one I’m aware of is Heavenly Sword..

  2. Hi Marco, good to see you here! Back when I looked at this I only worked out how to optimize the RGB->LogLuv code as per above, not the other way around, so I never looked into the precision issues but I did scribble in my notes that splitting the luminance value into two bytes could be an issue and that you might have to do something else. A possible option could be to write it this way:

    float4 res;
    float3 v = mul(rgb, m);
    v = max(v, float3(1e-6, 1e-6, 1e-6));
    float k = 2*log2(v.y) + 128;
    res.xy = unpack_4ubyte(k).xy;
    res.zw = v.xy / v.z;
    return res;
    

    Although this might not work so well either; I can’t remember if the pack/unpack instructions were hosed or not. (I’m not the one doing the shader coding.) I recall the unpack approach produced worse code too, but this was with a pretty old Cg compiler, so who knows.

    Feel free to steal the dot product folding trick! (If you haven’t already.)

    I don’t know of any other games using LogLuv at this point. I considered it for our engine based on your posts about your approach, but we haven’t committed to how to deal with HDR yet so the ball is still in the air. (As you know quite well, there are several possible options and which is best depends a lot on what other choices you’ve made – and we haven’t made all our choices yet.)

    BTW, I don’t think it’s a secret to mention publicly that I’ve seen builds of Heavenly Sword and the graphics are absolutely gorgeous. Kudos to you and the team!

  3. Hi Christer,

    You’re right, there are many different options when it comes down to render HDR images.
    Since we can’t really do alpha blending using this color space, if a game really needs to blend in a HDR color space then this technique only make sense only if used in conjunction with multisampling, writing a custom AA resolve filter that downsample a LogLuv image to a FP16 image where we can do HDR blending
    (Though I think that HDR blending is overrated, we can live without it just blending on a RGBA8 render target, tone mapping in our alpha blending pass pixel shaders, even better if we do it using exposure computed in the previous frame read back with the CPU so that we can avoid a texture read in our pxel shaders and set exposure as a pixel shader constant.)
    It’s also worth to notice that the vast majority of games probably don’t fully use the full FP16 range bur rather a narrower range, in this case we can drop the logarithm and just store a linear luminance scaled to just fit the luminance range we want to support, even in this case I doubt we can tell the difference, can we? :)
    BTW..the code I used to encode luminance is really ugly but I did not have much time to spend on it and it was the only code that did the job (perfect LogLuv -> RGB/FP64 conversion) as when I (sneakily!) introduced it the game already had a ton of content developed using FP16 and I did not want the artists/art director to scream in pain cause our images were slighty darker (!!):

    // pack the luminance into 2 channels..
    float Le_LSBs = frac(Le);
    float Le_MSBs = (Le - (floor(Le_LSBs*255.0f))/255.0f)/255.0f;
    
    // unpack log luminance
    float Le = Le_MSBs + Le_LSBs / 255.0f;
    

    Thanks for your compliments Christer, I’m looking forward to what you and your team can do on PS3!

    Marco

  4. Just to round out the above blog post and its comments, I thought I’d mention that users MJP and remigius over at gamedev.net incorporated Marco’s packing code with my code snippet and also worked out the details of the matching LogLuv_Decode() function. So, for completeness, and with credits to MJP and remigus, here’s the full Encode/Decode pair (in HLSL):

    // M matrix, for encoding
    const static float3x3 M = float3x3(
        0.2209, 0.3390, 0.4184,
        0.1138, 0.6780, 0.7319,
        0.0102, 0.1130, 0.2969);
    
    // Inverse M matrix, for decoding
    const static float3x3 InverseM = float3x3(
        6.0014, -2.7008, -1.7996,
       -1.3320,  3.1029, -5.7721,
        0.3008, -1.0882,  5.6268);
    
    float4 LogLuvEncode(in float3 vRGB)  {		 
        float4 vResult; 
        float3 Xp_Y_XYZp = mul(vRGB, M);
        Xp_Y_XYZp = max(Xp_Y_XYZp, float3(1e-6, 1e-6, 1e-6));
        vResult.xy = Xp_Y_XYZp.xy / Xp_Y_XYZp.z;
        float Le = 2 * log2(Xp_Y_XYZp.y) + 127;
        vResult.w = frac(Le);
        vResult.z = (Le - (floor(vResult.w*255.0f))/255.0f)/255.0f;
        return vResult;
    }
    
    float3 LogLuvDecode(in float4 vLogLuv) {
        float Le = vLogLuv.z * 255 + vLogLuv.w;
        float3 Xp_Y_XYZp;
        Xp_Y_XYZp.y = exp2((Le - 127) / 2);
        Xp_Y_XYZp.z = Xp_Y_XYZp.y / vLogLuv.y;
        Xp_Y_XYZp.x = vLogLuv.x * Xp_Y_XYZp.z;
        float3 vRGB = mul(Xp_Y_XYZp, InverseM);
        return max(vRGB, 0);
    }
    

    I hope people who visit this post (and according to the stats, its a fairly popular post) will find this info useful. Make sure to visit the gamedev.net thread too (as linked above). Rim van Wersch (remigus) also posted a simple test project that you might want to check out.

  5. I am not very familiar with display formats, so feel free to correct me. The CCIR became the ITU-R, so I presume that “CCIR 709” is the same as “ITU-R Recommendation BT.709” (a.k.a. “Rec. 709” or “BT.709”)? If this is the case, the used RGB-to-XYZ conversion (cfr. paper by Gregory Ward) is different from the one used in Real-Time Rendering (3th edition), pbrt-v2, pbrt-v3 and Mitsuba (which all use the same coefficients)?

    pbrt-v2’s RGB to XYZ conversion (“column-major” matrix):

    inline void RGBToXYZ(const float rgb[3], float xyz[3]) {
    xyz[0] = 0.412453f*rgb[0] + 0.357580f*rgb[1] + 0.180423f*rgb[2];
    xyz[1] = 0.212671f*rgb[0] + 0.715160f*rgb[1] + 0.072169f*rgb[2];
    xyz[2] = 0.019334f*rgb[0] + 0.119193f*rgb[1] + 0.950227f*rgb[2];
    }

    pbrt-v3’s RGB to XYZ conversion (“column-major” matrix):

    inline void RGBToXYZ(const Float rgb[3], Float xyz[3]) {
    xyz[0] = 0.412453f * rgb[0] + 0.357580f * rgb[1] + 0.180423f * rgb[2];
    xyz[1] = 0.212671f * rgb[0] + 0.715160f * rgb[1] + 0.072169f * rgb[2];
    xyz[2] = 0.019334f * rgb[0] + 0.119193f * rgb[1] + 0.950227f * rgb[2];
    }

    Mitsuba’s RGB to XYZ conversion (“column-major” matrix):

    void Spectrum::toXYZ(Float &x, Float &y, Float &z) const {
    /* Convert ITU-R Rec. BT.709 linear RGB to XYZ tristimulus values */
    x = s[0] * 0.412453f + s[1] * 0.357580f + s[2] * 0.180423f;
    y = s[0] * 0.212671f + s[1] * 0.715160f + s[2] * 0.072169f;
    z = s[0] * 0.019334f + s[1] * 0.119193f + s[2] * 0.950227f;
    }

    “Gregory Ward Larson: The LogLuv Encoding for Full Gamut, High Dynamic Range Images”
    Using the standard CCIR 709 RGB primaries for computer displays and a neutral white point
    RGB to XYZ conversion (“row-major” matrix):

    M = [0.497,0.339,0.164;
    0.256,0.678,0.066;
    0.023,0.113,0.864]

    [X,Y,Z] = [R,G,B] * M

    Thanks in advance!

Leave a Reply