Converting RGB to LogLuv in a fragment shader

There are a few places on the net that I visit pretty regularly. One such place is the forums at Beyond3D where several cool developers hang around and post good stuff. Some of my favourite Beyond3D contributors are DeanA (Dean Ashton, a colleague over at SCEE ATG, with a never-updated blog), and DeanoC (Dean Calver) and nAo (Marco Salvi) who happen to be lead and graphics programmers, respectively, on Heavenly Sword over at Ninja Theory. All fellow PS3 programmers in other words.

Of note in recent times is Dean C talking about the Atomic Cache facility of the SPEs both on his blog and on the forums. Though here I’m going to talk about something both Dean C and Marco have talked a lot about in the past, namely how they use LogLuv encoding as part of their HDR solution. (Lots of funny speculation ensued on the forums as they didn’t quite give enough info for people to connect the dots, even to the extent where websites felt they needed to conduct interviews on the topic.)

As most (good) developers do, when respected people talk about a piece of tech of theirs, which you are not currently employing, you investigate. So, back then, I decided to look into (amongst other things) what it would take to encode RGB into LogLuv in a pixel shader. Here’s what I found.

The canonical reference for LogLuv is Greg Ward’s paper The LogLuv Encoding for Full Gamut, High Dynamic Range Images (see also the book High Dynamic Range Imaging which Greg coauthored). His paper talks about both 24-bit and 32-bit LogLuv encodings, but here we’re only interested in the 32-bit one, which uses 16 bits for luminance information and 16 bits for chrominance. Figure 1 of Ward’s paper gives the representation as:

bit     31: flag negative luminances (1 bit)
bit 30..16: log encoding of luminance (15 bits)
bit  15..8: u coordinate (8 bits)
bit   7..0: v coordinate (8 bits)

The pertinent information on converting from [R,G,B] to [Le,Ue,Ve] (i.e. LogLuv) is spread over multiple sections of Ward’s paper, so to save you some work, I’ll summarize. The conversion is done as follows:

[X,Y,Z] = [R,G,B]*M
x = X/(X+Y+Z)
y = Y/(X+Y+Z)
u'=4*x/(-2*x+12*y+3)
v'=9*y/(-2*x+12*y+3)
Ue = floor(410*u')
Ve = floor(410*v')
Le = floor(256*(log2(Y)+64))

where M is the 3×3 matrix [0.497,0.339,0.164; 0.256,0.678,0.066; 0.023,0.113,0.864].

To explain the “magic” constants in Ward’s math, we note that we support Y in the range (5.4*10-20, 1.8*1019), because log2() of these values give the range (-64.0,64.0), which for Ward’s Le calculation brings Le into the desired integer range [0, 215-1] (a 15-bit integer).

Ward states the gamut of perceivable u and v values lies in the range [0, 0.62] and he therefore scales the u and v values by 410 to result in an integer [0, 255]. For a fragment shader we need Le, Ue, and Ve to lie in the [0, 1] range, as the hardware will automatically turn floats in that range into a [0, 255] integer (clamped). However, we will in the end be splitting Le over two such integers, so we’ll turn Le into a float of range [0,256). Making the appropriate changes turns the math into:

[X,Y,Z] = [R,G,B]*M
x = X/(X+Y+Z)
y = Y/(X+Y+Z)
u'= 4*x/(-2*x+12*y+3)
v'= 9*y/(-2*x+12*y+3)
Ue = (1/0.62)*u'
Ve = (1/0.62)*v'
Le = 2*(log2(Y)+64)

There are quite a few optimizations we can do at this point. In an attempt at being educational, I’ll apply them one by one. First, substitute the expressions for x and y in the expressions for u’ and v’ and simplify, to obtain this calculation:

[X,Y,Z] = [R,G,B]*M
u' = 4*X/(X+15*Y+3*Z)
v' = 9*Y/(X+15*Y+3*Z)
Ue = (1/0.62)*u'
Ve = (1/0.62)*v'
Le = 2*(log2(Y)+64)

Next we fold the computations for U’, v’, Ue, and Ve:

[X,Y,Z] = [R,G,B]*M
Ue = (4/0.62)*X/(X+15*Y+3*Z)
Ve = (9/0.62)*Y/(X+15*Y+3*Z)
Le = 2*(log2(Y)+64)

Here we note that it is possible to fold the dot product dot([1,15,3], [X,Y,Z]) into the vector-matrix multiplication so that it ends up in the Z component of the result (which I’ll call XYZ). The new math is then

[X,Y,XYZ] = [R,G,B]*M'
Ue = (4/0.62)*X/XYZ
Ve = (9/0.62)*Y/XYZ
Le = 2*(log2(Y)+64)

where M’ = M * [1,0,1; 0,1,15; 0,0,3]. We can now also fold the (4/0.62) and (9/0.62) constants into the matrix multiply:

[X',Y,XYZ'] = [R,G,B]*M'
Ue = X'/XYZ'
Ve = Y /XYZ'
Le = 2*(log2(Y)+64)

The new matrix is M’ = M * [1,0,1; 0,1,15; 0,0,3] * [4/9,0,0; 0,1,0; 0,0,0.62/9]. At this point, there’s hardly any math left and no(?) optimizations left to apply, so now it’s time to code. However, turning this into production code we have two potential problem sources:

  1. Division by zero.
  2. log2() arguments less-than or equal to zero.

To avoid visible glitches both issues must be handled, which we can do by strategically adding in some small epsilons to force values to be strictly positive where it matters. When all that is done, we get the following code (Cg code, of course) as a result:

const static float3x3 m = float3x3(
    0.2209, 0.3390, 0.4184,
    0.1138, 0.6780, 0.7319,
    0.0102, 0.1130, 0.2969);

inline float4 PS3_LogLuv_Encode(in float3 rgb) {
    float4 res; // float4(Ue, Ve, LeHigh, LeLow)
    float3 Xp_Y_XYZp = mul(rgb,m);
    Xp_Y_XYZp = max(Xp_Y_XYZp, float3(1e-6, 1e-6, 1e-6));
    res.xy = Xp_Y_XYZp.xy / Xp_Y_XYZp.z;
    float Le = 2 * log2(Xp_Y_XYZp.y) + 128;
    res.z = Le / 256;
    res.w = frac(Le);
    return res;
}

Running this code through NVShaderPerf gives (from memory) 5 cycles for 9 instructions. When inserted at the end of a longer shader where there is plenty of room for instruction pairing, the total overhead for the LogLuv conversion will be less than this, perhaps around 3 cycles. I haven’t checked with Marco to see how this compares to what he’s doing, but it matches the cycle numbers he mentioned in various posts so it’ll be pretty close.

As Marco discusses on e.g. Dean’s blog you might want to adjust this representation a little to avoid getting carry problems during interpolation, which I haven’t done here but left as an exercise to the reader. Another exercise is to do the conversion from LogLuv back to RGB. Enjoy!

Similar Posts:

Share and Enjoy:
  • Digg
  • del.icio.us
  • Facebook
  • Reddit
  • Slashdot
  • StumbleUpon
  • Technorati
  • LinkedIn

11 Comments »

  1. Deano’s Home From Home » Almost there said,

    July 10, 2007 @ 6:31 am

    […] Ericson has made a cool post describing LogLuv HDR (which Marco did for HS and we christianed NAO32) on PS3. So go read if you […]

  2. nAo said,

    July 13, 2007 @ 11:35 pm

    Well done Christer!
    Your implementation is actually faster than mine as I did not fold the dot product into the matrix multiplication and I also had to split the log luminance in a more convoluted way as using the same transformation you used I was not being able to perfectly go back to a RGB colour without losing a tiny bit of intensity.
    BTW..do you know any game that is making use of the same base technique?
    The only one I’m aware of is Heavenly Sword..

  3. christer said,

    July 14, 2007 @ 12:10 am

    Hi Marco, good to see you here! Back when I looked at this I only worked out how to optimize the RGB->LogLuv code as per above, not the other way around, so I never looked into the precision issues but I did scribble in my notes that splitting the luminance value into two bytes could be an issue and that you might have to do something else. A possible option could be to write it this way:

    float4 res;
    float3 v = mul(rgb, m);
    v = max(v, float3(1e-6, 1e-6, 1e-6));
    float k = 2*log2(v.y) + 128;
    res.xy = unpack_4ubyte(k).xy;
    res.zw = v.xy / v.z;
    return res;
    

    Although this might not work so well either; I can’t remember if the pack/unpack instructions were hosed or not. (I’m not the one doing the shader coding.) I recall the unpack approach produced worse code too, but this was with a pretty old Cg compiler, so who knows.

    Feel free to steal the dot product folding trick! (If you haven’t already.)

    I don’t know of any other games using LogLuv at this point. I considered it for our engine based on your posts about your approach, but we haven’t committed to how to deal with HDR yet so the ball is still in the air. (As you know quite well, there are several possible options and which is best depends a lot on what other choices you’ve made - and we haven’t made all our choices yet.)

    BTW, I don’t think it’s a secret to mention publicly that I’ve seen builds of Heavenly Sword and the graphics are absolutely gorgeous. Kudos to you and the team!

  4. nAo said,

    July 14, 2007 @ 10:50 am

    Hi Christer,

    You’re right, there are many different options when it comes down to render HDR images.
    Since we can’t really do alpha blending using this color space, if a game really needs to blend in a HDR color space then this technique only make sense only if used in conjunction with multisampling, writing a custom AA resolve filter that downsample a LogLuv image to a FP16 image where we can do HDR blending
    (Though I think that HDR blending is overrated, we can live without it just blending on a RGBA8 render target, tone mapping in our alpha blending pass pixel shaders, even better if we do it using exposure computed in the previous frame read back with the CPU so that we can avoid a texture read in our pxel shaders and set exposure as a pixel shader constant.)
    It’s also worth to notice that the vast majority of games probably don’t fully use the full FP16 range bur rather a narrower range, in this case we can drop the logarithm and just store a linear luminance scaled to just fit the luminance range we want to support, even in this case I doubt we can tell the difference, can we? :)
    BTW..the code I used to encode luminance is really ugly but I did not have much time to spend on it and it was the only code that did the job (perfect LogLuv -> RGB/FP64 conversion) as when I (sneakily!) introduced it the game already had a ton of content developed using FP16 and I did not want the artists/art director to scream in pain cause our images were slighty darker (!!):

    // pack the luminance into 2 channels..
    float Le_LSBs = frac(Le);
    float Le_MSBs = (Le - (floor(Le_LSBs*255.0f))/255.0f)/255.0f;
    
    // unpack log luminance
    float Le = Le_MSBs + Le_LSBs / 255.0f;
    

    Thanks for your compliments Christer, I’m looking forward to what you and your team can do on PS3!

    Marco

  5. realtimecollisiondetection.net - the blog » I like spilled beans! said,

    September 2, 2007 @ 8:33 pm

    […] the RGN values in the fragment shader. I guess it would also be possible to encode textures using LogLuv, although that seems a bit […]

  6. Another day, another HDR rendering trick and some hope for the future. « Pixels, Too Many.. said,

    July 5, 2008 @ 9:43 am

    […] but even without re-introducing a floating point buffer (or some funky color space technique, see Christer Ericson’s blog entry about some of the work I did on Heavenly Sword and his very clever take on it) we can still […]

  7. christer said,

    July 12, 2008 @ 12:25 am

    Just to round out the above blog post and its comments, I thought I’d mention that users MJP and remigius over at gamedev.net incorporated Marco’s packing code with my code snippet and also worked out the details of the matching LogLuv_Decode() function. So, for completeness, and with credits to MJP and remigus, here’s the full Encode/Decode pair (in HLSL):

    // M matrix, for encoding
    const static float3x3 M = float3x3(
        0.2209, 0.3390, 0.4184,
        0.1138, 0.6780, 0.7319,
        0.0102, 0.1130, 0.2969);
    
    // Inverse M matrix, for decoding
    const static float3x3 InverseM = float3x3(
        6.0014, -2.7008, -1.7996,
       -1.3320,  3.1029, -5.7721,
        0.3008, -1.0882,  5.6268);
    
    float4 LogLuvEncode(in float3 vRGB)  {
        float4 vResult;
        float3 Xp_Y_XYZp = mul(vRGB, M);
        Xp_Y_XYZp = max(Xp_Y_XYZp, float3(1e-6, 1e-6, 1e-6));
        vResult.xy = Xp_Y_XYZp.xy / Xp_Y_XYZp.z;
        float Le = 2 * log2(Xp_Y_XYZp.y) + 127;
        vResult.w = frac(Le);
        vResult.z = (Le - (floor(vResult.w*255.0f))/255.0f)/255.0f;
        return vResult;
    }
    
    float3 LogLuvDecode(in float4 vLogLuv) {
        float Le = vLogLuv.z * 255 + vLogLuv.w;
        float3 Xp_Y_XYZp;
        Xp_Y_XYZp.y = exp2((Le - 127) / 2);
        Xp_Y_XYZp.z = Xp_Y_XYZp.y / vLogLuv.y;
        Xp_Y_XYZp.x = vLogLuv.x * Xp_Y_XYZp.z;
        float3 vRGB = mul(Xp_Y_XYZp, InverseM);
        return max(vRGB, 0);
    }
    

    I hope people who visit this post (and according to the stats, its a fairly popular post) will find this info useful. Make sure to visit the gamedev.net thread too (as linked above). Rim van Wersch (remigus) also posted a simple test project that you might want to check out.

  8. XNA On The 360, Part 2: HDR said,

    August 14, 2008 @ 1:31 pm

    […] in many other PS3 games, as well. My actual shader implementation was helped along quite a bit by Christer Ericson's blog post, which described how to derive optimized shader code for encoding RGB into the LogLuv format.  […]

  9. realtimecollisiondetection.net - the blog » Catching up (part 2) said,

    June 8, 2009 @ 1:51 am

    […] Karis links to my LogLUV post while pointing out that there’s another kid in town: RGBM color encoding. In fact, we are […]

  10. Article: HDR Rendering with XNA « Sgt. Conker said,

    January 1, 2010 @ 10:59 am

    […] NOTE: credit for the optimized encoding function goes to Christer Ericcson, who posted it on his his blog. […]

  11. Gamma correct and HDR rendering in a 32 bits buffer | Light is beautiful said,

    May 26, 2013 @ 11:42 pm

    […] Since we really want a wide range of light intensity, a different approach is to use a different color space. Several people mentioned LogLUV, which I hear gives good results, at the expense of a high instruction cost for both packing and unpacking. Here is a detailed explanation. […]

RSS feed for comments on this post · TrackBack URI

Leave a Comment

You must be logged in to post a comment.