Springs on the GPU

May 12th, 2009


A few months ago, deciding I wanted to get to know my GPU a bit better, I took the plunge into the strange and wonderful world of GPGPU. My initial goal was to implement a GPU-based simulation of a spring mesh, trying to improve the performance of the 2D pressure-based soft system I’ve been working on. Having suffered through this process once, I wanted to provide a description of the implementation and thought process that goes into general GPGPU programming.

The original (CPU-based) implementation I wrote was based on the paper “How To Implement a Pressure Soft Body Model” by Maciej Matyka (link). I had wanted to transfer this implementation to the GPU from the beginning, but never felt entirely comfortable with the GPGPU concepts. I had read short papers describing general GPU-based mass-spring systems (e.g. here), but the addition of pressure to the simulation required a different approach.

So, how do we implement a 2D pressure-based soft body simulation on the GPU? Time for a little setup.

To start, I’d recommend the FrameBufferObject (FBO) class and other easily digestible GPGPU goodies (like a simple CUDA example) on the “GPGPU Programming Resources” page on sourceforge. There’s even a simple GLSL-based edge detection shader hiding in there as well. Basically, what we need is a way to get our data (forces, spring lengths, etc.) into a form that can be used by the GPU. In addition, we want to keep the data on the GPU for as long as we possibly can, because getting data back from the GPU is expensive. This is where the FrameBufferObject class comes in. It’s a nice wrapper around the OpenGL EXT_framebuffer_object extension to help us keep our data on the GPU (as OpenGL textures).

From the paper, the sequence of calculations we need to do (per-frame) is as follows:

  1. From the current positions of each spring node, calculate the forces exerted upon each node.
  2. Using the Gauss theorem, calculate the area of the region based on the node positions.
  3. Update the forces with the pressure values.
  4. Calculate the updated positions.

What we need to do is figure out how to get our data into the GPU so that we can leverage its parallelism. The calculations mentioned in (1-4) above will be done through a combination of shaders. Conceptually, what we do is to put some set of data into a texture, draw it, then figure out what to do with the results. The act of drawing is what will get our shader to kick in and do its job. (Note: I would have preferred to do the implementation using Cg as opposed to GLSL, but apparently the current OS X implementation of Cg doesn’t support geometry shaders on my card (8600M GT)).

The first thing we need to do is to create a FrameBufferObject:

myFBO = [[FrameBufferObject alloc] init];

The nifty thing about these things is that they allow us to draw to places that aren’t the screen. For example, If we want to draw a bunch of things directly into a texture, we can do that with a FBO. When we draw things into a texture, the results reside on the GPU. This data can then be used by other shaders to perform additional calculations without having to read the back from the GPU (via glReadPixels, for example).

Now that we’ve created our FBO, we need to “attach” things to it. In this case, we’ll attach a bunch of textures that we’ll want to use as render targets for each step of the simulation. We first bind the FBO and call “AttachTexture” for each texture we’d like to add. In addition to the texture, we must also specify the “attachment” to which each texture is bound. Since we’ll be using color textures for our calculations, we’ll be using the GL_COLOR_ATTACHMENT* types. First, we’ll need to create a bunch of textures. We’ll need five textures, one for the force calculation, one for the area calculation, and three for the node position calculations.


GLuint iSize = 512;
GLuint forceTex;
glGenTextures(1, &forceTex);
glBindTexture(GL_TEXTURE_2D, forceTex);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_S, GL_CLAMP);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_T, GL_CLAMP);
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA32F_ARB, iSize, _iSize, 0, GL_RGBA, GL_FLOAT, 0);

The force texture doesn’t need to be filled with anything, as we’ll fill it at the start of each frame based on the current node positions. To seed the initial node position array, Let’s fill the texture containing the node positions with random values, and leave the rest empty.


-(GLuint) create_empty_texture:(GLenum)format withWidth:(int)w withHeight:(int)h
{
    float *data = (float*)malloc(sizeof(float) * w * h * 4);
    float *ptr = data;
    for(int i=0; i<w*h; i++) {
        *ptr++ = 0.0;
        *ptr++ = 0.0;
        *ptr++ = 0.0;
        *ptr++ = 0.0;
    }
    GLuint texid;
    glGenTextures(1, &texid);
    GLenum target = GL_TEXTURE_2D;
    glBindTexture(target, texid);
    glTexParameteri(target, GL_TEXTURE_MAG_FILTER, GL_LINEAR);
    glTexParameteri(target, GL_TEXTURE_MIN_FILTER, GL_LINEAR);
    glTexParameteri(target, GL_TEXTURE_WRAP_S, GL_REPEAT);
    glTexParameteri(target, GL_TEXTURE_WRAP_T, GL_REPEAT);
    glTexParameteri(target, GL_TEXTURE_WRAP_R, GL_REPEAT);
    glPixelStorei(GL_UNPACK_ALIGNMENT, 1);
    glTexImage2D(target, 0, format, w, h, 0, GL_RGBA, GL_FLOAT, data);
    free(data);

    return texid;
}

-(GLuint) create_noise_texture:(GLenum)format withWidth:(int)w withHeight:(int)h
{
    float *data = (float*)malloc(sizeof(float) * w * h * 4);
    float *ptr = data;
    for(int i=0; i<w*h; i++) {
        *ptr++ = (rand() / (float) RAND_MAX) * 2.0-1.0;
        *ptr++ = (rand() / (float) RAND_MAX) * 2.0-1.0;
        *ptr++ = 0.0;
        *ptr++ = 0.0;
    }

    GLuint texid;
    glGenTextures(1, &texid);
    GLenum target = GL_TEXTURE_2D;
    glBindTexture(target, texid);
    glTexParameteri(target, GL_TEXTURE_MAG_FILTER, GL_LINEAR);
    glTexParameteri(target, GL_TEXTURE_MIN_FILTER, GL_LINEAR);
    glTexParameteri(target, GL_TEXTURE_WRAP_S, GL_REPEAT);
    glTexParameteri(target, GL_TEXTURE_WRAP_T, GL_REPEAT);
    glTexParameteri(target, GL_TEXTURE_WRAP_R, GL_REPEAT);
    glPixelStorei(GL_UNPACK_ALIGNMENT, 1);
    glTexImage2D(target, 0, format, w, h, 0, GL_RGBA, GL_FLOAT, data);
    free(data);

    return texid;
}

GLuint posTex[3];

posTex[0] = [self create_noise_texture:GL_RGBA32F_ARB withWidth:512 withHeight:512];
posTex[1] = [self create_empty_texture:GL_RGBA32F_ARB withWidth:512 withHeight:512];
posTex[2] = [self create_empty_texture:GL_RGBA32F_ARB withWidth:512 withHeight:512];

Now that the textures are created, we can attach them to the appropriate positions on our FBO:


GLint attachments[3];

attachments[0] = GL_COLOR_ATTACHMENT2_EXT;
attachments[1] = GL_COLOR_ATTACHMENT3_EXT;
attachments[2] = GL_COLOR_ATTACHMENT4_EXT;

[myFBO Bind];
[myFBO AttachTexture:GL_TEXTURE_2D _texId:forceTex _attachment:GL_COLOR_ATTACHMENT0_EXT _mipLevel:0 _zSlice:0];
[myFBO AttachTexture:GL_TEXTURE_2D _texId:areaTex _attachment:GL_COLOR_ATTACHMENT1_EXT _mipLevel:0 _zSlice:0];
[myFBO AttachTexture:GL_TEXTURE_2D _texId:posTex[0] _attachment:attachments[0] _mipLevel:0 _zSlice:0];
[myFBO AttachTexture:GL_TEXTURE_2D _texId:posTex[1] _attachment:attachments[1] _mipLevel:0 _zSlice:0];
[myFBO AttachTexture:GL_TEXTURE_2D _texId:posTex[2] _attachment:attachments[2] _mipLevel:0 _zSlice:0];
[myFBO IsValid];
[myFBO Disable];

The reason for maintaining a separate attachment array for attachments 2-4 (and an array for the position textures) will become clear soon.

Now that our FBO is initialized, we need to compile all of our shaders. For brevity, I’ll leave out most of the details, and show how to compile our first force calculation shader:


//Shader manager singleton. Get the instance.
ShaderManagerGLSL *sm = [ShaderManagerGLSL sharedManager];

GLhandleARB forceProgramObject;
GLhandleARB edgeUpdateShader;
GLint posSamplerLoc;
GLint defaultLenLoc;
GLint springConstLoc;

//Setup the force calculation shader.
forceProgramObject = [sm createProgram];
edgeUpdateShader = [sm createFragmentShader];
[sm loadFragmentShader:@"forceCalculation" withShader:edgeUpdateShader];
[sm attachShader:edgeUpdateShader toProgram:forceProgramObject]
[sm linkProgram:forceProgramObject];
posSamplerLoc = [sm getParamLocation:forceProgramObject withName:@"TexUnit"];
springConstLoc  = [sm getParamLocation:forceProgramObject withName:@"springConst"];
defaultLenLoc  = [sm getParamLocation:forceProgramObject withName:@"defaultLength"];

The rest of the shaders are similar, but before I continue, I should point out one additional thing we need to do for our geometry shaders before we can use them later:


glProgramParameteriEXT((GLuint)topLineNormalProgram, GL_GEOMETRY_INPUT_TYPE_EXT , GL_LINES);
glProgramParameteriEXT((GLuint)topLineNormalProgram, GL_GEOMETRY_OUTPUT_TYPE_EXT , GL_POINTS);
glProgramParameteriEXT((GLuint)topLineNormalProgram, GL_GEOMETRY_VERTICES_OUT_EXT, 128);
glGetIntegerv(GL_MAX_GEOMETRY_OUTPUT_VERTICES_EXT, &temp);

We need to specify what types of input, output, and number of vertices our geometry shader will use. A little irritating to have to do it this way, but that’s how it goes. Above, I’m using a little Cocoa-based GLSL shader manager class that I wrote. You can get it here if you’d like to use it. The remainder of the shaders can be compiled and linked in a similar manner. Of course, I haven’t yet shown you what the shaders do, but that’s coming next.

Before we can continue on to the render loop, we need set up the perspective:


-(void) reshape
{
    NSSize bound = [self bounds].size;
    glViewport(0, 0, (GLfloat)bound.width, (GLfloat)bound.height);

    // Create a 1:1 pixel to texel mapping.
    glMatrixMode(GL_PROJECTION);
    glLoadIdentity();
    gluOrtho2D(-1, 1, -1, 1);
    glMatrixMode(GL_MODELVIEW);
    glLoadIdentity();
}

We create an orthographic projection so that we can read pixel values from bound textures in our shader from known locations. Again, I hope this will become more clear when we start aplying shaders.

Now we’re ready to see how the main render loop works. The very first thing we need to do is to bind our FBO:


[myFBO Bind];

This allows us to draw directly into the textures we’ve attached to it. The first thing we want to do is to read the current positions of the nodes and calculate forces based on those node positions. Because we would like these force calculations to be used by another shader later on, we want to put these force calculations into a texture, which is why we’re using a FBO. Because we’ve attached the “force” texture to the first color attachment slot (GL_COLOR_ATTACHMENT0_EXT) of the FBO, here’s how we specify that whatever we render will go into that force texture:


glDrawBuffer(GL_COLOR_ATTACHMENT0_EXT);

//Wipe the texture clean before rendering.
glClearColor(0.0, 0.0, 0.0, 0.0);
glClear(GL_COLOR_BUFFER_BIT);

//The first element of the position texture array contains the current point set.
//Here, we tell the shader how to access the position values from the shader.
// (e.g. set the active texture to GL_TEXTURE0, bind the texture, and tell the shader
//         that sampler location is ‘0′).
glActiveTexture(GL_TEXTURE0);
glBindTexture(GL_TEXTURE_2D, posTex[0]);

glUseProgramObjectARB(forceProgramObject);

//Set the uniforms.
glUniform1iARB(posSamplerLoc, 0);
glUniform1fARB(defaultLenLoc, 10.5);
glUniform1fARB(springConstLoc, 100.0);

//Draw a textured quad to kick in the shader.
glBegin(GL_QUADS);
{
    glTexCoord2f(0, 0); glVertex3f(-1, -1, -0.5f);
    glTexCoord2f(1, 0); glVertex3f( 1, -1, -0.5f);
    glTexCoord2f(1, 1); glVertex3f( 1,  1, -0.5f);
    glTexCoord2f(0, 1); glVertex3f(-1,  1, -0.5f);
}
glEnd();

glUseProgramObjectARB(0);

The force calculation shader can be found here. All this shader does is reads values from the position texture to find the x,y positions of its 8 neighbors. The farther the distance, the larger the resulting force. These forces are summed up and a new pixel value is emitted (i.e. drawn into the force texture we’ve bound to the FBO we’re using) containing the force values in the x and y directions in the red and green pixel positions. Now that we have the forces, we need to calculate the pressures. Before we can do that , we need to calculate the area.

In order to calculate the area, we need to be take each position of the nodes along the boundary of our mesh, grab its neighbor, and perform the area calculation based on these two points. The total area is the sum of all of these values. The easiest way to do this is by using a technique described in chapter 41 (Using the Geometry Shader for Compact and Variable-Length GPU Feedback) of GPU Gems 3 that shows how to perform calculations on the geometry shader where the output length is not known in advance. While I don’t need exactly this much freedom, the general approach works well in this case. Because we need to perform calculations on each side of the mesh, what we’ll do is to render a few primitives (lines, in this case) along each edge (top, left, bottom, right). The shaders will read the relevant pixel values from the bound position texture, perform the calculation, and draw each result to the same location. So, we draw one line, and our geometry shader takes this line and emits a collection of points (all at the same location, with the area calculation encoded in the color value). By appropriately blending the results, we can calculate the area without having bring things back to the CPU.

In the current example, we’re simulating a mesh of 512×512 elements, so each side is 512 pixels. We want to render a single line (2 points) along each edge, and allow the geometry shader to emit 512 new points from these pieces of geometry at the same location, encoding the area calculation in the pixel color value. First, some setup (remember that we created a texture called “areaTex” previously, and bound it to GL_COLOR_ATTACHMENT1_EXT):


//The area texture is bound to color attachment 1.
//We want to write into that now.
glDrawBuffer(GL_COLOR_ATTACHMENT1_EXT);

//When se sample neighboring locations, we use this offset.
GLfloat pixelVertexStep = 1.0/512.0;

//Calculate the area, but first, wipe it clean.
glClearColor(0.0, 0.0, 0.0, 0.0);
glClear(GL_COLOR_BUFFER_BIT);

glEnable(GL_BLEND);
glBlendFunc(GL_ONE, GL_ONE);

//Read from the current position texture.
glActiveTexture(GL_TEXTURE0);
glBindTexture(GL_TEXTURE_2D, posTex[0]);

glUseProgramObjectARB(bottomLineNormalProgram);
glUniform1iARB(bottomSamplerLoc, 0);

Then, we render our lines (only the bottom line case is shown here), using the geometry shader found here.

glBegin(GL_LINES);
{
    //Bottom Line.
    //Do it in 128 step chunks. Better for parallelization
    glVertex3f(-1.0 + pixelVertexStep,  -1 + pixelVertexStep, 0.00f);
    glVertex3f(-0.5,                    -1 + pixelVertexStep, 0.25f);

    glVertex3f(-0.5 + pixelVertexStep , -1 + pixelVertexStep, 0.25f);
    glVertex3f( 0.0,                    -1 + pixelVertexStep, 0.50f);

    glVertex3f( 0.0 + pixelVertexStep,  -1 + pixelVertexStep, 0.50f);
    glVertex3f( 0.5,                    -1 + pixelVertexStep, 0.75f);

    glVertex3f( 0.5 + pixelVertexStep,  -1 + pixelVertexStep, 0.75f);
    glVertex3f( 1.0,                    -1 + pixelVertexStep, 1.00f);
}
glEnd();

Instead of drawing a single line and emitting 512 points, I’ve done each side in pieces of four (128 pixel steps). By doing things this way, we can better leverage the parallelization of the GPU so that each of these four segments can be rendered in concurrently. (Note that in order to tell the geometry shader which segment it is currently drawing, I’ve provided an additional offset hidden in the z value of each vertex).

Once we’re done calculating the area, (the result was rendered to to a single location in the target texture with each side rendering one color value (e.g. bottom = red, top = blue)), we can do a bit of debugging by doing a glReadPixels to make sure everything looks okay:


float areaPixel[4];

//Read from the texture we just wrote to.
glReadBuffer(GL_COLOR_ATTACHMENT1_EXT);

//Read the area pixel.
glReadPixels(0, 0, 1, 1, GL_RGBA, GL_FLOAT, areaPixel);
float area = areaPixel[0] + areaPixel[1] + areaPixel[2] + areaPixel[3];

I found these types of sanity checks to be particularly useful when debugging these sorts of things.

Now that we have the area, we need to update our existing force texture with the pressure calculation that we can now perform (only the bottom line shown) using this fragment shader.

//Update the existing force texture.
glDrawBuffer(GL_COLOR_ATTACHMENT0_EXT);

glEnable(GL_BLEND);
glBlendFunc(GL_ONE, GL_ONE);

glActiveTexture(GL_TEXTURE0);
glBindTexture(GL_TEXTURE_2D, posTex[0]);

glUseProgramObjectARB(bottomNormalProgram);
glUniform1iARB(bottomNormalSamplerLoc, 0);
glUniform1fARB(bottomAreaLoc, area);

glBegin(GL_LINES);
{
    //Bottom Line.
    glTexCoord2f(0, 0); glVertex3f(-1 + pixelVertexStep, -1 + pixelVertexStep, -0.5f);
    glTexCoord2f(1, 0); glVertex3f( 1 - pixelVertexStep, -1 + pixelVertexStep, -0.5f);
}
glEnd();

Ok, so we’ve got an updated force texture (if you draw this texture, you should see something resembling the image at the beginning of this post). All we need to do is generate the new node positions by integrating. Here’s the integration fragment shader. For the integration, we’ve used verlet integration, which requires that we keep not only the previous position around, but the one before that as well. (A nice overview of possible integration strategies can be found here).

The last and final step is to play a bit of FBO “ping pong” by swapping the attachment indices around so that the current position texture becomes the previous, and the oldest becomes the slot that the next step will render into.


GLint attachpt = attachments[0];
attachments[0] = attachments[1];
attachments[1] = attachments[2];
attachments[2] = attachpt;

GLuint curTexNum = posTex[0];
posTex[0] = posTex[1];
posTex[1] = posTex[2];
posTex[2] = curTexNum;

Sigh. When I initially set out to write up some of this stuff I had thought I would be able to make it clear enough that those individuals without any GPGPU experience could perhaps become more comfortable venturing off on their own. I’m uncertain whether or not I’ve succeeded in any way, but I hope that helped clarify at least something.

Delicious Visualization

May 6th, 2009

Continuing my experiments using Sunflow for web data-based visualizations, I recently finished one using data from my delicious bookmark collection. By popular demand, here’s a brief run-down of the process I used to make it.

  • Collect a set of delicious bookmark URLs with pydelicious.
  • Using PyObjC (should already be installed on OS X), load these URLs into WebViews (one at a time) and save the resulting view content as a jpeg. (This can take ~20s/URL, so you might be waiting a while).
  • From the resulting collection of images, generate a header file to be used by ODE that describes the number and sizes of these images.
  • Run an ODE gravity simulation using the previously specified sizes for the falling objects.
  • When the simulation is complete, dump the positions and rotations of the objects to an external file.
  • Using another Python script, read the image files and corresponding position and rotation information generated from ODE to procedurally generate a Sunflow scene file containing the object positions, shaders, and camera information.
  • Open the scene file in Sunflow, and render!

None of the above steps are prohibitively difficult in and of themselves, it’s really all about creating a streamlined workflow.

Twitter + Sunflow

April 11th, 2009

My time over the past few weekends has mostly been occupied with Sunflow, and getting to know it better. One thing I wanted to figure out was how to better incorporate certain types of web data into my renderings. Twitter, being a common visualization data source, seemed like a good place to start. That, and I’m particularly fond of the consistent size of the user profile images.

I initially thought that Twitter visualizations should be trivially easy. Surely there should be some nice Python bindings to help me get the data I need. I was only half right. I found a few Python modules for twitter, and not a single one of them worked as advertised.  The most common module, python-twitter, well, sucks. I do feel a little guilty saying that, as half of the reason that python-twitter sucks is that Twitter itself sucks (I’ll get to that later). My first experiments with the module went fine, allowing me to do my first twitter visualization, but once I needed to collect more than a handful of data, it didn’t work at all. Frustrated with that approach, I tried to get the Ruby version working, which I couldn’t even get to install. The only version that worked consistently was a php version that I found. After spending about 15 minutes with that, however, it dawned on me that I was using php, and I quickly abandoned that direction. Again faced with a broken Python version, I decided my only option was to write everything myself, stripping the necessary pieces directly out of python-twitter, and creating my own single purpose collection script. Running my script worked where python-twitter had failed, so I was certainly on the right track. Unfortunately, I was then confronted with a second problem — that Twitter sucks.

Most people that use Twitter regularly know that it can be flaky — I know from personal experience that it will decide to not post updates, change my background design randomly, and not update my profile image. I guess that can be traced back to the unanticipated amount of growth it experienced in the absence of any real architectural design. The best part of Twitter being flaky like this is that when you use their API to collect some information for the purpose of visualization, the servers will respond with bad requests whenever it feels like it. I know that there are limits on the number of requests users can make per minute, but this was clearly not the issue in my case. So, to collect a set of data that may take a few minutes to obtain, if the site gags on 1 out of every 10 requests, the data you’ll retrieve is useless. So, I had to run my collection script in phases to ensure each ran to completion without errors. I don’t know if this sounds at all annoying, but trust me, it is.

The above image is a visualization of the people followed by the people I follow in Twitter. I created a large grid with each cell containing a single person I follow, as well as the people they follow. I then use ODE to simulate stacking of these pieces within the grid arrangement. So, large piles represent people I follow with large numbers of friends.

In case the above image is too small, I’ve put up a much larger version here (~3MB). Maybe you can find yourself in there somewhere.

Process

April 1st, 2009

I’ve finally gotten around to learning how to use Sunflow, the Java-based global illumination rendering system. I have to admit that much of my motivation came from seeing the great work Toxi has been doing with it for a while now. It’s actually a lot less intimidating than I had originally anticipated, although it can be tricky to find the necessary documentation when you’re stuck with something.

I’ve spent my first few days with Sunflow almost entirely outside the context of any formal rendering engine like Maya, focusing on how to procedurally generate the data that Sunflow uses for rendering. Generating scenes like the one above has turned into a multi-step process spanning multiple applications, libraries, and programming languages. The complexity of this process has reminded me of my old professor Gerald Jay Sussman discussing recent changes to the introductory MIT computer science course to focus less on the minutiae of building a Scheme interpreter in Scheme (aka the metacircular evaluator), and more on dealing with the “Conquest of Abundance” (Paul Feyerabend’s phrase, not mine) that is present in modern day programming. The challenges we face as programmers are increasingly becoming ones of integration, and how to work across library and application boundaries.

The above image, for example, starts with Maya. I’ve written a Python script in Maya to generate and extrude a mesh from a specified character and font. From there, a second Python script is run to extract the mesh polygons, indices, and vertices and generate a C header file containing this data. This generated header file is then used by a program I wrote using the Open Dynamics Engine (ODE) for simulating rigid body dynamics. Primitive shapes are emitted in an ODE simulation with gravity to fill the extruded font mesh, and when that process is complete, the positions of the points are written to a data file. I then take this data file and chop it up with another Python program to randomize the point sizes and shader color values. The final Sunflow scene file is then rendered in about 15 minutes. At the end of the day, I find myself wondering if the cost is worth the payoff. Surely this could all be done by a dedicated Maya programmer, so why beat myself over the head with all of this complexity?

One of the dilemmas of the procedurally oriented graphics programmer, I suppose.

Cocoa + CgFX

March 15th, 2009

I’ve spent an increasing amount of time using Cg for my shaders lately, and I only just now stumbled onto CgFX. CgFX is for so-called “effect scripts”, and is one of a set of related effect techniques like the Microsoft FX and COLLADA FX  file formats. Excited by the collection of examples on the CgFX site, I wanted to experience them first hand.

I’m not really sure why there appears to be a lack of significant documentation on these types of scripts, or why they don’t appear to be widely adopted. The NVIDIA CgFX site, for example, contains examples targeted at sixth generation (2004 era) GeForce hardware. The Cg toolkit download contains a handful of working CgFX examples running under GLUT, but I was really interested in running the examples on the CgFX page. Unfortunately, the examples on that page appear to be both DirectX-centric (using .DDS files), and contain no OpenGL binding code. CgFX appears to be typically used for integration into environments like XSI as opposed to OpenGL.

I spent a few days poking around some of these scripts, trying to figure out a set of rules one can use to get them running under OpenGL on OS X. Basically, it was an exercise in trying to figure out functionality from variable names, amidst poorly documented and messy code. I thought I would document a bit of the process here, in case anyone finds themselves in need of getting a CgFX script running under Cocoa + OpenGL.

The example I’ll discuss is the Perlin noise based “vbomb” example. I’ve copied the CgFX script and placed it here for reference. Basically, that’s all we get from the CgFX site (plus the referenced includes). The question is, how can we get this thing (whatever it does) to work with OpenGL? In the absence of sufficient documentation, we can only discern so much from the picture and script contents.

CgFX scripts have the ability to specify multiple shading techniques (providing fallback mechanisms on systems that don’t support certain versions) and rendering passes. This example is simple in that there’s only one technique and one pass. All the script specifies is a single vertex and pixel shader. Two questions remain. The first is how we get the shader itself to run (i.e. what do we need to draw in OpenGL in order for that shader to do it’s job?). Shaders run when we render things. They take inputs (”uniforms”) that can be specified from OpenGL, and produce some final output based on what we rendered from OpenGL and the specified uniform arguments.

First things first. Shaders are run when we render things. Specific things. Sometimes a shader expects us to simply draw a textured quad, other times, a collection of points. At the surface, I don’t know what this shader expects us to draw for it to work. As this is a Perlin noise based example, I suspect it operates on vertices only, but it’s hard to tell.

My first attempt to get this thing working was to render a set of points. First, I generated a collection of points on a sphere, and rendered them as follows:

glBegin(GL_POINTS);
    for(int i=0; i <NUMPTS; i++)
        cvector3f vect = points[i];
        glVertex3f(vect.x, vect.y, vect.z);
    }
glEnd();

Nothing.

Assuming that rendering a set of points is what the shader expects, the next thing to do is look at the shader uniforms to see what values are expected to be set. Looking at the script, the “TWEAKABLES” comment header looks like it may be what we’re looking for. Let’s look at the first entry:

float gDisplacement <
    string UIWidget = “slider”;
    float UIMin = 0.0;
    float UIMax = 2.0;
    float UIStep = 0.01;
>> = 1.6f;

It looks like it’s got a default value of -1.6f, however, as do the others, so it looks like it’s not expecting any unforms to be set, necessarily. Hmm. (Note: the values inside the brackets above are called Cg annotations, and I believe they are for utilities like XSI to create fancy GUI components to manipulate the values. They aren’t of any use to us, however.

As it turns out, where I should have been looking is in the section (inappropriately titled) “UNTWEAKABLES”, containing a world view projection matrix and a timer. Looking at the vertex shader itself (mainVS), we can see these values are used, but they contain no default values. A bit of trial and error might help. Sketching up some timer modification code in my draw loop:

static float timer = 0.0;
timer += 0.003;
cgSetParameter1f(myTimer, timer);
cgGLSetStateMatrixParameter(mygWvpXf,
                            CG_GL_MODELVIEW_PROJECTION_MATRIX,
                            CG_GL_MATRIX_IDENTITY)

Still nothing.

This is the point where it’s good to give up tweaking the OpenGL side of things and try and get the shader to spit out something. The first step is to change the last line of the vertex shader to the following:

OUT.HPosition = IN.Position;

So, just emit the vertex that was drawn in OpenGL.

This does the trick, and we start to see vertices. Now, we just step through each line of the shader and try to emit a vertex position or color based on that step. That way, we can see what’s working and what’s not as the shader performs it’s calculations.

As it turns out, the offending line was the next to last line:

OUT.HPosition = mul(WvpXf,NewPos);

The multiplication of the resulting point with the specified world view projection matrix. For whatever reason, this ends up making the vertex position zero. I’m really not sure why this is the case, as there really isn’t any ambiguity with respect to how I’ve set up the corresponding uniform. Replacing this line with the following does the trick:

OUT.HPosition = NewPos;

After seeing that this setup appears to work, I replaced my simple point rendering approach with a nice GLU sphere, producing the image above and the following video:


Perlin noise CgFX from Kyle Buza on Vimeo.

Cg is really starting to grow on me, particularly with respect to OpenGL integration. I feel like I spend only a fraction of the amount of time setting up a Cg shader than I do with GLSL. I’m still unclear why it’s not widely used, but suspect that realization may come shortly.

Blobs++

March 2nd, 2009


I recently made some changes to the blobby image simulation I wrote about last time, integrating CVOCV for kicks. I’ve gotten some requests for a version that people can play with, so I’m bundling it up here (OS X 10.5 only). For those that haven’t seen the video, it’s over here on vimeo.

While there may seem to be a number of additional things going on in this demo that weren’t in the previous (image-based) example, the two approaches are in reality very much the same. The difference, of course, comes by leveraging CVOCV to get an OpenGL texture from CoreVideo, as well as having the OpenCV component of CVOCV give up some additional data about pixel movement around the edges of the video frames. The magic comes from the opticalFlowPyrLK method from CVOCV that calculates the optical flow of pixels in the video frames. I’m still new to OpenCV proper, but having implemented the Lucas & Kanade technique (used by opticalFlowPyrLK) by hand back in the day, I’m quite impressed at how easily accessible these things have now become. Anyway, basically what’s going on in this application is that the pixels along the borders of the video frame are monitored by opticalFlowPyrLK to determine how much they change per frame. Optical flow gives us a 2D vector representing how much (and in what direction) the pixels are changing between subsequent video frames. These points are shown below:

After obtaining these movement vectors, we just use them to increment the spring forces on the edges of the underlying mesh. The only nonstandard modification I had to make to opticalFlowPyrLK is to do a bit of low pass filtering, as many vision based tasks like optical flow can be very twitchy. The easiest way to add some of this functionality is to implement the pseudocode from the low pass filter wikipedia entry. That’s what I did, at least.

I’m really becoming pleased with how easy CVOCV has made it for me to add video processing functionality to my OS X applications without much difficulty. I recently needed to throw it into an Ogre application as well, and it worked like a charm.

Three cheers for CVOCV.

Blob Programming 1

February 9th, 2009


Even though I had an unhealthy childhood obsession with green slime, silly putty, and the original Stretch Armstrong, I’ve never tried to implement any soft body simulations in OpenGL. I’ve recently addressed this deficiency by implementing a 2D soft body simulation to produce stretchy images. Currently, it feels more like a rubber sheet than a balloon containing a highly viscous liquid, but I suspect I can achieve the latter through some additional parameter tweaking.

The implementation is based on a technique that incorporates the notion of pressure into the simulation as originally described by Maciej Matyka. I modified the approach he has provided here to create an entire mesh of springs (as opposed to just springs at points along the edges), and use Leapfrog Verlet integration instead of Euler. The following effect is produced after applying an image texture over this set of points. It definitely needs tweaking, but I’m pleased with it as a first pass.

CVOCV

December 26th, 2008


Following the interest I’ve been receiving from the simple CoreVideo + OpenCV example project I wrote a few months back, I decided to put a bit more time into it to make it more approachable and generally useful. Now, it’s becoming a flexible project that can be used for more structured OpenCV experimentation on OS X using CoreVideo. For now, I’m going to call it ‘CVOCV’, or ‘Core Video OpenCV’.

Before this version, the project only displayed the CoreVideo camera input, and the OpenCV results were printed to the terminal in text form. Now, both the unprocessed and processed frames are displayed, the latter being rendered by OpenGL.

I’m really pleased with the way this project is turning out. It performs extremely well, and contains some really nice demonstrations of sometimes tricky OpenGL and OpenCV tasks, from image conversion to the use of OpenGL pixel buffer objects to display frames.

We’re going to keep a simple wiki-style project page for CVOCV over on the buzamoto wiki for now. Enjoy.

E15:oGFx gets some polish

December 10th, 2008

Luis and I spent some time over the course of the past few weeks putting together a short book on E15:oGFx. It gave us a chance to reflect on its architecture and overall concept.

Good times.

E15 on the iPhone

November 2nd, 2008


E15 on the iPhone from Kyle Buza on Vimeo.
 

Well, not exactly. I simply ported the “WWW data access” technique that E15 uses to get web content.

E15 was always intended to be an application for the re-contextualization of web content — all of those pieces of text, data, and images that we see when we use the browser. A lot of work went into figuring out the best way to go about grabbing all of this data, and placing it within an environment that is potentially more sloppy and playful than the browser. Over the course of the past few months, I’ve found myself faced with the same task on two separate occasions, for applications other than E15. A major revelation came over the summer when I was working with Jamie Zigelbaum in the Tangible Media Group. During that time, we were trying to figure out a way to obtain all of this WWW data and place it into a different software architecture that wasn’t E15. It was then that I proposed a different solution to the existing E15 approach, which relied much more heavily on custom web services, deployed to provide extremely elegant and simple interfaces to the content. This approach worked so well that I replaced the current E15 mechanism with it. What once took 30-40 lines of Python can now be done in 3. I’m most particularly fond of is the resulting elegance and readability.

I’ve recently started another project that needs to grab WWW data in this way, with one piece running on the iPhone. I stripped out the implementation from E15 and stuffed it into a simple iPhone application. I’d like to say that it worked straight away, but the iPhone appears to have difficulties (only randomly, of course) with my use of the NSMachPort to initiate the asynchronous download and web service access. The application is simple, it just makes a request to my custom web service to find out where to find some flickr images, and downloads them. While waiting for the iPhone to get the image data, I render small circles that show pending requests.