
Imagine you are drawing a picture of a baseball game. You start drawing a batter hitting a ball. With the dynamic swing you draw, it’s very convincing to see that he has just hit a home run. Great job!! This is the most amazing drawing that I’ve ever seen befo….
   
Wait a second. The background is missing. I can't see where he is. This home run should be in a huge baseball stadium with thousands of people. Draw them now. Of course, you should draw them with the same drawing style as the rest.
Are you reluctant to draw the audience individually? Okay, then let’s learn the "instanced rendering technique." To get ready, please forget about the baseball game. Today's focus is on the GPU rendering technique on the screen.
Instanced rendering is a technique in computer graphics to efficiently draw tons of objects. We can even draw hundreds or thousands of objects by reusing a single instance. This article goes over basic notions of the instanced rendering in OpenGL and shares some example use cases that I developed in Cepton. I provide sample programs at the end of each section.
Let's consider the simplest possible shape that GPU can process, a triangle. We know that a triangle consists of three points and each vertex is represented by a 3D vector of float values. To draw a triangle, we pass the float array data to GPU. However, since GPU is fully detached from CPU, we need to specify every single thing for the simple data sending. But OpenGL APIs help us to write those specifications. The following is the essential summary of how we use OpenGL APIs to send three vertices data.
// Location in GPU to store the vertices data
#define VERTEX_LOCATION 0
// Generate a buffer on GPU to store data
glGenBuffers(1, &VBO);
// Specify the property of the buffer
glBindBuffer(GL_ARRAY_BUFFER, VBO);
// Vertices data
float vertices[9] = {-0.2f, -0.3f, 0.0f,
                      0.2f, -0.3f, 0.0f,
                      0.0f,  0.3f, 0.0f };
// Specify the pointer to the data array and its size 
glBufferData(GL_ARRAY_BUFFER,
             9*sizeof(float),
             vertices,
             GL_STATIC_DRAW);
// Tell GPU how to read the data
// I.e., the array is supposed to be interpreted as a series of 3D vectors
glVertexAttribPointer(VERTEX_LOCATION,
                      3,
                      GL_FLOAT,                 // Value type in GPU 
                      GL_FALSE,                 
                      (GLsizei)3*sizeof(float), // Size of a 3D vector 
                      NULL);                    // Where to start reading
// Enable the data location in GPU before drawing
glEnableVertexAttribArray(VERTEX_LOCATION);
Now the vertices data are ready. Let's tell GPU to draw them on the screen, or more formally, let's execute a "draw call." In our example, as we want to draw a triangle, we use GL_TRIANGLES mode so that GPU regards the series of vectors as the vertices of a triangle, and nicely fills the inside of the triangle.
glDrawArrays(GL_TRIANGLES, 0, 3);
Here is what GPU draws.
   
Now we know how to draw a triangle on the screen. Given that any shape can be represented by a bunch of triangles (i.e. "mesh" representation), we can practically draw anything. However, it's worth noting here that draw calls cannot be executed unlimitedly to ensure a reasonable refresh rate.
As a quick experiment, let's see what happens if we replace the single draw call with 1000,000 drawcalls.
// glDrawArrays(GL_TRIANGLES, 0, 3);
for (int i=0; i< 1000000; i++) glDrawArrays(GL_TRIANGLES, 0, 3);
You would find a significant delay until the triangle shows up and have a hard time manipulating the window (below).
   
OpenGL has various tricks to reduce draw call executions, one of which is the instanced rendering. Suppose we want to draw 1000 objects, the instanced rendering technique encloses all the instructions into a single draw call.
Again, let's use a triangle as an example. As we have already learned how to send the shape information of a triangle to GPU, let's go over how to specify the position information of 1000 triangle objects.
In computer graphics, positions information is typically represented by a 4x4 transformation matrix (in the so-called "homogeneous coordinate.")
// Location in GPU to store the vertices data
#define MATRICES_LOCATION 1
// Number of instances
#define NUM_INSTANCES 1000
// Generate a buffer to store transformation matrices
glGenBuffers(1, &(MBO));
// Specify the property of the buffer
glBindBuffer(GL_ARRAY_BUFFER, MBO);
// Matrices data
float mat[NUM_INSTANCES * 16];
// Positions of triangles 
for (int matrix_id=0; matrix_id < NUM_INSTANCES; matrix_id++) {
    // Spiral arrangement of triangles
    float pos_x = 0.002f *  matrix_id * cos(40*M_PI*matrix_id / NUM_INSTANCES);
    float pos_y = 0.002f *  matrix_id * sin(40*M_PI*matrix_id / NUM_INSTANCES);
    float scale = 0.0004f * matrix_id;
    int i = 16 * matrix_id;
    mat[i+0]  = scale; mat[i+4] = 0.0f;  mat[i+8]  = 0.0f;  mat[i+12] = pos_x; 
    mat[i+1]  = 0.0f;  mat[i+5] = scale; mat[i+9]  = 0.0f;  mat[i+13] = pos_y; 
    mat[i+2]  = 0.0f;  mat[i+6] = 0.0f;  mat[i+10] = scale; mat[i+14] = 0.0f; 
    mat[i+3] =  0.0f;  mat[i+7] = 0.0f;  mat[i+11] = 0.0f;  mat[i+15] = 1.0f; 
}
// Specify the pointer to the array and its length
glBufferData(GL_ARRAY_BUFFER, NUM_INSTANCES * 16 * sizeof(float),
             mat, GL_DYNAMIC_DRAW);
// Tell GPU how to read the data
// The array is supposed to be interpreted as a series of 4x4 matrices 
for (unsigned int i = 0; i < 4; i++) {
  glEnableVertexAttribArray(MATRICES_LOCATION + i);
  glVertexAttribPointer(MATRICES_LOCATION + i, 4, GL_FLOAT, GL_FALSE,
                        16 * sizeof(float),
                        (const GLvoid *)(sizeof(GLfloat) * i * 4));
  glVertexAttribDivisor(MATRICES_LOCATION + i, 1);
}
Now GPU knows 1000 position information as well as the triangle shape information. I always wish that GPU could automatically place the triangle using those matrices but there is another layer of instruction for GPU. We need to specify how to use those matrices with the "shader program." Although the shader programming itself is the very essence of GPU rendering, this article won’t dig into it for simplicity. Rather let's regard it as an extra code to add more instructions on the usage of transformation matrices. The following C-like code is the shader program.
layout (location = 0) in vec3 Vertex;
layout (location = 1) in mat4 Matrix;
void main()
{
  gl_Position = Matrix*vec4(Vertex.x, Vertex.y, Vertex.z, 1.0);
};
This shader program specifies how to process each transformation matrix and each vertex, i.e. taking the matrix product. Notice that the Vertex is placed at the location 0 (= VERTEX_LOCATION) and the Matrix
is in the location 1 ( = MATRIX_LOCATION). This shows that GPU properly receives the data at the locations that we specified.
Now that GPU finally knows everything, let's call a draw call. For instanced rendering, we need to use the following draw call API to draw multiple instances. But this is counted as a single draw call, and therefore efficient.
glDrawArraysInstanced(GL_TRIANGLES, 0, 3, (GLsizei)NUM_INSTANCES);
Here is the result of instanced rendering.
   
Confirm that there is no delay in rendering.
Here, I share some application examples in Cepton. In recent years, Cepton's development has been dedicated to the object detection using the data captured by the Cepton's Lidar sensor. In my internship, I introduced the instanced rendering technique to visualize the results of object detection. Since the instanced rendering technique is easily extended to more complicated shapes, we used it to visualize objects commonly found during driving, such as cars and trees.

The results were quite successful. Although they contain a lot of triangles, (car: ~1000 triangles, tree: ~30000 triangles), they can be visualized in real-time with the instanced rendering technique (see the demo below).


Thank you for reading so far, and I hope this article helps your development as well.
Finally, here I would like to thank the Cepton Perception team members for the great internship experience. I appreciated how they allowed me to try various approaches while being always willing to help me. Without their help, I wouldn't have even hit on instanced rendering.