Vertex Rendering
- This page is about the drawing functions for vertices. If you're looking for info on how to define where this vertex data comes from, that is on Vertex Specification.
Vertex Rendering is the process of taking vertex data specified in arrays and rendering one or more Primitives with this vertex data.
Prerequisites[edit]
In order to successfully issue a drawing command, the currently bound Vertex Array Object must have been properly set up with vertex attribute arrays, as defined here. If indexed rendering is to be used, the GL_ELEMENT_ARRAY_BUFFER binding in the VAO must have a Buffer Object bound to it as well.
Causes of rendering failure[edit]
The GL_INVALID_OPERATION error can happen when issuing any drawing command for many reasons, most of which have little to do with the actual drawing command itself. The following represent conditions you must ensure are valid when issuing a drawing command.
- A non-zero Vertex Array Object must be bound (though no arrays have to be enabled, so it can be a freshly-created vertex array object).
- The current framebuffer must be complete. The Default Framebuffer (if present) is always complete, so this usually happens with Framebuffer Objects.
- The current program object or program pipeline object must be successfully linked and valid for the current state. This includes:
- If there is an active Tessellation Evaluation Shader or Geometry Shader, then there must also be an active Vertex Shader present.
- Any active programs cannot have two or more sampler variables of different types which are associated with the same texture image unit.
- Any active programs cannot have two or more image variables of different types which are associated with the same image unit.
- Program pipeline objects have additional rules. In the case of a unified program, these would all be linker errors, but program pipelines have to check at render-time:
- Programs from one program object cannot come between two shader stages that are defined by different program object. So if you have program A which has a vertex and fragment shader, you cannot put program B into the pipeline with a geometry shader between them.
- The number of active samplers, images, uniform buffers, and shader storage buffers (where applicable) across all images must not exceed the combined implementation-defined limits. This is statically checked by the linker for non-separate programs, but must be render-time checked with program pipelines.
- If a Geometry Shader and a Tessellation Evaluation Shader are present and not linked into the same program, the GS's input primitive type must match the primitive generated by the TES's patch output type.
- If a Tessellation Control Shader is active, a Tessellation Evaluation Shader must also be active.
- Textures used by the current programs' samplers and/or images must be complete.
- If a Geometry Shader is present, the primitive type fed to the GS must be compatible with the primitive input for the GS. If no TES is active, then the mode primitive type from the drawing command must match.
- If Transform Feedback is active, the transform feedback mode must match the applicable primitive mode. That mode is determined as follows:
- If a Geometry Shader is active, then the applicable primitive mode is the GS's output primitive type.
- If no GS is active but a Tessellation Evaluation Shader is active, then the applicable primitive mode is the TES's output primitive type.
- Otherwise, the applicable primitive type is the primitive mode provided to the drawing command.
- mode may be GL_PATCHES if and only if a Tessellation Evaluation Shader is active. And vice versa.
- Buffer objects being read from or written by an OpenGL rendering call must not be mapped in a non-persistent fashion. This includes, but is not limited to:
- A buffer bound for attribute or index data.
- A buffer bound for Transform Feedback when that is active.
- A buffer bound for uniforms, shader storage, or Atomic Counters when any shader reads from (or writes to) that buffer.
- A buffer bound as a texture or Image Load Store image.
This list is not comprehensive. If you know of more, please add them here.
Common[edit]
Rendering can take place as non-indexed rendering or indexed rendering. Indexed rendering uses an element buffer to decide which index in the vertex arrays values are pulled from. This is explained in more detail in the Vertex Specification article.
All non-indexed drawing commands are of the form, gl*Draw*Arrays*, where the * values can be filled in with different words. All indexed drawing commands are of the form, gl*Draw*Elements*.
Primitive Restart[edit]
Primitive restart functionality allows you to tell OpenGL that a particular index value means, not to source a vertex at that index, but to begin a new Primitive of the same type with the next vertex. In essence, it is an alternative to glMultiDrawElements (see below). This allows you to have an element buffer that contains multiple triangle strips or fans (or similar primitives where the start/end of a primitive has special behavior).
The way it works is with the function glPrimitiveRestartIndex. This function takes an index value. If this index is found in the index array, the system will not source a vertex; instead, it will start the primitive processing again as though a second drawing command had been issued. If you use a BaseVertex drawing function, this test is done before the base index is added to the restart. Using this feature also requires using glEnable(GL_PRIMITIVE_RESTART) to activate it, and the corresponding glDisable to turn it off.
Here is an example. Let's say you have an index array as follows:
{ 0 1 2 3 65535 2 3 4 5 }
If you render this as a triangle strip normally, you get 7 triangles. If you render it with glPrimitiveRestartIndex(65535) and GL_PRIMITIVE_RESTART enabled, then you will get 4 triangles:
{0 1 2}, {1 2 3}, {2 3 4}, {3 4 5}
Primitive restart works with any indexed rendering function. Even the indirect ones.
Fixed index restart[edit]
Core in version | 4.6 | |
---|---|---|
Core since version | 4.3 | |
Core ARB extension | ARB_ES3_compatibility |
For compatibility with OpenGL ES 3.0, OpenGL 4.3 allows the use of GL_PRIMITIVE_RESTART_FIXED_INDEX. This enumerator can be enabled and disabled, just like GL_PRIMITIVE_RESTART. If they are both enabled, then fixed-index restarting takes precedence over the user-specified index.
Unlike regular restarting, the fixed-index version uses a specific index. Namely, the largest index possible for the type of the indexed drawing command. So if you the type is GL_UNSIGNED_SHORT, then the restart index will be 65535 or 0xFFFF.
Direct rendering[edit]
These vertex drawing commands provide the various rendering parameters directly as parameters passed to the functions. This contrasts with other drawing commands (see later sections), where some parameters are pulled from OpenGL object sources.
Basic Drawing[edit]
The basic drawing functions are these:
void glDrawArrays( GLenum mode, GLint first, GLsizei count ); void glDrawElements( GLenum mode, GLsizei count, GLenum type, void * indices );
where, for glDrawArrays:
- mode parameter is the Primitive type.
- first and count values in define the range of elements to be pulled from the buffer.
as for glDrawElements:
- count and indices parameters define the range of indices.
- count defines how many indices to use.
- indices defines the offset into the index buffer object (bound to GL_ELEMENT_ARRAY_BUFFER, stored in the VAO) to begin reading data.
- type field describes what the type of the indices are:
- GL_UNSIGNED_BYTE: index range: [0, 255]
- GL_UNSIGNED_SHORT: index range: [0, 65535]
- GL_UNSIGNED_INT: index range: [0, 232 - 1].
The indices parameter is odd. Much like old-style vertex attributes, it is not a pointer at all. It is in fact a byte offset, which is disguised as a pointer. So you need to take your byte offset into the index buffer and cast it into a void* (with reinterpret_cast<void*> or just (void*)).
Multi-Draw[edit]
The basic drawing functions are all you really need in order to send vertices for rendering. However, there are a number of ways to draw that optimize certain rendering cases.
Rendering with a different VAO from the last drawing command is usually a relatively expensive operation. So many of the optimization mechanisms are based on you storing the data for several meshes in the same buffer objects with the same vertex formats and other VAO data.
Binding a VAO or modifying VAO state is often an expensive operation. And there are many cases where you want to render a number of distinct meshes with a single draw call. All of the meshes must be in the same VAO (and therefore the same buffer objects and index buffers). Also, of course, they must use the same shader program with the same uniform values.
To render multiple primitives from a VAO at once, use this:
void glMultiDrawArrays( GLenum mode, GLint *first, GLsizei *count, GLsizei primcount);
This function is conceptually implemented as:
void glMultiDrawArrays( GLenum mode, GLint *first, GLsizei *count, GLsizei primcount )
{
for (int i = 0; i < primcount; i++)
{
if (count[i] > 0)
glDrawArrays(mode, first[i], count[i]);
}
}
Of course, you could write this function yourself. However, because it all happens in a single OpenGL call, the implementation has the opportunity to optimize this beyond what you could write.
There is an indexed form as well:
void glMultiDrawElements( GLenum mode, GLsizei *count, GLenum type, void **indices, GLsizei primcount );
Similarly, this is implemented conceptually as:
void glMultiDrawElements( GLenum mode, GLsizei *count, GLenum type, void **indices, GLsizei primcount )
{
for (int i = 0; i < primcount; i++)
{
if (count[i]) > 0)
glDrawElements(mode, count[i], type, indices[i]);
}
}
Multi-draw is useful for circumstances where you know that you are going to draw a lot of separate primitives of the same kind that all use the same shader. Typically, this would be a single conceptual object that you would always draw together in the same way. You simply pack all of the vertex data into the same VAO and buffer objects, using the various offsets to pick and choose between them.
Base Index[edit]
All of the glVertexAttribPointer calls define the format of the vertices. That is, the way the vertex data is stored in the buffer objects. Changing this format is somewhat expensive in terms of performance.
If you have a number of meshes that all share the same vertex format, it would be useful to be able to put them all in a single set of buffer objects, one after the other. If we have two meshes, A and B, then their data would look like this:
[A00 A01 A02 A03 A04... Ann B00 B01 B02... Bmm]
B's mesh data immediately follows A's mesh data, with no breaks inbetween.
The glDrawArrays call takes a start index. If we are using unindexed rendering, then this is all we need. We call glDrawArrays once with 0 as the start index and nn as the array count. Then we call it again with nn as the start index and mm as the array count.
Indexed rendering is often very useful, both for memory saving and performance. So it would be great if we can preserve this performance saving optimization when using indexed rendering.
In indexed rendering, each mesh also has an index buffers. glDrawElements takes an offset into the index buffer, so we can use the same mechanism to select which sets of indices to use.
The problem is the contents of these indices. The third vertex of mesh B is technically index 02. However, the actual index is determined by the location of that vertex relative to where the format was defined. And since we're trying to avoid redefining the format, the format still points to the start of the buffer. So the third vertex of mesh B is actually at index 02 + nn.
We could in fact store these indices in the index buffer that way. We could go through all of mesh B's indices and add nn to them. But we don't have to.
Instead, we can use this function:
void glDrawElementsBaseVertex( GLenum mode, GLsizei count, GLenum type, void *indices, GLint basevertex);
This works as glDrawElements does, except that basevertex is added to each index before pulling from the vertex data. So for mesh A, we pass a base vertex of 0 (or just use glDrawElements), and for mesh B, we pass a base vertex of nn.
Instancing[edit]
It is often useful to be able to render multiple copies of the same mesh in different locations. If you're doing this with small numbers, like 5-20 or so, multiple draw commands with shader uniform changes between them (to tell which is in which location) is reasonably fast in performance. However, if you're doing this with large numbers of meshes, like 5,000+ or so, then it can be a performance problem.
Instancing is a way to get around this. The idea is that your vertex shader has some internal mechanism for deciding where each instance of the rendered mesh goes based on a single number. Perhaps it has a table (stored in a Buffer Texture or Uniform Buffer Object) that it indexes with the current vertex's instance number to get the per-instance data it needs. Perhaps it uses an attribute divisor for certain attributes, which provides a different value for each instance. Or perhaps it has a simple algorithm for computing the location of an instance based on its instance number.
Regardless of the mechanism, if you want to do instanced rendering, you call:
void glDrawArraysInstanced( GLenum mode, GLint first, GLsizei count, GLsizei instancecount );
void glDrawElementsInstanced( GLenum mode, GLsizei count, GLenum type, const void *indices, GLsizei instancecount );
It will send the same vertices instancecount number of times, as though you called glDrawArrays/Elements in a loop of instancecount length. However, the vertex shader is given a special input value: gl_InstanceID. It will receive a value on the half-open range [0, instancecount) based on which instance of the mesh is being rendered. gl_InstanceID and using instanced attribute arrays are the only mechanisms for being able to differentiate between instances.
In OpenGL 4.2 or with ARB_base_instance, the starting instance can be specified with "BaseInstance" commands, as follows:
void glDrawArraysInstancedBaseInstance( GLenum mode, GLint first, GLsizei count, GLsizei instancecount, GLuint baseinstance );
void glDrawElementsInstancedBaseInstance( GLenum mode, GLsizei count, GLenum type, const void *indices, GLsizei instancecount, GLuint baseinstance );
The baseinstance specifies the first instance. The instancecount still represents the number of instances. The instance used by the attribute divisor is biased by the base instance. That is, it starts at the base instance and is incremented by 1 for each instance. So the base instance affects the attribute divisor.
Range[edit]
Implementations of OpenGL can often find it useful to know how much vertex data is being used in a buffer object. For non-indexed rendering, this is pretty easy to determine: the first and count parameters of the Arrays functions gives you appropriate information. For indexed rendering, this is more difficult, as the index buffer can use potentially any index up to its size.
Still for optimization purposes, it is useful for implementations to know the range of indexed rendering data. Implementations may even read index data manually to determine this.
The "Range" series of glDrawElements commands allows the user to specify that this indexed rendering call will never cause indices outside of the given range of values to be sourced. The call works as follows:
void glDrawRangeElements( GLenum mode, GLuint start, GLuint end, GLsizei count, GLenum type, void *indices );
Unlike the "Arrays" functions, the start and end parameters specify the minimum and maximum index values (from the element buffer) that this draw call will use (rather than a first and count-style). If you try to violate this restriction, you will get implementation-behavior (ie: rendering may work fine or you may get garbage).
There is one index that is allowed outside of the area bound by start and end: the primitive restart index. If primitive restart is set and enabled, it does not have to be within the given boundary.
Implementations may have a specific "sweet spot" for the range of indices, such that using indices within this range will have better performance. They expose such values with a pair of glGetIntegerv enumerators. To get the best performance, end - start should be less than or equal to GL_MAX_ELEMENTS_VERTICES, and count (the number of indices to be rendered) should be less than or equal to GL_MAX_ELEMENTS_INDICES.
Combinations[edit]
It is often useful to combine these optimization techniques. Primitive restart can be combined with any of them, so long as they are using indexed rendering. The primitive restart comparison test, in the case of BaseVertex calls, is done before the base index is added to the index from the mesh.
Base vertex can be combined with any one of MultiDraw, Range, or Instancing. These functions are:
void glMultiDrawElementsBaseVertex( GLenum mode, GLsizei *count, GLenum type, void **indices, GLsizei primcount, GLint *basevertex );
void glDrawRangeElementsBaseVertex( GLenum mode, GLuint start, GLuint end, GLsizei count, GLenum type, void *indices, GLint basevertex );
void glDrawElementsInstancedBaseVertex( GLenum mode, GLsizei count, GLenum type, const void *indices, GLsizei instancecount, GLint basevertex );
In the case of MultiDraw, the basevertex parameter is an array, so each primitive can have its own base index.
BaseVertex and instancing can also be combined with BaseInstance in OpenGL 4.2 or ARB_base_instance, thus yielding the massively named:
void glDrawElementsInstancedBaseVertexBaseInstance( GLenum mode, GLsizei count, GLenum type, const void *indices, GLsizei instancecount, GLint basevertex, GLuint baseinstance);
None of the other features can be combined with one another. So Range does not combine with MultiDraw.
Transform feedback rendering[edit]
Core in version | 4.6 | |
---|---|---|
Core since version | 4.0 | |
Core ARB extension | ARB_transform_feedback2, ARB_transform_feedback3, ARB_transform_feedback_instanced |
When using Transform Feedback to generate vertices for rendering, you often use a asynchronous query object to get the number of primitives, and then use this number to compute the number of vertices for glDrawArrays or glDrawArraysInstanced call, where appropriate.
However, using a query object for this requires a GPU->CPU->GPU transfer of information. You have to read from the query object on the CPU, then transfer that information to your draw call.
This feature allows a way to bypass this. These functions allow the user to draw everything that was rendered during a transform feedback operation, without the CPU having to explicitly read the value back.
To perform non-instanced rendering from a transform feedback object, these functions are used:
void glDrawTransformFeedback(GLenum mode, GLuint id); void glDrawTransformFeedbackStream(GLenum mode, GLuint id, GLuint stream);
mode is the usual Primitive type. The id is the transform feedback object to draw from. The stream is the stream in the feedback object to get the vertex count from. Note that glDrawTransformFeedback is equivalent to calling glDrawTransformFeedbackStream with a stream of zero.
If GL 4.2 or ARB_transform_feedback_instanced is available, then the instanced version of these functions can be used:
void glDrawTransformFeedbackInstanced(GLenum mode, GLuint id, GLsizei instancecount); void glDrawTransformFeedbackStreamInstanced(GLenum mode, GLuint id, GLuint stream, GLsizei instancecount);
These function as glDrawArraysInstanced. There are no BaseInstance versions of these.
Indirect rendering[edit]
Core in version | 4.6 | |
---|---|---|
Core since version | 4.0 | |
Core ARB extension | ARB_draw_indirect, ARB_multi_draw_indirect, ARB_base_instance |
Indirect rendering is the process of issuing a drawing command to OpenGL, except that most of the parameters to that command come from GPU storage provided by a Buffer Object. For example, glDrawArrays takes a primitive type, the number of vertices, and the starting vertex. When using the indirect drawing command glDrawArraysIndirect, the starting vertex and number of vertices to render would instead be stored in a buffer object.
The purpose of this is to allow GPU processes to fill these values in. This could be a Compute Shader, a specially designed Geometry Shader coupled with Transform Feedback, or an OpenCL/CUDA process. The idea is to avoid the GPU->CPU->GPU round-trip; the GPU decides what range of vertices to render with. All the CPU does is decide when to issue the drawing command, as well as which Primitive is used with that command.
The indirect rendering functions take their data from the buffer currently bound to the GL_DRAW_INDIRECT_BUFFER binding. Thus, any of these functions will fail if no buffer is bound to that binding.
All of the indirect rendering functions allow the following features:
- Indexed rendering
- Base vertex (for indexed rendering)
- Instanced rendering
- Base instance (if OpenGL 4.2 or ARB_base_instance is available). Does not require rendering more than one instance
Thus, they act as the largest combination of features of the supported implementation.
For non-indexed rendering, the indirect equivalent to glDrawArraysInstancedBaseInstance is this:
void glDrawArraysIndirect(GLenum mode, const void *indirect);
The mode is the usual primitive type. indirect is the offset into the GL_DRAW_INDIRECT_BUFFER to find the beginning of the data.
The data is provided as if in a C struct of the following definition:
typedef struct {
GLuint count;
GLuint instanceCount;
GLuint first;
GLuint baseInstance;
} DrawArraysIndirectCommand;
This represents a draw call equivalent to:
glDrawArraysInstancedBaseInstance(mode, cmd->first, cmd->count, cmd->instanceCount, cmd->baseInstance);
If OpenGL 4.3 or ARB_multi_draw_indirect is available, then multiple indirect array drawing commands can be issued in one call with this:
void glMultiDrawArraysIndirect(GLenum mode, const void *indirect, GLsizei drawcount, GLsizei stride);
The drawcount is the number of indirect drawing commands to issue; the stride is the byte offset from one drawing command to the next. It can be set to zero; if so, then the array of indirect commands is assumed to be tightly backed (ie: 16-byte stride). The stride must be a multiple of 4.
For indexed rendering, the indirect equivalent to glDrawElementsInstancedBaseVertexBaseInstance is this:
void glDrawElementsIndirect(GLenum mode, GLenum type, const void *indirect);
The mode and type parameters work as they do in regular glDrawElements-style functions. As with other indirect functions, the indirect is the byte-offset into the GL_DRAW_INDIRECT_BUFFER to find the indirect data structure.
In indexed rendering, the structure is defined as follows:
typedef struct {
GLuint count;
GLuint instanceCount;
GLuint firstIndex;
GLuint baseVertex;
GLuint baseInstance;
} DrawElementsIndirectCommand;
This represents a draw call equivalent to:
glDrawElementsInstancedBaseVertexBaseInstance(mode, cmd->count, type,
cmd->firstIndex * size-of-type, cmd->instanceCount, cmd->baseVertex, cmd->baseInstance);
Where size-of-type is the size in bytes of type.
If OpenGL 4.3 or ARB_multi_draw_indirect are available, then multiple indirect indexed drawing commands can be issued in one call with this:
void glMultiDrawElementsIndirect( GLenum mode, GLenum type, const void *indirect, GLsizei drawcount, GLsizei stride );
The drawcount is the number of indirect drawing commands to issue; the stride is the byte offset from one drawing command to the next. It can be set to zero; if so, then the array of indirect commands is assumed to be tightly backed (ie: 20-byte stride). The stride must be a multiple of 4.
Indirect count[edit]
Core in version | 4.6 | |
---|---|---|
Core since version | 4.6 | |
Core ARB extension | ARB_indirect_parameters |
OpenGL 4.6 introduced a small extension to the Multi Draw Indirect APIs which allows supplying drawcount from a Buffer Object that is bound to the GL_PARAMETER_BUFFER binding. This works much like GL_DRAW_INDIRECT_BUFFER and allows the GPU to specify not just the parameters for each individual draw, but also how many draws to issue - all without requiring a readback to the CPU.
void glMultiDrawArraysIndirectCount( GLenum mode, const void *indirect, GLintptr drawcount, GLsizei maxdrawcount, GLsizei stride );
void glMultiDrawElementsIndirectCount( GLenum mode, GLenum type, const void *indirect, GLintptr drawcount, GLsizei maxdrawcount, GLsizei stride );
The above two functions are identical to their non-indirect-count counterparts except that:
- drawcount defines a byte offset into the GL_PARAMETER_BUFFER - which must be a multiple of 4, and
- maxdrawcount defines an expected upper limit to the number of draw commands in GL_DRAW_INDIRECT_BUFFER.
The draw count is read from GL_PARAMETER_BUFFER at offset drawcount and must be a single GLsizei. If the value read from the parameter buffer exceeds maxdrawcount then it will be clamped to the specified maximum value.
Conditional rendering[edit]
Core in version | 4.6 | |
---|---|---|
Core since version | 3.0 | |
Vendor extension | NV_conditional_render |
Conditional rendering is a mechanism for making the execution of one or more rendering commands conditional on the result of an Occlusion Query operation. This feature allows you to render some cheap object, then use an occlusion query to see if any of it is visible. If it is, then you can render the expensive object, but if it isn't, then you can save time and performance.
Note that conditional rendering allows any Rendering Command to be executed conditionally, not just the drawing commands listed on this page.
This is done with the following functions:
glBeginConditionalRender(GLuint id, GLenum mode); glEndConditionalRender();
All rendering commands issued within the boundaries of these two functions will only execute if the occlusion condition specified by id is tested to be true. For GL_SAMPLES_PASSED queries, it is considered true (and thus rendering commands are executed) if the number of samples is not zero.
The commands that can be conditioned are:
- Every function previously mentioned. IE, all functions of the form glDraw* or glMultiDraw*.
- glClear and glClearBuffer.
- glDispatchCompute and glDispatchComputeIndirect.
The mode parameter determines how the discarding of the rendering functions is performed. It can be one of the following:
- GL_QUERY_WAIT: OpenGL will wait until the query result is returned, then decide whether to execute the rendering command. This ensures that the rendering commands will only be discarded if the query fails. Note that it is OpenGL that's waiting, not (necessarily) the CPU.
- GL_QUERY_NO_WAIT: OpenGL may execute the rendering commands anyway. It will not wait to see if the query test is true or not. This is used to prevent pipeline stalls if the time between the query test and the execution of the rendering commands is too short.
- GL_QUERY_BY_REGION_WAIT: OpenGL will wait until the query result is returned, then decide whether to execute the rendering command. However, the rendered results will be clipped to the samples that were actually rasterized in the occlusion query. Thus, the rendered result can never appear outside of the occlusion query area.
- GL_QUERY_BY_REGION_NO_WAIT: As above, except that it may not wait until the occlusion query is finished. The region clipping still holds.
Note that "wait" in this case does not mean that glEndConditionalRender itself will stall on the CPU. It means that the first command within the conditional rendering scope will not be executed by the GPU until the query has returned. So the CPU will continue processing, but the GPU itself may have a pipeline stall.
See Also[edit]
Reference[edit]
- Category:Core API Ref Vertex Rendering: Reference documentation for vertex array rendering functions.