
Exploring the glTF External Reference Format
The Khronos 3D Formats Working Group is developing a new addition to the glTF ecosystem that will revolutionize how complex 3D scenes are built on the Web. While glTF, the widely-adopted asset open standard for 3D web content, typically handles one asset at a time, this addition will enable creators to efficiently combine multiple glTF assets into sophisticated scenes. The 3D Formats Working Group is still weighing whether this project will ultimately be released as a glTF extension or an entirely new format. For the purposes of this blog, we will discuss it as the latter: the “glTF External Reference Format.”
At its core, the format consists of a lightweight JSON file that acts as an assembly, pointing to various glTF files and other reference files while defining their arrangement in the scene. By keeping the reference file free of binary data - all geometry, animations, and skinning information remains in the referenced glTF assets - this approach offers two key advantages:
- Commonly used assets can be cached and reused across multiple scenes, significantly improving delivery efficiency.
- It enables "lazy loading," where only the immediately necessary parts of a scene are loaded, whether based on spatial proximity or temporal relevance.
The Khronos Group worked with UX3D to thoroughly exercise the emerging capabilities of the glTF External Reference Format against real-world use cases. This investigation explored crucial functionality including dynamic asset swapping, independent lighting control for individual assets, and on-demand loading. The research findings are helping guide the specification's final development, ensuring it will meet the complex needs of modern 3D applications.
In this post, we'll explore key insights from UX3D’s work with the glTF External Reference Format. We'll examine practical applications including level-of-detail management, asset-specific lighting systems, and billboard implementations, and demonstrate these capabilities through a scene arrangement demo.
Level of Detail
Disclaimer: The video and demo display the transition between different detail levels. While, in practice, this should not be noticeable, for demonstration purposes we exaggerated the differences between the levels.
A common use of external references is level-of-detail (LoD) management, which is normally needed due to several critical factors:
- Network bandwidth limits how quickly large models can load, making lower-resolution alternatives essential.
- Memory constraints often prevent loading all resources at maximum detail.
- Computing power varies across devices, affecting their ability to render high-quality assets smoothly.
- Hardware capabilities vary across multiple platforms, particularly in acceleration support, sometimes necessitating fallback options.
- Display diversity, from small mobile screens to large monitors, requires flexible asset detail levels.
To accommodate these limitations, developers can implement several optimization strategies:
- Geometric complexity reduction: Mesh decimation techniques reduce vertex count while preserving the overall shape, significantly decreasing both memory requirements and GPU load. This approach is particularly effective for distant or less prominent objects.
- Texture optimization: By implementing lower resolution textures or employing lossy compression techniques, developers can dramatically reduce memory usage and loading times without severely impacting visual quality at appropriate viewing distances.
- Shader simplification: Switching from computationally intensive effects like transmission to lighter alternatives such as alpha blending can substantially improve performance. This adaptive approach helps maintain smooth rendering across different device capabilities.
- Animation optimization: By selectively reducing animation complexity through limiting animated properties or implementing simplified motion paths, developers can significantly lower computational overhead while preserving essential movement characteristics.
To ensure the glTF External Reference Format supports flexible LoD strategies across different runtime environments, we identified several essential attributes that could be incorporated into the specification:
- Asset extents: Axis-aligned bounding boxes with defined minimum and maximum coordinates provide precise tracking of asset positioning and scale. This spatial awareness enables real-time decisions about viewport visibility and viewer distance.
- Byte size analysis: By incorporating file size metadata, applications can make informed decisions about resource allocation and loading priorities. This allows for intelligent quality-versus-bandwidth tradeoffs based on network conditions and device capabilities.
- Screen coverage calculation: The percentage of screen occupied by an asset's bounding box allows determination of appropriate detail levels for a given scene and camera setting.
- Distance metrics: may be used to outline a scene independent of display resolution.
- Quality Pixels Measurement: a metric of expected rendering quality, based on the projected diagonal pixel count for each asset detail level. This helps establish consistent visual quality across different viewing conditions.
UX3D prototyped a flexible LoD management system with the introduction of an array of alternative assets into the glTF External Reference file description, each accompanied by the above properties, illustrating that applications will be able to make intelligent, context-aware decisions about which version of an asset to load.
Asset-Based Lighting Management
UX3D investigated the complexities of lighting when composing 3D scenes and proposed enhancements to the glTF External References Format to address these challenges.
Modern 3D scenes often combine multiple independently-created assets, each designed with its own lighting setup. For example, a building model might include both exterior details and illuminated interiors. When multiple such assets are combined in a single scene, their individual lighting can interact in unexpected ways, potentially creating unwanted illumination overlaps.
When combining multiple glTF files into a unified scene, creators need precise control over how light from one model affects others. This control needs to address two key scenarios:
- Local-only lighting: Each model maintains its own isolated lighting environment, preventing light from affecting other models. This preserves the original artistic intent for each asset.
- Scene-wide illumination: Light from any model can affect nearby models, creating naturalistic interactions between assets while potentially increasing computational complexity.
To manage these scenarios without modifying the base glTF specification, we introduced a new lightSource property for assets within the glTF External Reference file. This property enables creators to specify whether an asset should consider only its internal lights or all lights in the combined scene. This control helps optimize performance by preventing unnecessary light calculations between assets.
While glTF files traditionally don't specify environmental lighting directly, combining assets with varying lighting conditions (like sun-lit exteriors and shadowed interiors) creates new challenges. Our proposed solution introduces direct HDR environment definitions in glTF External Reference files. Rather than including prefiltered mipmaps as proposed in the EXT_lights_image_based extension, we delegate prefiltering to the viewer. This approach simplifies the specification while giving viewers more flexibility in HDR map interpretation.
For fine-grained control, creators can define multiple HDR environments with individual intensity multipliers. This allows precise lighting management for each asset in the scene, ensuring that both artistic vision and technical performance requirements are met.
Billboards
Billboarding is a technique where a mesh (typically a 2D sprite) automatically orients itself to face the camera. This widely requested feature enables several important use cases:
- Text labels that maintain readability from any viewing angle
- Efficient representation of environmental elements like foliage, trees, and clouds using 2D sprites
- Dynamic visual effects, including particle systems and fade transitions
- Performance-optimized distant object representations using low-detail sprites
While our initial investigation focused on implementing billboarding within the glTF External Reference Format, we determined that creating a dedicated glTF node extension would provide greater flexibility, ensuring billboard functionality is available in both standalone glTF files and the External Reference Format.
The extension focuses specifically on node transformations rather than mesh data manipulation. This scope decision means the billboard functionality can be applied to any mesh, not just 2D sprites. Future extensions could address complementary features, such as text-to-sprite generation with formatting options.
The billboard extension offers several powerful customization options:
- Distance-Based Scaling: Billboards can maintain consistent screen size regardless of camera position or zoom level.
- Custom Orientation Control: Creators can override the default forward (+Z) and up vectors of the node.
- Depth Priority: Optional rendering in front of other meshes, as demonstrated by the George Church label in our demo asset. By default, billboards respect normal depth occlusion rules.
- Axis-Constrained Rotation: Creators can limit rotation to specific axes. For example, tree sprites can be configured to rotate only around their vertical axis, preserving natural movement while avoiding unrealistic camera-based rotation.
Interactive Scene Arrangement Demo
UX3D has created a sample demonstration that showcases a practical application of the glTF External Reference Format through an interactive scene composition demonstration. The demo allows users to manipulate various assets defined in glTF External Reference Format by stacking and arranging them within a 3D environment - similar to how a room planning application might enable furniture arrangement for example.
The interaction system works through a straightforward selection and placement mechanism:
Asset Selection
- Each asset has a unique color identifier in a hidden picking layer
- Double-clicking an asset highlights both the selected item and its child elements
Asset Placement
- Hold CTRL and click another asset to place the selected item on its surface
- Alternatively, hold CTRL and drag to smoothly move the selected asset
- Objects automatically align their up vectors with surface normals, enabling natural placement on walls and curved surfaces
- The demo prioritizes ease of use over physical accuracy - there's no gravity simulation or collision detection, and objects can intersect freely. When an asset is moved onto another object, it's automatically reparented in the scene hierarchy.
This prototype could evolve into a comprehensive room planning tool with additional features:
- Drag-and-drop functionality for importing new assets
- Asset swapping capabilities for replacing existing objects
- Integration with a library of glTF assets for easy scene composition.
These enhancements would create a powerful tool for creating and modifying glTF External Reference files through an intuitive visual interface.
Learn More and Get Involved
You can explore UX3D’s full findings on the Khronos GitHub, along with a live billboarding demo using glTF External Reference Format functionality proposed as a result of this research. You can also hear 3D Formats WG co-chairs Alexey Medvedev of Meta and Dan Frith of London Dynamics explore these use cases as well as potential commercial applications and the broader glTF roadmap on the Voices of VR podcast.
glTF External Reference Format is currently under active development. If you have ideas, use cases, or want to provide feedback on the draft specification, please reach out to us or raise an issue in the glTF External Reference repository.