Don't use Data URLs in 3D models

March 10, 2024

glTF provides an efficient, last-mile transmission format for 3D models and scenes. Resources meant for GPU upload, like geometry and textures, are stored in GPU-ready binary buffers.^[1] Runtimes upload these buffers to the GPU directly — without iterating over each vertex, and often without any intermediate copies of the buffer at all. Among standard 3D formats today, glTF is unique in these optimizations for last-mile transmission to realtime applications.

However, there's more than one way to package a glTF file. One of those ways, often called “glTF Embedded,” uses Data URLs and loses many of those benefits. This might not matter for someone using glTF as an interchange format between two desktop applications, or loading 3D models into a game engine that compiles to engine-specific formats. But for anyone using glTF for last-mile transmission to a realtime application, the “glTF Embedded” packaging method is cursed.

Regularly I've seen developers exporting models to “glTF Embedded,” understandably not knowing why it's any different, and then needing support to diagnose and fix performance issues. This wastes time for everyone — for developers trying to build great 3D experiences, for volunteers providing support on community forums, and for end-users waiting on slow 3D experiences.

Other ways of packaging glTF files are unambiguously better. The “glTF Embedded” method should be avoided except for debugging, if it's supported at all.

Packaging glTF models

.glb models are binary, standalone files. By convention, GLB files are self-contained. Textures and other resources are embedded in the file, as binary data, following a short JSON header. This is the most common glTF packaging method.

.gltf models are human-readable JSON, and should reference external files for binary data and textures. Open a .gltf file in any text editor, and you'll see something like this:

{
    "asset": {
        "version": "2.0",
        "generator": "Khronos glTF Blender I/O v4.0.44",
    },
    "extensionsUsed": ["KHR_materials_unlit"],
    "scenes": [ ... ],
    "nodes": [ ... ],
    "meshes": [ ... ],
    "materials": [{
        "pbrMetallicRoughness": {
            "baseColorFactor": [ 1, 1, 1, 1 ],
            "baseColorTexture": { "index": 0 }
        }
    }],
    "textures": [ ... ],
    "images": [ { "uri": "base_color.png" }, { "uri": "normal.png" } ],
    "buffers": [ { "uri": "geometry.bin", "byteLength": 3240 } ],
    ...
}

JSON is convenient. With JSON, you can edit materials and other non-binary definitions in a text editor.^[2] And notice the URLs for images and buffers above: these are external files with .png, .jpg, .webp, or .ktx2 extensions for textures, or .bin extensions for geometry and animation data.

What if we wanted to embed these images and buffers in the JSON? In a .glb, appending binary chunks to a binary format is no problem. But JSON can't store binary data directly, and we instead need to use Data URLs and Base64:

{
    ...
    "images": [ { "uri": "data:image/png;base64,iVBORw0K..." }],
    "buffers": [ { "uri": "data:application/octet-stream;base64,iVBORw0K..." }],
}

Combining these choices, we get three common presets for packaging glTF files:^[3]

preset	extensions	buffers	human-readable	self-contained
GLB	.glb	binary	❌	✅
glTF “Separate”	.gltf+.bin+.jpg+.png...	binary	✅	❌
glTF “Embedded”	.gltf	Data URL	✅	✅

The first two presets are great! Use GLB if you want a self-contained file. Use “glTF Separate” if you want human-readable JSON or external textures. But “glTF Embedded” encodes buffers as Data URLs, taking a heavy performance penalty.

The problem with Data URLs

Data URLs are common enough in web development. HTTP requests cost some overhead, and webpages can have many small resources like icons, stylesheets, and scripts. The overhead adds up, so we might embed the smallest resources — below some size threshold — into the initial HTML page request. Data URLs are fine for that.

But this scenario doesn't translate well to 3D scenes. Performance requires batching resources on the GPU. We only get so many textures per material, we can't have fewer draw calls than we have materials, and so lots of tiny images aren't practical. Instead we have a smaller number of larger 2K-4K images, and heavy binary resources like geometry and animation keyframes. Even properly optimized, these can be much larger than the resources on a traditional webpage.

So here's the first problem: Data URLs increase data size by +33%.

// create binary 'geometry'.
const bytes = new Uint8Array(1024).map(() => Math.random() * 255);
bytes.byteLength; // 1024 bytes

// convert to base64 Data URL.
const base64 = btoa(String.fromCharCode.apply(null, bytes));
const bytesBase64 = new TextEncoder().encode(base64);
bytesBase64.byteLength; // 1368 bytes

Worse, the data is no longer GPU-ready — it's encoded as ASCII characters! On each device that renders the 3D scene, we're going to have to decode every byte on the CPU up front. Peter McLachlan ran the numbers on mobile devices loading images from Data URLs:^[4]

“... you can imagine my surprise to discover, when measuring the performance of hundreds of thousands of mobile page views, that loading images using a data URI is on average 6x slower than using a binary source link such as an img tag with an src attribute!”

And here's the second problem: That's entirely unnecessary work. We took binary GPU-ready data, encoded it in a larger plaintext format, sent that larger data over the network, and then made each viewer's device spend time translating it back to something the GPU understands.

Conclusion

Arguably, Data URLs should not have been included in the glTF specification. I don't recall the reasons for allowing them, but based on recent feedback to Blender removing the “glTF Embedded” export option (and later reverting that change) the convenience of human-readable, self-contained JSON is a big factor.

Unfortunately I see a lot of people in the WebGL and WebGPU communities taken by surprise with performance problems caused by Data URLs in their applications, then understandably not knowing what's wrong. In some cases Data URLs have lead to negative perceptions of the entire glTF format. glTF isn't the only cause. Popular JavaScript bundlers with the wrong configuration can inline binary resources — of any format — into large Data URLs, too. It's a footgun, and explaining details of file formats to users doesn't scale. Tooling needs better guard rails, and warnings when Data URLs exceed some size threshold.

As general rules for performance-sensitive applications:

For a self-contained binary file, use .glb.
For a human-readable JSON file, use .gltf with external resources (“glTF Separate”).
DO NOT use self-contained .gltf files with embedded resources (“glTF Embedded”), outside of debugging.

If you have glTF files already using “glTF Embedded” and want to losslessly convert to another glTF packaging method, you can. Install Node.js, and then open a terminal.

# one-time installation
npm install --global @gltf-transform/cli

# glb
gltf-transform cp input.gltf output.glb

# gltf separate
gltf-transform cp input.gltf output.gltf

¹ I'm using the term “GPU-ready” somewhat optimistically when referring to images. Some image formats are GPU-compatible, others are not. See my previous post, Choosing texture formats for WebGL and WebGPU applications, for more on image formats in 3D models.

² Optionally, there's a great VSCode plugin.

³ Technically, exporters could mix and match. A binary .glb could contain Data URLs; a .gltf could contain some Data URLs but also have external binary resources. But these presets cover 99% of use cases.

⁴ On Mobile, Data URIs are 6x Slower than Source Linking, July 2013.