arxiv:2512.16670

FrameDiffuser: G-Buffer-Conditioned Diffusion for Neural Forward Frame Rendering

Published on Dec 18

· Submitted by

Jan-Niklas Dihlmann on Dec 19

Upvote

Authors:

Jan-Niklas Dihlmann ,

Abstract

FrameDiffuser is an autoregressive neural rendering framework that generates temporally consistent photorealistic frames using G-buffer data and previous frame outputs, leveraging ControlNet and ControlLoRA for structural and temporal coherence.

AI-generated summary

Neural rendering for interactive applications requires translating geometric and material properties (G-buffer) to photorealistic images with realistic lighting on a frame-by-frame basis. While recent diffusion-based approaches show promise for G-buffer-conditioned image synthesis, they face critical limitations: single-image models like RGBX generate frames independently without temporal consistency, while video models like DiffusionRenderer are too computationally expensive for most consumer gaming sets ups and require complete sequences upfront, making them unsuitable for interactive applications where future frames depend on user input. We introduce FrameDiffuser, an autoregressive neural rendering framework that generates temporally consistent, photorealistic frames by conditioning on G-buffer data and the models own previous output. After an initial frame, FrameDiffuser operates purely on incoming G-buffer data, comprising geometry, materials, and surface properties, while using its previously generated frame for temporal guidance, maintaining stable, temporal consistent generation over hundreds to thousands of frames. Our dual-conditioning architecture combines ControlNet for structural guidance with ControlLoRA for temporal coherence. A three-stage training strategy enables stable autoregressive generation. We specialize our model to individual environments, prioritizing consistency and inference speed over broad generalization, demonstrating that environment-specific training achieves superior photorealistic quality with accurate lighting, shadows, and reflections compared to generalized approaches.

View arXiv page View PDF Add to collection

Community

JDihlmann

Paper author Paper submitter 2 days ago

FrameDiffuser: GBuffer-Conditioned Diffusion for Neural Forward Frame Rendering

Let any interactive scene be illuminated by an image diffusion model. We show that Stable Diffusion 1.5 can be adapted to autoregressively predict global illumination from a stream of G-buffer data. We overfit SD on single scenes and show that it learns the illumination setting for the scene and can transfer it to OOD views of the scene.

avahal

1 day ago

arXiv lens breakdown of this paper 👉 https://arxivlens.com/PaperView/Details/framediffuser-g-buffer-conditioned-diffusion-for-neural-forward-frame-rendering-53-33a66c61

Key Findings
Executive Summary
Detailed Breakdown
Practical Applications

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2512.16670 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2512.16670 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2512.16670 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.