NeuROK: Generative 4D Neural Object Kinematics

Problem Setting

Generating simulative 4D dynamics from a static shape

Given a shape-only static 3D object and an initial physical condition, we study generating its simulative 4D dynamics: plausible temporal deformations of static objects under the specified input physical conditions.

Application

Scan your room and make it interactable!

Our method is robust and can be applied to turn real scanned 3D objects into interactive 4D objects.

Interacting with real objects in a Stanford office

3D Scan · Smartphone

Generated Output · Stanford Office

We scan each scene and perform post-processing, including object segmentation, to obtain the model inputs.

Interacting with real objects in a Stanford office kitchen

3D Scan · Smartphone

Generated Output · Stanford Office Kitchen

We scan each scene and perform post-processing, including object segmentation, to obtain the model inputs.

Interacting with real objects in an apartment kitchen

3D Scan · Smartphone

Generated Output · Apartment Kitchen

We scan each scene and perform post-processing, including object segmentation, to obtain the model inputs.

Interacting with real objects in a Cornell office

3D Scan · Smartphone

Generated Output · Cornell Office

We scan each scene and perform post-processing, including object segmentation, to obtain the model inputs.

Model Prediction

Unified model for diverse phenomena

From shape-only static 3D assets without any dynamic annotations, our pipeline can form a 3D world supporting diverse interaction from users.

Our method uses only the minimal inductive bias of Lagrangian mechanics and assumes no object category or dynamic structure — so the same unified model can be applied to a diverse range of objects.

Headphones, flowers, Newton's cradle & more in your office

Input: 3D shapes of static objects and initial physical conditions

Generated Output

We insert the generated 4D objects into a 3D office to form an interactive 3D world.

Curtains, oral rinse, faucets & more in your bathroom

Input: 3D shapes of static objects and initial physical conditions

Generated Output

We insert the generated 4D objects into a 3D bathroom to form an interactive 3D world.

Sponges, kettles, microwaves & more in your kitchen

Input: 3D shapes of static objects and initial physical conditions

Generated Output

We insert the generated 4D objects into a 3D kitchen to form an interactive 3D world.

Method

Simpler coordinates, simpler dynamics

NeuROK's idea is to simulate inside a learned latent state space of the object.
It draws on Lagrangian mechanics: with the right choice of coordinates, a hard dynamics problem becomes a simple one.

Your browser can't run WebGL, so these interactive 3D views are unavailable.

Encoding kinematics

NeuROK learns this latent space from data: it captures the object's possible states, and a decoder maps any latent vector to a valid deformation. Below, we make this tangible: sweeping across a 2D slice of the space for an eyeglass, we decode each latent vector into 3D on the fly.

Latent space

drag to explore

latent = (0.0, 0.0)

Decoded 3D shape

Solving dynamics on a latent state space

Simulating is then straightforward: a single equation of motion (the Euler–Lagrange equation) handles every kind of object. As the eyeglass is dropped, its latent vector follows a path over time; decoding that path frame by frame gives the full 3D motion.

Latent trajectory

Simulated drop

Comparisons

We show video results comparing our method with baselines on physically-inspired 4D generation.

Select a dynamic object

Input

Input

Input

Input

Input

Input

Input

Input

Note that the goal of this paper is to generate one plausible 4D sequence that satisfies one valid physical configuration and conforms to human physical intuition.

BibTeX

@InProceedings{Geng_2026_CVPR,
    author    = {Geng, Chen and He, Guangzhao and Gao, Yue and Zhang, Yunzhi and Wu, Shangzhe and Wu, Jiajun},
    title     = {{NeuROK}: Generative 4D Neural Object Kinematics},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2026},
    pages     = {39239-39251}
}

Acknowledgements

This work is in part supported by NSF RI #2211258 and #2338203, ONR YIP N00014-24-12117, ONR MURI N00014-22-1-2740, the Stanford Institute for Human-Centered AI (HAI), and the Magic Grant from the Brown Institute for Media Innovation.

We acknowledge the compute support from the NSF ACCESS program #CIS250696, Stanford Data Science and Marlowe Computing Platform, and the AMD University Program for AI & HPC Cluster.

We thank Robyn Lockwood (Stanford Language Center) for editorial and writing suggestions that improved the clarity of the manuscript. We thank Chong Zeng and Ruocheng Wang for early feedback on the manuscript and members of Stanford Vision and Learning Lab and Stanford Graphics Lab for fruitful discussion.

NeuROK: Generative 4D Neural Object Kinematics

Generating simulative 4D dynamics from a static shape

Scan your room and make it interactable!

Interacting with real objects in a Stanford office

Interacting with real objects in a Stanford office kitchen

Interacting with real objects in an apartment kitchen

Interacting with real objects in a Cornell office

Unified model for diverse phenomena

Headphones, flowers, Newton's cradle & more in your office

Curtains, oral rinse, faucets & more in your bathroom

Sponges, kettles, microwaves & more in your kitchen

Simpler coordinates, simpler dynamics

Encoding kinematics

Solving dynamics on a latent state space

Comparisons

Select a dynamic object

Related Projects on 4D Generation

Choreographing a World of Dynamic Objects

Category-Agnostic Neural Object Rigging

Birth and Death of a Rose

BibTeX

Acknowledgements