Gaussian Point Splatting
12 points by Yogthos
12 points by Yogthos
What sort of process produces gaussian splats instead of something more traditional (e.g. triangles or voxels)?
I’m no graphics expert, but IIRC it’s a process where multiple 2D photos of a scene from many angles are combined, and the resulting splats form a 3d scene.
What do you mean? e.g are you asking about the original data source or how you end up with "splat" information for rendering vs triangles?
Yes. How do you end up with gaussians to render in the first place?
That was an or - you can't answer with yes :p
Very big caveat this is just my high level understanding from years of sporadic articles appearing on lobsters :D
The first bit is to have lots lots of overlapping photos (other wise known as film :D), from that you use (or work out? I'm not sure about this) camera position direction information, and create a point cloud where every point has a color of the associated image data - a single point cloud contains all of the information for every image. Then you do <maths> and end up with a gaussian thing? for each point. Then you render that cloud from the camera settings for every original picture, then you do maths again to modify the data for the gaussian maths thing for each point to try to get the rendering results closer to the original photos. You then just repeat continuously until you believe the final renderings are close enough to the originals.
At that point, the idea is that because the single combined model of 3d space is simultaneously correct for every individual photo, it is in principle accurate for the entire world as captured by those photos. That means you if you were to render an image from a different location or direction, as long as there were photos covering the area, the rendered images should approximately match the image that would have been rendered, had that location actually have been one of the original source images.
https://en.wikipedia.org/wiki/Gaussian_splatting#Method
I think you start with photos from multiple angles like @snej said, and then find a “best fit” set of splats, a lot like other kinds of curve fitting or model training.
Not an expert, but it makes sense! Gaussians seem like a very good shape for this because they’re continuous and differentiable (so, easy to nudge), and localized in space (each one touches relatively few pixels).
My information may be a few years out of date at this point, but the first step is to feed your photos to COLMAP to receive a point cloud with inferred camera locations. Can't recall if it was the "sparse" or "dense" cloud it outputs. Then you initialize the actual optimization with these points.
The target is to minimize image-space error between the original photos and the rendered splats from each camera location. This is where the differentiable rendering stuff happens. But there's a kludge: the numerical optimization is periodically halted and splats in too-dense areas are pruned and correspondingly in sparse areas other points are split in two. This is a heuristic and not differentiable; you could say it's one step of some alternating error minimization scheme. There have been attempts to formulate this in a more mathematical way but it's unclear to me if they worked any better in practice.