Here's a video that's played back in the space it was captured
๐ It's interactive, drag the viewer to adjust the viewpoint
Locating frames
I was hoping to use the telemetry data from my drone; it produces a text file with it's location as it captures video. However this doesn't include orientation or camera gimble info so I wasn't able to map it into a pose.
So I decided to use COLMAP, a Structure-from-Motion tool which allows you to take a series of images to build a scene. COLMAP stores the position from which each image was captured which I was able to use for aligning the video frames. As a bonus, this works for other video sources, not just drone footage.
I wrote some slightly scrappy code to extract and serialise the poses and points into a ply file that I could load into a webgl component. You can read some of the process (and see some gaussian splats) on this bluesky thread.
Implementation
Iโm pretty happy with how this is structured, itโs a web component that wraps a <video />
element with links to the colmap data:
<pose-tracker poses="poses.ply" points="points.ply">
<video src="motocamp.mp4"></video>
</pose-tracker>
Internally the video element is hidden but still drives the playback of the component, which is some html controls & a webgl canvas element.
The canvas is rendered by threejs. The key trick is using a single 2d texture array to stash the historical video frames, with an instanced mesh that allows everything to be drawn together. My original approach for pushing the frames was using a 2dCanvas to write the pixels into a array buffer, but I found WebGLArrayRenderTarget
which lets you populate texture arrays directly!
I didn't want/need every frame of the video, so I sampled it (from 60 โ 2-5 Hz) and interpolate to find the position at a set timestamp. Orientation is quite straightforward in threejs, but for translation I was really happy when I found curve-interpolator.
Data sources & formats
Using structure from motion is cool, but you can get potentially richer data from sensors on the capture device.
For drones, the UZH-FPV Drone Racing Dataset is a great example of the sort of data that's available.
For capturing from a mobile device WebXR Raw Camera Access could be an option for capturing pose-aligned video.
GoPro cameras have a telemetry format which looks like it captures a bunch of metadata.
And for output formats, I enjoyed using ply because it's so lightweight/flexible (it can be just a text file!). But if I was doing this properly I'd probably use something like mcap to link everything together.
I've got a fairly shonky pipeline for processing videos now, so if there's something that you think would be interesting give me a shout and I'll run it through!