CSC 461 - VR Project

Patrick Holland, Nick Karamanian

University of Victoria - Fall 2019

Presentation & Report

Our presentation can be found here and our report can be seen here.

Project Updates

Popular Codecs: The popular codecs in virtual reality videos are currently the same as normal videos. H.264, MPEG-4, and H.265 are the most common choices. These codecs are well known, well tested, and widely supported. They provide an assortment of different profiles which allows the video encoding and compression to be tailored to the devices it is going to be played on.
H.264 Profile: Chris Milk from VRSE was one of the more well established VR filmmakers in 2015 and inspecting some of his videos reveals the H.264 Baseline profile at level 4.2. This results in a 3840x2160 display resolution at 30 frames per second which comes out to 20-30Mbps. Each device supports up to a certain H.264 level with newer devices supporting higher levels. This example was from 2015 so today's devices should be able to support much higher levels and even H.265.
H.265: This codec is the successor to H.264 and offers similar quality at around half the bitrate or double the quality at the same bitrate. It's newer than H.264 so it doesn't have as much support and hasn't been tested as thoroughly, but both of these factors are becoming less true each year.

Optimizations

Still Image Trick: One very clever and powerful trick shown in this blog post is to have the top and bottom of the video be still images. VR demands much higher resolutions than traditional video playback, but devices such as smartphones don't necessarily have the computing power available to handle those resolutions. Imagine a video that has a resolution of 3840x3840. In most cases, the objects most important to the scene will be found at mid-height (for example, the 3840x1536 middle portion of the scene). This means that instead of using video, we can use a still image for the top and bottom of the scene (both 3840x1152). This trick will allow the user to experience a 3840x3840 resolution scene while their device only needs to decode a 3840x1536 video.
Spherical Wavelets: The majority of video encoding algorithms operate on 2D planes in order to compress and produce a 2D picture. However, VR uses interactive input from the user in the 3D plane as input into the VR system, which is then compressed and projected back onto a simulated 3D video plane for viewing. In order for information to be effictiently compressed in this manner, the most commonly used method converts the inputted data into 2D rectangular wavelets and projects the wavelets onto a 3D sphere. This is known as a reverse equirectangular projection.
Asynchronous Video Loading: In VR, the most imperative system for good user experience is for head movements to match the video feed. Typically, VR systems will compress and project video as soon as sensored head movements complete calculations. This is done in parallel to all other sensor calculations such as hand movements and buttons. Overall, this decreases the compression ratio, but will also decrease perceived latency.

Schedule & Challenges

Schedule: We are on schedule. Our plan moving forward is still to continue finding more resources on the topic of VR coding and compression while reading on the various methods used in industry.
Technical Challenges: The main challenges we've had so far is with finding papers and articles pertaining to what we are looking to learn. The original plan was to learn about compression and coding schemes used between rendering/generating a scene on a desktop PC and transmitting that information to the tethered headset. Most of the papers and information that we've found while searching for variations of "compression in vr" have been in relation to 360 degree VR video being streamed to a smartphone. This is still a fascinating topic, but it seems that most video that falls under this category just uses fairly standard H.264 or H.265 codecs and profiles. The papers then go into detail about complex theories that could be applied to reduce the bitrate while keeping the quality sufficiently high. A lot of the techniques being discussed in these papers are currently above our level of understanding so it's hard to articulate some of the methods we've seen thus far.

Project Proposal

Team name: Mr. Oculus, I don't feel so good..
Project Topic: Our topic is to explore the coding and compression techniques used for Virtual Reality (VR), specifically for VR headsets connected to a GPU. Data compression and delivery for VR is very important because inconsistencies or unexpected outcomes in the video output while users move their heads can lead to negative user experience and extreme nausea.
What's been done already: VR is an emerging industry and so there are many coding and compression techniques currently in development. Very few are open source so it will be difficult forus to find specific details, but two standards that look interesting are JPEG XS and MPEG-OMAF. JPEG XS is a lossless compression standard for images that focuses on low latency and minimal complexity. MPEG-OMAF is a standard for storage and transmission of 360 degree video that has similar motivations to JPEG XS. These compression techniques are relatively new so exploring and perhaps testing them may produce unexpected results.
Expected deliverables: There will be a report. We likely will not be able to have a demo since these standards are still in development and the technology is not easy to get our hands on. This may change as we read more about it.
Schedule: At this point, we are continuing to search for more info on the topic each day and will read more about VR while keeping notes of what we think is important to write about in our final report. We will keep the website updated with our findings.