Apple releases Depth Pro, an open source monocular depth estimation AI model

Author:

Apple has released several open source artificial intelligence (AI) models this year. These are mostly small language models designed for a specific task. Adding to the list, the Cupertino-based tech giant has now released a new AI model called Depth Pro. It is a vision model that can generate monocular depth maps of any image. This technology is useful in creating 3D textures, augmented reality (AR) and more. The researchers behind the project claim that depth maps generated by artificial intelligence are better than those generated with the help of multiple cameras.

Apple releases Depth Pro AI model

Depth estimation is an important process in 3D modeling, as well as in various other technologies such as AR, autonomous driving systems, robotics, and more. The human eye is a complex system of lenses that can accurately measure the depth of objects even when viewing them from a single-point perspective. However, cameras are not so good at this. Images taken with a single camera make them appear two-dimensional, removing depth from the equation.

So, for technologies where the depth of the object plays an important role, multiple cameras are used. However, modeling such objects can require a lot of time and resources. Instead, in a research paper titled “Depth Pro: Sharp Monocular Metric Depth in Less Than a Second,” Apple noted that it used a vision-based AI model to generate depth maps without taking monocular images of objects.

apple depth pro github Apple Depth Pro

How the Depth Pro AI model generates depth maps
Photo credit: Apple

To develop the AI ​​model, the researchers used an architecture based on the Vision Transformer (ViT). An output resolution of 384 x 384 was chosen, but the input and processing resolution was kept at 1536 x 1536, allowing the AI ​​model more room to understand the details.

In a pre-print version of the paper, currently published in the online journal arXiv, the researchers claimed that the AI ​​model can now accurately generate depth maps of visually complex objects such as a cage, a furry cat’s body and whiskers, and more. The generation time is said to be one second. The open source AI model weights are currently hosted on the GitHub list. Interested individuals can run the model based on a single GPU.

Leave a Reply

Your email address will not be published. Required fields are marked *