Top 5 This Week

Related Posts

Apple releases Depth Pro, an AI model that rewrites the rules of 3D vision


Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More


Appleโ€™s AI research team has developed a new model that could significantly advance how machines perceive depth, potentially transforming industries ranging from augmented reality to autonomous vehicles.

The system, calledย Depth Pro, is able to generate detailed 3D depth maps from single 2D images in a fraction of a secondโ€”without relying on the camera data traditionally needed to make such predictions.

The technology, detailed in a research paper titledย โ€œDepth Pro: Sharp Monocular Metric Depth in Less Than a Second,โ€ย is a major leap forward in the field of monocular depth estimation, a process that uses just one image to infer depth.

This could have far-reaching applications across sectors where real-time spatial awareness is key. The modelโ€™s creators, led by Aleksei Bochkovskii and Vladlen Koltun, describeย Depth Proย as one of the fastest and most accurate systems of its kind.

A comparison of depth maps from Appleโ€™s Depth Pro, Marigold, Depth Anything v2, and Metric3D v2. Depth Pro excels in capturing fine details like fur and birdcage wires, producing sharp, high-resolution depth maps in just 0.3 seconds, outperforming other models in accuracy and detail. (credit: arxiv.org)

Monocular depth estimation has long been a challenging task, requiring either multiple images or metadata like focal lengths to accurately gauge depth.

Butย Depth Proย bypasses these requirements, producing high-resolution depth maps in just 0.3 seconds on a standard GPU. The model can create 2.25-megapixel maps with exceptional sharpness, capturing even minute details like hair and vegetation that are often overlooked by other methods.

โ€œThese characteristics are enabled by a number of technical contributions, including an efficient multi-scale vision transformer for dense prediction,โ€ the researchers explain in their paper. This architecture allows the model to process both the overall context of an image and its finer details simultaneouslyโ€”an enormous leap from slower, less precise models that came before it.

A comparison of depth maps from Appleโ€™s Depth Pro, Depth Anything v2, Marigold, and Metric3D v2. Depth Pro excels in capturing fine details like the deerโ€™s fur, windmill blades, and zebraโ€™s stripes, delivering sharp, high-resolution depth maps in 0.3 seconds. (credit: arxiv.org)

Metric depth, zero-shot learning

What truly setsย Depth Proย apart is its ability to estimate both relative and absolute depth, a capability called โ€œmetric depth.โ€

This means that the model can provide real-world measurements, which is essential for applications like augmented reality (AR), where virtual objects need to be placed in precise locations within physical spaces.

Andย Depth Proย doesnโ€™t require extensive training on domain-specific datasets to make accurate predictionsโ€”a feature known as โ€œzero-shot learning.โ€ This makes the model highly versatile. It can be applied to a wide range of images, without the need for the camera-specific data usually required in depth estimation models.

โ€œDepth Pro produces metric depth maps with absolute scale on arbitrary images โ€˜in the wildโ€™ without requiring metadata such as camera intrinsics,โ€ the authors explain. This flexibility opens up a world of possibilities, from enhancing AR experiences to improving autonomous vehiclesโ€™ ability to detect and navigate obstacles.

For those curious to experience Depth Pro firsthand, a live demo is available on the Hugging Face platform.

A comparison of depth estimation models across multiple datasets. Appleโ€™s Depth Pro ranks highest overall with an average rank of 2.5, outperforming models like Depth Anything v2 and Metric3D in accuracy across diverse scenarios. (credit: arxiv.org)

Real-world applications: From e-commerce to autonomous vehicles

This versatility has significant implications for various industries. In e-commerce, for example,ย Depth Proย could allow consumers to see how furniture fits in their home by simply pointing their phoneโ€™s camera at the room. In the automotive industry, the ability to generate real-time, high-resolution depth maps from a single camera could improve how self-driving cars perceive their environment, boosting navigation and safety.

โ€œThe method should ideally produce metric depth maps in this zero-shot regime to accurately reproduce object shapes, scene layouts, and absolute scales,โ€ the researchers write, emphasizing the modelโ€™s potential to reduce the time and cost associated with training more conventional AI models.

Tackling the challenges of depth estimation

One of the toughest challenges in depth estimation is handling what are known as โ€œflying pixelsโ€โ€”pixels that appear to float in mid-air due to errors in depth mapping.ย Depth Proย tackles this issue head-on, making it particularly effective for applications like 3D reconstruction and virtual environments, where accuracy is paramount.

Additionally,ย Depth Proย excels in boundary tracing, outperforming previous models in sharply delineating objects and their edges. The researchers claim it surpasses other systems โ€œby a multiplicative factor in boundary accuracy,โ€ which is key for applications that require precise object segmentation, such as image matting and medical imaging.

Open-source and ready to scale

In a move that could accelerate its adoption, Apple has madeย Depth Proย open-source. The code, along with pre-trained model weights, is available on GitHub, allowing developers and researchers to experiment with and further refine the technology. The repository includes everything from the modelโ€™s architecture to pretrained checkpoints, making it easy for others to build on Appleโ€™s work.

The research team is also encouraging further exploration ofย Depth Proโ€™s potential in fields like robotics, manufacturing, and healthcare. โ€œWe release code and weights atย https://github.com/apple/ml-depth-pro,โ€ย the authors write, signaling this as just the beginning for the model.

Whatโ€™s next for AI depth perception

As artificial intelligence continues to push the boundaries of whatโ€™s possible,ย Depth Proย sets a new standard in speed and accuracy for monocular depth estimation. Its ability to generate high-quality, real-time depth maps from a single image could have wide-ranging effects across industries that rely on spatial awareness.

In a world where AI is increasingly central to decision-making and product development,ย Depth Proย exemplifies how cutting-edge research can translate into practical, real-world solutions. Whether itโ€™s improving how machines perceive their surroundings or enhancing consumer experiences, the potential uses forย Depth Proย are broad and varied.

As the researchers conclude, โ€œDepth Pro dramatically outperforms all prior work in sharp delineation of object boundaries, including fine structures such as hair, fur, and vegetation.โ€ With its open-source release,ย Depth Proย could soon become integral to industries ranging from autonomous driving to augmented realityโ€”transforming how machines and people interact with 3D environments.

#Apple #releases #Depth #Pro #model #rewrites #rules #vision
source: https://venturebeat.com/ai/apple-releases-depth-pro-an-ai-model-that-rewrites-the-rules-of-3d-vision/

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Popular Articles