Zoo3D: Zero-Shot 3D Object Detection at Scene Level

arXiv — cs.CV•Wednesday, November 26, 2025 at 5:00:00 AM

PositiveArtificial Intelligence

Zoo3D has been introduced as the first training-free 3D object detection framework, enabling the construction of 3D bounding boxes through graph clustering of 2D instance masks. This innovative approach allows for the recognition of previously unseen objects without the need for extensive training, marking a significant advancement in 3D object detection technology.
The development of Zoo3D is crucial as it addresses the limitations of existing closed-set methods and enhances the capability of models to operate in real-world environments, where diverse and untrained objects are prevalent. This positions Zoo3D as a potential game-changer in the field of spatial understanding and computer vision.
The introduction of Zoo3D aligns with ongoing efforts in the AI community to improve object detection and segmentation techniques, particularly in dynamic environments. Similar frameworks are emerging that tackle challenges such as out-of-distribution detection and class imbalance, indicating a broader trend towards more adaptable and robust AI systems capable of handling complex real-world scenarios.

— via World Pulse Now AI Editorial System

Read Original

Was this article worth reading? Share it

Continue Readings

arXiv — cs.CVa day ago

Proxy-Free Gaussian Splats Deformation with Splat-Based Surface Estimation

PositiveArtificial Intelligence

A new method called SpLap has been introduced for proxy-free deformation of Gaussian splats, utilizing a surface-aware splat graph to enhance the quality of deformations while minimizing computational overhead. This approach overcomes limitations of traditional methods that rely on proxies, which can be of varying quality and add complexity to the deformation process.

Read full article

via arXiv — cs.CV

arXiv — cs.LGa day ago

Actionable and diverse counterfactual explanations incorporating domain knowledge and causal constraints

PositiveArtificial Intelligence

A new method for generating Diverse, Actionable, and kNowledge-Constrained Explanations (DANCE) has been proposed to enhance the interpretability of machine learning models by identifying minimal changes needed to achieve desired outcomes. This method addresses the limitations of existing approaches by incorporating feature dependencies and causal constraints, ensuring that the generated counterfactuals are both plausible and actionable.

Read full article

via arXiv — cs.LG

arXiv — cs.LGa day ago

A Comprehensive Survey on Long Context Language Modeling

PositiveArtificial Intelligence

A comprehensive survey on Long Context Language Models (LCLMs) has been published, highlighting the importance of efficiently processing long textual inputs in Natural Language Processing. The survey covers effective strategies for obtaining, training, and evaluating LCLMs, addressing the increasing demand for handling extensive documents and dialogues in various applications.

Read full article

via arXiv — cs.LG

arXiv — cs.CVa day ago

From Forecasting to Planning: Policy World Model for Collaborative State-Action Prediction

PositiveArtificial Intelligence

A new paradigm called Policy World Model (PWM) has been introduced, integrating world modeling and trajectory planning into a unified architecture. This model enhances the planning capabilities of autonomous systems by utilizing learned world knowledge through an action-free future state forecasting scheme, enabling more reliable planning performance through collaborative state-action prediction.

Read full article

via arXiv — cs.CV

arXiv — cs.CVa day ago

OceanGym: A Benchmark Environment for Underwater Embodied Agents

PositiveArtificial Intelligence

OceanGym has been introduced as the first comprehensive benchmark for underwater embodied agents, aimed at enhancing AI capabilities in challenging oceanic environments characterized by low visibility and dynamic currents. This benchmark includes eight realistic task domains and utilizes Multi-modal Large Language Models (MLLMs) to integrate perception, memory, and decision-making processes.

Read full article

via arXiv — cs.CV

arXiv — cs.LGa day ago

MXtalTools: A Toolkit for Machine Learning on Molecular Crystals

PositiveArtificial Intelligence

MXtalTools has been introduced as a flexible Python package designed for data-driven modeling of molecular crystals, enhancing machine learning applications in the molecular solid state. The toolkit includes utilities for dataset curation, model training, crystal parameterization, and high-throughput modeling using CUDA acceleration.

Read full article

via arXiv — cs.LG

arXiv — cs.LGa day ago

FunReason: Enhancing Large Language Models' Function Calling via Self-Refinement Multiscale Loss and Automated Data Refinement

PositiveArtificial Intelligence

FunReason has been introduced as a novel framework aimed at enhancing the function calling capabilities of large language models (LLMs) through an automated data refinement strategy and a Self-Refinement Multiscale Loss (SRML) approach. This development addresses the challenges of integrating reasoning processes with accurate function execution, which has been a significant hurdle in optimizing LLM performance in real-world applications.

Read full article

via arXiv — cs.LG

arXiv — cs.LGa day ago

SafeFix: Targeted Model Repair via Controlled Image Generation

PositiveArtificial Intelligence

A new model repair module named SafeFix has been introduced to address systematic errors in deep learning models for visual recognition, particularly those stemming from underrepresented semantic subpopulations. This module utilizes a conditional text-to-image model to generate targeted images for failure cases, enhancing the model's performance by ensuring semantic consistency with the original data distribution.

Read full article

via arXiv — cs.LG