World PulseNowPowered by AI

Trending:

Controllable Long-term Motion Generation with Extended Joint Targets

arXiv — cs.CV•Friday, December 5, 2025 at 5:00:00 AM

PositiveArtificial Intelligence

A new framework called COMET has been introduced for generating stable and controllable character motion in real-time, addressing challenges in computer animation related to fine-grained control and motion degradation over long sequences. This autoregressive model utilizes a Transformer-based conditional VAE to allow precise control over user-specified joints, enhancing tasks such as goal-reaching and in-betweening.
The development of COMET is significant as it enables robust long-horizon synthesis and real-time style transfer, which can greatly enhance interactive applications in gaming and virtual environments. Its innovative reference-guided feedback mechanism prevents error accumulation, ensuring long-term temporal stability in motion generation.
This advancement reflects a broader trend in artificial intelligence where frameworks leveraging Transformer architectures are increasingly being applied across various domains, including Computer-Aided Design and multimodal understanding. The integration of efficient models like COMET and others demonstrates a growing emphasis on real-time processing capabilities and the need for models that can operate effectively in resource-constrained environments.

— via World Pulse Now AI Editorial System

Was this article worth reading? Share it

Recommended apps based on your readingExplore all apps

SuperMotion

Transform short clips and images into stunning, professional-quality videos effortlessly.

Marketing & CommerceTry the app

Videotok

Generate viral videos automatically using advanced AI technology.

AI & DataTry the app

Cometapi-e0d0fd

Access all major AI models through one unified API for seamless integration.

AI & DataTry the app

Continue Readings

LaFiTe: A Generative Latent Field for 3D Native Texturing

arXiv — cs.CV18 hours ago

LaFiTe: A Generative Latent Field for 3D Native Texturing

PositiveArtificial Intelligence

LaFiTe has been introduced as a novel framework for generating high-fidelity, seamless textures directly on 3D surfaces, addressing the limitations of traditional UV-based and multi-view projection methods. By utilizing a variational autoencoder (VAE), LaFiTe encodes complex surface appearances into a structured latent space, which is then decoded into a continuous color field, achieving unprecedented texture fidelity.

Read full article

via arXiv — cs.CV

Self-Paced and Self-Corrective Masked Prediction for Movie Trailer Generation

arXiv — cs.CV18 hours ago

Self-Paced and Self-Corrective Masked Prediction for Movie Trailer Generation

PositiveArtificial Intelligence

A new method for movie trailer generation, named SSMP, has been proposed, which utilizes self-paced and self-corrective masked prediction to enhance the quality of trailers by employing bi-directional contextual modeling. This approach addresses the limitations of traditional selection-then-ranking methods that often lead to error propagation in trailer creation.

Read full article

via arXiv — cs.CV

Tokenizing Buildings: A Transformer for Layout Synthesis

arXiv — cs.CV18 hours ago

Tokenizing Buildings: A Transformer for Layout Synthesis

PositiveArtificial Intelligence

A new Transformer-based architecture called Small Building Model (SBM) has been introduced for layout synthesis in Building Information Modeling (BIM) scenes. This model addresses the challenge of tokenizing buildings by integrating diverse architectural features into sequences while maintaining their compositional structure, utilizing a sparse attribute-feature matrix to represent room properties.

Read full article

via arXiv — cs.CV

Sliding-Window Merging for Compacting Patch-Redundant Layers in LLMs

arXiv — cs.CV18 hours ago

Sliding-Window Merging for Compacting Patch-Redundant Layers in LLMs

PositiveArtificial Intelligence

A new method called Sliding-Window Merging (SWM) has been proposed to enhance the efficiency of large language models (LLMs) by compacting patch-redundant layers. This technique identifies and merges consecutive layers based on their functional similarity, thereby maintaining performance while simplifying model architecture. Extensive experiments indicate that SWM outperforms traditional pruning methods in zero-shot inference performance.

Read full article

via arXiv — cs.CV

Reconstructing KV Caches with Cross-layer Fusion For Enhanced Transformers

arXiv — cs.CL2 days ago

Reconstructing KV Caches with Cross-layer Fusion For Enhanced Transformers

PositiveArtificial Intelligence

Researchers have introduced FusedKV, a novel approach to reconstructing key-value (KV) caches in transformer models, enhancing their efficiency by fusing information from bottom and middle layers. This method addresses the significant memory demands of KV caches during long sequence processing, which has been a bottleneck in transformer performance. Preliminary findings indicate that this fusion retains essential positional information without the computational burden of rotary embeddings.

Read full article

via arXiv — cs.CL

SimFlow: Simplified and End-to-End Training of Latent Normalizing Flows

arXiv — cs.CV2 days ago

SimFlow: Simplified and End-to-End Training of Latent Normalizing Flows

PositiveArtificial Intelligence

SimFlow introduces a simplified and end-to-end training method for Latent Normalizing Flows (NFs), addressing limitations in previous models that relied on complex noise addition and frozen VAE encoders. By fixing the variance to a constant, the model enhances the encoder's output distribution and stabilizes training, leading to improved image reconstruction and generation quality.

Read full article

via arXiv — cs.CV

DynamicCity: Large-Scale 4D Occupancy Generation from Dynamic Scenes

arXiv — cs.CV2 days ago

DynamicCity: Large-Scale 4D Occupancy Generation from Dynamic Scenes

PositiveArtificial Intelligence

DynamicCity has introduced a groundbreaking 4D occupancy generation framework that enhances urban scene generation by focusing on the dynamic nature of real-world driving environments. This framework utilizes a VAE model and a novel Projection Module to create high-quality dynamic 4D scenes, significantly improving fitting quality and reconstruction accuracy.

Read full article

via arXiv — cs.CV

Density-Informed VAE (DiVAE): Reliable Log-Prior Probability via Density Alignment Regularization

arXiv — cs.LG2 days ago

Density-Informed VAE (DiVAE): Reliable Log-Prior Probability via Density Alignment Regularization

PositiveArtificial Intelligence

A new method called Density-Informed VAE (DiVAE) has been introduced, which enhances the Variational Autoencoder (VAE) framework by aligning the log-prior probability with data-derived log-density estimates. This approach allows for better allocation of posterior mass in relation to data-space density and improves prior coverage, particularly in synthetic datasets and the MNIST dataset.

Read full article

via arXiv — cs.LG