Towards Synthesizing High-Dimensional Tabular Data with Limited Samples
PositiveArtificial Intelligence
The introduction of CtrTab marks a significant advancement in the synthesis of high-dimensional tabular data, addressing the limitations faced by existing diffusion-based models. These models often degenerate in performance when data dimensionality increases, particularly in low-sample scenarios, leading to results that can be inferior to simpler, non-diffusion-based approaches. CtrTab mitigates these issues by incorporating perturbed ground-truth samples as auxiliary inputs during training, which stabilizes the learning process and enhances the model's sensitivity to control signals. Experimental results demonstrate that CtrTab outperforms state-of-the-art models by an impressive margin, achieving over 90% accuracy on average. This development is crucial for various applications in artificial intelligence and data science, where high-dimensional data is common yet challenging to synthesize effectively.
— via World Pulse Now AI Editorial System