
Abstract
Reconstructing 3D garments from images is emerging as a practical approach for creating high-quality digital clothing assets. Prior approaches either fine-tune large vision–language models (VLMs) on synthetic garment datasets to recover garments expressed in a domain-specific language (DSL), or learn predictors that regress panel geometry and stitch structure from images or 3D scans. However, these methods often struggle to generalize to in-the-wild images, fail to capture the full expressivity of modern parametric garment representations such as GarmentCode, and are typically restricted to single-layer outfits or predefined templates. To address these challenges, we introduce NGL-Prompter, a training-free pipeline that reconstructs a valid GarmentCode sewing pattern from a single image. We observe that, while VLMs are effective at describing garments in natural language, directly prompting them to estimate GarmentCode parameters yields poor results. To bridge this gap, we propose NGL (Natural Garment Language), a novel intermediate DSL that simplifies GarmentCode into a representation more understandable to language models while preserving its expressive power. Leveraging this language, NGL-Prompter queries large VLMs to extract accurate and structured garment parameters, which are then deterministically mapped to valid GarmentCode. We evaluate our method on the Dress4D and CloSe datasets, as well as on a newly collected dataset of approximately 5,000 in-the-wild fashion images. Our approach achieves state-of-the-art performance on standard geometry metrics and is strongly preferred in both human and GPT-based perceptual evaluations compared to existing baselines. Furthermore, NGL-Prompter recovers multi-layer outfits from a single image, whereas most competing methods are limited to single-layer garments. These results demonstrate that accurate sewing pattern reconstruction is possible without costly model training.