Abstract Typical AI downscaling models ingest large data sets, obscuring physical insight and straining computational resources. Guided by meteorological theory, a compact yet physically informative input set alleviates these limitations. Analyses of TaiwanVVM simulations reveals that the synoptic‐scale upstream sounding largely controls the evolution of meso‐γ $\gamma $‐scale circulations over complex terrain. Accordingly, AI models are trained to reconstruct 2 km‐resolution near‐surface winds from a single upstream sounding. A sounding‐to‐image variational autoencoder reproduces the diurnal circulation with one‐third the mean square error (MSE) of previous architectures. Jacobian‐based sensitivity analysis quantifies the influence of each sounding element: the model depends primarily on wind speed near 560 m, lower‐level wind direction, and time of day when estimating winds at 100 m, confirming physically consistent learning. The lean input also enables a transformer‐based architecture, which further reduces MSE and improves generalization, especially in rugged mountain. These results demonstrate that compact, physically grounded inputs yield interpretable AI downscaling.