Visual Program Distillation: Distilling Tools and Programmatic Reasoning into Vision-Language Models
Abstract: Solving complex visual tasks such as “Who invented the musical instrument on the right?” involves a composition of skills: understanding space, recognizing instruments, and also retrieving ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results