Warning: This post contains spoilers for all of the Marvel Cinematic Universe films. Captain Marvel chases after the Tesseract, the Guardians of the Galaxy steal an orb, Thor tries to capture Aether.
Referring Image Segmentation (RIS) is a fundamental vision-language task that outputs object masks based on text descriptions. Many works have achieved considerable progress for RIS, including ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results