Abstract: In order to address the challenge of feature fusion in multimodal audio-visual question answering tasks by proposing a question-guided dual-path cross-attention enhancement module. The main ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results