Current Large Audio Language Models largely transcribe rather than listen

6 points by crmne