(RTTNews) - Chinese tech giant Alibaba Cloud on Wednesday unveiled its latest visual-language model, Qwen2.5-VL, which it claims to be a significant improvement from its predecessor, Qwen2-VL.
The open-source, multimodal model is offered in various sizes, ranging from 3 billion, 7 billion to 72 billion parameters, and includes both base and instruction-tuned versions.
"Qwen 2.5-Max outperforms ... almost across the board GPT-4o, DeepSeek-V3 and Llama-3.1-405B," Alibaba's cloud unit said in an announcement posted on its official WeChat account, referring to OpenAI and Meta's most advanced open-source AI models.
The flagship model, Qwen2.5-VL-72B-Instruct, is now accessible through the Qwen Chat platform, while the entire Qwen2.5-VL series is available on Hugging Face and Alibaba's open-source community Model Scope.
Alibaba claims that Qwen2.5-VL demonstrates remarkable multimodal capabilities, excelling in advanced visual comprehension of texts, charts, diagrams, graphics, and layouts within images. It can also understand videos longer than an hour and answer video-related questions, while accurately identifying specific segments down to the exact second.
The views and opinions expressed herein are the views and opinions of the author and do not necessarily reflect those of Nasdaq, Inc.