Heaptalk, Jakarta — DeepSeek, a Chinese company, announced its latest multimodal large language model (MLLM), Janus Pro, to challenge OpenAI’s Dall-E 3 (01/27). Multimodal LLMs are trained to understand and generate content across various formats, including text, images, audio, and video.
This model has been trained in 1 billion to 7 billion parameters. The company applies parameters corresponding to a model’s problem-solving skills, and models with more parameters generally perform better than those with fewer parameters. Janus Pro is licensed under the MIT License, meaning it can be used commercially without restriction.
“Janus Pro is a novel autoregressive framework that unifies multimodal understanding and generation. It addresses the limitations of previous approaches by decoupling visual encoding into separate pathways, while still utilizing a single, unified transformer architecture for processing,” DeepSeek stated on Hugging Face (01/27).
Analyzing & creating new images

The company claimed that this AI model surpasses previous unified models and matches or exceeds the performance of task-specific models. Its simplicity, flexibility, and effectiveness qualify it for next-generation unified multimodal models.
Furthermore, Janus Pro can both analyze and create new images. It is constructed based on the DeepSeek-LLM-1.5 b Base/DeepSeek-LLM-7 b Base. It uses the SigLIP-L as the vision encoder for multimodal understanding, which supports 384 x 384 image input. For image generation, Janus Pro uses the tokenizer from here with a downsample rate of 16.
DeepSeek, a Chinese AI lab primarily funded by the quantitative trading firm High-Flyer Capital Management, broke into the mainstream consciousness this week after its chatbot app rose to the top of the Apple App Store charts. Founded in 2023, the company aims to make artificial general intelligence (AGI) a reality.