🏆 FysicsWorld Leaderboard
Evaluation results for OmniLLM / MLLM models.
Qwen3-Omni-30B-A3B | MLLM | 62.64 | 69.52 | 63.11 | 73.78 | 68.37 | 68.13 | 55.48 | 72.15 | 51.69 | 44.55 | 58.53 | 45.71 | 61.74 | 58.88 |
📊 Overall Score Definition
To facilitate clearer and more consistent comparison across models, we introduce an Overall score for each leaderboard track.
1. OmniLLM / MLLM
The Overall score is computed as the arithmetic mean of all reported task-specific scores.
2. Image Generation
The evaluation involves metrics defined on different numerical scales. WIScore is used for image generation, while VIEScore (averaged over three dimensions) is used for image editing.
The Overall score is defined as:
$$ \text{Overall}=\frac{(\text{WIScore}\times 10)+\left(\frac{\sum \text{VIEScore}}{3}\right)}{2} $$
This normalization-based formulation ensures a balanced contribution from both image generation and image editing performance.
3. Video Generation
The Overall score is calculated as the arithmetic mean of all evaluated dimensions, including imaging quality, aesthetics, motion, and temporal consistency.