🏆 FysicsWorld Leaderboard

🏠 Project Page 📖 Paper 🤗 Dataset 👾 ModelScope

We introduce FysicsWorld, the first unified full-modality benchmark that supports bidirectional input-output across image, video, audio, and text, enabling comprehensive any-to-any evaluation across understanding, generation, and reasoning. Our systematic design spans uni-modal perception tasks to fusion-dependent reasoning under strong cross-modal coupling, allowing us to diagnose, with unprecedented clarity, the limitations and emerging strengths of modern multimodal and omni-modal architectures.

Evaluation results for OmniLLM / MLLM models.


Qwen3-Omni-30B-A3B	MLLM	62.64	69.52	63.11	73.78	68.37	68.13	55.48	72.15	51.69	44.55	58.53	45.71	61.74	58.88


Gemini-2.5-Pro	MLLM	62.64	69.52	63.11	73.78	68.37	-	-	72.15	51.69	44.55	-	-	61.74	58.88
GPT-5	MLLM	61.88	75.71	68.89	68.89	70.23	68.13	55.48	70.47	47.83	47.73	58.53	45.71	65.44	61.42
Qwen3-Omni-30B-A3B	Omni	61.52	72.86	67.56	72.89	66.05	66.57	58.84	70.47	53.62	45.45	57.73	41.9	65.44	60.41
Ming-lite-Omni-1.5	Omni	55.02	60.95	60.44	67.56	62.33	61.15	52.7	60.40	44.44	37.73	59.26	43.81	52.68	51.78
Stream-Omni	Omni	54.00	53.33	-	58.22	65.12	52.82	40.15	54.36	-	-	-	-	-	-
Qwen2.5-Omni-7B	Omni	53.26	60.95	61.78	66.67	60.47	59.8	37.05	60.40	45.89	38.64	50.38	39.05	60	51.27
VITA-1.5	Omni	47.81	52.86	49.33	55.56	55.81	52.3	28.91	54.36	45.41	35.45	55.62	36.67	51.01	48.22
Baichuan-Omni-1.5	Omni	47.53	56.19	57.33	60.00	54.88	55.46	39.23	52.35	37.69	30	50.68	32.38	49	42.64
MiniCPM-o 2.6	Omni	46.46	57.62	48	60.00	57.21	54.7	32.5	53.02	35.27	32.73	53.41	35.81	43.62	40.1

📊 Overall Score Definition

To facilitate clearer and more consistent comparison across models, we introduce an Overall score for each leaderboard track.

1. OmniLLM / MLLM
The Overall score is computed as the arithmetic mean of all reported task-specific scores.

2. Image Generation
The evaluation involves metrics defined on different numerical scales. WIScore is used for image generation, while VIEScore (averaged over three dimensions) is used for image editing.
The Overall score is defined as:

$$ \text{Overall}=\frac{(\text{WIScore}\times 10)+\left(\frac{\sum \text{VIEScore}}{3}\right)}{2} $$

This normalization-based formulation ensures a balanced contribution from both image generation and image editing performance.

3. Video Generation
The Overall score is calculated as the arithmetic mean of all evaluated dimensions, including imaging quality, aesthetics, motion, and temporal consistency.