Model Performance Benchmarking in LLM

Xiaomi stuns with new MiMo-V2-Pro LLM nearing GPT-5.2, Opus 4.6 performance at a fraction of the cost

MiMo-V2-Pro utilizes a 7:1 hybrid ratio (increased from 5:1 in the Flash version) to manage its massive 1M-token context window.

TMCnet

Fractal Introduces LLM Studio to Bring Enterprise-Grade GenAI Customization with NVIDIA NeMo and NVIDIA NIM Microservices

New enterprise workbench helps organizations design, build, evaluate, and operate domain-specific language models using ...

Microsoft

CTI-REALM: A new benchmark for end-to-end detection rule generation with AI agents

CTI-REALM is Microsoft’s open-source benchmark that evaluates AI agents on real-world detection engineering. It measures whether an agent can take cyber threat intelligence (CTI) and produce validated ...

SDxCentral

Nvidia sets benchmarking performance records with its H200 and TensorRT-LLM software

Nvidia has set new MLPerf performance benchmarking records on its H200 Tensor Core GPU and TensorRT-LLM software. MLPerf Inference is a benchmarking suite that measures inference performance across ...

New MiniMax M2.7 proprietary AI model is 'self-evolving' and can perform 30-50% of reinforcement learning research workflow

In the last few years, Chinese AI startup MiniMax has become one of the most exciting in the crowded global AI marketplace, ...

Yahoo

Show inaccessible results

Xiaomi stuns with new MiMo-V2-Pro LLM nearing GPT-5.2, Opus 4.6 performance at a fraction of the cost

Fractal Introduces LLM Studio to Bring Enterprise-Grade GenAI Customization with NVIDIA NeMo and NVIDIA NIM Microservices

CTI-REALM: A new benchmark for end-to-end detection rule generation with AI agents

Nvidia sets benchmarking performance records with its H200 and TensorRT-LLM software

New MiniMax M2.7 proprietary AI model is 'self-evolving' and can perform 30-50% of reinforcement learning research workflow

AI benchmarking platform is helping top companies rig their model performances, study claims

Benchmark and Evaluation Framework For Characterizing LLM Performance In Formal Verification (UC Berkeley, Nvidia)

Google intros benchmark of AI models for Android development

Fractal launches LLM Studio powered by NVIDIA AI infrastructure

Top AI models underperform in languages other than English