MiMo-V2-Pro utilizes a 7:1 hybrid ratio (increased from 5:1 in the Flash version) to manage its massive 1M-token context window.
New enterprise workbench helps organizations design, build, evaluate, and operate domain-specific language models using ...
CTI-REALM is Microsoft’s open-source benchmark that evaluates AI agents on real-world detection engineering. It measures whether an agent can take cyber threat intelligence (CTI) and produce validated ...
Nvidia has set new MLPerf performance benchmarking records on its H200 Tensor Core GPU and TensorRT-LLM software. MLPerf Inference is a benchmarking suite that measures inference performance across ...
In the last few years, Chinese AI startup MiniMax has become one of the most exciting in the crowded global AI marketplace, ...
The go-to benchmark for artificial intelligence (AI) chatbots is facing scrutiny from researchers who claim that its tests favor proprietary AI models from big tech companies. LM Arena effectively ...
A new technical paper titled “FVEval: Understanding Language Model Capabilities in Formal Verification of Digital Hardware” was published by researchers at UC Berkeley and NVIDIA. “The remarkable ...
Google has introduced a leaderboard that benchmarks how well AI models handle Android mobile development tasks.
Fractal Analytics announced the launch of LLM Studio, an enterprise platform that helps organizations build and run language models tailored to their business. It is designed for teams that want more ...
This illustrates a widespread problem affecting large language models (LLMs): even when an English-language version passes a safety test, it can still hallucinate dangerous misinformation in other ...