Why Your Accuracy Score Differs Across Platforms

How chess platforms calculate accuracy and why the numbers rarely agree

ChessOnyx · · 6 min read

Accuracy Platforms Analysis

You play a game on Chess.com, and it tells you your accuracy was 87%. You import the same game to Lichess, and the analysis suggests different inaccuracies. You run it through Stockfish on your computer, and the results differ again. Which one is right?

The answer is: none of them is objectively "right." Each platform uses its own methodology, engine version, analysis depth, and classification criteria. Understanding these differences is crucial for using accuracy metrics productively rather than obsessing over numbers that were never designed to be precise.

How Accuracy Scores Are Calculated

The basic idea behind accuracy scores is simple: compare each of your moves to the engine's best move and measure the difference. But the details of how this comparison is done vary significantly across platforms.

Some platforms use a centipawn loss model — they measure how much evaluation you "lost" with each move compared to the best move, then convert this into a percentage. Others use a win-probability model — they convert evaluations into winning chances and measure how much winning probability you lost. These two approaches can produce meaningfully different results for the same game.

The centipawn loss model tends to be more punishing in already-decisive positions. If you are winning by +5.0 and play a move that drops the evaluation to +3.0, that is a 200 centipawn loss — significant by this metric — even though you are still completely winning. The win-probability model is more forgiving here, since the winning probability barely changes between +5.0 and +3.0.

Conversely, the win-probability model is more sensitive to changes around equality. Dropping from +0.3 to -0.3 might be a small centipawn loss (60 centipawns) but represents a significant shift in winning probability.

The Depth Problem

Perhaps the most significant factor in accuracy score variation is analysis depth. As discussed in our article on engine evaluations, the engine's assessment of a position can change substantially between depth 18 and depth 28.

Most platforms analyze games at moderate depths — typically depth 18-22 — because analyzing millions of games daily at depth 30+ would require enormous computational resources. But at these moderate depths, the engine's "best move" is not always actually the best move. Your supposedly "inaccurate" move might turn out to be superior at higher depths.

This creates an uncomfortable reality: accuracy scores are measuring your play against an imperfect standard. The engine at depth 20 is not an oracle — it is a very strong but imperfect analyst. Treating its assessments as ground truth introduces systematic errors into accuracy calculations.

Classification Thresholds

Different platforms use different thresholds for classifying moves as "best," "excellent," "good," "inaccuracy," "mistake," or "blunder." These thresholds are somewhat arbitrary and significantly affect the narrative of your game review.

One platform might call a 50-centipawn loss a "mistake" while another requires 100 centipawns for that label. One platform might label only the single top engine move as "best" while another considers any move within 10 centipawns of the top move as "best."

These classification choices are design decisions, not chess truths. They affect how many "mistakes" appear in your game review, which moves get highlighted, and ultimately how you perceive your own play. A game that looks clean on one platform might appear error-filled on another, even though the underlying chess was identical.

Engine Versions and Networks

Even when two platforms both use "Stockfish," they may be running different versions with different NNUE networks. Stockfish 15 evaluates positions differently from Stockfish 17, which evaluates differently from Stockfish 18. These differences are usually small but can be significant in specific types of positions.

Some platforms use proprietary engines or modified versions of open-source engines. These may have different evaluation characteristics, strengths, and weaknesses compared to standard Stockfish. When a platform does not disclose which engine and version it uses, accuracy comparisons become even less meaningful.

At ChessOnyx, we run Stockfish 18 with NNUE — the latest stable version — and we are transparent about this. When we release analysis features, you will always know exactly what engine and settings are producing the evaluations.

The Gamification Factor

It is worth acknowledging that accuracy scores serve a dual purpose. They are partly analytical tools and partly engagement features. Platforms have a financial incentive to make analysis feel rewarding and keep users coming back.

This can manifest in subtle ways: generous accuracy scores that make players feel good, dramatic labels that create memorable moments, or performance ratings that suggest players are stronger than they might be. None of this is necessarily dishonest, but it is important to recognize the motivations at play.

The most valuable analysis is not the one that makes you feel best — it is the one that shows you where you can improve. Sometimes the most useful feedback is uncomfortable, pointing out patterns of errors you did not realize you were making.

How to Use Accuracy Scores Productively

Despite their limitations, accuracy scores can be useful when approached correctly:

Track your accuracy on a single platform over time. While the absolute number might not mean much, trends within the same system are meaningful. If your average accuracy improves from 75% to 82% over several months on the same platform, that represents genuine progress — even if the exact numbers are somewhat arbitrary.

Do not compare accuracy scores across platforms. An 85% on one platform is not equivalent to an 85% on another. Compare only within the same system.

Focus on the analysis, not the score. The real value of post-game analysis is understanding specific positions where you went wrong and learning from them. The accuracy percentage is a summary statistic that hides more than it reveals.

Be skeptical of very high accuracy scores. If a platform consistently tells you that you played with 95% accuracy, it might be using generous thresholds rather than reflecting genuinely near-perfect play. Use this as motivation to seek out more rigorous analysis, not as confirmation that your play is flawless.

Remember that chess is played by humans, not engines. A move that is "second best" by engine standards might be the most practical choice in a real game, especially under time pressure. Accuracy scores cannot capture the human elements of chess — time management, psychological pressure, practical complications — that are just as important as pure chess quality.