The Runs to Wins Bridge
- biljames
- Sep 5, 2025
- 5 min read
Updated: Sep 5, 2025
The purpose of this article is to explain how Cramer Levels can be translated into expected winning percentages—how, and to an extent why it is done that way. A Cramer Level of 1.0 is a separation between two teams of 1.0 runs per game, such that if the two teams were to play a sufficient number of games in an offensive context of 4.50 runs per game, it would be expected that the higher-rated team would outscore the other by an average of 1.0 runs per game.
In the simplest example, a separation of 1.0 runs per game in an offensive context of 4.50 runs per game would be 5 runs per game against 4. +.5, against -.5. That approach does not reliably work here, however, because the system anticipates comparisons between teams across a very wide gulf of talent—college ball against the majors, for example. If they played head to head and the results mattered, the major league team might win by 20 runs a game. A 20-run separation at a 4.5 base level, by that approach, would be 14.5 runs against negative 5.5. You can’t score less than zero, so that doesn’t work.
What we have to do is find the numbers which are separated by 20, but which bear the same relative relationship to 4.5. Let’s assume A is the higher-ranked team, B is the lower-ranked, and the margin between them in terms of runs per game is 20 in this case. Then
A/4.5 = 4.5/B and
A = B+20
Solving for A is already done; A = B + 20. So then
(B +20) / 4.5 = 4.5/B
So we have a quadratic equation:
B squared plus 20 B minus 20.25 = 0
You remember how to solve a quadratic equation? I didn’t either, actually, but I was able to force it out of the depths of my memory after some struggle. But not to worry; this chart presents the expected winning percentage resulting from a known difference in levels of competition. For example, if League A has a Cramer Level 16.5 points higher than League B and an average team from League A plays an average team from League B, then the expected winning percentage of the team from the stronger league is .9958.

The goal here is to get to Winning Percentages. Run Levels are merely the pathway toward winning percentages, at the end of which we must have a bridge to cross to reach the emerald city. Why, then, don’t we just measure Cramer Levels in Winning Percentage?
Because that isn’t workable. There are several obstacles. Runs per game are unbounded. One team could, in theory, beat another by 100 runs, although our system would not accommodate a number over 50. (A team more than 50 runs per game worse than the major league average—a competitive team of 75-year-olds, perhaps—would be off the grid, not part of the organized structure.)
Winning percentages, on the other hand, have very tight limits relative to the differences between levels of competition. This causes the winning percentage curve to almost entirely stop moving after it moves away from .500. The first run that separates two teams has a winning percentage impact of .109, which we will call 109 points. The 20th run has a winning percentage impact of .0005, or one-half of one point.
But when we are trying to mark teams and leagues along a continuous scale, the 20th run is every bit as important as the first, since the 20th run makes that team one run worse than a team which is 19 runs below the major league standard.
A related problem is that winning percentages are far more vulnerable to flukes than are run separations. Suppose that two leagues separated by 10 Cramer Levels or more play 120 games over a course of ten years, which very reasonably could happen with, for example, SEC teams playing January games against small colleges in the south. Over 120 games, the SEC teams might outscore the GIC (Grossly Inferior Conference) teams 1300 to 200 over the 120 games, which would indicate that the GIC teams should win 2 or 3 games. By simple chance, they might win 7. Based on the winning percentage, you would conclude that the strength of the SEC vs. the GIC was .947-.053. This would lead to a mis-estimate of the relative strength of the two conferences of 3.0 runs.
The vast, vast majority of baseball is, of course, far below major league standard. Using winning percentages and using major league baseball as the standard, almost all of organized baseball would be squeezed into a tiny space of winning percentage. Within that tiny space, small errors would be exaggerated by unexpected won-loss outcomes.
Wins result from runs. Runs are cause; wins are effect. The data for runs is much more stable than the data for wins, which is why run ratios predict future winning percentages better than winning percentages predict future winning percentages.
Of course, we don’t actually know whether my initial concept of Cramer Levels will hold up in practice. The initial assumption is that if Team A is 1.0 runs better than Team B, Team B is 1.0 runs better than Team C, and Team C is 1.0 runs better than Team D, then Team A must be 3.0 runs better than Team D. At this point we don’t actually know that this is true. It may be that, in that situation, Team A would be four runs better than Team D, or two runs. It may be that we are scaling arithmetically things that ultimately will have to be scaled geometrically.
My assumption is that as we go forward with this work, we will be able to see that. We should be able to see how the conceptual model isn’t working, and we should be able to see how to correct it.
A sort of related question is, why do we use 4.50 RPG as a universal standard? College leagues, many leagues, tend to score 6 or 7 runs a game. Lower-level leagues tend to score more because the number or runs scored in a game is unbounded on one side. The process I have outlined will, in practice, force the researcher through an adjustment of real-life RPG to the 4.50 standard. Why not just use “natural runs”?
My assumption is that the process needs a universal standard, a normative standard. For the major leagues, it is obvious from the data that the normative standard over time is 4.50 runs per game. That is the starting point of the discussion. I don’t know what other number we would use. I don’t think the process will work without one.
I am not under any delusion that the concept I have developed to approach this field of research will work without fail. Some parts of it almost certainly WILL fail. The men who built a rocket to the moon had an idea of what would work. Some of that worked; some of it didn’t. Eventually they got to the moon.
Thanks for reading.


Comments