top of page
Search

Welcome to the Site

            There is a great deal of active discussion about. ..well, how Babe Ruth would do in today’s game, or about the quality of play in the Negro Leagues, or in general about the quality of play over time.  Ratings like WAR and Win Shares rate players from the past and rate players from one league against players from another, but on the assumption that the leagues are equal and that the quality of play is constant over time, which of course it is not (over time) and they are not (leagues.)  

People talk about it a lot, but they talk about it as if the issue could never be resolved, as if the truth about it was unknowable.  But if a team from 2025 and a team from 1950 were put onto the same field, they could very easily play one another.  The game has not changed that much, that there would be some non-translatable skill involved, so that the result would be like a baseball team playing hockey or something.  It’s essentially the same game. If two teams remote in time played one another, one or the other would win most of the games.   There is no actual barrier that prevents us from understanding how the teams compare.   

`          The quality of play in baseball over time can be converted from an opinion-based debate that makes no progress into a subject about which “we” have objective knowledge.  I know how that can be done.  The purpose of this article is to explain that.   I say “we” because I personally will not be around long enough to see the outcome of this research, but you know. . .that’s science; it requires many people participating in the effort, and all of us own a share of the result.

            I was asked in October, 2023, to speak to a National Association of Engineers, so I did a little bit of research into the history of engineering to try to find some point of contact that would help me relate to the audience. Surveying is one of the oldest forms of engineering, dating back thousands of years before Christ.

            About the year 330 BCE, a man named Dicaerchus was ordered by his king to estimate the height of the mountain Pelion. This is the first known effort to estimate the height of a mountain. Dicaerchus was a pupil of Aristotle, and a renowned scholar of his era.  Dicaerchus estimated that Pelion was about 6,000 feet high at the peak, an estimate that is wrong, but not TERRIBLY wrong.  No one knows how Dicaerchus made this estimate. Dicaerchus was a contemporary of Euclid, and it is likely that he used some form of primitive geometry, plus he probably used some surveying tools, the Egyptians having developed surveying tools many years earlier. 

            Several hundred years before Dicaerchus, a named Thales had estimated the height of the pyramid of Giza—which was thousands of years old even then—at 481 feet. Thales had estimated the height of the pyramid, accurately, by finding a stick which was the same height as Thales himself, and then finding the time at which his shadow was the same length as the stick. He reasoned that at that moment the shadow of the Pyramid would also be the same length as the height of the pyramid, so he marked the end of the pyramid’s shadow and measured that on the ground, adjusting of course for the size of the base of the pyramid. 

            Following this subject forward in time, literally hundreds of different methods were used to try to measure the height of mountains.  Once the circumference of the earth was established, the height of a mountain could be estimated by how long it took for the top of the mountain to disappear from the view of a ship moving out to sea.  Of course, if you do that 20 different times you will get 20 different answers, but if you do that 100 times you can pin down the range of estimates so that you have a degree of confidence in your answer. 

            In the years 1710 to 1770, the French were either the world’s leading scientists or among the leaders.  Among the prominent interests of the French was creating topographical maps.  In the early part of that era, one common way to estimate altitude was to boil water.  In the 1630s Galileo and others developed the concept of barometric pressure, and the barometer was invented in 1643.  In 1717 Daniel Fahrenheit began commercially marketing the mercury thermometer, which was vastly superior to the alcohol-based thermometers used before then.  Water boils at a slightly lower temperature at higher altitudes. When thermometers became available, people making topographical maps would climb mountains and estimate their altitude by the temperature at which water boiled.   

About 1731 two people—one American and one English—developed the octant, also described in notes left earlier by Sir Isaac Newton.  The octant developed into the sextant, a navigational and surveying tool still in use today, although of course it has evolved.  The sextant measures angles from sighted objects some distance away.  By 1750 this was widely used in the construction of topographical maps, and by 1770 there were topographical maps of Europe, giving the altitude not only of mountains but of any place in Western Europe. The process of refining those topographical maps continues to the present day. Occasionally you will still see notes in the news saying that this mountain or that one is about 12 feet taller than we thought it was the last time we estimated it. 

            As I was following this history, it occurred to me that this is essentially the same problem as the quality of play in baseball over time.  The question of how Babe Ruth would perform in today’s baseball HAS an objective answer; we merely don’t know what it is.  We can assume that he would not hit .342 and slug .690 now, but would he hit .322 or .270 or .225 or .170?  There IS an answer to the question; we merely don’t know what it is.  Why?  Why don’t we know?

            Dick Cramer published an extremely interesting study of this issue in the mid-1970s.  In my view, Cramer’s study was outstanding, groundbreaking research, decades ahead of its time. I actually HAVE studied the quality of play over time, many times, but no work that I have done in that area is as good as Cramer’s work from the 1970s.  Why is it, then, that we have never followed through on that research, so that we know no more about the subject now than we did then? 

            Putting together Cramer’s concepts with the history I was then seeing, the answer was immediately obvious.  We were missing three things.  First, we were missing Sea Level. No one knows for sure how accurate Dicaerchus’ estimate was, because we don’t know for sure where he was measuring from.  The concept of “Sea Level” had not yet evolved.  We were missing a foundation, a floor, an agreed-upon zero point from which other things could be measured.

            Second, we were missing a unit of measurement, equivalent to feet or meters, in which the height of the mountain could be stated.  Cramer had postulated/demonstrated that the quality of play was better in 1958 than in 1935, but stated in relative terms, 1958 versus 1935.  What we need is ABSOLUTE terms.  If a team from 1958 played a team from 1935, the team from 1958 would win X percent of the games, and would win by an average margin of Y runs.  What are X and Y?

            And third, Cramer—like Dicaerchus--had addressed a difficult, complex issue by aiming at its most difficult target. He was measuring the mountain tops, but with no supporting topographical data from the many levels of play BELOW the major league level.  Lacking these fundamental tools, working toward an understanding on the subject is like climbing a ladder which has no rungs.

            I propose to remedy the first problem—the lack of Sea Level—by referencing as the center or baseline the quality of play of an average major league player or team in 1920, and I propose to define that level of play as 50.000.  These numbers (1920 and 50.000) are not arbitrarily selected, but I will explain later.

            For the unit of measurement, I propose that we use runs team per game, and I propose that one run per game be referred to as a Cramer Level.  In other words, what I am saying is that if a given team would beat an average 1920 major league team by an average of 8 runs per game, then we should refer to that as a Cramer Level of 58.000.  I will venture the guess at this point that the current major league Cramer level is a little bit short of 52.000—that is 51.8 or 51.9, perhaps.  But that is just a guess, no better at this point than anyone else’s guess.  Again, I will explain later why I believe this is the appropriate measurement standard.  In the article “The Quality of Competition”, I assumed that the 2025 level is 53.15, which not actually what I believe, but that should not matter.

            And third, I need to convince people that there is real, tangible value in comparing the quality of play in different times and places.  The average or standard quality of play cannot be measured in isolation from lower levels of play.  There is also a quality of play—a Cramer Level—for college baseball.  We will not truly understand this issue until we understand ALL of this issue or at least more of this issue, until we have a more complete topographical map of the quality of play which includes 2020 and 1958 and 1935 and 1911 and 1892, which includes minors and majors and distinguishes accurately based on real data between the Eastern League and the International League and the Pacific Coast League and the Appalachian League, and which includes amateur baseball and distinguishes accurately between the Big 12 and the Big 10 and SEC and the Ohio Valley League, and which encompasses summer leagues and high school baseball and the Japanese Leagues and the Negro Leagues.  Only by studying ALL of it can we hope to understand how it all fits together.

          The reason that is true is Triangulation.  Dick Cramer’s effort was based on AL-to-NL comparisons.  That’s essentially surveying a not-quite-level street.  What will provide vastly better estimates is triangulation from a hundred different lower leagues. 

            In other words, we have a massive amount of research ahead of us.  I see that as a good thing.  When the French were creating topographical maps in the 18th century, there were plenty of people who were willing to walk up the side of a mountain and boil some water to see at what temperature it boiled.  In 1983, when we wanted the public to have access to the records of all major league baseball games, there were hundreds of volunteers who were ready to score games for us.  It is my experience that there are many thousands of people who like to do baseball research.  All we need is a group of researchers willing to take on the problem, rather than standing and looking at it and shrugging their shoulders.  My job now is to convince a few people that this is worthwhile research.  I guess we could refer to this as the Baseball Topography Project, or as the Cramer Levels Effort.

 

            Let me double back now to the problems involved in structuring the research.  I suggested that the unit of measurement by which we structure our analysis was one run per team per game.  The Cleveland Indians in 1920 outscored their opponents by 214 runs in 154 games, or 1.39 runs per game.  The Cramer Level of the 1920 Cleveland Indians, then, would be 51.39, setting aside for the moment the issue of the comparative strength of the two leagues in 1920.

            There are several features that recommend one run per game as the unit of measurement.  One is that runs per game can relatively easily be translated into a winning percentage.  Baseball teams over time score and allow an average of 4.50 runs per team per game, so a 214-run advantage in 154 games is equivalent to a winning percentage of .651 in a league at that Cramer Level.  Their winning percentage was actually .636, and we can negotiate the details later, but the point is that we can easily calculate that if the Cramer Level of the league was 51.00, rather than 50.00, the 1920 Indians would have an expected winning percentage of .543. If competing in a league with a Cramer Level of 52, they would have an expected winning percentage of .432; at a level of 53, .327.  I have dodged around one of the internal problems here, and I’ll come back to that in a second, but it’s a relatively straightforward calculation, once you have an estimate of the Cramer Level of the league.

            It is also a relatively straightforward calculation to apply this to an individual hitter or an individual pitcher.  Cleveland’s best hitter in 1920, Tris Speaker, created about 146 runs, which was about 10.33 runs per 27 outs, which was about 5.50 runs per 27 outs better than the league average, park-adjusted.  Speaker thus has a Cramer Level, as an offensive player, of about 55.50.  You are free to agree with the math or substitute your own, of course, but the point is that it is a relatively straightforward calculation.

            The standard performance variations. . ..standard deviations.  The standard performance variations for TEAMS are much less than those for PLAYERS, and the standard performance variations for LEAGUES, for the same reason, are much less than for teams.  We can see, then, that if we assume that the Cramer Level of major league baseball NOW is about 53.00, that that is an enormous change over time on the scale of a LEAGUE, but a much less impressive change over time on the scale of a PLAYER.  On the scale of a team, it would mean that Cleveland, the best team in baseball in 1920, would have a won-lost record about 53-109 in 2025 baseball. But Tris Speaker, the best player on the team in 1920, would still be much better than an average player in 2025 (55.50 to 53.00).   His effective winning percentage as a hitter in 1920 was about .821.  If the Cramer Level increased to 53.00, his effective winning percentage would still be about .697.

            That work contrasts the scale of the PLAYER (Speaker) with the scale that applies to his TEAM (Cleveland.)  The difference is very large on the scale of the team, but not nearly as large on the scale that applies to the PLAYER.  But what we are talking about for purposes of these studies is the scale for a LEAGUE.  The compression of talent for a league as opposed to a team is more or less the same as the compression for a team as compared to a player.  What I am trying to demonstrate is that an improvement of 3 runs per team per game, while not overpowering on the scale by which we evaluate Tris Speaker or Babe Ruth or Ronald Acuna, is gigantic on the scale by which we would compare leagues.  That’s why I believe that the Cramer Level of major league baseball has probably edged upward by only about 2.00 in the last 100 years—because 2.00 is a monumental improvement on the scale of a league, although not that large on the scale of an individual player.

            Of course, this is all theoretical, and the underlying theory could be completely wrong.  It could be that I am making an arithmetical adjustment where I should be making a percentage adjustment, and this could lead to dramatically different conclusions.  But let’s assume for the sake of argument that I have it more or less right.

            When I was with the Red Sox, I was convinced that the place where we might be missing players in the amateur draft was at the lower levels of college ball. I believed that for exactly the reason I have explained here. In the draft there would be an occasional player who looked to have some defensive ability and who hit .415 with power in college, but did that in Coastal Athletic League or the Horizon League or someplace, rather than doing that in the SEC or the ACC.  The scouts would have zero interest in him, because, they would say, “we don’t know what kind of pitching he was seeing in that league.”

            Well, yes, but it is unlikely that the difference in the quality of play in the leagues was large enough to entirely invalidate the performance numbers of an individual.  Players and leagues are not operating (varying) on the same scale.  The scouts and their front office supervisors thought of the differences between LEAGUES as being on the same scale as the difference as the differences between PLAYERS.  I never made any progress in convincing anyone that the quality-of-play difference between leagues was a small fraction of what they thought it was. I had zero sales in this category.  But I still think that I might possibly have been right.

            Another place in which this matters is in comparing Mantle vs. Mays. Analysis based on the assumption of level competition, one league being as good as the other, tells us that Mantle at his peak (1955-1961) was a significantly better hitter than Mays. In recent years, however, it has become common to hear that that is without adjusting for the quality of the league. The National League in that era was significantly stronger than the American League. If you adjust for the quality of the league, Mays’ performance is as good as Mantle’s.

            Well, no, it isn’t.  Why not?

            Certainly the National League in that era was stronger than the American League.  The question is, how much stronger? If the National League had been as much as one full Cramer Level stronger than the American, it would have been very difficult for the American League team to have won a World Series.  In effect, the American League would have been starting every game behind 1-0, which would lead to a winning percentage of around .400.  In the years 1947 to 1962 the American League won 11 out of 16 World Series, won 55 World Series games and lost 41, and outscored the National League in World Series games, 424 to 339.

            If the National League in that era was stronger than the American League by .1 Cramer Level or .2, you can work with that data and get back to the conclusion that the National League was the stronger league at least in the latter portion of that time frame.  But if the difference between the leagues was a full Cramer Level, you can’t.  It is impossible to explain how that could have happened if the American League was working against a weight disadvantage of a full run a game.  And if you don’t give Mays an extra run a game, you can’t make him a better hitter than Mantle, I don’t think.

            Before I move on entirely there is another little mathematical wrinkle that I need to straighten out, which has to do with translating run advantages for teams into expected winning percentages.

            Ordinarily, when a team has a run advantage (R – OR) of two runs a game, we would state the expected winning percentage corresponding to that by representing this as a one-run advantage on offense and a one-run advantage on defense—thus, 5.50 to 3.50 against a league norm of 4.50, which is an expected winning percentage of .712.  But in this case, we can’t do that.  We can’t do it because we’re stretching the canvas, so to speak.  We’re creating a way to theoretically compare a major league team in 2025 to a team from decades ago or to a minor league team or a college team.  The advantage might be 11 or 12 runs.  This would create a negative defensive number. . ..10.50 runs scored against negative 1.50 runs allowed.  We can’t have that.

          There is a much longer discussion of this issue and other methodological quagmires in the article “The Runs to Wins Bridge”, also available. That’s like a small theoretical detail in a system yet to be created. ..I was just trying to avoid getting off on the wrong foot.  

 

            OK, so why 1920?

            1920 is the most obvious pivot point in the history of baseball.  The game changed tremendously between 1918 and 1922, with most of that change occurring in 1920.

            “Major League” baseball in 1876 or in 1871 is barely recognizable as baseball.  Pitchers were prohibited from throwing overhand.   Batters requested the height at which they wanted the pitch delivered, fielders did not wear gloves, umpires were often amateurs or volunteers and had little control over the game, and basic rules changed frequently.  There was no meaningful distinction between major leagues and minor leagues, and college baseball was just beginning to get organized.  While I am quite confident that we can eventually compare accurately the quality of play in 2025 to that in 1920, it will be difficult to extend that back before 1890, because at some point you are no longer dealing with what we think of as baseball.

            Baseball evolved and changed rapidly in the years 1871-1890, less rapidly from 1891 to 1920, but it was still evolving.  By 1920 the game was more or less what it is today. The rules changes of the last few years are not innovations or changes in the game; what they mostly do is GET RID OF some of the garbage that infected the game in the years 1970 to 2020, and force the game to go back a little bit toward what it was from 1920 to 2010.  

 

            And why 50?

            50 has the appearance of a midpoint, which was actually not my intention.  We need a “stable center point” which has space around it on both sides.  If an average major league team played a good college team from one of the best leagues, if they played them regularly and the games were taken seriously and played competitively, the major league team would win almost all of the games, and would win by an average margin of. ..what?  My guess is about 8 to 10 runs, but I could be wrong, and it could be more than that. It might be 20 runs.

            So if the Cramer Level of an average major league team now is 52, the Cramer Level of a good college league might be 30 or 32.   But a good college team would also regularly beat a weak, low-level college team by some margin.  What is that margin?

            I THINK it is about 2-3 runs, but I could be wrong.  It could be 8 runs.  So if the Cramer Level of the SEC turns out to be 30, then the Cramer Level of a weak college team might 22. 

            But the weak college team would still be expected to beat a high school team—a good high school team—regularly and by some margin.  So what is the Cramer Level of an average high school team?  15, perhaps, or 12?  A major league team, playing regularly against a high school team, might beat them regularly by 35 or 40 runs.  Or more. . .who knows?

            I started out thinking of the base level as about 5, then realized that to make room for the many other levels of competition I would have to make it 10, then I went to 20, 30, on up.  When I got to 50 I realized that I could keep moving it up indefinitely, to 150 or 500.  But if you can keep moving it up indefinitely, then what is the point of moving it up at all?

            I decided on 50; others can revise it if they want to.  It’s arbitrary to a certain extent, in the same sense that Sea Level is an arbitrary construct, or a “foot” or an “inch” or a “meter” or one degree Celsius or one degree Fahrenheit.  All units of measurement are ultimately arbitrary; all measurement scales are ultimately arbitrary.  I think this one will work. 

 

            Turning my attention now to the question of how the quality of play can be measured, how one league can be compared to another.  In the last 20 years I have done many different studies of the quality of play over time and the relative quality of play of one major league to the other, and of different levels of the minors to the majors. Most of those studies have never been published. I have learned, however, that there are many, many different ways in which the issue can be studied.

            One approach, and I suspect the best approach, is what we might refer to as the COM, or Cramer Original Method.  What Dick Cramer did in his seminal 1970s study was to look at players who played in two leagues, and compare their performance in the two leagues.  Frank Robinson played in the National League in 1965 and in the American League in 1966, so we can compare the relative strength of the American League in 1966 to the National League in 1965 by comparing how Robinson played in one league versus the other. 

          Well, not just Robinson.  He actually compared ALL of the players who played in those two leagues.  I don’t understand all of the details, like when in the process he mashed them all together and whether or if he removed park effects, etc.  Maybe we can persuade Dick to come on here and explain to us all of the stuff the SABR Journal of the 1970s would never have found space for. 

            This approach will be the most commonly used in our research, I suspect.  I suspect that the Cramer Original Method (COM), will be the most commonly used method in the Baseball Topography Project (BTP).   It is equivalent to surveying, equivalent to saying that the mountain is X feet high by measuring 10-foot increments on the side of the mountain and adding them together.  Jimmy Rollins played in the Florida State League in 1998 and in the Eastern League in 1999.  Over the years, there are probably hundreds of players who played in the Florida State League in one year and the Eastern League the next.  There are probably hundreds of players who played in both leagues in the same year.  By comparing Jimmy Rollins in 1998 in the Florida State League to Jimmy Rollins in the Eastern League in 1999, you can make an estimate of the relative strength of the two leagues.  By studying 200 such players, you can make that a clear and convincing estimate.  Having done the work, you could then publish your results as The Relative Competitive Quality of the Florida State League to the Eastern League in the 1990s (Baseball Topography Project.)

            What we need, in the Baseball Topography Project, is to do as many of those studies as is possible.  We need thousands of those kind of studies.  We need studies comparing the Pacific Coast League in the 1930s to the American Association. We need studies comparing the West Texas-New Mexico League in 1947 to the Central League.  We need studies comparing the Piedmont League to the Three-I League.  We need studies comparing the Pacific Coast League in the 1920s to the Pacific Coast League in the 1930s.  We need studies comparing the Nippon League to the American League.  We need studies comparing the West Coast Athletic Conference (college) to the Pioneer League (minors.)  We need studies comparing the Negro National League to the white National League. We need all of those studies to make estimates of the relative strength of the league, stated in runs per team per game (Cramer levels). 

            Once we have a sufficient number of those studies, we will be on solid ground comparing the American Association to the American League in 1939, and once we reach that point, the ground will begin to solidify in comparing the National League in 1935 to the National League in 1955.  Once THAT ground is solid, the ground will be begin to solidify in comparing the Honus Wagner era to the Mike Schmidt era. 

            If your calculation is in error and you say that the American Association in 1965 was 3 Cramer Levels stronger than the Eastern League when it was really only .28 Cramer Levels, that matters, of course.  Accuracy is always helpful.  It matters, but it doesn’t matter all that much.  Do you imagine that the 17th century engineers who estimated the height of steep mountains by throwing rocks off the sides of them and recording how long it took for the rocks to reach the ground below got everything right?  They contributed to the process, right or wrong.  Accurate estimates are built out of the inaccurate estimates made in the previous generation.  What is important is that the process begins, that the work starts.  If the work starts, it will develop its own momentum, its own methods, its own processes of confirmation.

            Perhaps the major hurdle in using the Cramer Original Method to work on this project is the need for an age-adjustment chart.  There are at least 26 players who played in the American League in 1921 and in 1931, 17 position players and nine pitchers, which is probably enough to make a reasonably solid comparison of the American League in 1921 to the American League in 1931, except that they are all ten years older in 1931 than they were in 1921.  You can’t assume that their performance level did not change.

            We need some sort of chart that says that, in comparing a 29-year-old hitter to the same hitter at age 31, we have to assume that he lost about 2% of his run-producing ability between ages 29 and 31, or that his ability to create runs has probably slipped by about .13 runs per 27 outs, or something.  If you have a chart of that nature, then you can apply the age-correction to each of the 26 American Leaguers from 1921 and 1931, and re-center the results so that the aging curve does not destroy your calculations. If you are interested in working on this project and you have the relevant skills, one of the most valuable things you can do would be to get to work on the age-adjustment chart.

            A major headache in that process will be the self-selection issues.  Suppose that you have one hundred 25-year-old players.  Half of them will age well, compared to the norm, and half will age poorly. Of those who age well, many will still be in the majors when they are 35.  Of those who age poorly, almost all will no longer be in the majors.

            That means that when you compare major league hitters at age 25 and at age 35 and measure the aging rate between them, the measured aging rate could be significantly lower than the ACTUAL aging rate.  But there are ways to deal with that problem, too.  What we need, actually, is for a dozen or more people to draw up a dozen or more age-adjustment charts, and then hammer out the discrepancies between them until we have a chart that we have confidence in.

            This age-adjustment issue is especially critical in dealing with comparisons like Jimmy Rollins, Florida State State League, 1998, versus Jimmy Rollins, Eastern League, 1999.  There will be hundreds of players who played in the Florida State League one season and in the Eastern League the next season, but the vast majority of them will be (a) less than 25 years old, and (b) one year older when they are in the Eastern League than when they were in the Florida State League.  23-year-old professional players are just better than 22-year-old players, overall. If you don’t implement an age adjustment, you will wind up concluding that the Florida State League is stronger, relative to the Eastern League, than it actually is. 

            But there are also many ways to study the issue which are not dependent on aging adjustments. On a brief personal note, I got interested in the question of the relative quality of the two major leagues in my early years with the Boston Red Sox. In the 1980s and 1990s, when asked about the relative quality of the leagues, I would always say that I didn’t see how there could be a significant difference between the leagues. The players compete against one another as amateurs and through college. They are drafted from a common pool.  They compete against one another in the minor leagues. What is going to happen, when they get to the major leagues, to make one group stronger than the other?

            In the early 2000s, working for the Red Sox, it became obvious that this position could no longer be defended. From 2004 to 2017, the American League had a better record than the National in inter-league play every year, and many of the margins were huge—154-98 (2006), 149-103 (2008) and 142-112 (2012). The difference was just obvious. I realized that I had to be wrong about this issue, so where did I go wrong?

            Basically, where I went wrong is that I failed to allow for the strength of organizations within the league. I was thinking of leagues as composed of collections of players, 100+ players in a league, so we might presume that the levelling effects of chance would keep the two leagues relatively even. That fails to allow for the existence of strong and weak organizations.  With 15 organizations in a league, or eight in the 1950s, you may have two strong organizations in a league, or you may have ten. Different organizations have different amounts of resources, different fan bases and different degrees of judgment.

            But there was a second flaw in my logic, not exactly a flaw but a limitation that I did not see until I was about 50 years old. My logic about this, early in my career, was that within a league, every success for one player or one team is always recorded as a failure for another player or another team, so the sum total of success and failure within a league is always the same, always .500.

            Well, yes, but what that misses is that strong leagues may still have different characteristics than weak leagues.  In fact, strong leagues have many characteristics different than weak leagues. For example, the age distribution. Strong leagues tend to have players clustered around the ages of 26 and 27.  Think about it. . .majors, high minors, low minors, college, high school, youth baseball. The stronger the league, the more players in the league who are near the peak age for a player. The further the players in the league are from ages 26-28, the weaker the league.

            In World War II the quality of play in the major leagues took a step backward. What happened?  More teenagers in the majors, more 40-year-olds.  When the major leagues expanded in 1961, 1962, 1969 and a little bit since then, what did that do?  It brought into the majors a certain number of very young players, and kept in the majors for a year or two some older players.  The concentration of players near the prime age is a fairly reliable indicator of the standard of competition in the league, and it can be used to make an estimate of the strength of the league.

            Looking at it in that way, I was able to identify 25 to 30 characteristics of leagues which are indicative of the quality of play. There are more errors in low-quality leagues. Follow it down. ..majors, high minors, low minors, college ball, high school, youth baseball.  The lower you go down the ladder, the more errors there are.  The same with Wild Pitches and Passed Balls.  When baseball expanded for the first time 1961-62, the league fielding percentages dropped by a couple of points. When baseball expanded again in 1969, they dropped again.  

            Actually, the MOST reliable indicator I could find of the quality of play in a league was the percentage of runs scored in a league which were driven in. This was 20 years ago, but my studies showed that at that time this percentage was higher in the majors than in AAA, higher in AAA than in AA, higher in double-A than in high-A, higher in high-A than in low-A, and higher in low-A than in rookie ball. As you got closer to the majors, the percentage of runs scored that were driven in progressed with metronomic regularity.

            Competitive balance is an indicator of league quality.  This operates on multiple levels.  If one team goes 106-34 and another goes 45-95, that’s probably not a strong league. That’s the team level.  In 1938 Virgil Trucks had a 1.25 ERA and struck out 418 batters in 263 innings.  Probably not a strong league.  That’s the individual level.  If you have a lot of games decided 9-1, 11-0, 17-2 and 23-1, probably not a strong league.  That’s the game level.

            The experience of the players is an indicator of the quality of play in the league. The experience of the managers and coaches is an indicator of the quality of play in the league.  The talent turnover is an indicator (strong leagues tend to have less turnover of players from year to year, as do strong teams.) The number of double plays turned in an indicator of the quality of play in the league, although a very weak one.  Fastball velocity may be an indicator of quality.  The offensive production of the pitchers is an indicator.  (In a weak league, the pitchers hit as well as any other position.)   The offensive production of the shortstops as opposed to the first basemen is an indicator. Attendance is an indicator of the quality of the league, and an indicator of the quality of the team.  For low level teams from the past, the number of players who later played in the majors (or played in higher leagues) is an indicator of the quality of play in the league.

            When I was working on this problem 20 years ago, I tried to combine a large number of small indicators into one more reliable indicator.  You could do that, but the indicators could also be studied in isolation; i.e. Competitive Balance as an Indicator of League Quality.  That doesn’t work very well in the low minor leagues, because if a player is significantly better-than-league at one level of play, he will very soon be promoted to a higher level. But one way that we can be relatively confident that the quality of major league play is higher now than it was in 1920 is that the best players today do not dominate their competition to the extent that Babe Ruth, Honus Wagner, Jimmie Foxx and Tris Speaker did.

            One thing you could do would be to use data to estimate, not the difference in the quality of play between two leagues, but to say that the data indicates that League X must be at least 0.3 Cramer Levels above League Y, and could not exceed 0.6 Cramer Levels above League Y.  If you can’t draw a firm conclusion from the data, you can at least establish parameters within which the answer must fall.

            You could study the problem by creating models. In other words, in regard to the effects of expansion in decreasing the quality of play 1960-1962, you could create a model of major and minor league baseball in 1960, consistent with the known facts, and state that under these conditions, the addition of two teams to the major league should result in a backward step of approximately 0.4 Cramer Levels. 

            The point is that as there were many, many different ways to estimate the height of a mountain, which eventually coalesced into solid measurements, there are many different ways to estimate the quality of play in a league.  You don’t have to be right in order to contribute.  If you have five ideas about how to compare the Eastern League in 1984 to the NYP in 1982 and two of those ideas are good ones and three are not, a community of researchers can take the two parts that are good and figure out how to move on.  If the results are published and reported in consistent form so that one study can be integrated with another, then it is contributing to the process.  If the process is healthy, it will create results which will be widely accepted in time. Thank you for considering my ideas. 

 

 

 
 
 

Comments


bottom of page