Back of the Envelope

A Replacement Level Baseball Blog

Friday, February 1, 2008

Clay Versus Joba

I've got two friends, A and B, who love to gamble. Well, actually, friend A loves to gamble. Friend B loves to see friend A lose.

Their latest wager involves speculating on the futures of the Yankees' and Red Sox best pitching prospects--Joba Chamberlain and Clay Buchholz. Friend A bet friend B a large (enough) sum of money that Chamberlain will have a better career than Buchholz. A let B pick the metric of record: VORP.

Now, let's leave aside for the moment the sheer idiocacy of making an even money bet on the long-term future of a couple of 22-year-old pitchers (I bet friend B that Ian Kennedy would win 150 games, but I got odds). Suppose they both beat attrition rates and have 10- or 12-year careers. Where is the smart money?



Stuff



Well, Buchholz has a shitload of talent. He's that rare big framed right hander with both power and control. In addition to a 92-94mph four-seam fastball, he's got a goodish changeup and what some have called the best 12-6 curveball in baseball. Period.

But Chamberlain has a shitload of talent too. He's that rare big framed right hander with both power and control. In addition to a 96-98 mph four-seam fastball, he's got a goodish changeup and what some have called the best power slider in baseball. Period.

Since most scouts have them as 1 and 1A among pitching prospects, accounting for differences in "tools" becomes a matter of nitpicking. Joba's got a slightly better fastball--how many pitchers in the last ten years can you think of that can regularly paint the black at 100mph?--but Clay has two fastballs, offsetting and excellent four-seamer with a pretty good sinker.

The argument runs the other way for secondary offerings. Buchholz's curveball is triply devastating: he can throw it over the plate for called strikes, as an off-speed pitch, and as a pure downer. Chamberlain's slider, disgusting as it is, is mostly a one-speed, swing and miss pitch that he buries down-and-in on lefties and low-and-away on righties (though, in 2007 Chamberlain was able to backdoor a slider here and there; if this proves consistently possible...My God). And while Chamberlain's slow(er) curveball is only mostly deadly--think Pedro's circa 2006 instead of 1999--it gives a hitter two different plus-plus breaking balls to worry about. Lastly, both Buchholz and Chamberlain feature decent changeups, but neither is a finished product.

I should also note, there isn't enough pitchf/x data (that I could find) to compare them that way, yet. We'll have to wait for '08. What little there is I found here.


Track Record

The 2007 major league numbers look like this:

Buchholz: 22.7IP, 3-1, 1.59ERA, 1.059WHIP, 22K/10BB

Chamberlain: 24.IP, 2-0, 0.38ERA, .750WHIP, 34K/6BB


Neither of these guys is going to be standing on a bread line any time soon. Nevertheless, Chamberlain holds a slight edge (incidentally, the difference is even more pronounced in Davenport Translations). You could argue that Joba didn't have to turn over the lineup three times--like Buchholz did in his no-hitter--but then one also has to remember Joba gave up exactly one earned run (on a patented Mike Lowell solo shot at Fenway)



with half his pitching arsenal tied behind his back. Surely, Chamberlain is going to start hitting a few more bats when he moves into the starting rotation, but you're not going to see that 6-to-1 K/BB ratio dissolve overnight. Rather, Chamberlain is likely to settle in at a level comfortably above Buchholz. Just look at the minor league numbers, which are basically identical but for the K/BB ratio (though Buchholz has the bigger sample as Joba came out of the college system.)

Buchholz: 285.2IP, 2.46ERA, 1.00WHIP, 356K/77BB (4.5/1)

Chamberlain: 88.1IP, 2.45ERA, 1.01WHIP, 135K/27BB (5/1)

Projection:

Buchholz's no-hitter and subsequent disappearance from tthe greater Boston area seemed to have added a rock-star mystique to his status among prospectors. Thankfully, computers don't care about Houdini acts. Here, you can find the 2008 projections for Buchholz and Chamberlain from a popular system that rhymes with SCHMECOTA.

Highlights include the preservation of Chamberlain's lead in K/BB, his far-greater "upside", and substantially higher VORP, WARP, and WXRL numbers despite the fact that he is projected as a swing man and not a starter. Also interesting, despite having accumulated fewer professional innings than Buchholz, Chamberlain's "Beta" number--a measure of the volatility of projection based on comporable players--is lower than Buchholz's.

Monday, January 28, 2008

Lowered Expectations, Part II

In my last post, I looked at eBABIP (LD% + .117) as a predictor of actual BABIP, and my main findings were:

-There is some year-to-year stability in size of the residual between eBABIP and BABIP for a given player.

-eBABIP is slightly more effective on good hitters with relatively more playing time, and slightly less effective on below average hitters with fewer PAs. Specifically:

Best prediction quartile (means): .269/.336/.429, 430 plate appearances
Worst prediction quartile (means): .262/.326/.411, 304 plate appearances
League 2005-2007 (min. 100PA): .267/.333/.421, 391 plate appearances

Today, I said I'd run some regressions to get a more precise idea where eBABIP does its best predictive work. So I looked at playing time by 50 plate appearance intervals, and I looked at offensive production by OPS in .20 intervals (I'd use EqA, but I couldn't find numbers for 2005-2006). Here's how it went.



As you can see, there is a big jump in predictive power around the .800 OPS line, with the highest R-square values coming from players with at least a .900 OPS (rare enough territory). In terms of plate appearances, there is a more steady increase in predictability starting around the 300 PA threshold and peaking at 500 PA.

From this, one would figure that the best test group for eBABIP would be .900 OPS guys with at least 500 PAs. Turns out that's true. The R-square for the eBABIP/BABIP regression in those cases is .255, which is the highest value I found using these two parameters.

This certainly confirms the quick and dirty hypothesizing we did using the means of the upper and lower quartiles by residual. eBABIP works best on good hitters with a lot of playing time. Still, when you regress the residual with PA and OPS, the latter account for only about 14% of the variance in the former.

Over/Underestimated Players by BABIP

In my last post I also said that I'd eventually look at the real value as opposed to the absolute value of the residual. This number tells you not only how far off eBABIP is, but whether it over or underestimates a player's production. Think of it as giving a vector instead of a magnitude.

So, I ran Pearson correlations between the residual and about 20 different rate and count stats. I found statistically significant relations with many--the highest was with batting average, which came in at. -.490 (meaning that players with higher batting averages tend to have lower residuals and vice versa). OBP and SLG had similar negative relationships at -.358 and -.260 respectively. There was also a slight negative relationship with SO Rate (-.180). But the correlations that really caught my eye were three "speed" numbers: specifically SB (-.226), 3B (-.224) and GB% (-.355). If you don't see my leap of logic in including ground ball percentage as a speed-related number, read on.

In interpreting these relationships, remember that I'm using the actual value of the residual. So, for instance, the residual's negative relationship with stolen bases, triples, and ground ball percentage tells yout that guys with higher rates tend to be underestimated by eBABIP--that is, they tend to have negative residuals. Conversely, guys with lower SB, 3B, and GB% numbers tend to be overestimated by eBABIP--they tend to have positive residuals. What this tells me is that eBABIP underestimates fleet-footed, punch-and-judy types whose ability to beat out ground balls and take extra bases accounts for a large part of their offensive value. Likewise, eBABIP overrates line-drive hitters who are slow on their feet and don't get their share of ground ball hits. I could get a much better idea of the correlation here by computing Bill James' speed score and pitting it against the residuals, but they don't pay me enough (i.e., anything at all) to do it right now.

So instead, I'll look at a few extreme case studies, namely guys who beat their eBABIP by 50 points or more, and see informally if a pattern emerges. First, let's look at guys who were grossly underestimated by eBABIP--guys who had residuals of -.050 or less. Since my data is arranged by player-team-year and covers 2005-2007, a player can appear up to three times, and wouldn't you know it, three guys do:

Willy Taveras, Joey Gathright, and Luis Castillo all hit at least 50 points better than their line drive rates would suggest in each year from 2005-2007. As predicted, all three are slap-happy singles hitters with above average speed. In fact, they are more or less each other's closest active comparables. The guys who make the list twice mostly fit the bill as well:

Derek Jeter
Howie Kendrick
Kelly Shoppach
Pablo Ozuna
Preston Wilson
Ryan Shealy
Tadahito Iguchi
Victor Diaz
Willy Aybar

Though, don't ask me to explain Kelly Shoppach beyond the small sample size: as a backup catcher he never got more than 200 PAs.

On the other extreme, the list of guys who came up short of their expected BABIP by .050 points or more is topped by lead-footed St. Louis backup catcher Gary Bennett, who appears three times. Bennett has a slightly above average LD%, but does absolutely nothing with it. In 2007, he managed just 54 total bases on 44 hits, good for a .221/.298/.271 line. Still, there are no shortage of good hitters who make the list, including Eric Byrnes, Barry Bonds, and Frank Thomas, all of whom appear twice. Like these three, many on the list are slow sluggers or, like Bennett, catchers. The complete stats for all members of the "50 Point Club" can be found here.

The next step would be to get a better handle on the shape of the relationship between these speed factors and BABIP, and to come up with a more nuanced version of eBABIP that reflects them. Fat chance getting that out of me today.

Sunday, January 27, 2008

Lowered Expectations

How to beat the odds and hit above your line drive rate.

One of the more interesting insights to come out of Clay Davenport's marriage of baseball analysis with computer science is this: take one hundred major leaguers who are "true" .270 hitters and give them all 200 simulated at-bats, and by luck alone at least one of them will hit .350 and another .190.

There is a lot of random variance in baseball, enough for all-stars to look like Texas Leaguers and triple-A rejects to look like home run kings given a short enough timeline. And enough that finding ways to separate luck from talent in a given season can be tremendously profitable for general managers.

One of the more popular ways of doing this is by comparing actual to expected batting average on balls in play (BABIP to eBABIP). The former, of course, is calculated by removing homeruns and strikeouts as outcomes of at-bats. So you take (H-HR)/(AB-HR-SO). The latter is estimated by taking a hitter's line drive rate (expressed as a decimal like .182) and adding to it a constant--usually .120--to get a number like .302.

So far as I can tell, the constant is what it is because it's equal to league average BABIP minus league average LD% over some significant stretch of time (over the last three years this number is .117). If there is a more complex way of deriving it, I'm not aware. The idea the eBABIP is intended to capture is that most of a player's hits are going to come off line drives, and that the number of hits a player gets off fly balls and grounders is fairly stable (since defensive efficiency for those outcomes is fairly stable.)

But eBABIP can also be used as an excuse for using generalities to draw fallacious conclusions about individual players, since there will always be some who consistently beat their BABIP expectations by substantive margins.

Well, so I claimed anyway. I figure its only fair to run some numbers and see whether I'm right or stupid. So what I did is this: I took all major leaguers with at least 100 plate appearances in each year from 2005 to 2007 and I calculated BABIP and eBABIP for all of them. I also collected about 100 other rate and count stats for these guys. Then I put the whole mess of it in a big ass spreadsheet and ran a bunch of statistical tests.

First off, I wondered if there was any year-to-year stability in the size of the residual (the difference between a hitter's expected and actual BABIP, expressed as a decimal). If there was, it would indicate that how well or poorly eBABIP predicts BABIP for a given player is not random, but a function of some disposition or capacity that player has. I say "disposition" or "capacity" instead of "skill" because I don't care whether the thing that determines a player's residual is useful, only whether it's tangible.

OK, so I ran three-year intraclass correlations on BABIP residuals and got an "R" number of .276. The R-number gives you something like the ratio of signal-to-noise in the sample. In this case it's just about 3-to-1 noise. Not great, but not terrible considering that R-number for BABIP itself over the same stretch is just .369 (as a point of comparison, the number for Home Runs is .675). Moreover, the residual is actually more stable than eBABIP itself, which came in at .267. Since I'm not a statistician, I'm not sure what eBABIP's instability really means besides the fact that LD% is itself unstable (and that's a freebie given that eBABIP just is LD% plus a constant).

Anyway, the upshot is there is at least something systematic in how well or poorly eBABIP predicts BABIP for a given player. And if that's so, it's then reasonable to infer that there are certain "types" of players for whom the eBABIP method is particularly accurate, and others for whom it is not.

So I took the absolute value of the residuals--the distance from the value zero, a perfect prediction--and ranked these by percentile. I then ran the the means, std. deviations, skews, etc. on the lower quartile (the most accurate eBABIP predictions) and the upper quartile (the least accurate eBABIP predictions). The descriptive statistics are here and here. Two things stood out.

1) The "good predictor" group are better hitters than the "bad predictor" group. The AVG/OBP/SLG for the former is .269/.336/.429, compared to .262/.326/.411 for the latter. The good predictors also walked more (BB/PA) and struck out less (SO/PA).

2) The good predictor group played more than the bad predictor group, averaging more games (111 vs. 84), plate appearances (430 vs. 304) and at-bats (383 vs. 271).

Surely (1) and (2) are related. Good hitters get more playing time than bad ones, and the smart money says that, for what we're interested in, the causal chain goes from (2) --> (1). In other words, it isn't that eBABIP fails for poor hitters, but that it fails for smaller sample sizes. (Though, when you take into account that the good predictors are almost a year older on average, it becomes a marginally trickier to guess which variable is doing most of the work, since older players tend to get more PAs than younger ones, regardless of talent). I should also say that the good predictors averaged higher VORP (15) than the bad predictors (8.92), which nicely parallels (1) and (2) since VORP takes both production and playing time into account.

So eBABIP can't get a handle on young, part-time and/or poor hitters, kinda sorta. None of this is really revelatory, nor even that helpful, in part because there is more work to do. Tomorrow, or when I get around to it, I'll run regressions between eBABIP and BABIP at different OPS and PA thresholds just to get a better idea of where on the output/playing time spectrums eBABIP does its best and worst work. I'll also run some numbers on players whose BABIPs were subsantially over- and under-estimated by eBABIP. Basically, instead of worrying about the absolute accuracy of eBABIP (as measured by the absolute value of the residual) I'll be looking at whether there is any pattern to the way eBABIP shortchanges some player's output and overblows others'.

But that's tomorrow.

Tuesday, January 22, 2008

Snake Oil Salesmen







Last year, the Arizona Diamondbacks won 90 games and the NL West with a mix of young talent (Conor Jackson, Mark Reynolds, Chris Young) and peak-aged regulars having career years (Eric Byrnes, Orlando Hudson), plus another solid season from Brandon Webb (good for an ERA+ of 156) and innings ably ate by Doug Davis and Livan Hernandez. In the postseason, the D-Backs looked pretty good running over the Cubs in the divisional series, and not terrible having the favor returned by the pick-of-destiny Rockies in the NLCS.


But to the statheads, none of this mattered as much as how definitively they beat their Pythagorean Projection. By +11 games, to be precise.


As the Snakes put up a 46-35 record to open the season (while being badly outscored) baseball analysts were falling all over themselves to explain, or explain away, this beating of the odds. Sal Baxamusa (whose real name used to be Sal Baxamusamusabaxa) at Hardball Times took this as an opportunity to rant.


When reading Sally's beef, I couldn't help but think about Dustin Hoffman's tantrums in Rain Man, especially the scene in which he mumbles "397 toothpicks, defnly defnly 397 toothpicks" over and over while gently rocking himself. Nothing upsets a left-brainer more than an outlier.




Baxamusa picks on guys like Jim Rosenthal, who made the seemingly coherent observation that the Diamondbacks tend to win their close games and lose in blowouts. Others were so audacious as to suggest that the relevant factor might be conscientious leveraging of a quality bullpen.


Mere superstition, cried the grad student. Cast it off like Biblical Creationism, the Female Orgasm, and so many other untenable myths. You see, for the sabermetrically devout, there are only numbers and noise. Any variance in independent variable X not explained by the best predictor variable(s) Y must be random. Never mind that Pizza Cutter's study at MVN (which the guy cites!) shows that the Pythagorean residuals--the difference between the winning percentage predicted by the model and the actual winning percentage, basically a measure of over/under performance--is correlated rather strongly with several of the factors Baxamusa calls "rationalizations": winning% in one-run games, offensive consistency and inconsistenty in run prevention.

Now, maybe Sal's point is just that these factors aren't representative of particular athletic or managerial "skills". But it seems that the first and last (performance in close games and inconsistency in run prevention) are rather closely related to bullpen leverage. An optimally levered bullpen is one in which the best arms pitch in the situations that have the highest impact on the outcome of the game. This is fairly obvious. What maybe isn't so obvious is that the opposite holds as well: an optimally levered bullpen is one in which the worst arms pitch in the situations with the least effect on the outcome of the game. In other words, mopup duty in blowouts.


And then there's offensive consistency. All offensive consistency means in this case is that variance in runs scored per game is clustered tightly around the mean. There are infinitely many ways to for this to happen, most of which are either random or at least unintentional and thus don't speak to any discernible "skill". Some ways of producing consistency--like managers using consistent lineups and batting orders, or GMs putting a premium on non-streaky hitters, are intentional but still don't necessarily reflect a skill (the idea of lineup/batting order optimization is to score as many runs as possible given a certain group of hitters, not to score the same number of runs as often as possible).

In the present case, it doesn't matter to me whether the D-backs offensive consistency was reflective of a skill, I'm only interested in whether it was reflective of something besides randomness. That is, whether the outcomes that account for the error between the D-Backs projected and actual wins record were more the result of choices or chance. Even the best play-by-play data can't successfully quantify all the choices made within an organization over 162 games. But statistical analysts often make the mistaking randonmness with unmeasurability.

That's all for now.

Monday, January 21, 2008

Exit Sandman?

Is Mariano Rivera Slipping?


Mariano Rivera's late inning dominance has been as crucial to the Yankees' success over the last ten years as any single factor. But perhaps more impressive still is the consistency with which he has gone about it. In the four rate statistics believed to be more or less defense-independent—strikeout rate, walk rate, isolated slugging, and ground ball to fly ball ratio—and even in the notoriously defense-dependent or “luck” driven batting average on balls in play (BABIP), Rivera has been remarkably dependable. Most of the variation clusters closely around the mean: Rivera usually strikes out between 18 and 25 of every 100 batters he faces, walks between 4 and 7, and gives up 2 to 3 times as many ground balls as fly balls. Even in the metrics where there is a noticeable skew, it tends to be toward superior performances.

But in 2007, Rivera had what many regarded as an “off year”. Indeed, across both conventional and unconventional stats, he ranked as no more than an average closer. Of the thirty-six pitchers in baseball last year with at least 10 saves, Rivera fared thusly:

He faced more batters and gave up more hits (many of them line drives, more on this later) than almost any other closer in baseball. When the same thirty-six closers are given standardized “scores” that measure their value over or under the positional average, Rivera comes in at a measly .40 above the mean (by comparison Takashi Saito and J.J. Putz topped the list at 10.69 and 7.28 above the mean, respectively). Unsurprisingly, Rivera also saw his ERA balloon to 3.15, the highest since his rookie season (and the first time its been north of 2.00 since 2002).


The most popular explanations of Rivera's down performance focused on two factors: age and misuse. The age argument seems unlikely on the face of it. While it's true that at 37, Rivera was the fourth oldest closer in baseball last season—behind Todd Jones, Trevor Hoffman, and Bob Wickman—it's also true that six of the top ten closers last season were 30 or over (including Hoffman at 40, and Saito at 37). When power pitchers slip, the signs usually start to appear around the age of 32. But it was at that age that Rivera's most dominant five year stretch began (Between 2002 and 2006, opponents hit just .213 and slugged just .285 off Rivera, while he averaged over 40 saves per year with a 1.83 ERA and a WHIP of .98). While age eventually becomes a factor for any major league pitcher, starters tend to collapse more rapidly than relievers, and Rivera's overall stability and dominance through his mid-thirties provides little reason to think his decline phase will be anything more than gradual.

The misuse argument is a bit harder to assess. Was Rivera left in too long in some games? Perhaps. When pitching in the eighth or ninth inning, opponents hit just .250/.291/.337 off Rivera. In extra innings of work the numbers went to .227/.320/.500. But these increases are normal signs of fatigue for any pitcher, and the only 2007 figure substantially disparate from Rivera's career splits in those situations is in extra-inning slugging percentage (Rivera's career SLG-against in those situations is .340). And this is hardly unusual, as over a short period of time even one extra-inning home run can drastically inflate SLG (especially since closers tend to leave and/or games tend to end after extra-inning shots).

Was Rivera not used regularly enough? Was he over- and/or under-rested for significant stretches, and did this effect his performance? Again, there is no compelling pattern here. In 2007 Rivera pitched his poorest on zero days rest (.294/.333/.451 against, with an ERA of 3.38), which is normal enough, but his ERA ballooned to 5.27 on three days of rest, before going back down with each subsequent day. His career splits are for the most part tightly clustered, and where there is substantial difference it appears basically random: Rivera posts his best career ERAs on two, three and five days of rest, and his worst on one, four, and six-plus days of rest. None of this supports the case for improper usage.

Did Rivera under-perform in non-save situations, as the common bit of closer psychology predicts? Somewhat. Rivera's ERA for the 30 appearances in which he received 'no decisions' in 2007 was 3.30, higher than his career mark of 2.24. But Rivera's ERA was up from career averages across the board—from a ridiculous 0.68 in saves to a slightly less ridiculous 1.06, and from 16.17 to 24.30 in losses—so it is hard to pin the change to pitching to the score. It is also important to note that earned run average is a flawed measure of the underlying quality of a pitcher's performance, especially for relievers, for whom one or two bad outings can be unfairly distorting.

So if there is no smoking gun that points to age or misuse, what exactly caused Rivera's poor showing in 2007? The likeliest answer is luck. Pure, dumb luck. More on why next post.

Sunday, January 20, 2008

Bang for the Buck

Doug Pappas, the late, great former chair of SABR's "Business of Baseball" committee, created a formula for determining the marginal cost of a win, in terms of payroll dollars spent, for a given MLB team. He started by determining the minimum payroll required to field a team of "replacement level" players (essentially, what it costs to fill a 40-man roster at league minimum) and estimated the number of games that such a team could expect to win (Pappas set this at 49 games). The marginal cost per win was then the ratio of payroll dollars above minimum to wins above minimum.

Using a minimum payroll of $15.2 million (literally just the league minimum of $380,000*40) and Pappas' benchmark of 49 wins, here is how the numbers break down for 2007.

As you can tell, I've highlighted the playoff teams and made some other modifications to the raw numbers. For one thing, I adjusted the Red Sox and Yankees numbers to reflect the Competitive Balance or "Luxury" tax, and just for grins I estimated the Yankees payroll minus Roger Clemens (assuming Clemens' starts would have been given to a "replacement level" player like Ian Kennedy; replacement level is in scare-quotes for obvious reasons).

A few things are readily apparent here. The Yankees spent an awful lot of money on 94 wins and a first-round playoff elimination. A fate made more embarrasing by the fact that the team that knocked the Yanks out shared the best record in baseball with the Red Sox at a third of what it cost John Henry.

Even though the Yankees look like the biggest wastrels in baseball, they at least got something for the money. Not so for the Orioles, White Sox and especially the Giants. On the bright side, Bonds' home run chase easily brought in enough additional revenue to offset his $15.5 million salary and 5.8 WARP1 performance, especially when you consider that Bonds only charged the Giants about $3 million for each of the wins he contributed, well under their regular marginal costs.

Things are even more interesting at the top of the chart. The Florida franchises once again justify their continued existence with efficiency instead of wins. The Marlins in particular demonstrate their excellent understanding of both the Wins Curve and the Success Cycle, and one begins to understand how a team could collect two World Championships in ten years despite an organizational ethos that kills a little part of the baseball's soul each year.

The efficiency of the Indians and Diamondbacks is just one of the many reasons I think each is the best run franchise in its league (more on that in a future post).

The biggest macro lesson to be learned is that the market for wins is still incredibly inefficient, though some teams are getting better at exploiting this inefficiency. Here's the scatter-dot regression of average wins versus payroll from 2002-2006, courtesy of Dan Fox at BP:



So over the five years preceding last season, payroll explained about 43% of a team's success. Compare that to the regression I did with this year's data:

Last year, only 21% of a team's fate was determined by its financial resources. Good time to be a Rays fan, huh?

Friday, January 18, 2008

A Man with a Plan

How Mike Lowell pulled his way into a big fat contract.

I was looking at some Balls in Play numbers for Mike Lowell, courtesy of Dan Fox's BIPChart 2.5.


In my first post, I talked about the intimate link between Lowell's 2007 success and the Green Monster. Here's what Lowell did on balls in the air to left versus the right-handed league average for 2003-2007:

On fly balls to left (PERCENT/AVG/SLG):

League: 28%/.443/1.455

Lowell: 43%/.425/1.238



On line drives to left:

League: 42.7%/.725/1.100

Lowell: 54.3%/.807/1.105 on liners.


The numbers make it look like the Monster helped a couple of line drives turn into hits for Lowell, but that it actually hurt him a little on fly balls. What gives?

Well, these numbers don't give the whole picture. For one thing, it's highly likely that the home/road splits would look a lot different from these aggregates, based on nothing more than the visual evidence of the spray charts (all but two of Lowell's extra-base hits came in left and left-center). Too bad there's no data on this that I can find (any volunteers?).

But way more importantly, one has to remember that Lowell is basically a league average hitter (.280 career AVG including '07), and that he didn't see dramatic improvements in his LD% from '05 to '07 (18.3%, 21.1%, and 20% respectively.)

The thing that did change dramatically was both the raw number and the percentage of balls he put in the air to left. His worst year, '05 in Florida, he hit flyballs to left at a league average rate and was rewarded with a miserable .255 cBA (contact batting average) there. He hit slightly more LDs than average to left, but again, was rewarded with a below average .700 cBA. Now that was an unlucky year no doubt, but still.

Fast forward to '07. If Lowell had hit the same total number of fly balls and line drives, but with the spray chart of a league average rightie, you'd expect him to hit 52 flyballs to left; he actually hit 80. You'd expect him to hit 44 line drives to left, he actually hit 57. In other words, Lowell hit a net total of 41 more balls to left than average. The two years before that?


2006
FB(expected/actual): 45/60, LD(exp./act.): 50/54, Net: +19

2005
FB: 48/51, LD: 39/40, Net: +4



It's true that Lowell has always been a pull hitter (numbers courtesty of Sean Campion).



The questions is, is Lowell's natural swing just a perfect fit for Fenway? Or is this a man with a plan?