Expected no-hitters

When Matt Cain threw a Perfect Game for the San Francisco Giants on Thursday, he became the fifth pitcher in the last four years to do so (no, Galarraga’s game doesn’t count). Perfect Games are also No-Hitters, and there have been a startling 22 no-hitters in the past six seasons (here I am including Halladay’s post-season no-hitter two years ago).

Since the end of the Steroid Era in baseball, pitching has been under a resurgence. Last year was called The Year Of The Strikeout by some, and this year is, so far, exceeding last year’s number. In addition, runs per game and hits per inning have been in decline for the past decade, too. But this isn’t just because batters aren’t hitting as hard or fielding has improved. Walks per inning, too, are at their lowest point in 20 years.

Improved pitching means a better chance of No-Hitters and Perfect Games. Does that explain it completely? Is the recent surge in pitching gems a coincidence – in which case we can expect the frequency to revert to the mean – or a result of improving pitching? I started collecting data to answer this question myself (which you can see after the break), but during the course of my research I found an article by Rebecca Sichel, Uri Carl and Bruce Bukiet titled Modeling Perfect Games and No-Hitters in Baseball.

Expected No-Hitters on Google Docs

NFL records - 2011 first two weeks

The following records have been set in the first two weeks of the 2011 NFL season:

  • Most total yards passing in a game by both teams - Brady and Henne, week 1 (906)
  • Most yards passing by a rookie - Newton tied it week 1 (422) and then broke it week 2 (432)
  • Most yards passing in the first two weeks of a season - Newton (854), and then a couple hours later Brady broke it (940)
  • Most total touchdowns by all teams in the first two weeks of a season (172)
  • Most total passing yards by all teams in the first two weeks of a season (15,771)
  • Most consecutive 400+ yard passing games - Newton and Brady tied it at 2
  • Most QBs with 300+ yard passing games in one weekend - week 1 (14)
  • Longest field goal - tied by Janikowski (63 yards) week 1
  • Most penalties by both teams in a single game - Raiders vs Broncos (25)

There’s more if you count near-records. Brady’s week 1 performance of 516 passing yards was a team record and a Monday Night record, but only the fifth-best ever. And his 940 passing yards over two games was the most in the first two weeks of a season, but was five yards short of the record for any two consecutive games.

So far, a weird high-scoring season. It’ll be left to be seen if this is just a result of the shortened offseason (thanks to the lockout), or bigger strategic changes across the league.

Expected ERA

Imagine two hypothetical pitchers. Their ERAs are very close together and both pretty average: 3.40 and 3.41. They’ve both pitched just over 200 innings in 30 starts with just a couple weeks of the season remaining. But one pitcher has had some pretty advantageous matchups: he’s played the Padres four times, the Reds and Rockies twice each, the Pirates, the Mets, the Royals – all teams with records under .500. The other pitcher, on the other hand, has had a harder schedule: four games each against the Yankees and the Red Sox, two against the Rangers, and one in Detroit. Are these equivalent pitchers?

I set out to determine if pitchers we accept as “elite” are truly that great, or if some might have an unfair advantage due to schedules. I downloaded all of MLB’s gamelogs for the 2011 regular season up through yesterday, and then I parsed them, tracking a few key pieces of information. First, the number of innings each pitcher pitched against each team. Second, the average number of earned runs each team scores per inning. Then, for each pitcher, I calculated what their ERA would be if they allowed exactly their opponent’s average for each appearance. Here are the results (for pitchers who have enough innings pitched to qualify for the ERA title):

Expected ERA - after 2011-09-12

One interesting takeaway is that there aren’t any huge surprises. Of the pitchers with the ten best actual ERAs, only one of them (Cole Hamels) isn’t in the top ten for the best differential. Another interesting fact is how much a pitcher’s expected ERA is affected by simple rotation timing, and not just the team’s schedule. The Phillies have played against a lot of sub-.500 teams this year, and Cole Hamels has one of the lowest expected ERAs. But his teammate Cliff Lee hasn’t been so lucky – his expected ERA is higher than most.

But there are some people who get a nudge from good to great with this analysis. Oh, and those two hypothetical pitchers I mentioned? They aren’t hypothetical. They’re Daniel Hudson of the Diamondbacks and David Price of the Rays, respectively. This puts a little context on the fact that Price’s record is 12-12 and Hudson’s is 16-9.

Why We Watch, and NERD

While looking for a little preview on tonight’s Red Sox - Angels game (which I’ll be heading to), I happened upon these blog posts about Why We Watch baseball and a neat statistic to help find the great matchups. I read them in reverse, but here they are in chronological order:

Tonight’s pitchers are Clay Buchholz (with bad year so far and a NERD of 0) and Jared Weaver (7), both of whom were scheduled to start Sunday but were pushed back a day for (unrelated?) illnesses. Both teams have a NERD of 4, which gives the game a NERD of 4. I’m hoping that Clay pitches like he did last year, and it could turn out to be more exciting game than the stat predicts.

My Fantasy Baseball Team

I just finished my Fantasy Baseball league draft. Every time it was my turn to pick, I picked the highest remaining guy from my scientific list. (Except I didn’t pick anyone for my bench until I had a full roster.) Here’s my resulting team:

Name Position WOA
Joe Mauer C 14.69
Pablo Sandoval 1B,3B 3.47
Aaron Hill 2B 2.38
Mark Reynolds 3B 6.02
Jason Bartlett SS 1.63
Jayson Werth OF 11.03
Bobby Abreu OF 7.14
Torii Hunter OF 6.55
Rajai Davis OF 5.26
Denard Span OF 5.21
Roy Halladay SP 21.03
Felix Hernandez SP 20.46
Chris Carpenter SP 11.96
Matt Cain SP 9.95
Mat Latos SP 8.6
Ted Lilly SP 6.39
Rafael Soriano RP 5.41
Brian Fuentes RP 0.06
Adam Jones Bench (OF) 3.59
Johnny Damon Bench (OF) 2.54
Raul Ibanez Bench (OF) 2.43
Miguel Olivo Bench (C) 2.34
Marlon Byrd Bench (OF) 2.32

My co-workers think I’ll be crawling to them for offense in a few weeks, but I think they’ll be crawling to me for pitching.

Two Hundred Thousand Miles

My 2003 Nissan Altima passed 200,000 miles this morning. I bought it new on April 30, 2003. Not only was it the first new car I ever bought, but it’s survived much longer with far fewer major repairs than any other car I’ve ever owned. It needed a new axle at about 160k, and the O2 sensor is currently complaining about the catalytic converter. But that’s really it.

  • I bought it 2,865 days ago. That's an average of 69.8 miles per day.
  • I've tracked the last 50k miles on Fuelly. I've average 26 mpg, so I've used about 5.1 million 7,800 gallons of gasoline.
  • It costs me about 10 cents of gas to drive this car one mile. That's just under $7 a day.
  • The good news is I haven't spent as much on gas as I spent to buy the car. Yet.
  • The most surprising thing to me is that other than Parking By Braille, this car has never been in an accident, despite my less-than-defensive driving.

M and I are discussing buying a new car this spring. Anyone want an aged workhorse?

Cowboy Stadium vs. NASCAR

NASCAR uses about 6,000 gallons (of 110-octane E15 fuel) per race week. The United States averages 386 million gallons of gasoline per day. NASCAR runs 39 races per year (including the pre-season and All-Star races), so their usage only accounts for 0.000166% of fuel consumption in the United States (or about one out of every six-hundred-thousand gallons used).

Cowboy Stadium spends about $200,000 per month on utilities. If we assume that’s mostly electricity, that’s about 25,000 MWh, or the equivalent of an 80,000 person city.

A gallon of gas contains about 35-40 kWh of energy. So Cowboy Stadium consumes 32 times as much energy as all of the cars in NASCAR.

Fantasy Baseball Nerd Overload

This year will be the fourth season that I’ve played Fantasy Baseball. I’m a baseball fan, and I have a pretty good knowledge of the most prominent players of the game. But it’s far from encyclopedic. I have at least a couple of co-workers who can walk into draft night with no real preparation at all, and end up schooling me. What I do have, though, is an analytic mind and a desire for rigor (albeit half-assed rigor). The last couple of years, I’ve been pulling down players stats from the season before, doing a little magic, and coming up with some sort of an order. But I’ve finished in sixth place and then in seventh (out of 12).

There are some complexities in fantasy baseball that make an accurate analysis difficult. As a manager, you need to draft at least one player at each position, a couple general “hitters”, a couple starting pitchers, and a couple relief pitchers (the exact numbers vary from league to league). But there are some positions (catcher and second base in particular) that have very concentrated hitting skills. There are probably three catchers who could be considered “great” hitters. Does the scarcity increase the value of those catchers? How much?

Not all of the stats that are tracked by fantasy leagues can be considered equal. Compare, for instance, RBI and stolen bases. Just about any player that gets drafted will have a minimum of 40 RBIs, but only about half a dozen players will finish with that many stolen bases and at least half will probably have less than 10. So any specific stolen base is more likely to make the difference, since teams' numbers in that stat are more likely to be low. Is a base-stealing average-hitter worth more than a slow heavy-hitter? How much?

And every time it’s your turn to draft a player, you can choose any available player at any position. If you take one of the great catchers early, you might not get a premiere starting pitcher. How can you be sure the tradeoff you made was wise? Hitters and pitchers are scored on completely different criteria, making comparing them even harder.

So how can we possibly take this kind of system and predict how much a player is worth? At the end of the day, what you want in a head-to-head league is “wins”, so the most useful end statistic would be one that you can use to determine “If I drafted Player A instead of Player B, I’d end up with (on average) this many more wins”. After spending some abortive attempts trying to come up with an algorithm, I realized it didn’t have to be that hard. I could throw computing power at it and run a Monte Carlo simulation.

So that’s what I did. I started with someone else’s player-by-player predictions for 2011 stats (the Marcel the Monkey Forecasting System, specifically). Then I wrote a script that would simulate a ten-team league over and over. I had to simplify the rules a little bit – figuring out what to do about benching players and trades and pitcher off-days was just too much. After each simulated season, I determined how many wins over exactly average each team was, and I credited every player on that team with that many wins. Over time, good players would more likely to be on good teams than bad ones, and their averages would be high. Less-good players' averages would be low.

After two million one million (see update below) seasons, I had some pretty firm numbers, and some surprising results.

See the full results

Keep in mind the caveats: This was a ten-team five-by-five head-to-head league. The WOA column (“Wins Over Average”) is the most useful number, and it’s what the spreadsheet is sorted by. It’s scaled for a nine-week season (each team played each other team once), which obviously is not standard, but it’s directly proportional to a real 24-week season.

Multiply by 2.66, and you see that drafting Albert Pujols is worth almost 26 wins over the average player and 2.44 wins over the second-most-valuable player, Hanley Ramirez. After those first two infielders, there’s a lot of pitchers, which goes counter to a lot of the common advice out there. Which I like. Joe Mauer is the first catcher, at 27th (third round in a ten-team league).

Here’s the code and source files that I used, if you’re interested in giving it some tweaks of your own. I used the standard 5x5 stats, but you can change those pretty easily. It took about 90 minutes to run a million seasons on my not-state-of-the-art computer.

Update 11 Feb: This morning, I realized that these numbers were off, in particular those of pitchers. I had been miscalculating team ERA: just averaging all of the players together. The problem is that the true team ERA value is IP-weighted: Total team ER divided by total team IP. So it was over-valuing players with good ERA and few IP, like relief pitchers. The same was similarly true about WHIP and BA, so everyone who didn't play much was being given too much credit. I've updated the code and re-run a million seasons. There are a lot of starting pitchers rated highly now, 8 of the top 10 and thirteen of the top twenty. But actually, the top hundred players are exactly 50/50 batters and pitchers.

Update 14 Feb: I changed the spreadsheet so it includes Marcel's predicted stats for each player. This should make it easier to see why each player might be ranked where he is, plus make it easier to use on Draft Day. I updated the code linked to above, as well.

Update 22 Mar: A couple of last minute tweaks before my draft tonight. I couple weeks ago I added code to remove the "chaff" players (really, only the top couple hundred players will be drafted in a fantasy league, so I shouldn't be comparing Pujols against the 1000th-best batter). And a co-worker noticed David Ortiz missing from the list, which revealed a bug in the way I was dealing with players who play no positions. Also, I've removed hands-down first-round pitcher Adam Wainwright from the stats list.

Patriots late game problems

Yesterday, the Patriots went into halftime with a 14-10 lead. They didn’t score once in the second half, and went home on the wrong side of a 14-28 game. I said at the time that it reminded me a lot of last year. I pulled up some numbers to be sure. (The following stats are for all of 2009 and 2010, including both regular and post-season games. That’s a total of 19 games.)

  • In regulation over that period, we were 11-7-1. The one "tie" went into overtime and the Broncos won on a field goal. (Don't get me started on sudden death overtime.)
  • If the games had ended after one half, we would have been 14-3-2 over that period. If they'd ended after three quarters, we'd have been 14-5-0.
  • Our total point differential has been 154 points in the first half, and -28 in the second half.
  • The point differentials have been 38 in the first quarter, 116 (!!) in the second, 3 points in the third, and -31 in the fourth quarter.

Maybe we can lobby the league to switch to a 45-minute clock when they go to a 20-week/18-game schedule?

The Red Sox haven't been bad

Since April 19, the Yankees are 71-47, Rays are 72-47, and the Red Sox are 70-48. It’s not that the Sox have been bad. In fact, in 2009 between games 13 and 131 they were 70-48, and in 2008 they were 69-49. They have been just as good as in previous years. The problem is that the team fell to a quick deficit and has been trying to catch two other extremely good teams. There’s a good chance the AL East will finish the year with two teams with 100+ wins, and Boston with 90+ wins. I’m almost certain this hasn’t happened as long as there have been divisions (1969). The closest I can find is 1977, when the top three teams in the AL East finished with 100, 97, and 97 wins respectively. But remember, that was a six-team division, not four.

To look at this season and draw the conclusion that we’ve had a bad year is to ignore the actual facts. We just haven’t had as good of a year as we needed to have in order to make up our losses in the first couple of weeks.

Solitaire probabilities

For some reason, I’ve been thinking about Klondike solitaire probabilities a lot lately. Primarily, I’m wondering what the likelihood is that a game will have zero legal plays. I’m certain it happens, but it’s got to be pretty rare. It’s a complex game, though, so here’s my plan towards solving it:

  1. Given two non-ace cards, what is the chance that one cannot be placed on the other, using standard Klondike rules?
  2. Given three non-ace cards, what is the chance that none can be played on any other?
  3. Given seven...?
  4. Given two (three, seven) cards, what is the chance that there is no legal play (placing one on another or moving an ace to the foundation)?

There are more steps after that involving the eight deck cards, but it gets pretty complicated pretty quickly. I’ll be happy just getting this far.

Vive la difference

Anyone with more than a passing familiarity with baseball knows that there’s at least one important rule difference between the American League and the National League: in the AL, the pitcher never bats, and is instead represented by the Designated Hitter, a player who never takes the field. Interestingly, the rule that governs this is 6.10, which begins by stating “Any League may elect to use the Designated Hitter Rule.” Apparently the NL has simply elected not to use it. More interestingly, there are a few minor official rules that specifically apply to only NL or AL teams:

  • 1.16(b) - All NL players have to wear a double ear-flap helmet while at bat. According to 1.16(c), almost all other players are simply required to wear one with at least one ear flap. (Aside: Rule 1.16(c) actually grandfathers in players who chose in 1982 to not wear a helmet with ear flaps. Tim Raines was the last player to wear a helmet without ear flaps. He retired after the 2002 season. Julio Franco is the only still-active player who would qualify under this rule. Unfortunately, he chose to wear one-flap helmets, even before they were required.)
  • 4.10(a) - The National League can adopt a rule changing one or both double-header games to be seven innings long. The AL does not have that right. As far as I know, such a rule has never been adopted.
  • 4.12(a)(7), 4.12(a)(8), and 4.12(a)(9) - The NL can adopt a rule making games that have been stopped before regulation (for instance, because of rain) a "suspended game" instead of "no game".
  • 6.02(d) - The NL had to follow this experimental rule in 2006, essentially saying that the batter could not leave the batter's box unless either team was making a substitution or calling a conference. I have no idea if they're planning to make it permanent.
  • 10.23(b) - In the AL, the league pitching champion must have pitched at least as many innings as the number of games each team played that season (162 this year). In the NL, the champion only needed to have pitched 80% that many innings (129.6). As far as I can tell, this rule rarely, if ever, actually matters. The top pitchers in both leagues usually pitch at least 190 innings a year.

Unanimous Justices

Did you know that 4 of the 9 Supreme Court Justices now serving were confirmed by unanimous votes? And another 3 had fewer than ten opposing votes each. Only Clarence Thomas's confirmation was even close.

Justice Appointed by Vote
John Paul Stevens Ford 98-0
Sandra Day O'Connor Reagan 99-0
Antonin Scalia Reagan 98-0
Anthony Kennedy Reagan 97-0
David Souter G.H.W. Bush 90-9
Clarence Thomas G.H.W. Bush 52-48
Ruth Bader Ginsburg Clinton 97-3
Stephen Breyer Clinton 87-9
John Roberts G. W. Bush 78-22

Peyton Manning's statistics

You know I love statistics/numbers/trivia. Last night in the gym, the UConn football game was on, and it’s got me in the mood for Superbowl XXXIX. Clicking around, reading about some football stuff, led me to a fantastic list of Peyton Manning’s amazing season. He broke the record for most touchdown passes in a season; but read the article. He’s broken all kinds of records in stupendous fashion, and has been an amazing Quarterback his whole career.

"So far this year, Manning has more touchdown passes than the Giants, Ravens, Bears, and Cardinals combined (44). He has more TD passes in 15 games this season than the Giants have in their last 50 games (48). If you split Manning up into two quarterbacks, he would rank third and fourth in touchdown passes among AFC quarterbacks with 25 and 24."

Hypothetical optimal imperfect election

Remember the 2000 election (sure you do), when Gore won the popular vote but lost the electoral vote? Well, it’s happened before, and it’s one of the strongest arguments that people use for abolishing the electoral college (an argument that I haven’t chosen a side on, to be honest). It got me thinking: how much could a candidate lose the popular vote by and still win the electoral vote? The candidate would have to get half (plus one) of the votes in “strong states”, and zero votes in “weak states”. In this case, a state’s strength is defined as electoral votes per voter.

A candidate could win the election with only 21.7% of the national popular vote by just squeaking by in AK, AL, AR, AZ, CO, CT, DC, DE, GA, HI, IA, ID, IN, KS, KY, LA, MD, ME, MS, MT, NC, ND, NE, NH, NM, NV, NY, OK, RI, SC, SD, TN, TX, UT, VA, VT, WV, and WY and getting no votes in any other states. You’ll note that CT, MD, and VA are “weaker” states than California, but since it has such a large population, including it instead of the three weaker states increases the final total quite a bit.

Note that I had to ignore Maine and Nebraska’s congressional district method, since it would add too many extra variables.

Update 10-15: I should probably point out that I used the turnout numbers for the 2000 election, since I assumed that was a good enough estimate, but the new electoral vote counts (i.e. 7 for CT instead of 8). You can check out my actual data.

Ken Jennings, Day 39

Jeopardy is showing new episodes again, and Ken Jennings has won show number 39 in a row. My favorite among some KenJen statistics: The last time that Final Jeopardy was mathematically necessary (i.e. the last time Ken did not have more than double the score of the next-best contestant) was June 29.

Update, 09-08: Huge KenJen spoiler.