Wednesday, June 21, 2006

This year's NBA championship series is now over, with the Miami Heat defeating the Dallas Mavericks 4 games to 2. There were several instances of streakiness in the series, not least Miami's coming back from 2-0 down (and in great danger in Game 3) to take four straight. Each of the teams, as well as individual players, also went through periods of hotness and coldness, of course. Once the Heat began to turn the series around, Dwyane Wade went through stretches where it looked like he couldn't miss (and rarely did). At the other end of the spectrum, the Mavs' outside shooting during the second half of Game 6 seemed to disappear.

What I'd like to focus on here, though, is the dreadful free throw shooting of Miami center Shaquille O'Neal, whose statistics are available here. As all NBA fans know, even under the best of circumstances, Shaq is terrible from the stripe, making only 52.8% of free throws for his career (based on nearly 10,000 attempts!).

This past regular season, O'Neal slipped to 46.9% on free throws, then to 37.4% for the play-offs (68 of 182). In the finals against Dallas, Shaq's FT shooting was particularly hideous, 29.2% (14 of 48). In three of the games against the Mavs, he shot 1 of 9, 1 of 7, and 2 of 12.

Before possibly examining the depths of O'Neal's woes vs. Dallas, I think it's worth testing initially whether the roughly 10% drop in his FT percentage from the regular season to the play-offs overall is statistically significant. With a dichotomous outcome such as hit or miss on a free throw, a statistical technique known as the binomial probability (for which there's an online calculator in my links section, to the right) is very useful. It answers the question of how likely a given pattern is (i.e., a certain number of hits within some number of attempts), given some prior baseline percentage of success.

In Shaq's case, how likely is it that he would have made 68 (or fewer) free throws out of 182, assuming a baserate of .469 (corresponding to his FT percentage in the regular season)? Using the aformentioned calculator, this probability is .006, sufficiently small to be considered statistically significant (cut-offs of .05 or .01 are commonly used).

Thus, even when we take Shaq's play-off FT performance as a whole (not focusing merely on his horrible time in the final round), his fall-off from the regular season is more than would have been expected from ordinary fluctuation. Fatigue is a possibility, especially since his worst round in the play-offs was the last one. However, Shaq and the Heat had a six-day rest from the end of the Detroit series (June 2) to the start of the Dallas series (June 8), and he still went 1 for 9 from the line in the opener against the Mavs.

If anyone would like to conduct statistical analyses of other players in the Miami-Dallas series, please do so. You can provide a brief write-up of what you found in the comments section below.

Monday, June 12, 2006

Leading up to this past weekend, I had been planning to write something about how the men's French Open tennis final would be pitting two players against each other, who each had phenomenal streaks coming in. That indeed happened and I will still write about it, but something else happened over the weekend in college baseball, which I think tops the tennis match.

The University of South Carolina hit a mind-boggling five consecutive home runs against the University of Georgia, en route to a 15-6 win and 1-0 lead in the teams' two-out-of-three super-regional series (final qualifying round before the College World Series).

A simple way to estimate the probability of five homers in five at bats is to start with the Gamecocks' baseline probability of hitting a home run in any single at bat. This Southeastern Conference (SEC) baseball statistics page (updated through June 6, as I'm looking at it) tells us that, out of 2,215 at bats this season, South Carolina had hit 82 homers (.037).

Alternatively, we could increase the denominator by adding in plate appearances that are not counted as official at bats. The main source of such extra appearance are walks, however, and one could argue that many walks represent instances where the pitcher does not want to give the hitter the opportunity to swing the bat (explicitly, when there's an intentional walk, but also when a team "pitches around" a hitter). Also, by using only official at bats as the denominator (and thus keeping the home run ratio a little higher), that will make my upcoming calculation a little more conservative (i.e., helping to avoid overstating the rarity of the occurrence).

We then simply raise the Gamecocks' probability of a home run on a single at bat (.037) to the fifth power (representing the five homers), which yields .00000007 (7 X 10 to the minus eighth power, or 7 in 100 million). This type of calculation is analogous to determining that the probability of rolling double sixes on two dice is 1/36, by raising the probability of a six on a single die (1/6) to the second power.

In the dice example, it is assumed that the outcomes of the roll of two dice are independent (i.e., the number that comes up on one die does not affect the number that comes up on the other). One may question whether the independence assumption holds up in this home run-hitting scenario. Many of you are probably thinking that the same Georgia pitcher was throwing to these batters and just kept "grooving" the ball to the hitters, based on loss of speed and/or movement on the pitches. That may be true to some extent, but it must be noted that after the first three homers of the streak, Georgia changed pitchers and the new guy gave up two more homers!

Another consideration is that I was drawn to analyze the South Carolina streak by its spectacular nature. If we were to ask instead, in all the countless college baseball games played over a period of years, how likely is it that we would find such a streak at some point, the streak would not seem so unlikely.

Here is a passage from the textbook I use in teaching statistics (King & Minium, 2003, Statistical Reasoning in Psychology and Education, p. 205):

Let us consider again the case of Evelyn Adams... who won the New Jersey Lottery twice in a 4-month time span in 1986. The probability of Ms. Adams doing this was 1 in 17 trillion... If there were 4,123,000 lottery tickets sold for each lottery, and Ms. Adams had purchased 1 ticket for each, the probability of her winning both was (1 / 4,123,000) (1 / 4,123,000), the same as for any other specific person who purchased 1 ticket in each lottery.

But the probability of someone, somewhere winning two lotteries in 4 months is a different matter altogether. Professors Diaconis and Mosteller (1989) calculated the chance of this happening to be only 1 in 30.

The citation for the original Diaconis and Mosteller article is:

Diaconis, P., & Mosteller., F. (1989). Methods for studying coincidences. Journal of the American Statistical Association, 84, 853-861.

In fact, as the above-linked article about the South Carolina homer barrage notes, the five "dingers" merely tied the NCAA record (set in 1998), rather than breaking it.

What about the tennis match that I started this write-up with? I've gone on too long for a detailed statistical analysis, so I'll just note that Rafael Nadal came into the French Open final having won 59 straight matches on clay (the surface in the French), whereas his opponent Roger Federer had won 27 consecutive matches in major (Grand Slam) tournaments, capturing Wimbledon, the U.S. Open, and the Australian Open, before advancing to the finals in Paris (none of these three tournaments won by Federer are played on clay). Nadal beat Federer, and I'll leave you to read about it here.

Monday, June 05, 2006

Welcome to the relaunching of the Hot Hand in Sports website. After somewhat over four years with the old look, I thought something new was in order. This new format should also provide several advantages over the old one:

*The URL is now much simpler (be sure to notice, however, that it's; "hothand" without the "the" will lead to another, unrelated site).

*Readers can now comment on my entries (I've put in some steps, however, in an attempt to prevent spam).

*Over the years, my write-ups have been shifting away from long, detailed analytic pieces to brief summaries, always with a link to an article about the sports performance in question, and sometimes with statistical analyses of my own. The format on this new hosting site should fit well with my trend toward succinctness.

Another nice thing is that Blogspot has now made it much easier than before to post visual images. Though perhaps not as frequently as before, I still occasionally may want to post charts, graphs, and the like.

In the coming days and weeks, I will be inserting links on this new page, attempting to preserve as much of the information on the old page as possible. If there's something on the old page that you don't see here, please don't hesitate to inquire by e-mail (via the link to my faculty webpage in the upper-right portion of the page).


One recent, substantive hot streak that I wanted to mention is that the Angels' Vladimir Guerrero got a hit in all three late-May games against the Texas Rangers, meaning that he has now gotten at least one hit in all 42 games he's ever played against them. To quote the headline I came up with and was using on my old site, "Texas Can't Be Glad to See Vlad." The teams now don't play each other again until August.