Thursday, November 30, 2006

Being a faculty member at Texas Tech University, I periodically check out the Internet discussion boards related to the school's sports teams. It was there that I learned a few hours ago that the Red Raider men's basketball team is, at the moment, leading the NCAA in three-point shooting percentage.

Texas Tech has made 58 of 115 attempts from behind the arc (50.4%). While I was looking at the team statistics, I decided to peruse the individual three-point shooting statistics, as well.

Excluding three players who are each 2-for-2 (100%) on three-pointers due to insufficient attempts, the current national leader among individuals is BYU's Austin Ainge, who's hit 12-17 (70.6%). (For those who are wondering, Austin is indeed the son of former NBA guard Danny.) I guess you can say the young Ainge has the range!

Neither Texas Tech's 50% success rate as a team, nor Ainge's 70% rate, is likely to hold up for the season. Last year's three-point percentage leaders at the end of the season were Southern Utah (team) at 42.9% and Northern Arizona's Stephen Sir (individual) at 48.9%.

The current season is about one-fourth of the way through. What we're likely seeing, therefore, is the extremity of results associated with small numbers of observations. This concept was first brought to my attention by Geoff Fong in the spring of 1984, when he was on the faculty at Northwestern and I was visiting during my tour of prospective graduate schools (I ultimately chose Michigan).

Geoff was telling me about his research on statistical reasoning, and he pointed out how, early in every Major League Baseball season, the list of batting leaders will tend to have several players hitting above .400, yet there would be virtually no chance of any player ending the season at that level (the last player to hit .400 or better for a season was, of course, Ted Williams in 1941).

This statistical document describes the small-numbers phenomenon a bit more technically:

...all other things being equal, variation is more pronounced with small samples than with large ones. The larger your sample, the more stable your results will be. They will be less subject to the possibility that another study would produce greatly different results. A corollary is that large samples are less likely to produce extreme results. For example, assuming that you have a fair coin, it's much more difficult to get all heads when you toss a coin 50 times than when you toss it only two or three times.

Let's use last year's Texas Tech three-point success rate of .390 as a baseline for this year's squad (though there has been some change in personnel, most of the Red Raiders' outside shooters are still on the team, including offensive stalwart Jarrius [Jay] Jackson).

Using an online calculator for what is known as a binomial probability, we can ask how likely it is that a .390 three-point shooting team (which is what this year's Red Raiders are assumed to be, based on last year) could make 58 (or more) treys in 115 attempts. The answer is .008, a little less than 1-in-100, so what the Red Raiders are doing is already very rare statistically. Eventually, we may have to reject our "null hypothesis" that Texas Tech really has an underlying .390 probability on making threes.

As noted above, however, the larger the sample, the less susceptibility to unusually high or low success rates. To approximate a full season's worth of shots (i.e., a larger sample) instead of just a quarter season, I multiplied by four, Texas Tech's current number of made threes (58 X 4 = 232) and number of attempts (115 X 4 = 460). The ratio of 232/460 is the same as the Raiders' current three-point percentage of 50.4, but would be a much longer-term accomplishment. Again, using .390 as a baseline, the team's probability of hitting 50.4% of 460 three-point attempts is much tinier than before, .0000004, about 4 in 10 million.

Another potentially relevant concept that I'd like to mention briefly is regression toward the mean, which Lady Raider basketball announcer Ryan Hyatt sometimes invokes in his radio broadcasts. Regression toward the mean simply refers to the tendency for extreme values in the early rounds of performance -- either extremely high or extremely low -- to be followed by values more in the center of the distribution.

In conclusion, the statistical phenomena of small samples and regression toward the mean both suggest that the Texas Tech men will suffer some drop-off from their current 50.4% three-point shooting percentage. You probably don't need to have a statistics teacher tell you a 50% three-point shooting clip is unlikely to be maintained for a full season, any more than you need one to tell you that baseball players batting over .400 for the first month of the season will almost certainly fall off in their averages. If, however, you have some interest in the statistical concepts associated with teams' and players' fall-off after hot starts, you've visited the right place!

No comments: