Field Sobriety Test Studies Found to Be Flawed
Posted by Lawrence Taylor on June 18th, 2012Proponents of the so-called “standardized” field sobriety tests (SFSTs) have long pointed to federally-funded field studies which indicate a high correlation between performance on the tests and actual blood alcohol concentrations (BAC).
Subsequent studies, however, have called those conclusions into question.
Originally, the National Highway Traffic Safety Administration (NHTSA) paid a private group, the Southern California Research Institute, to conduct studies to find which among the various field sobriety tests used by police were most effective and to develop a standardized 3-test battery. SCRI subsequently reported to NHTSA that a battery of walk-and-turn, one-leg-stand and nystagmus provided a strong correlation with breath test results.
Confronted with questions about those conclusions, NHTSA later commissioned the same researcher who had conducted the original studies, Marcelline Burns, to corroborate the accuracy of her own tests of the SFSTs – rather than commission an independent source.
Burns accompanied a small number of San Diego officers conducting actual DUI investigations in the field. After administering the SFSTs, the officers were asked to guess whether suspects had blood alcohol concentrations (BAC) over or under .08%. Burns reported a 91% correlation between SFSTs and BAC over-under estimates, thereby validating the battery of tests she had helped create.
A subsequent scientific article challenged Burns’ corroboration of her own research. In Hlastala, Polissar and Oberman, “Statistical Evaluation of Standardized Field Sobriety Tests”, 50(3) Journal of Forensic Sciences 1 (May 2005), the raw data used in the validation study were obtained from NHTSA through the Freedom of Information Act. The methodology used was then reviewed and the data subjected to statistical analysis.
The methodology was found to be seriously flawed in a number of respects. For one thing, many of the suspects had very high BACs, making estimates of whether a suspect was over .08% obvious regardless of SFST performance. For another, there was no attempt to isolate the influence of SFST performance from other factors: officers estimated BACs after the field sobriety tests, but they also took into account earlier observations, such as erratic driving, slurred speech, odor of alcohol, flushed face, admissions as to amount of alcohol consumed, etc.
The most glaring defect in Burns’ corroborative study was that “all police officers participating in the study were equipped with NHTSA-approved portable breath testing devices”. In other words, the San Diego officers already had the results of portable breath tests before they were asked to estimate the BACs later obtained at the station!
After reviewing the flawed methodology, the raw data was then statistically analyzed. The conclusions:
If we consider three ranges of MBAC [measured blood alcohol content], 0.00% to 0.04%, 0.04% to 0.08%, and 0.08% to 0.12%, the officers’ EBAC [estimated blood alcohol content] overestimated the MBAC 76%, 67% and 48% of the time, and underestimated it 14%, 26% and 28% of the time.
In other words, officers relying upon field sobriety tests were far more likely to overestimate BACs than underestimate — particularly with those suspects having very low BACs.
(T)he utility of the SFST depends very much on how intoxicated an individual is. Accuracy (and specificity) are low when individuals are close to 0.08% MBAC, but if the individuals are quite intoxicated, such as above 0.12%, then accuracy is high.
In borderline cases involving persons at or under the legal limit, then, officers were very poor at estimating blood-alcohol levels based upon SFSTs. And it is these cases, of course, that are critical. Suspects with high BACs are relatively easy to single out without the help of field tests; it is for the closer cases, particularly those who are innocent (below .08%), that the SFSTs are designed. And it is with these very cases that the tests apparently fail.
Put another way, accuracy in using field sobriety tests is high when they are not needed — and low when they are.
For another independent study conducted by Professor Spurgeon Cole of Clemson University, in which he found field sobriety tests to be worthless, see Are Field Sobriety Tests Designed for Failure?.



I disagree that the SFSTs are worthless. They are, of course, scientifically unreliable and inaccurate, but that was never the reason for their use. They provide law enforcement with a valuable tool that is subjective, not objective, used to manufacture probable cause to put someone in cuffs and take them to the LEC for a breath test. The SFSTs have made it exponentially easier for police and prosecutors to arrest and convict low (or .00) BAC drivers of DUI. That was the ONLY goal of their development. In that sense, SFSTs have proven to be invaluable.
Great article. After 15 years of practice I am convinced that SFSTs are based on junk science and your blog post shows why the science is flawed. It is not hard to tell someone impaired over a .1 or .15 just by observation. SFST are used to snare people that are around a .08 or below. Once these non impaired people are in the system many end up pleading to avoid the “jury trial tax” of jail. I am always amazed at the high blows on people that perform the SFST well. I think the BAC machines are equally based on junk science and hope you write about that soon.
I have scene people well above a .200 perform the exercises very well, and people below a .050 unable to perform them at all. Those are the extremes of cases. The SFSTs were established as a norm, and to be applied accordingly. Assuming there is no combination of drug influence, some people’s bodies react differently to an equivalent amount of alcohol in their system.
But remember, an officer is making the decision the arrest based solely on a subject’s performance during these exercises. It’s the totality of the circumstances. The SFSTs are just one piece of the puzzle, a specific color to a painting. I support the research behind both sides, but will lean toward affective and reliable every time.