Improved methods for studying hard-to-reach populations published in PNAS

Respondent-driven sampling is a popular network-based approach to sample hard-to-reach populations, where participants refer contacts into the sample through a coupon system. It has been particularly useful in HIV research where individuals most at risk (e.g., people who inject drugs) are unlikely to participate in conventional sampling schemes. Many major health organizations, including the Centers for Disease Control and the World Health Organization, employ this approach to quantify the prevalence of HIV in these at-risk groups. Unfortunately this type of network sampling suffers from a significant drawback: because referred contacts often share similar characteristics, samples are highly correlated which can lead to exceedingly variable estimates.

In work that just appeared in the Proceedings of the National Academy of Sciences, IFDS members Sebastien Roch and Karl Rohe introduced a new estimation technique for respondent-driven sampling with a substantially reduced variability that, surprisingly, is comparable to that expected under more conventional sampling. The technique is based upon the classical statistical idea of generalized least squares and points the way to entirely different classes of estimators that account for network structure.