9 Keys To Successful Randomized Impact Evaluations
By Emmanuel Ekuri, Sabbatical Fellow
June 8, 2012
The APHRC Education team is currently designing a ‘policy pilot’ intervention involving a randomized experiment, to improve time-on-task, pupil engagement, academic learning time and learning outcomes at Basic Education level. This intervention shall seek to strengthen capacity at the basic education level resulting in: Improved teacher/pupil interactions, Improved pupil attendance, Improved children participation in class activities, Teachers use of learning materials, new practices, and appropriate assessment approaches; while long term outcomes have to do with improved test scores in literacy and numeracy. Recently, I participated in an Executive Education Course on conducting high quality, policy relevant rigorous impact evaluations for Education Programs. Here are 9 key considerations when undertaking a randomised impact evaluation.
- There are many ways to estimate a program’s impact. Different methods can generate different impact estimates. Though randomized experiments can be highly complex and challenging, but they provide the most credible method to estimate the impact of a program if properly designed and conducted. They remove bias, increase credibility and transparency.
- To credibly detect a given effect size in randomized experiments and prevent biased estimates, the utilisation of a sufficiently large sample can be very useful. Randomization helps us to be accurate (unbiased), while sample size allows us to be precise (confident about our estimates).
- Cluster randomization minimises contamination across individual; and it is very important to randomize an adequate number of groups. Very often, the number of individuals within clustered groups matters less than the total number of groups.
- Stratification reduces sample size needed to achieve a given power because it reduces the variance of outcome of interest in each strata as well as the correlation of units within clusters.
- Sample size calculations are a craft, and depend on parameters whose values are unknown and will vary. They involve some guesswork and pilot testing.
- Ensuring reliability in survey data collection is very important and time consuming. Probably the most important part is piloting. It is often good to start with a very basic set of questions asked in an open-ended way, more of a qualitative or focus group style. Over time, the lessons from this can be refined into a survey.
- Attrition and Spill over constitute a serious threat to validity of experiments. However, measuring programme impact in the present of spill over can best be achieved by designing the unit of randomization so that it encompasses the spill over. For instance, if we expect externalities that are all within school, Randomization at the level of the school allows for estimation of the overall effect.
- The more outcomes you look at, the higher the chance you find at least one significantly affected by the program. Report results on all measured outcomes. Therefore, include multiple outcomes especially, the type that can occur with enough frequency to detect an impact given your sample size.
- There are many threats to the internal and external validity of randomized evaluations as in every other type of study. Randomized trials facilitate simple and transparent analysis since they provide few “degrees of freedom” in data analysis. RCT also allow clear tests of validity of experiment.