The role of RCTs within the police professionalization agenda
The Randomised Control Trial (sometimes ‘Randomised Controlled Trial’ and occasionally ‘Randomised Clinical Trial’) or ‘RCT’ is a scientifically-conducted experiment and considered the ‘gold standard’ for some forms of empirical research. The RCT is used extensively in clinical and drug trials, particularly for assessing the effect of a new treatment or medicine. The reference to a ‘gold standard’ is probably to the widely held acceptance of the RCT as having the greatest internal validity when testing for cause and effect relationships, when compared, for example to more qualitative approaches (which presumably are at best of only ‘silver’ or ‘bronze’ standard). RCTs have been used within medical research since at least the 1950s (although the use of randomised experimental methods in agricultural occurred in the 1920s and 30s) and by 2015 the Cochrane Central register of Controlled Trials (CENTRAL) contained entries for many hundreds of RCTs used in diabetes research alone (authors’ calculations derived from Cochrane Community, 2015).
It is however only more recently that Evidence-Based Policing (EBP) has employed RCTs, perhaps drawing inspiration from Evidence-Based Medicine, and also parallel developments in experimental criminology. Certainly, RCTs are an important method of the ‘testing’ phase of the EBP edict to ‘target, test and track’ the use of police resources (Sherman, 2014). More generally, the RCT is widely held as being an important measure of the scientific quality of research into ‘what works’ in policing. For example, the Center for Evidence-Based Crime Policy (based at the George Mason University) uses an ‘Evidence- Based Policing Matrix’ to assess the literature on ‘What Works in Policing’ (George Mason University, 2013a). For a study to be included in the Matrix it must ‘either be a randomised controlled experiments (sic) or quasi-experiments using matched comparison groups or multivariate controls’ (George Mason University, 2013b). In the UK, the College of Policing’s ‘What Works for Crime Reduction Centre’ produces regular briefings on ‘What works in policing to reduce crime’ that utilise the Campbell Collaboration Systematic Review approach (College of Policing, 2015). The Campbell Review methodology emphasises that ‘With rare exceptions, the best evidence […] is provided by randomized controlled trials (RCTs)’ (The Campbell Collaboration, 2014, p.9).
Usually an RCT is employed by researchers in crime and policing in an attempt to gain knowledge about an underlying ‘population’. This is a statistical term that refers to the complete set of ‘objects’ (such as people) that share a common quality or characteristic (usually referred to as a ‘variable’). The quality or characteristic is normally something that can be measured in a quantitative way. For example, we might be interested in the population of all victims of domestic abuse in the city of Manchester in 2015. One characteristic all of the subjects (the victims) will have in common is their likelihood of being a victim of domestic abuse in the future, measured by the number of times this occurs (the rate of repeat victimisation).
An RCT might be carried out to test what effect a particular police action has on this characteristic: that is, does the particular police intervention decrease, increase or leave unchanged the likelihood of further victimisation? In this case the population size of all victims of domestic abuse in Manchester would both be too large and too difficult to completely identify to form the basis of the experiment and so instead a ‘sample’ would be used. This is the first scientific compromise in the experiment as it is inevitable that any effects we might be able to discern for the sample do not necessarily have to hold true for the population, especially if based on one RCT alone. A particular issue in sampling is that sample might be in part ‘self-selecting’. In the domestic abuse RCT the sample is likely to be predominantly made up of those people who have themselves reported allegations of domestic abuse to the authorities (as distinct from, say a report from a neighbour). This self-selecting sample is probably not entirely representative of the population.
After sampling set criteria for inclusion in the RCT are applied to members of the sample. For example, it might be determined that every member of the sample should be a subject of the experiment. However, there may be theoretical reasons why we need to apply criteria for inclusion and exclusion – for example, based on the likelihood of a potential subject in the experiment to give informed consent. The application of the inclusion criteria might unwittingly introduce some form of bias in the outcome, but in any case need to be made expolicit.
After application of the criteria members of the sample (for example, victims of domestic abuse) are then allocated to one of two groups: one the ‘intervention group’, the other the ‘control group’. With RCTs, the method of selection is by random allocation using a ‘double blind’ approach. This means that each member of the sample group (the ‘subject’) has equal chance of being selected and neither the subject nor the researchers involved (the ‘experimenters’) or any other participants know to which group the subject has been allocated. Often this double blind random allocation is determined by random number generation (to avoid human intervention and reduce unconscious bias) and occurs at a pre-determined time within the RCT (for example, when the police respond to the report of domestic violence in Manchester in 2015).
The ‘intervention group’ receives the new ‘treatment’ whereas the ‘control group’ does not. In medical drug trials the control group receives instead either an inert placebo or current best treatment. (With an RCT as part of EBP any human subjects within the control group are likely to experience current best practice rather than a form of non-effective ‘placebo’ intervention – largely on ethical grounds). With a rigorously conducted RCT it is important that both the subjects and the experimenters (which in terms of RCT with EBP is likely to include the police officers implementing the new approach with the treatment group) are ‘blind’ to whether the ‘treatment’ or the ‘placebo’ being administered.
After implementation the outcome(s) of the trial are then measured for both the intervention group and the control group. Often this means measuring the change in a numerical quantity, such as the number of times something has happened to the population variable being researched, or the average rate at which an event occurred. So, for example researchers would measure rates of re-victimisation for both the intervention group and the control group for the sample of domestic abuse victims in Manchester. In many RCTs conducted within healthcare the outcomes for each group are also measured on a number of further occasions and even the whole RCT might be repeated. This repetition is undertaken to check that the effect, if any, of the new treatment does not simply fade away after a period of time, or was the product of chance selection from the outset.
In experimental terms, the conclusion of an RCT is the comparison between the outcome for the intervention group and the outcome for the control group for the particular variable being investigated. Usually this means comparing two numerical values (such as 16% and 22%), together with a theoretical understanding of the likely underlying form that the data takes, and in a statistically robust fashion. Simply by chance alone, measurements of a particular variable for the intervention and control groups are likely to be different but the key question
is whether the difference is large enough for us to conclude that the outcomes are significantly different. A decision concerning a significant, or otherwise difference is not simply a matter of a researcher’s judgement, however experienced and qualified the individual is, but instead is the consequence of a pre-determined statistical test. The particular test employed depends on the nature of the data collected, the design of the RCT and other factors but commonly occurring tests used in RCTs include the so-called t-tests and chi-squared tests. Most statistical tests used to assess the difference in outcomes of an RCT will come with a level of significance (often cited as ‘p’ and values of 0.05 and 0.01 are often employed) –a way of judging just how likely identifying a significant difference can be put down to chance or is a ‘genuine’ effect.
RCTs as part of EBP have led to so notable insights into ‘what works’ in policing and crime reduction. Examples include evaluating the effects of CCTV on crime, the initial police response to abuse within the family and street-level drug enforcement (College of Policing, 2015). These successes may have contributed to what Greene (2014, p.193) calls the narrowing of the ‘cognitive lens’ through which policy makers and others view policing research. However, there is a danger that this narrowing means that we both undervalue the alternatives to the RCT (as not meeting the ‘gold standard’) but also ignore the inherent problems in applying RCTs within a crime and policing context. Putting aside a possible philosophical critique of RCT (e.g. epistemological questions and opposition on grounds of postivism) and practical limitations within policing itself (e.g the lack of qualified and motivated researchers), we can instead highlight some likely problems inherent in employing a rigorous RCT approach when employed as a means of researching policing.
The first and possibly most fundamental question that a police researcher needs to ask before undertaking an RCT is whether or not there can be any genuine uncertainty about the effect of the intervention (see Lilford and Jackson, 1995 and the concept of ‘equipoise’ in medical research – is a trial ethical if there is no preference between ‘competing’ treatments?). If there is already good reason to judge that the intervention group is to receive the better treatment is it morally acceptable to allocate subjects to the control group? In many of the published RCTs within EBP there is the lack of pilot studies, and insufficient review of the evidence before establishing hypotheses for testing through RCT which would help mitigate this problem. One could argue that RCTs should be used only where there is some well-founded doubt concerning the effect of the intervention. It is hardly surprising that an RCT-based experiment in Queensland, Australia (involving 2762 drivers) found that if road drivers subjected to a breath test for alcohol were treated by police officers fairly, politely, respectfully and with apparent interest in the views of the driver then the drivers tended to have more trust and confidence in the police immediately after the event (Murphy et al., 2014).
A further problem is that with RCTs used within EBP the inclusion and exclusion criteria (for sampling from the population) are often missing from the published description of the RCT. In contrast with healthcare clinical RCTs ‘co-morbidity’ (where a subject of an RCT has two or more illnesses) is often an exclusion criterion when sampling, given its tendency to be a confounding factor. The criminological and policing literature is replete with examples of co-morbidity.
Further, the use of randomisation methods and double blind implementation are key features of RCTs. Indeed, Oakley et al. (2003, p.171) argued that the only ‘special claim’ to made for RCTs is the use of random allocation used to minimise bias in creating intervention and control groups. However, in terms of implementing a rigorous ‘gold standard’ RCT within EBP it is often almost impossible for police experimenters not to be aware that they are employing the ‘treatment’ and not the ‘placebo’. Indeed, both experimenters and subjects will often know whether they have been allocated to the intervention or to the control group as this is beyond the control of the researchers. (A police officer will be aware that they have a new ‘script’ to use when conducting a random breath test in Australia and wearing body video to a domestic incident in the UK).
It is also the case that response bias, particularly arising through self-selecting or non-response, can be a serious potential problem for RCTs within medicine and healthcare (Antrobus et al., 2013) but are perhaps even more acute in policing and crime reduction where the subjects are often offenders or victims, not ‘patients’,
Finally, there are also significant challenges encountered at the summative phase of an RCT-based EBP experiment – that if drawing valid and reliable conclusions. These challenges include premature (often implicit) generalisation from the outcome found for the sample to the underlying population without first repeating the RCT or triangulating the results with other forms of experimental research or complementary qualitative methods (see below). A statistically significant result concerning an intervention with a self-selected sample of domestic abuse victims in Manchester in 2015 with ill-defined inclusion/exclusion criteria and quasi-random allocation to groups can easily ‘slip’ into becoming a confident observation about a particular form of intervention as a policing strategy for all domestic abuse victims in the city, or even the nation. Even if we assume that the RCT has been conducted in a rigorous scientific fashion, meeting the ‘gold standard’ on the way, and that the difference between outcomes for intervention and control groups is statistically significant this alone does not guarantee a ‘successful’ EBP outcome. .After all, a difference between average police patrol response rates of 23.8 minutes and 22.6 minutes for control and intervention groups could well be statistically significant at the 5% or even 1% level but at what cost is the reduction in 1.2 minutes? In the context of policing other factors are invariably taken into consideration such as the resource allocation costs involved in implementing alternative interventions for perhaps a relatively modest (although statistically significant) gain..
Police research beyond RCTs
RCTs are undoubtedly valid and reliable means for testing the effect of interventions in policing or within crime reduction. However, they are not the only, or sometimes even necessarily the best way, of determining ‘what works’ in policing and crime reduction. (In fairness to those promoting RCTs as part of EBP, it is rarely claimed that they are). Firstly, RCTs are not the only experimental method in town that might provide evidence of a causal relationship between variables and permit generalisations from the sample to the population. Secondly, and perhaps more fundamentally, RCTs alone (and experimental methods more generally) are unable to answer some of the higher level questions of ‘What works’ in policing and crime reduction. Indeed, as Hough (2010, p.11) notes (referring to RCTs and reducing re-offending) even for the middle level questions ‘the right strategy for getting closer to answers is not to invest in a huge programme of randomised controlled trials, but to construct and test middle-level theories about how to change people’s behaviour’.
One of the attractions of RCTs to the police community must surely be their apparent scientific nature (as exemplified by statistical testing) and their pedigree in health care sciences and elsewhere. In this sense, the RCT is implicitly seen as standing in contrast to the ‘less scientific’ and more qualitative methods traditionally used within the social sciences. However, we would argue that it is a false dichotomy to cast ‘qualitative’ and ‘quantitative’ research of policing as somehow being in opposition. An analogy is the similar caricaturing of investigative reasoning within criminal investigation as either inductive or deductive whereas reasoning in reality is often abductive in nature, which subsumes aspects of both. Combining methods (so called ‘mixed methods’ research) to test hypotheses enables a research question to be addressed from different perspectives and acknowledges the practical reality that RCTs in the context of policing can rarely, if ever, be conducted in the same way, and meet the same ‘gold standard’ as say RCTs used for pharmaceutical research. Evidence Based Policing owes some of its history to Evidence-Based Medicine and it is telling that in recent years within healthcare mixed methods (rather RCTs alone) have become the ‘dominant paradigm’ in research (Doyle et al. 2009, p.175).
Antrobus, E., Elffers, H, White, G and Mazerolle, L. (2013) Nonresponse Bias in Randomized Controlled Experiments in Criminology: Putting the Queensland Community Engagement Trial (QCET) Under a Microscope , Eval Rev June/August 2013 37: 197-212
Oakley, A., Strange, V., Toroyan, T., Wiggins, M., Roberts, I. & Stephenson, J. (2003) Using Random Allocation to Evaluate Social Interventions: Three Recent U.K. Examples The Annals of the American Academy of Political and Social Science 2003; 589; 170
Doyle, L., Brady, A-M., Byrne, G. An overview of mixed methods research Journal of Research in Nursing
Sherman, L. W. (2014) The Future of Policing Research, Statement to the Division on Policing, American Society of Criminology, San Francisco, November 20, 2014 Available at: http://www.crim.cam.ac.uk/courses/police/prospective/ASC%20Sherman%202014%20Policing%20Division%20The%20Future%20of%20Policing%20Research%20final.pdf
Cochrane Community (2015) Cochrane Central Register of Controlled Trials (CENTRAL) Available at: http://community.cochrane.org/editorial-and-publishing-policy-resource/cochrane-central-register-controlled-trials-central
George Mason University (2013a) Center for Evidence-Based Crime Policy, What Works in Policing?. Available at: http://cebcp.org/evidence-based-policing/what-works-in-policing/
George Mason University (2013b) Center for Evidence-Based Crime Policy, Inclusion Criteria & Methods Key Available at: http://cebcp.org/evidence-based-policing/the-matrix/inclusion-criteria-methods-key/
College of Policing (2015) What Works Briefings Available at: http://whatworks.college.police.uk/Research/Briefings/Pages/default.aspx
The Campbell Collaboration (2014) Campbell Collaboration Systematic Reviews: Policies and Guidelines Version 1.0
Greene, J. (2014) New Directions in Policing: Balancing Prediction and Meaning in Police Research Justice Quarterly Vol. 31, No. 2, 193-228.
Lilford, R. & Jackson, J. (1995) Equipoise and the ethics of randomisation Journal of the Royal Society of Medicine Vol. 88 October 1995 pp. 552-559.
Murphy, K., Mazerolle, L. & Bennett, S. (2014) Promoting trust in police: findings from a randomised experiemtantal field trial of procedural justice policing Policing and Society: An International Journal of Research and Policy Vol 24, Issue 4 Pp. 405-424
Hough, M. (2010) Gold standard or fool’s gold: the pursuit of certainty in experimental criminology. Criminology Criminal Justice 10(1), pp. 11-22
Anderson, C. (2008) The End of Theory: The Data Deluge Makes the Scientific Method Obsolete Wired Magazine Available at: http://archive.wired.com/science/discoveries/magazine/16-07/pb_theory
West, G. (2013) Big Data Needs a Big Theory to Go With It Scientific American, Vol. 308, Issue 5 Available at: http://www.scientificamerican.com/article/big-data-needs-big-theory/