Home > Reviews & Submissions > Program Review Criteria

Program Review Criteria

The term “effective program” can be ambiguous. A program may be very effective in addressing one problem or outcome (such as reducing alcohol use) but not another (such as decreasing anxiety). To help practitioners find programs that address problems relevant to their work, NREPP rates programs on an outcome level. This way, we can better determine what a program effectively addresses.

NREPP certified reviewers use the NREPP Outcome Rating Instrument to review each eligible outcome in the evidence base of evaluation studies for a program. Program outcomes are reviewed on four dimensions.

These four dimensions include the following:

1. Rigor. Rigor assesses the strength of the study methodology. It is composed of the following elements:

Design/Assignment. Reviewers look at how the treatment and comparison groups were formed. The best designs try to ensure the two groups are the same or equivalent at the beginning of the intervention. If the two groups are different, an evaluation cannot tell us if changes over time are due to the program’s impact or due to preexisting differences between the two groups.
Intent-to-Treat – Original Group Assignment (ITT-OGA). For this element, reviewers examine the degree to which the analysis preserves assignment of individuals/clusters to their original groups, regardless of the intervention they actually received or of how much of the intervention was received. For instance, even though some of the treatment group may fail to start the intervention, they still need to be included as part of the treatment group.
Statistical Precision. This element requires reviewers to think about the size of study groups. With larger sample sizes, there is a greater chance the evaluation will be able to detect the impact of a program.
Pretest Equivalence. This element requires reviewers to use information from a study to determine whether there were significant differences between study groups on observed variables at pretest.
Pretest Adjustment. When groups are very similar, but not equivalent, researchers need to take preexisting differences into account in their analysis of the data by adjusting for pretest scores. This approach helps us determine if any changes in outcomes are due to the program or due to preexisting differences in the groups.
Analysis Method. Researchers have many ways to analyze data, so it is important for them to select the method for analysis that best suits the data.
Other Threats to Internal Validity. For this element, reviewers consider whether factors other than the program’s impact could account for changes in outcomes. For instance, if one group experiences an event unrelated to the intervention that the other group does not experience, the changes in outcomes for that group may be related to the program OR to the event. We just cannot tell.
Measurement Reliability. This element assesses the consistency of a measure—for example, whether the same results are observed under similar conditions. (This element is combined with validity to produce a score for measurement quality.)
Measurement Validity. This element assesses whether the measurement appears to describe or quantify what it says it will measure or what the researchers say they want to measure. (This element is combined with reliability to produce a score for measurement quality.)
Attrition. This element considers how many participants drop out of one or both groups, with regard to how many are included in the analysis. When participants drop out of a study, it can affect the equivalence of the groups.

2. Effect Size. An effect size is a way to measure whether a program had an impact, how big that impact was, and whether it helped or hurt the treatment group. Effect sizes are calculated when evaluation studies provide the data needed to do so.

3. Program Fidelity. Reviewers examine the evaluation studies to determine if the program was delivered as intended and to the target population. This dimension is composed of the following elements:

Service utilization. This element examines the extent to which the intended population received the intended services. In other words, did the program reach the appropriate target population?
Service Delivery. This element assesses whether the participants received the proper amount, type, and/or quality of service or treatment. In other words, it is a measure of the degree to which the core program services or components are implemented as depicted in the conceptual framework.

4. Conceptual Framework. This dimension is concerned with how clearly the components of a program are articulated. It is composed of the following elements:

Program Goals. This element assesses the degree to which the program goals are clearly defined. A program goal describes the change the program aims to accomplish. Moreover, to be meaningful, the program goals must be realistically attainable as the result of the program services. In other words, there should be a reasonable connection between what the program does and what it intends to accomplish.
Program Components. This element assesses the degree to which the program activities or components are sufficient to attain the intended goal. Programs are generally composed of many different components or activities, but core components refer to the essential functions and activities that are judged necessary to produce the desired outcomes and directly related to a program’s theory of change.
Theory of Change. This element rates the plausibility of the program impact theory. It should explain why the program effects change. The presumption that a program will create some benefit for the intended target population depends on the occurrence of some causal mechanism that begins with the target's interaction with the program and ends with improved circumstances in the target population. For instance, a program may provide psychotherapy, which improves coping skills, which in turn decreases stress and anxiety.

By rating the elements within these dimensions, a determination is made as to the strength of the methodology (rigor), program fidelity, effect size, and conceptual framework. Numerical values are assigned to each element in the NREPP Outcome Rating Instrument (with the exception of effect size). To ensure consistency across reviews, the dimensions include definitions and other guidance reviewers consider when rating the elements. Reviewers also make note of any other information that should be highlighted as being of particular importance. See Review Process for more details on how these dimensions are used to form the outcome rating.

For more information on the terms used in the dimensions of the NREPP Outcome Rating Instrument, see the Glossary

For information about the ratings previously used for assessing programs, click here