DEVELOPING AND PRESENTING PERFORMANCE MEASURES FOR RESEARCH PROGRAMS The implementation of the Government Performance and Results Act (GPRA) of 1993 will provide many challenges to managers across the Federal Government. Adapting GPRA requirements to the Federal research environment will be especially challenging.1 Applied constructively, GPRA can have a positive effect on the quality and innovativeness of our scientific endeavors. Conversely, serious detrimental effects can occur if it is applied incorrectly. Federal researchers and managers representing a crosssection of departments and agencies have met over the last six months in a roundtable forum to discuss the unique circumstances surrounding the development of performance measures for research programs. The following observations and model for research performance measures are the result of these discussions. Purpose: This paper articulates an approach that Government research organizations can use in applying the principles of GPRA to a wide range of federally supported research activities. Background: In 1993, Congress passed and the President signed into law P.L. 10362, the Government Performance and Results Act (GPRA). The intent of the statute was to increase the effectiveness, efficiency, and accountability of the Federal Government. GPRA requires each agency to: * develop a fiveyear strategic plan; * establish national goals; * identify outcome and output performance measures, with an emphasis on outcome measures; * develop performance indicators; * beginning with the FY 1999 budget submission, include a oneyear performance plan that defines what progress will be made in that fiscal year toward achieving the goals in the strategic plan; and * each year, beginning at the end of FY 1999, prepare a report that assesses actual performance against planned performance. Observations from the forum: 1. The results of research program performance can be measured. The indicators that can be used will vary between basic and applied research programs. 2. The Federal research community recognizes the importance and desirability of measuring performance and results and reporting them to the executive and legislative branches of Government and to the public. Such measures are also useful in the internal management of these programs. 3. Measures can be developed proactively by research organizations in consultation with their customers and stakeholders. Careful identification of the full range of customers, stakeholders and partners will aid the selection of appropriate performance measures. 4. It is appropriate and in the public interest that the Federal research community define how their achievements will be measured and begin using the agreed-upon measures as soon as possible. 5. The cause-effect relationships between research outputs and their eventual outcomes are complex. Often, it is extremely difficult to quantify these relationships empirically even though obvious logical relationships exist between the outputs and outcomes. The difficulties arise from (1) the long time delays that often occur between research results and their eventual impacts, (2) the fact that a specific outcome is usually the result of many factors, not just a particular research program or project, and (3) the fact that a single research output often has several outcomes, often unforeseen, not a single unique outcome (see Attachment 1 for examples). Consequently, the causeeffect relationship between research outputs and their resultant outcomes should be described in terms of logical causality. Quantitative empirical demonstrations should not be required and are, often, not even possible. 6. As envisioned in GPRA, strategic planning is a prerequisite to performance measure development. Performance measures should be derived directly from a research program's goals and objectives. They should measure the extent to which specific goals and/or objectives are being accomplished. 7. Performance measures should have value to the program measured. In fact, measurements currently made for internal program management will frequently provide key data for performance measures suitable for GPRA. A Performance Measurement Model for Research: The following model, initially formulated by the Army Research Laboratory and expanded by the governmentwide research roundtable, describes an approach that addresses GPRA requirements and improves the management of performance. It presents a method of evaluation that is both equitable and informative about research and development programs. It applies to all types of research, which can be arrayed on a continuum extending from the most basic research through specific applied research. Depending on where a program falls on this continuum, certain types of evaluation methods may be more pertinent than others. As research moves toward the applied end of the continuum, more specific outcome measures can be identified. No single measure can be used to assess the success of research. 1. Research can be evaluated using the following matrix. It arrays dimensions of performance (relevance, productivity, quality) by assessment methods (peer review, metrics, customer evaluation): TABLE -----------------Dimensions of Performance--------------- Relevance Productivity Quality Assessment Methods Peer Review XX XX XX Metrics XX XX XX Customer Evaluation XX XX XX XX to be entered as: ++ = Very Useful + = Somewhat Useful o = Less Useful. Because of unique circumstances and interpretation, each agency may fill in the table and apply the model differently. One of the assessment methods (peer review, metrics, customer evaluation) may be more or less valuable depending on the nature of the work being done. 2. Definitions: Relevance: The degree to which the program (or project) adds value and is responsive, timely, and pertinent to customers' needs. Productivity: The degree to which work yields useful results. Quality: The degree to which work is considered to be scientifically excellent. Peer Review: There are three types of peer review that address different aspects of performance. Prospective peer review generally addresses the relevance of proposed research and can be used to ensure the relevance of the research to the agency mission. Prospective review can also be an indicator of the quality of the research hypothesis, especially in the context of the competition for awards. Inprocess peer review examines ongoing research. It can serve as a quality check and a relevance check of projects and programs while they are underway. It has particular usefulness for assessment of the scientific quality and performance of intramural or Federal laboratory research that may not have undergone peer review for project selection. Retrospective peer review generally addresses the scientific quality of research that has been conducted. Metrics: Standards of measurement that rely on counts of discrete entities to infer levels of accomplishment, e.g. improved health status, increased production, bibliometrics (publications and references), or degrees awarded. Customer Evaluation: Customers are any individuals who directly or indirectly use the products of research. Customer evaluation is the opinion of one or more customers about either (1) the extent to which a research program directly benefits the customer or (2) the extent to which the research is perceived as beneficial to the public. 3. The degree of usefulness of the information that each of the three assessment methods provides with respect to the dimensions of relevance, productivity and quality depends on the particular research work being conducted. For example, in basic research, there may not be a specific customer identified since the purpose of much of this work is to add to the body of knowledge in science. In that case, customer evaluation would be very difficult to obtain. In applied research, there is more likelihood to be a specific customer, so the information about relevance and productivity is more useful. The table needs to be filled in (++, +, o) for each particular research program being evaluated. Attached are some examples of the usefulness of different types of measures for specific basic and applied research programs. 4. The assessment methods in the model peer review, metrics, and customer evaluation will often be used together in a performance evaluation process. 5. Research outcomes are often not quantifiable. Therefore, research measures should always be accompanied by narrative in order to provide full opportunity for explanation, presentation of anecdotal evidence of success and discussion of the nature of nonmetric peer review and customer evaluation measures. 6. Dogmatic tracking of metrics should be avoided when experience determines they are not useful. Although it is important to be consistent in the types of metrics and goals that are tracked, flexibility in dropping or adding metrics could prove to be very beneficial to arrive at the most useful set of metrics. 7. Aggregation of measures of individual research projects to the level of an overall program can be accomplished if consistent peer review and customer evaluation protocols, as well as metrics, are used across projects and time. The amount of aggregation needed depends on the audience for the measure; i.e. high-level, external reporting demands greater aggregation, while internal program management needs little if any aggregation. In any case, the amount of aggregation should relate to how one describes progress toward achieving goals. 8. The information from this model should be reported and be understandable to lay as well as scientific audiences. Conclusion: The introduction and application of meaningful and accurate performance measures into Federal agency research programs represents both a significant opportunity and a challenge. Performance measures can become a powerful tool to assist in the management of these programs and to help meet the objectives of GPRA. Accordingly, the research roundtable offers its full support to the development of such measures for use in reporting to the executive and legislative branches of government and to the public, as well as for internal management. At the same time, it is important to recognize the complexity of the cause-effect relationship between research outputs and their eventual outcomes. These complexities make it difficult to establish quantifiable measures that consistently measure program performance, and they create a potential for incorrect application with a subsequent detrimental effect on the quality and innovativeness of our scientific endeavors. As a starting point for developing performance measures, the research roundtable offers the model outlined in this paper. This model identifies dimensions of measurement and methods for obtaining the necessary inputs. It stresses the value of both quantifiable data and narrative statements. The roundtable participants recognize the recommended approach as an evolving process for measuring performance that takes advantage of experimentation and innovation, encourages sharing of successful efforts, and allows mistakes to be made and new directions taken. PARTICIPANT LIST HHS NIH/NIA USDA DOE DOT AHCPR NIH/NIAAA ARS NRC DOEd FDA/CBER NIH/NIAMS CSREES DOI NIST FDA/CDER NIH/NIDCD ERS IO LOC FDA/CDRH NIH/NCI FS NBS NSA FDA/CFSAN NIH/NHLBI NASS NOAA Other: FDA/CVM NIH/OD Army USGS FedFocus FDA/NCTR NIH/OSPE ARL USBM MSI FDA/OPE OASH COE NASA NAPA FDA/ORA OS EPA DOJ Attachment 1 EXAMPLES OF UNFORESEEN RESEARCH OUTCOMES 1. Research done on rat brain tumors was later shown to have an important role in human breast cancer. In fact, of the several genes now known to be involved in human breast cancer, all but one have been identified while working on something other than breast cancer. 2. Fundamental agricultural research on Agrobacterium, a common soil bacterium that causes crown gall disease in plants, led to the discovery that the tumorlike growth occurs because the bacterium transfers some of its genetic material (DNA) to the host plant. This discovery led to a new genetic tool that was instrumental in making bioengineering of improved crop plants possible. 3. AIDS research also contributes to knowledge in other fields including virology, immunology, microbiology, and molecular biology. Research has led to a better understanding of the immune system, new approaches to vaccine development, novel diagnostic techniques and new methods for evaluating drug treatments. 4. The basic research conducted prior to the AIDS epidemic allowed researchers to more quickly establish the link between the human immunodeficiency virus and AIDS, develop a blood test for the virus and develop treatments, such as AZT, for those suffering from the disease. 5. The MichelsonMorley experiments on the speed of light in different directions provide a spectacular example of extremely important unforeseen outcomes, leading as they did to Einstein's formulation of the theory of relativity. 6. The technology developed for recycling cobalt from scrap jet engines using double membrane cells for electrorefining is now to be used to upgrade the national cobalt stockpile, saving taxpayers millions of dollars. 7. The 1960's breakthrough of deciphering the genetic code has led to the identification of genes linked to illnesses such as breast and colon cancer, Huntington's and Alzheimer's disease, and the inception of gene therapy treatments. 1 The roundtable discussed issues relating to both basic and applied research. It also concluded that most, if not all, of its findings apply to classic "development" activities in the research an development environment. But, since development activity was not fully explored with respect to performance measurement, the paper addresses research alone