Professionals in learning and development, performance consulting, quality improvement, leadership, management and coaching often try to measure the impact of what they do to develop or improve performance in their organizations. Organizations decide on key performance indicators (KPIs), typically holding individuals and teams accountable for achieving goals measured with those indicators. Everyone wants good measurement, but it is frankly quite rare in many organizations.
In the learning and development space, there has long been the levels of evaluation model, originated by Dr. Donald Kirkpatrick, whereby measurement is said to be possible in four levels: Reaction, Learning, Behavior, and Results. Others have suggested adding a fifth level of measurement, Return on Investment (ROI), relevant to efforts designed to develop or improve performance.
In the world of business strategy, the Balanced Scorecard model, from business strategy thought leaders Kaplan and Norton, offers a framework for what to measure. The underlying notion is that one cannot ultimately define or measure business strategy with an exclusive focus on financial results. Instead, they argue that we should use a balanced scorecard of measures that include multiple perspectives, specifically Financial, Customer, Internal Process, and Learning and Growth. Including this broader range of measures enables one to look more holistically at the performance of an organization and its people, and can offer important insights leading to better decisions.
Looking a little more deeply, we see that these frameworks do not define specifically what to measure or how. They provide a conceptual framework, or heuristic, that one can use in the process of selecting specific measures. And, honestly, people often get it wrong and either do not measure in ways that support good decisions, or become complicated with too much data but few informative insights.
Some commonly used measures do not measure what they claim to measure, or are open to wide interpretation. So-called smile sheets, by which participants in programs or experiences rate their satisfaction on a Likert scale (what Kirkpatrick would call a reaction measure), are not truly measurement that would be accepted by natural scientists. As one of my late mentors, Eric Haughton, often said, rating scales are refined opinion. Moreover, it is widely known that how much people enjoy or appreciate a learning experience does not predict whether or not they have learned anything or will perform well. At a minimum, if you are going to use rating scales, we suggest displaying how many people gave each level of rating, as in the customer reviews on Amazon's web site. That gives us something to count and analyze, rather than adding up the rating values and dividing by the number of ratings to obtain a meaningless number (We call it voodoo math because rating levels are categories, not numbers that can be meaningfully added, subtracted, multiplied or divided.)
Percent correct is perhaps the most damaging of all in education and training. Just because one can be accurate, at the 100% correct level, does not predict whether the person can recall what they learned, apply it, or work efficiently in distracting environments. (There is plenty of related research on the web site www.Fluency.org.) A competent adult can, for example write answers to simple addition problems at 100-150 digits per minute, while a typical second grader might perform accurately at 20 or 30 digits per minute, too slowly to be useful in mental math or “story problems.” The time dimension makes all the difference when we measure performance, and percent correct is "blind" to time or to actual count, once we calculate the percentage.
In many organizations, so-called KPIs (key performance indicators) are constructed with formulas that some employees may not understand, yet they are held accountable for improving those measures. Often, KPIs are not counts of things, but formulas of some kind.
At The Performance Thinking Network, we stay with measures that would be accepted in the natural sciences, such as physics, chemistry, biology, or B.F. Skinner’s experimental analysis of behavior, from which our work evolved. That means we prefer to count things over time, two standard and objective dimensions of measurement that you will find in any natural science. We can count and time production of work outputs, instances of behavior, or units of business result measures.
As a framework for measurement, we use the Performance Chain. That is, we can measure organization-level business results, work outputs that meet criteria for “good,” and behavior.
Organization-level Business Results: We measure the business results that owners and investors use to assess the health of the organization as a whole. This is important for anyone on a project or monitoring an initiative or training program that is expected to contribute to the organization’s success. However, most organization-level results are lagging indicators. In other words, we get infrequent data points over time (monthly or quarterly in many cases). That means we do not gather enough data points to make reliable decisions very often based on these data. We should still measure business results, when we can. But recognize that we typically need at least 5 to 7 data points to be able to identify trends or understand how much variability there is in a given measure.
Work Outputs (accomplishments): These are the countable products of individuals, teams, or processes. They are the valuable contributions of human performance that help to achieve business results. Often, they are permanent products (e.g., widgets, successful proposals, good treatment session notes, etc.), thus a bit easier to count than behavior. And even when they are less tangible (e.g., decisions, relationships, people who can demonstrate the ability to do X), because they are countable, and usually happen with relatively high frequency, we can get more frequent data points (e.g., hourly, daily, weekly). Thus, we can make decisions using these data more frequently. In other words, counts of work outputs can be leading indicators.
Behavior: We can use checklists to monitor the occurrence of different forms of behavior (e.g., as we listen to recordings of customer service representatives, or observe safety practices in dangerous environments). We can also count behavior (e.g., number of times per day a manager provides positive feedback, the number of phone calls a sales person makes per week). While measures of behavior can be very useful for feedback, and for diagnosing why individuals or teams are not producing work outputs as expected, it can be relatively expensive and time-consuming to monitor behavior. Thus, behavioral measures can be helpful leading indicators, if the behavior happens fairly often. And measuring behavior can often help to improve performance. However, a better choice for leading indicators, if one does not need to measure behavior, is to count work outputs that do and do not meet criteria.
When we advise participants in our certification programs about how to measure impact and make data-based decisions, we generally suggest that they first analyze the performance of interest into its components: work outputs, the behavior for producing them, and the business results to which the work outputs are expected to contribute. We then suggest they create a short list of measures, guided by the performance chain, that are easiest and least costly to obtain, most indicative of successful performance, and that we can use to make frequent decisions for continuous improvement. After trying out a set of measures, we can sometimes calibrate or adjust what we measure and how often, to provide a good foundation for evaluation and decision-making.
- Carl Binder, CEO