Designing Rubrics for Evaluation

Designing Rubrics for Evaluation

Last week, I attended an Australasian Evaluation Society (AES) workshop on “Foundations of Rubric Design”. It was a thought-provoking workshop. Kystin Martens, the presenter of the workshop, explained and challenged our understanding about rubric design as well as presented some practical tips to develop and use rubrics properly in evaluation. Here are my key takeaway points from the workshop:

Why do we use rubrics in evaluation?

A rubric is a tool or matrix or guide that outlines specific criteria and standards for judging different levels of performance. These days, more and more evaluators are using rubrics to guide their judgement on program performance. Rubrics enable evaluators to transform data from one form to another form, for example from qualitative evidence into quantitative data. Rubrics also provide an opportunity to analyse and synthesis evidence into a general evaluative judgement transparently throughout the evaluation process.

Three systematic steps to create a rubric for evaluation

There are three logical steps to develop a rubric:

  • Establish criteria: criteria are dimensions or essential elements of quality for a given type of performance, for example criteria for a good presentation including content and creativity of the presentation; coherence and organization of the materials; speaking skills and participation/interaction with audience.
  • Construct standards: standards are scaled level of performance or gradations of quality or a rating of performance, for example a scale from poor, adequate to excellent or a scale from novice, apprentice, proficient to distinguished.
  • Build descriptor for each criterion and standard: descriptor is narrative or detailed description to articulate the level of performance or what the performance at each level of standard looks like, for example a poor speaking skill describes that the presenters were often inaudible and/or hesitant and relied heavily on notes; the presentation went over the required time, and some other descriptions.

Ensuring reliability of judgement and creating gold standards in evaluation

A calibration process is required so all evaluators will assess the program performance consistently and in alignment with the scoring rubric. This process will ensure that all evaluators will produce a similar evaluation score when assessing same program performance. This is a critical process to create gold standards for assessment and increase reliability of the assessment data. For example when we evaluate a multi-country development program and deploy more than one evaluator to assess the program, we need to make sure that all evaluators agree upon the rubric and understand the performance expectations expressed in the rubric thus they are able to interpret and apply the rubric consistently.

If you have any experience using rubrics in evaluation, please share and tweet your experience and thoughts with us @ClearHorizonAU.

References

Tools for Assessment. Retrieved from: https://www.cmu.edu/teaching/assessment/examples/courselevel-bycollege/hss/tools/jeria.pdf

Rubrics: Tools for Making Learning Goals and Evaluation Criteria Explicit for Both Teachers and Learners. Retrieved from: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1618692/

Roger, Patricia. Rubrics. Retrieved from: https://www.betterevaluation.org/en/evaluation-options/rubrics

Design & Evaluation – We’re Better Together

Last month Clear Horizon and The Australian Centre for Social Innovation (TACSI) had a great time delivering our new and sold out course on reconciling the worlds of Human Centred Design (HCD) & evaluation. Hot off the press, we’re proud to introduce the Integrated Design, Evaluation and Engagement with Purpose (InDEEP) Framework (Figure 1) which underpinned the course.

(Figure 1)

The InDeep Framework

The Integrated Design, Evaluation and Engagement (InDEEP) Framework (Figure 1) has been developed through reflection on two years of collaboration between TACSI (the designers) and Clear Horizon (the evaluators). At a high level the InDeep Framework conceptualises relationships between design and potential evaluation. Simply, the top half of the diagram sets out the design cycle in five phases (discover, define, prototyping, piloting and scaling). The bottom half of the diagram lists potential evaluative inputs that can be useful at each design phase (design research, developmental testing, pilot evaluation and broader impacts).

The journey starts by setting up the design and evaluation relationship for success, by carefully thinking through governance structures, role clarity and scope. In relation to scope consideration should first be given to the both which design phase the design project is currently in, or where it is expected to be when the evaluation occurs. Once the design phases of interest have been diagnosed the different types of evaluation needed can be thought through. Its definitely a menu of types of evaluation, and if you chose them all you’d be pretty full!

If the design project is in the discover to early prototyping phases it is likely that developmental evaluation approaches will be appropriate, in these phases evaluation should support learning and the development of the ideas. If the design project has moved to the late prototyping through to broader impacts phases’ then more traditional formative and summative evaluation may be more appropriate, in these phases there is more of an accountability towards achieving outcomes and impacts.

The InDEEP framework also acknowledges that there are some evaluative tools that are useful at all phases of the design cycle. In the diagram these are described as golden threads and include both modelling (e.g. Theory of Change) and facilitating learning (e.g. having a critical friend). If needed process and/or capability evaluation can also be applied at any design phase.

Key insights from the course

Course participants came from different sectors and levels of government. They represented a cohort of people applying UCD approaches to solve complex problems (from increasing student engagement in education through to the digital transformation of government services). At some level everyone came to the course looking for tools to help them demonstrate the outcomes and impact of their UCD work. Some key reflections from participants included:

  • Theory of Change is a golden thread
  • Evaluation can add rigour and de-risk design
  • It’s important to quarantine some space to for a helicopter view (developmental evaluation)
  • Be ready to capture your outcomes and impact

Theory of Change is a golden thread

Theory of Change proves to be one of our most versatile and flexible tools in design and evaluation. In the design process it’s able to provide direction in the scoping phase (broader goals), absorb learnings and insights in the discover phase (intermediate outcomes), and test out theories of action during prototyping. When you’re ready for a more meaty evaluation, Theory of Change provides you with a solid evaluand to: refine, test and or prove in the piloting and scaling phases. At all stages it surfaces the assumptions and is a useful communication devise.

Evaluation can add rigour and de-risk design

UCD is a relatively new mechanism being applied to policy development and social programming. It can take time for UCD processes to move through the design cycle to scaling and it is also assumed that some interventions may fail; this has the potential to make some funders nervous. Participants confirmed that developmental approaches which document key learnings and pivot points in design can help to communicate to funders what has been done. More judgmental process and capability building evaluation can also assist demonstrate to funders that innovation is on track.

It’s important to quarantine some space to for a helicopter view (developmental evaluation)

The discoverdefine and early prototyping phases are the realm of designers who primarily need space to be creative and ideate (be in the washing machine). In these early phases developmental evaluation can enable pause points for the design team to, zoom out, take stock of assumptions, and make useful adjustments to the design (a helicopter view of the washing machine). Although participants found the distinction between design and developmental evaluation useful they took away the challenge that design teams did not often distinguish the roles. One solution to this challenge was to rotate the developmental evaluation role within design teams, and or, resources permitting, bring an external developmental evaluator onto the design team.

Be ready to capture outcomes and impact

As designs move into late prototypingpiloting and scaling, teams come under increased pressure to document outcomes and impact. In many instances this is in some part to show funders what has been achieved through the innovation process. The key message for participants was that it was important to plan for capturing outcomes early on. One way to be ready is to have a Theory of Change. If you are expecting to have a population level impact it may also be important to set up a baseline early on. Equally if you are chasing more intangible outcomes like policy change then you should think through some techniques like Outcomes Harvesting and SIPSI so that you are ready to systematically make a case for causation.

We would love your feedback on our InDeep Framework, to join the conversation tweet us at @ClearHorizonAU. In you are interested in learning more we are running our InDEEP training again early next year, see our public training calendar for more details.

Tom Hannon and Jess Dart

Powerful insights and stories of MSC taking place across Africa!

Since joining, Clear Horizon earlier this year, I’ve really enjoyed engaging with clients on different ways to elicit and present program outcomes.  An interesting and compelling way to collect program outcomes is through personal stories of change using the Most Significant Change (MSC) technique.

Globally, there is increasing interest and application of the MSC technique from a range of organisations and funders, who recognise the value of narrative-based outcomes. MSC is a relatively versatile technique that can be used in many different contexts and sectors including international development, health, education and agriculture.

In May this year, I was privileged to facilitate three MSC workshops in Ghana, Zambia and Kenya with 60 alumni from the Australia Awards Africa Program, which provides post-graduate scholarships to Australian universities for up to two years. During the workshops, we reflected on and analysed the stories of significant change that had come about for the alumni since completing their studies and returning to Africa. The technique was chosen as it is a participatory form of monitoring and evaluation and does not start with any pre-defined indicators of success and so allows for unexpected (and even unintended) outcomes to be expressed.

The key elements of MSC that were used during these workshops included collection, review and selection of alumni stories. In each location four stories from 20 participants were selected as those describing the most significant changes. Together we analysed the themes from the 12 selected stories and found that themes of increased confidence, critical thinking skills and new opportunities were common in the selected stories. The ability to have impact or influence within and beyond the alumni’s immediate workplace to their communities, countries and even globally were also strong themes and align well with the Australia Awards program aims. And finally documenting why a story was chosen over others (ie why it was considered to be the most significant) is a key component of the technique, which elicits the underlying values that are represented in the stories of change.

Written by Marty Pritchard

What is Collaborative Outcomes Reporting?

Collaborative Outcomes Reporting (COR) is a participatory approach to impact evaluation. It centres on a performance story that presents evidence of how a program has contributed to outcomes and impacts. This performance story is then reviewed by both technical experts and program stakeholders, which may include community members.

Developed by Jess Dart of Clear Horizon, COR combines contribution analysis and Multiple Lines and Levels of Evidence (MLLE), mapping existing and additional data against the program logic to produce a performance story.  Performance story reports are essentially a short report about how a program contributed to outcomes. Although they may vary in content and format, most are short, describe the program context and aims, relate to a plausible results chain, and are backed by empirical evidence. The aim is to tell the ‘story’ of a program’s performance using multiple-lines of evidence.

COR adds processes of review, such as an expert panel or a summit process where stakeholders in the intervention, for example, community members, check for the credibility of the evidence about what impacts have occurred and the extent to which these can be credibly attributed to the intervention. It is these components of expert panel review (outcomes panel) and a collaborative approach to developing outcomes (through summit workshops) that differentiate COR from other approaches to outcome and impact evaluation.

Find out more about COR in a short paper by Dr Jess Dart here.

Can MSC play a role in program design?

In my most recent blog I explored a light bulb moment around Developmental Evaluation. Since then I have been musing about the role of the Most Significant Change Technique (MSC) in design and Developmental Evaluation. Although MSC certainly wouldn’t be the only tool you would use; I think MSC could be an exciting part of a Developmental Evaluator’s tool kit.

Developmental Evaluation is appropriate to use in innovative, complex and adaptive environments, it allows evaluation to occur even when the end point (or the path to get there) isn’t known. Developmental Evaluation sees the evaluator working collaboratively with the social entrepreneur (or other program designers) in the design phase of a new program. It is a way to get rapid feedback on the design and approach, and on how the program can be improved.

MSC can be an insightful tool for capturing emergent or unknown outcomes and helps us to make sense of impact and causality. A strength of MSC is that it enables the perspective of the user to be known and understood. Three ways we think MSC could assist to provide user feedback as part of a developmental evaluation include:

  • Using MSC for historical analysis
  • Using MSC to test social innovations
  • Using MSC to envisage alternative solutions

MSC for historical analysis

Using MSC in planning helps examine the historical context and how program participants have experienced and valued past interventions. In the past we have used MSC to inform the development of a community action strategy. The MSC question was broad. For example: “from your point of view, what is the most significant change that has resulted from any intervention in this community?” After the collection of MSC stories by trained volunteers, they were selected in large group settings. The outputs from this process were used to inform the situation analysis in a similar way to the technique Appreciative Inquiry. One difference between this usage of MSC in planning and Appreciative Inquiry, is that MSC stories do not necessarily seek only positive stories; they can also reveal the most significant negative changes. This information can be an insightful input into the design of a new program.

MSC for testing social innovations

When piloting a new social innovation, MSC can be used to help users to articulate the impact of the pilot on their lives and their communities. In this way it can be used to rapidly test possible innovations that are being piloted, particularly the immediate outcomes. The key here is allowing the users to interpret the benefits and negative impacts of the innovation in their words.

MSC for envisaging solutions

MSC has also been used in a future-orientated manner to help develop future vision and goals. In this context instead of collecting stories about the past, participants are invited to write a story about their ‘desired future’. The process follows six steps:

  • Set a future point in time – for example 5 years.
  • In a group setting, brainstorm a range of possible future scenarios that might arise from this program if it is successful (or unsuccessful).
  • Individually or in sub-groups, choose one scenario from this list that represents what you would most like to see happen.
  • Flesh this scenario out into a story – with a beginning, middle and end – as if it had already happened. For example, describe the changes that happen in a participant’s life, and what difference it made to them.
  • End with why you chose that particular scenario to write a story about.
  • You can then share stories and select the most significant one, but in so doing also develop a set of future outcomes you wish to see, and a set of values.

This technique is akin to scenario planning – or visioning. This is an accessible way to develop a future vision that is grounded in how people may see a new program.

MSC is a versatile M&E tool, which as well as being used to uncover the impacts of current or completed programs, can also be used to evaluate programs in the design phase – either by drawing out historical lessons, providing feedback on pilot interventions or through envisaging the desired future.

Clear Horizon is running public training on MSC in Melbourne next week (May 26th & 27th) and again in Perth on August 25th. For further information and to register head to the Clear Horizon Website.

Are you excited about using MSC as a tool to inform program design? To join the conversation please tweet your thoughts and tag us (@ClearHorizonAU).

Top 5 tips for evaluation capacity building in your organisation

In this post our Senior Consultant and evaluation theory whiz, Caitlin Barry, shares her research into making evaluation capacity building work.

Over the past decade there has been a growing emphasis worldwide on evaluation capacity building (ECB). The increasing need for organisations to develop their evaluation capacity is being driven by the need to demonstrate accountability and improve program performance, among other factors.

I have an enormous appreciation of internal staff required to build the monitoring and evaluation skills of their organisation. Having held a similar role in a large and busy government agency, I know all too well what a challenging task this can be. I investigated the literature on building evaluation capacity in an organisation, and here are my top 5 tips to consider:

  • Have a clear purpose for undertaking ECB in your organisation
  • Familiarise yourself with the various ECB frameworks available
  • Take the ECB readiness test
  • Don’t assume everyone needs to be trained to the same level
  • Evaluate your ECB efforts

1. Have a clear purpose for undertaking ECB in your organisation

The literature reinforces the need for organisations to be clear on what they want to achieve through ECB. Preskill and Boyle (2008) strongly recommend examining and communicating an organisation’s motivations, assumptions and expectations of any ECB efforts. As ECB methods can vary widely depending on purpose, this crucial step informs the ECB design. Examples of different ECB purposes include:

  • to build an evaluation culture and practice within a broader learning organisation
  • to increase the use of evaluation results by staff
  • for internal staff to be able to commission and manage high-quality evaluations
  • for internal staff to be able to conduct high-quality program evaluations themselves
  • a combination of any, or all of the above

2. Familiarise yourself with the various ECB frameworks available

There is no single agreed definition of ECB. Some definitions focus on the individual’s “ability to conduct an effective evaluation” (Milstein and Cotton, 2000), while other definitions are broader, encompassing the capacity to not only “do” evaluation, but also to “use” evaluation results at an organisational level.
Just as there is no single agreed definition of ECB, there is no one agreed framework to guide how ECB should be designed and implemented. However, methods usually involve either internal evaluation units or external evaluation contractors providing evaluation expertise, training and support to staff within an organisation. I found a really great place to start for an organisation-wide ECB framework is the Multidisciplinary Model of Evaluation Capacity Building (Preskill and Boyle, 2008), which focuses on an organisation’s capacity to sustain and embed evaluation practices. In a nutshell, the Multidisciplinary Model is designed on the premise that an organisation’s ability to embed evaluation is inextricably linked to the organisation’s culture and approach to organisational learning.

3. Take the ECB readiness test

Organisation-level ECB involves issues of individual learning and organisational change. As such, many of the ECB frameworks place enormous emphasis on the presence of organisational factors (such as leadership, learning culture, communication systems and structures) for ECB efforts to be sustainable. Taylor-Ritzler et al. (2013) demonstrated that even where staff build their evaluation knowledge and skills, they are less likely to use or sustain these skills if their organisation does not provide the leadership, support, resources and necessary learning climate.

To determine whether your organisation has the organisational learning conditions necessary to support and sustain ECB, Preskill and Boyle (2008) suggest using the “Readiness for Organisational Learning and Evaluation (ROLE)” tool, developed by Preskill and Torres (2000). However, the literature is divided on who is best placed to assist organisations to address a lack of pre-requisite factors. Preskill (2004; 2008) promotes the role of the evaluator in facilitating the development of an organisation’s culture of learning, believing that a “transfer of learning” can start by introducing evaluation to an organisation and communicating the results. Alternatively, Williams (2001) argues that addressing organisational learning capacity, leadership and culture often requires specialist skills that evaluation experts don’t necessarily possess. Williams (2001) and (Stevenson et al. 2002) resolve that evaluators should collaborate with organisational development experts, rather than expecting evaluators to address these issues.

4. Don’t assume everyone has to be trained to the same level

Since program staff are often busy it is important to ask whether is feasible to that all staff be trained to a level where they are able to undertake rigorous evaluations (Wehipeihana, 2010). In addition, another issue is the rapid turnover of staff in large government organisations and the continual need to provide ECB training to new starters. Stevenson et al. (2002) reported a lack of organisational stability as the greatest barrier to building ECB. They found that fifty percent of the staff they had worked with in providing ECB in the first year had left by the third year (Stevenson et al., 2002).

There is increasing recognition that senior leaders play a central role in sustaining a culture of evaluation and learning in organisations (Preskill, 2014; Cousins et al., 2014b; Labin et al., 2012). Several propose it is beneficial to first focus ECB training at the management level rather than program staff (Cousins et al, 2014b; Preskill, 2014). While senior leaders will likely require higher levels of specific evaluation skills and knowledge, training for program staff should focus on foundational knowledge such as understanding the benefits of evaluation and the use of evaluation findings (Preskill, 2014).

Rather than placing too much emphasis on individual skills, Cousins et al. (2014b) state that learning organisations emphasise the development of general behaviours in staff such as critical thinking, communication and collective problem solving (and this might require organisational development expertise).

5. Evaluate your ECB efforts

All too often, ECB efforts aren’t evaluated – so organisations remain unsure of whether the purpose and expected outcomes of ECB efforts were achieved. Preskill and Boyle strongly recommend examining and communicating the motivations, assumptions and expectations of any ECB effort. They caution that the absence of agreed assumptions and expectations by key leaders can undermine the success and effectiveness of any ECB efforts. Their Multidisciplinary Model also lists potential ECB objectives, against which Preskill and Boyle recommend evaluating ECB efforts to measure progress and impact. The 36 potential ECB objectives are divided into three themes:

  • improving the beliefs that staff have about evaluation
  • increasing staff’s knowledge and understanding about evaluation
  • staff developing a set of evaluation-related skills

Photo credit: Clear Horizon, Building Program Logic training 2015

Do any of these tips resonate with any ECB efforts undertaken in your organisation? To join the conversation please tweet your thoughts and tag us (@ClearHorizonAU).

References

Cousins, J.B., Goh, S.C., Elliot, C., Aubry, T., and Gilbert, N. (2014b). Government and Voluntary Sector Differences in Organizational Capacity to Do and Use Evaluation. Evaluation and Program Planning, 44, 1-13.
Labin, S.N., Duffy, J.L., Meyers, D.C., Wandersman, A. and Lesesne, C.A. (2012). A Research Synthesis of the Evaluation Capacity Building Literature, American Journal of Evaluation, 33(3), 307-338.
Milstein, B., & Cotton, D. (2000). Defining concepts for the presidential stand on building evaluation capacity. Paper presented at the 2000 meeting of the American Evaluation Association, Honolulu, Hawaii.
Preskill, H. (1994). Evaluation’s Role in Enhancing Organizational Learning – A Model for Practice, Evaluation and Program Planning, 17(3), 291-279.
Preskill, H. (2008). Evaluation’s Second Act – A Spotlight on Learning, American Journal of Evaluation, 29(2), 127-138.
Preskill, H. (2014). Now for the Hard Stuff: Next Steps in ECB Research and Practice, American Journal of Evaluation, 35(1), 116-119.
Preskill, H., and Boyle, S. (2008). A Multidisciplinary Model of Evaluation Capacity Building. American Journal of Evaluation, 29(4), 443-459.
Preskill, H. & Torres, R.T. (2000). Readiness for Organizational Learning and Evaluation instrument.
Stevenson, J.F., Florin, P., Scott Mills, D., and Andrade, M. (2002). Building Evaluation Capacity in Human Service Organizations: a case study. Evaluation and Program Planning, 25, 233-243.
Taylor-Ritzler, T., Suarez-Balcazar, Y., Garcia-Iriarte, E., Henry, D.B. and Balcazar, F.E. (2013). Understanding and Measuring Evaluation Capacity: A Model and Instrument Validation Study, American Journal of Evaluation, 34(2), 190-206.
Wehipeihana, N. (2010). How much is enough evaluation capacity building in communities and not-for-profit organizations? Sourced from: http://genuineevaluation.com/how-much-is-enough-evaluation-capacity-building/
Williams, B. (2001). Evaluation Capability. Sourced from: http://www.bobwilliams.co.nz/Works_in_Progress_files/capability%233.pdf