3.5 Module 3 · Thinking With AI

Critical Evaluation of AI Output

AI outputs look polished, but are they accurate? Build a systematic evaluation rubric and practise rating real AI-generated content for accuracy, bias, completeness, and hallucination risk.

Output Evaluation Rubric Rate the Output Exercise

Output Evaluation Rubric

Rate any AI output against five criteria. Use this rubric every time you receive an AI-generated response to build your critical evaluation instincts.

Overall Score 0 / 25
Rate all criteria to see your confidence band

The rubric is a thinking tool, not a score sheet. The act of systematically evaluating each criterion trains you to spot problems that casual reading misses. Over time, this evaluation becomes instinctive.

Rate the Output Exercise

Practise your evaluation skills on four AI-generated outputs. Rate each on the five criteria, then reveal the expert assessment to compare your judgement.

Notice the patterns. Hallucinated statistics often sound impressively precise. Biased outputs use superlative language and dismiss alternatives without evidence. Incomplete outputs address the question but skip important context. Training yourself to recognise these patterns is the single most valuable AI skill you can develop.

Quick Reference: Red Flags in AI Output

Hallucination Red Flags

  • Suspiciously precise statistics (e.g., "47.3% of agencies")
  • Named reports or publications you cannot independently verify
  • Direct quotes attributed to specific officials
  • Specific dates for future events presented as fact
  • URLs or hyperlinks that look plausible but may not exist

Bias Red Flags

  • Superlative language: "best", "only", "unmatched", "clear leader"
  • One option presented significantly more favourably than others
  • Cherry-picked features that favour a particular conclusion
  • Dismissal of alternatives without substantive analysis
  • Language that mirrors marketing copy rather than analysis

Completeness Red Flags

  • Key stakeholder perspectives missing from the analysis
  • Relevant policy frameworks not mentioned (ISM, PSPF, etc.)
  • Risk analysis that only covers one category of risk
  • No consideration of implementation challenges or costs
  • Conclusions that do not follow from the evidence presented

Quality Indicators

  • Claims qualified with appropriate uncertainty language
  • Multiple perspectives acknowledged and weighed
  • Recommendations tied to specific evidence or reasoning
  • Limitations of the analysis explicitly stated
  • Actionable next steps with clear owners and timelines