Critical Evaluation of AI Output
AI outputs look polished, but are they accurate? Build a systematic evaluation rubric and practise rating real AI-generated content for accuracy, bias, completeness, and hallucination risk.
Output Evaluation Rubric
Rate any AI output against five criteria. Use this rubric every time you receive an AI-generated response to build your critical evaluation instincts.
The rubric is a thinking tool, not a score sheet. The act of systematically evaluating each criterion trains you to spot problems that casual reading misses. Over time, this evaluation becomes instinctive.
Rate the Output Exercise
Practise your evaluation skills on four AI-generated outputs. Rate each on the five criteria, then reveal the expert assessment to compare your judgement.
Notice the patterns. Hallucinated statistics often sound impressively precise. Biased outputs use superlative language and dismiss alternatives without evidence. Incomplete outputs address the question but skip important context. Training yourself to recognise these patterns is the single most valuable AI skill you can develop.
Quick Reference: Red Flags in AI Output
Hallucination Red Flags
- Suspiciously precise statistics (e.g., "47.3% of agencies")
- Named reports or publications you cannot independently verify
- Direct quotes attributed to specific officials
- Specific dates for future events presented as fact
- URLs or hyperlinks that look plausible but may not exist
Bias Red Flags
- •Superlative language: "best", "only", "unmatched", "clear leader"
- •One option presented significantly more favourably than others
- •Cherry-picked features that favour a particular conclusion
- •Dismissal of alternatives without substantive analysis
- •Language that mirrors marketing copy rather than analysis
Completeness Red Flags
- •Key stakeholder perspectives missing from the analysis
- •Relevant policy frameworks not mentioned (ISM, PSPF, etc.)
- •Risk analysis that only covers one category of risk
- •No consideration of implementation challenges or costs
- •Conclusions that do not follow from the evidence presented
Quality Indicators
- •Claims qualified with appropriate uncertainty language
- •Multiple perspectives acknowledged and weighed
- •Recommendations tied to specific evidence or reasoning
- •Limitations of the analysis explicitly stated
- •Actionable next steps with clear owners and timelines