assessment-validity-checker
Assessment Validity Checker
What This Skill Does
Evaluates a proposed assessment against three dimensions: validity (does it measure what it claims to measure?), reliability (would different markers agree on the score?), and authenticity (is the task meaningful and does it require genuine demonstration of the intended learning?). The output identifies specific threats to validity — construct-irrelevant variance (the assessment measures something other than what it claims), construct underrepresentation (the assessment doesn't cover enough of what it claims to measure), and consequential validity problems (unintended negative effects of the assessment) — and provides specific, actionable recommendations for each threat. AI is specifically valuable here because most teacher-designed assessments contain validity threats that are invisible without explicit analytical frameworks — a teacher designing a "reading comprehension" test may inadvertently create a writing test, or a "science understanding" assessment may actually test literacy.
Evidence Foundation
Messick (1989) unified the concept of validity into a single framework: validity is not a property of a test but of the interpretation and use of test scores. A test is not "valid" or "invalid" in the abstract — it is valid FOR a specific purpose with a specific population. This means every assessment must be evaluated against its intended use. Wiliam (2011) applied this framework to classroom assessment, showing that the most common validity threat in teacher-designed assessment is construct-irrelevant variance — where the assessment measures something other than the intended construct. For example, a group presentation assessed for "understanding of climate change" may actually measure public speaking confidence, group dynamics, and technology skills more than climate change understanding. Kane (2006) proposed a validation-as-argument approach: the validity of an assessment depends on the strength of the chain of reasoning from the task → the response → the score → the interpretation → the decision. Any weak link in this chain is a validity threat. Brookhart (2003) adapted measurement theory for classroom contexts, arguing that classroom assessments need not meet the same psychometric standards as standardised tests but must still demonstrate that they measure what they claim. Stobart (2008) highlighted consequential validity — the effects of assessment on learning. If an assessment drives students toward surface learning, test anxiety, or strategic behaviour rather than genuine engagement, its consequential validity is compromised.
Input Schema
The teacher must provide:
- Assessment description: What students do and how it's marked. e.g. "Students write a 500-word essay on the causes of WW1, marked against a rubric with four criteria: historical knowledge, analytical argument, use of evidence, and written communication" / "Students complete a 30-question multiple choice test on photosynthesis" / "Students create a poster about healthy eating and present it to the class"
- Intended learning: What the assessment claims to measure. e.g. "Understanding of the causes of WW1 and ability to construct a historical argument" / "Knowledge and understanding of photosynthesis" / "Understanding of nutrition and healthy eating"
- Student level: Year group. e.g. "Year 10"
Optional (injected by context engine if available):
- Subject area: The curriculum subject
More from garethmanning/claude-education-skills
intelligent-tutoring-dialogue-designer
Script a multi-turn tutoring dialogue with branching responses for anticipated student difficulties. Use when designing AI tutors, chatbot interactions, or structured one-to-one support scripts.
15scaffolded-task-modifier
Modify a classroom task with language scaffolds that preserve cognitive demand for EAL learners. Use when adapting existing tasks for students at different English proficiency levels.
14experiential-learning-cycle-designer
Structure a direct experience into a full learning cycle with concrete experience, reflection, and conceptual transfer. Use when planning field trips, simulations, or practical tasks.
14gap-analysis-from-student-work
Analyse student work against criteria to identify specific gaps between current performance and learning objectives. Use when reviewing submissions, planning feedback, or diagnosing learning needs.
13backwards-design-unit-planner
Plan a unit using backwards design from desired outcomes through assessment evidence to learning activities. Use when starting a new unit or redesigning an existing one from standards.
13dual-coding-designer
Design a visual complement to verbal content using dual coding principles for stronger encoding. Use when creating slides, diagrams, posters, or visual explanations of complex concepts.
12