assessment-validity-checker

Installation

SKILL.md

Assessment Validity Checker

What This Skill Does

Evaluates a proposed assessment against three dimensions: validity (does it measure what it claims to measure?), reliability (would different markers agree on the score?), and authenticity (is the task meaningful and does it require genuine demonstration of the intended learning?). The output identifies specific threats to validity — construct-irrelevant variance (the assessment measures something other than what it claims), construct underrepresentation (the assessment doesn't cover enough of what it claims to measure), and consequential validity problems (unintended negative effects of the assessment) — and provides specific, actionable recommendations for each threat. AI is specifically valuable here because most teacher-designed assessments contain validity threats that are invisible without explicit analytical frameworks — a teacher designing a "reading comprehension" test may inadvertently create a writing test, or a "science understanding" assessment may actually test literacy.

Evidence Foundation

Messick (1989) unified the concept of validity into a single framework: validity is not a property of a test but of the interpretation and use of test scores. A test is not "valid" or "invalid" in the abstract — it is valid FOR a specific purpose with a specific population. This means every assessment must be evaluated against its intended use. Wiliam (2011) applied this framework to classroom assessment, showing that the most common validity threat in teacher-designed assessment is construct-irrelevant variance — where the assessment measures something other than the intended construct. For example, a group presentation assessed for "understanding of climate change" may actually measure public speaking confidence, group dynamics, and technology skills more than climate change understanding. Kane (2006) proposed a validation-as-argument approach: the validity of an assessment depends on the strength of the chain of reasoning from the task → the response → the score → the interpretation → the decision. Any weak link in this chain is a validity threat. Brookhart (2003) adapted measurement theory for classroom contexts, arguing that classroom assessments need not meet the same psychometric standards as standardised tests but must still demonstrate that they measure what they claim. Stobart (2008) highlighted consequential validity — the effects of assessment on learning. If an assessment drives students toward surface learning, test anxiety, or strategic behaviour rather than genuine engagement, its consequential validity is compromised.

Input Schema

The teacher must provide:

Assessment description: What students do and how it's marked. e.g. "Students write a 500-word essay on the causes of WW1, marked against a rubric with four criteria: historical knowledge, analytical argument, use of evidence, and written communication" / "Students complete a 30-question multiple choice test on photosynthesis" / "Students create a poster about healthy eating and present it to the class"
Intended learning: What the assessment claims to measure. e.g. "Understanding of the causes of WW1 and ability to construct a historical argument" / "Knowledge and understanding of photosynthesis" / "Understanding of nutrition and healthy eating"
Student level: Year group. e.g. "Year 10"

Optional (injected by context engine if available):

Subject area: The curriculum subject

Related skills

More from garethmanning/claude-education-skills

Installs

Repository

garethmanning/c…n-skills

GitHub Stars

216

First Seen

Apr 2, 2026

Security Audits

Gen Agent Trust HubPass

SocketPass

SnykPass

assessment-validity-checker

Assessment Validity Checker

What This Skill Does

Evidence Foundation

Input Schema

More from garethmanning/claude-education-skills

intelligent-tutoring-dialogue-designer

scaffolded-task-modifier

experiential-learning-cycle-designer

gap-analysis-from-student-work

backwards-design-unit-planner

dual-coding-designer