questionnaire-reading
Questionnaire Reading Skill
This skill converts a raw questionnaire design document into a structured
questionnaire-design.md file. The output becomes the authoritative reference
for survey structure, routing logic, and metadata — enabling an AI agent to
understand the data before writing any survy code.
1. Input Formats
Accept any of the following:
| Format | How to read |
|---|---|
.docx |
Use python-docx (pip install python-docx) to extract paragraphs and tables |
.xlsx / .xls |
Use openpyxl (pip install openpyxl) or polars.read_excel() to iterate rows |
.pdf |
Use pdfplumber (pip install pdfplumber) to extract text page by page |
.txt / .md |
Read directly — plain text, no library needed |
Always inspect the raw content first before deciding on a parsing strategy. Questionnaire documents vary widely; read several rows/paragraphs to detect the layout pattern before extracting.
2. Concepts to Extract
For each question, extract:
| Field | Description |
|---|---|
| Question ID | Short code used in data (e.g. Q1, S2, D3). If not present, assign sequentially. |
| Label / Text | The full question wording shown to respondents. |
| Type | Single (one answer), Multi (multiple answers), Open (free text), Number, Grid. |
| Options | Answer choices with their codes/numbers. Capture exactly as designed. |
| Logic | Routing instruction — who sees this question. See Section 3. |
| Terminate | Flag if any option ends the interview (-> Terminate respondent). |
3. Logic / Routing Rules
Questionnaire logic is the most important thing to capture accurately. Common patterns and how to write them in the output:
| Design wording | Output phrasing |
|---|---|
| "Ask all" / "All respondents" | Logic: All respondents |
| "Ask if Q2 = Yes" / "If Q2 = 1" | Logic: Ask if Q2 == 1 (Yes) |
| "Ask if Q3 = 1 or 2" | Logic: Ask if Q3 == 1 (Cat) OR Q3 == 2 (Dog) |
| "Skip to Q5 if Q4 = No" | Logic: Ask if Q4 != 2 (No) |
| "Ask for all non-terminated" | Logic: Ask for all respondents (who are not terminated) |
| "Ask if Q1 is answered" | Logic: Ask if Q1 is not empty |
| Grid sub-questions | Logic: Same as parent grid question |
When logic is ambiguous, capture the raw wording in a Note: line beneath.
4. Output Format
Always write the output as a .md file named questionnaire-design.md (or
the name the user specifies). Use this exact structure:
# QUESTIONNAIRE DESIGN
---
Question: Q1
Type: Single
Label / Text: Please indicate your gender
Options:
1. Male
2. Female
Logic: All respondents
---
Question: Q2
Type: Single
Label / Text: Do you nurture a pet?
Options:
1. Yes
2. No
Logic: All respondents
---
Question: Q3
Type: Multi
Label / Text: Which type of pet do you nurture?
Options:
1. Cat
2. Dog
3. Other -> Terminate respondent
Logic: Ask if Q2 == 1 (Yes)
---
Question: Q4
Type: Single
Label / Text: Which brand do you choose for pet food?
Options:
1. Brand 1
2. Brand 2
3. Brand 3
Logic: Ask for all respondents (who are not terminated)
Rules for the output:
- One
---separator before every question block (including the first). Question:— use the ID exactly as it appears in the data file (or the questionnaire code).Type:— one ofSingle,Multi,Open,Number,Grid.Label / Text:— full question wording; do NOT truncate.Options:— list every answer choice with its numeric code.- Append
-> Terminate respondentfor terminating options. - Append
-> Skip to Q{n}for options that jump forward.
- Append
Logic:— concise routing rule using the patterns in Section 3.- Add
Note:only when the original wording is ambiguous or non-standard. - Omit empty fields (e.g.
Options:block for Open/Number questions unless codes exist).
5. Step-by-Step Process
- Receive the file path from the user.
- Read the raw content using the appropriate library (see Section 1).
- Identify the layout: Is it a table? Numbered list? Free-form paragraphs?
- Extract each question in order, filling in every field from Section 2.
- Resolve logic: Map routing instructions to the standard phrasing from Section 3.
- Write
questionnaire-design.mdto the same directory as the input file (or to a path the user specifies). - Print a short summary: total questions found, any questions where logic
was unclear (flagged with
Note:).
6. Parsing Tips by Format
Word (.docx)
- Questions are usually in tables (one row per question) or numbered paragraphs.
- Use
doc.tablesfirst; if empty, fall back todoc.paragraphs. - Bold text often marks question IDs or labels.
- Italics or parenthetical text often marks routing instructions.
Excel (.xlsx)
- Look for a header row containing keywords like "Question", "Code", "Label", "Logic", "Routing".
- Each subsequent row is typically one question or one answer option.
- If rows alternate between question and option levels, detect the pattern by column indentation or a "Type" column value.
PDF (.pdf)
- Extract text page by page with
pdfplumber. - Questionnaires in PDF often use numbering (Q1, Q2, …) as anchors — split on these.
- Tables in PDF may be detected via
page.extract_tables(). - Watch for headers/footers repeating on every page — strip them.
Mixed / Unknown
- If the format is unclear, print the first 20 lines/rows and ask the user to confirm the layout before proceeding.
7. Example Script
See scripts/parse_questionnaire.py for a ready-to-run parser that handles
.docx, .xlsx, .pdf, and .txt inputs and writes questionnaire-design.md.
8. Integration with survy
Once questionnaire-design.md exists, an AI agent can:
- Read the
QuestionIDs and map them tosurvey["Q1"]variable IDs. - Use
Label / Textto populatev.labelviasurvey.update(...). - Use
Optionsto build correctvalue_indicesdicts. - Use
Logicto understand which respondents answered each question, and applysurvey.filter(...)correctly in analysis. - Use
-> Terminate respondentflags to identify and exclude screened-out rows.
9. Reference Files
scripts/parse_questionnaire.py— ready-to-run parser for docx/xlsx/pdf/txtassets/sample_questionnaire.docx— example Word questionnaire designassets/sample_questionnaire_design.md— expected output for the sample