phenosnap-phenotype-extractor
When to use
Use this skill when the user provides their own:
- clinical phenotypes / symptoms / diagnoses (free text, bullet lists, clinical note-like text), and/or
- drugs/medications (names, dosages, frequencies).
Examples that should trigger:
- “Symptoms: ataxia, seizures, developmental delay. Meds: levetiracetam 500 mg BID.”
- “I’m taking metformin 500mg daily and have fatigue, polyuria, blurry vision.”
When NOT to use
Do not use this skill when:
- The user asks general questions (e.g., “What is HPO?”, “What is a phenotype?”, “What is GLP-1?”).
- The user provides text that is not personal clinical information (news articles, academic paragraphs, code, etc.).
- The user asks you to interpret someone else’s private clinical record (PHI) without clear permission.
Safety & privacy
- Treat user input as potentially sensitive clinical information.
- Do not upload user text or extracted results anywhere (this skill is local-only).
- Before writing any input to disk, redact obvious identifiers:
- emails, phone numbers, street addresses
- MRN-like long numeric identifiers (e.g., 8+ digits)
- names when clearly presented as “Name: …”
- If the message appears to include highly identifying PHI (e.g., name + DOB + address, or name + MRN), pause and ask for confirmation to proceed, recommending the user remove identifiers first.
Requirements / setup
- Python:
python3available on PATH. - Network access (only for initial PhenoSnap download if missing).
- HPO OBO file:
- Default expected path:
{baseDir}/resources/hp.obo - Override path via environment variable:
HPO_OBO_PATH - This skill does not auto-download
hp.obo. You must supply it.
- Default expected path:
Recommended (best practice):
- Use a virtual environment (venv/conda) before running this skill, because it may install Python packages via
pip.
Inputs & outputs
- Input text file (redacted):
{baseDir}/artifacts/phenosnap_inputs/input_<YYYYMMDD_HHMMSS>.txt
- Output JSON file (timestamped):
{baseDir}/artifacts/phenosnap_outputs/phenotypes_<YYYYMMDD_HHMMSS>.json
- Third-party download cache:
{baseDir}/third_party/phenosnap_main.zip{baseDir}/third_party/get-pip.py
Detection heuristic (activation check)
Trigger if the user message contains any of:
- phenotype cues:
symptom(s),phenotype(s),Dx,diagnosis,PMH,Hx,history of, or a symptom-like list - medication cues:
meds,medications,taking,prescribed, plus patterns like:- dosages:
\b\d+(\.\d+)?\s?(mg|mcg|g|ml|units)\b - frequencies:
qd,q.d.,bid,b.i.d.,tid,t.i.d.,qhs,qAM,qPM,daily,weekly
- dosages:
Do not trigger for purely informational questions without user-provided phenotype/medication content.
Procedure
0) Create required directories
Create if missing:
{baseDir}/PhenoSnap/{baseDir}/artifacts/phenosnap_inputs/{baseDir}/artifacts/phenosnap_outputs/{baseDir}/resources/{baseDir}/third_party/
1) Confirm / redact sensitive identifiers
- Scan the user message for identifiers (email/phone/address/MRN/name fields).
- If strongly identifying PHI is present (name + DOB/address/MRN):
- Ask the user to confirm proceeding and recommend removing identifiers.
- Produce a redacted version of the user text:
- Replace emails with
[REDACTED_EMAIL] - Replace phone numbers with
[REDACTED_PHONE] - Replace long numeric IDs with
[REDACTED_ID] - Replace address-like patterns with
[REDACTED_ADDRESS] - Replace explicit “Name: …” fields with
Name: [REDACTED_NAME]
- Replace emails with
2) Ensure PhenoSnap exists locally
Target location: {baseDir}/PhenoSnap/
2A) If {baseDir}/PhenoSnap/ exists and contains extract_phenotypes.py
Proceed to dependency self-test.
2B) If {baseDir}/PhenoSnap/ does NOT exist (or missing extract_phenotypes.py)
Prefer git; fallback to zip.
If git is available
Run:
git clone https://github.com/WGLab/PhenoSnap.git "{baseDir}/PhenoSnap"
If git is NOT available
Download zip:
- URL:
https://github.com/WGLab/PhenoSnap/archive/refs/heads/main.zip - Destination:
{baseDir}/third_party/phenosnap_main.zip
Download method (pick first available):
- If
curlexists:curl -L "https://github.com/WGLab/PhenoSnap/archive/refs/heads/main.zip" -o "{baseDir}/third_party/phenosnap_main.zip"
- Else on Windows PowerShell:
Invoke-WebRequest -Uri "https://github.com/WGLab/PhenoSnap/archive/refs/heads/main.zip" -OutFile "{baseDir}/third_party/phenosnap_main.zip"
Unzip (choose by OS/tools):
- Windows PowerShell:
Expand-Archive -Path "{baseDir}/third_party/phenosnap_main.zip" -DestinationPath "{baseDir}/third_party/phenosnap_unzip" -Force
- macOS/Linux with unzip:
unzip -o "{baseDir}/third_party/phenosnap_main.zip" -d "{baseDir}/third_party/phenosnap_unzip"
If neither unzip nor Expand-Archive is available, use Python:
python3 -c "import zipfile; z=zipfile.ZipFile(r'{baseDir}/third_party/phenosnap_main.zip'); z.extractall(r'{baseDir}/third_party/phenosnap_unzip')"
Then rename/move the extracted folder to {baseDir}/PhenoSnap/:
- The extracted folder is typically
{baseDir}/third_party/phenosnap_unzip/PhenoSnap-main - Move/rename to
{baseDir}/PhenoSnap/
Final check:
- Verify
{baseDir}/PhenoSnap/extract_phenotypes.pyexists. - If not found, stop and report the directory listing of
{baseDir}/PhenoSnap/and{baseDir}/third_party/phenosnap_unzip/.
3) Dependency self-test and auto-install
Run from inside {baseDir}/PhenoSnap/.
3A) Smoke test importability
Run:
python3 -c "import importlib.util; spec=importlib.util.spec_from_file_location('extract_phenotypes','extract_phenotypes.py'); m=importlib.util.module_from_spec(spec); spec.loader.exec_module(m); print('ok')"
If it prints ok, proceed.
3B) If smoke test fails with missing module (ModuleNotFoundError / ImportError)
Step 1: Ensure pip exists for python3
Check:
python3 -m pip --version
If that fails, try:
python3 -m ensurepip --upgrade
Check again:
python3 -m pip --version
If still failing, bootstrap pip via get-pip.py:
- Download
https://bootstrap.pypa.io/get-pip.pyto{baseDir}/third_party/get-pip.py- with curl:
curl -L "https://bootstrap.pypa.io/get-pip.py" -o "{baseDir}/third_party/get-pip.py"
- or PowerShell:
Invoke-WebRequest -Uri "https://bootstrap.pypa.io/get-pip.py" -OutFile "{baseDir}/third_party/get-pip.py"
- with curl:
- Install:
python3 "{baseDir}/third_party/get-pip.py"
- Verify:
python3 -m pip --version
If pip still cannot be used, stop and report the error output.
Step 2: Install PhenoSnap dependencies
From {baseDir}/PhenoSnap/, run:
python3 -m pip install -r requirements.txt
Step 3: Re-run smoke test once
Re-run:
python3 -c "import importlib.util; spec=importlib.util.spec_from_file_location('extract_phenotypes','extract_phenotypes.py'); m=importlib.util.module_from_spec(spec); spec.loader.exec_module(m); print('ok')"
If still failing, stop and return:
- the missing module name (from error)
- the command output
- recommended fix (use a venv/conda env; verify python3/pip; rerun pip install)
4) Prepare HPO OBO path
Resolve HPO OBO path in this order:
- If env var
HPO_OBO_PATHis set and file exists, use that. - Else use
{baseDir}/resources/hp.oboif it exists.
If the resolved file does not exist:
- Stop and tell the user to place
hp.oboat{baseDir}/resources/hp.oboor setHPO_OBO_PATHto its full path.
5) Write input file (redacted)
- Timestamp format:
YYYYMMDD_HHMMSS(local time). - Write redacted user text to:
{baseDir}/artifacts/phenosnap_inputs/input_<TS>.txt
6) Run extraction
From {baseDir}/PhenoSnap/, run:
python3 extract_phenotypes.py --input-file "{baseDir}/artifacts/phenosnap_inputs/input_<TS>.txt" --hpo-obo "<HPO_OBO_PATH>" --output "{baseDir}/artifacts/phenosnap_outputs/phenotypes_<TS>.json" --format json
Validate:
- Output file exists
- Output file is non-empty
If validation fails:
- Return stderr/stdout
- Provide troubleshooting steps (missing hp.obo, permission issues, dependency issues)
7) Respond to user
Return a concise confirmation:
- Detected content: “phenotypes” and/or “medications”
- Input file path (redacted)
- Output file path (timestamped JSON)
- Note: no data uploaded; local-only
- Any warnings (e.g., missing hp.obo, PHI redaction/confirmation)
Troubleshooting
- PhenoSnap folder exists but script missing: confirm
{baseDir}/PhenoSnap/extract_phenotypes.pyexists. - No git: zip fallback should run; ensure curl/PowerShell is available for download.
- Unzip fails: use Python zipfile fallback.
- pip missing: ensurepip then get-pip.py steps above; consider installing Python with “pip” included.
- Permission errors installing packages: use a virtual environment:
python3 -m venv .venvthen activate and rerun skill.
- hp.obo missing: place file at
{baseDir}/resources/hp.oboor setHPO_OBO_PATH.
Examples
Example 1 (phenotypes + meds)
User:
- “Symptoms: developmental delay, seizures, ataxia. Meds: valproate 250 mg BID.”
Action:
- Write redacted input → run PhenoSnap → output
phenotypes_<TS>.json
Example 2 (meds only)
User:
- “Current meds: metformin 500mg daily, atorvastatin 20 mg qhs.”
Action:
- Extract medication entities/phenotype-related terms supported by PhenoSnap → output JSON
Example 3 (should NOT trigger)
User:
- “What is the Human Phenotype Ontology and how is it used?”
Action:
- Do not run extraction; answer informationally outside this skill.