skills/cookjohn/wos-skills/wos-parse-results

wos-parse-results

SKILL.md

WoS Parse Results

Internal skill for extracting structured data from WoS. Two modes: API response parsing (preferred) or DOM scraping (fallback).

Mode A: API Response Parsing (preferred)

When using the runQuerySearch API, the response is NDJSON. Parse records from the records payload:

// Parse NDJSON response text
const lines = text.trim().split('\n').map(l => { try { return JSON.parse(l); } catch(e) { return null; } }).filter(Boolean);
const searchInfo = lines.find(l => l.key === 'searchInfo')?.payload;
const recordsData = lines.find(l => l.key === 'records')?.payload;

const records = Object.entries(recordsData).map(([idx, rec]) => ({
  idx: parseInt(idx),
  wosId: rec.colluid,
  title: rec.titles?.item?.en?.[0]?.title || '',
  authors: rec.names?.author?.en?.filter(Boolean).map(a => a.wos_standard).join('; ') || '',
  source: rec.titles?.source?.en?.[0]?.title || '',
  year: rec.pub_info?.pubyear || '',
  vol: rec.pub_info?.vol || '',
  issue: rec.pub_info?.issue || '',
  pages: rec.pub_info?.page_no || '',
  doi: rec.doi || '',
  citations: rec.citation_related?.counts?.WOSCC || 0,
  citationsAll: rec.citation_related?.counts?.ALLDB || 0,
  refCount: rec.ref_count || 0,
  abstract: rec.abstract?.basic?.en?.abstract?.replace(/<[^>]*>/g, '') || '',
  docType: rec.doctypes?.[0] || '',
  oa: rec.oa || false
}));

Mode B: DOM Scraping (fallback)

When the browser is on a results page and API is not available:

async () => {
  for (let i = 0; i < 20; i++) {
    if (document.querySelector('app-record')) break;
    await new Promise(r => setTimeout(r, 500));
  }

  const records = [...document.querySelectorAll('app-record')].map((rec, idx) => {
    const titleEl = rec.querySelector('a[data-ta="summary-record-title-link"]');
    const title = titleEl?.textContent?.trim() || '';
    const href = titleEl?.href || '';
    const wosId = href.match(/WOS:\w+/)?.[0] || '';
    const authorEls = rec.querySelectorAll('a[data-ta*="DisplayName-author"]');
    const authors = [...authorEls].map(a => a.textContent?.trim()).join('; ');
    const sourceEl = rec.querySelector('a[data-ta="jcr-link-menu"]');
    const source = sourceEl?.textContent?.trim()?.replace('arrow_drop_down', '') || '';
    const citedEl = rec.querySelector('a[data-ta="stat-number-citation-related-count"]');
    const citations = citedEl?.textContent?.trim() || '0';
    return { idx: idx + 1, title, wosId, authors, source, citations };
  });

  return { status: 'ok', recordCount: records.length, records };
}

Notes

  • Mode A is always preferred — structured JSON, no selector fragility
  • Mode B is fallback when already on a results page without API access
  • DOM selectors may miss records due to lazy loading or dynamic rendering
Weekly Installs
6
GitHub Stars
9
First Seen
9 days ago
Installed on
github-copilot6
codex6
kimi-cli6
gemini-cli6
amp6
cline6