skills/cookjohn/cnki-skills/cnki-paper-detail

cnki-paper-detail

SKILL.md

CNKI Paper Detail Extraction

Extract complete metadata from a CNKI paper detail page.

Arguments

$ARGUMENTS is optionally a CNKI paper detail URL (containing kcms2/article/abstract). If not provided, assumes the current page is already a paper detail page.

Steps

1. Navigate to the paper page (if URL provided)

If $ARGUMENTS contains a URL:

  • Use mcp__chrome-devtools__navigate_page with the URL.
  • Use mcp__chrome-devtools__wait_for with text ["摘要"] and timeout 15000.

2. Check for captcha

Use mcp__chrome-devtools__take_snapshot. If "拖动下方拼图完成验证" found, notify user:

CNKI 正在显示滑块验证码。请在 Chrome 浏览器中手动完成拼图验证,完成后告诉我继续。

3. Extract paper metadata via JavaScript

Use mcp__chrome-devtools__evaluate_script with this function:

() => {
  const brief = document.querySelector('.brief');
  if (!brief) return { error: 'Paper detail section (.brief) not found' };

  // Title
  const title = brief.querySelector('h1')?.innerText?.trim()
    ?.replace(/\s*附视频\s*$/, '')  // remove "附视频" suffix
    ?.replace(/\s*网络首发\s*$/, ''); // remove "网络首发" suffix

  // Authors - first h3.author contains author links with sup tags
  const authorH3s = brief.querySelectorAll('h3.author');
  const authorSection = authorH3s[0];
  const authors = [];
  if (authorSection) {
    const authorLinks = authorSection.querySelectorAll('a');
    authorLinks.forEach(a => {
      const name = a.innerText?.replace(/\d+$/, '').trim();
      const supMatch = a.innerText?.match(/(\d+)$/);
      const affiliationNum = supMatch ? supMatch[1] : '';
      authors.push({ name, affiliationNum });
    });
  }

  // Affiliations - second h3.author contains org links
  const affiliations = [];
  if (authorH3s.length > 1) {
    const orgLinks = authorH3s[1].querySelectorAll('a');
    orgLinks.forEach(a => {
      affiliations.push(a.innerText?.trim());
    });
  }

  // Abstract
  const abstractEl = document.querySelector('.abstract-text');
  const abstract = abstractEl?.innerText?.trim() || '';

  // Keywords
  const keywordsP = document.querySelector('p.keywords');
  const keywords = keywordsP
    ? Array.from(keywordsP.querySelectorAll('a')).map(a => a.innerText?.replace(/;$/, '').trim())
    : [];

  // Fund
  const fundsP = document.querySelector('p.funds');
  const fund = fundsP?.innerText?.trim() || '';

  // Classification code
  const clcCode = document.querySelector('.clc-code');
  const classification = clcCode?.innerText?.trim() || '';

  // Journal/source
  const docTop = document.querySelector('.doc-top');
  const journal = docTop?.querySelector('a')?.innerText?.trim() || '';

  // Online first / publication info
  const headTime = document.querySelector('.head-time');
  const pubInfo = headTime?.innerText?.trim() || '';

  // Is online first?
  const isOnlineFirst = !!brief.querySelector('.icon-shoufa');

  // Article outline/TOC
  const catalogList = document.querySelector('.catalog-list, .catalog-listDiv');
  const toc = catalogList?.innerText?.trim() || '';

  // Citation network counts
  const citationTabs = document.querySelectorAll('ul.module-tab.tpl_lieteratures li');
  const citationInfo = {};
  citationTabs.forEach(li => {
    const id = li.getAttribute('data-id');
    const text = li.innerText?.trim();
    const countMatch = text.match(/(\d+)/);
    if (id) {
      citationInfo[id] = {
        label: text.replace(/\d+/, '').trim(),
        count: countMatch ? parseInt(countMatch[1]) : 0
      };
    }
  });

  return {
    title,
    authors,
    affiliations,
    abstract,
    keywords,
    fund,
    classification,
    journal,
    pubInfo,
    isOnlineFirst,
    toc,
    citationInfo
  };
}

4. Format and present the output

## {title} {isOnlineFirst ? "[网络首发]" : ""}

**Authors:**
{For each author: "- {name} ({affiliation})"}

**Affiliations:**
{For each affiliation: "- {affiliation}"}

**Journal:** {journal}
**Publication Info:** {pubInfo}

**Abstract:**
{abstract}

**Keywords:** {keywords joined by ", "}

**Fund:** {fund}
**Classification:** {classification}

**Citation Network:**
{For each citation type: "- {label}: {count}"}

5. Fallback: snapshot-based parsing

If JS extraction fails, use mcp__chrome-devtools__take_snapshot and parse the accessibility tree:

  • Title: heading level 1 element
  • Authors: link elements whose URLs contain kcms2/author/detail
  • Affiliations: link elements whose URLs contain kcms2/organ/detail
  • Abstract: StaticText following "摘要:"
  • Keywords: link elements whose URLs contain kcms2/keyword/detail
  • Fund: link elements following "基金资助:"
  • Classification: StaticText following "分类号:"

Verified DOM Selectors

Data Selector Notes
Paper section .brief Main paper info container
Title .brief h1 May contain icons, clean text needed
Authors .brief h3.author:first-of-type a Text has superscript numbers (e.g., "张三1")
Affiliations .brief h3.author:nth-of-type(2) a Text starts with "N." (e.g., "1.北京大学")
Abstract .abstract-text Full abstract text
Keywords p.keywords a Semicolon-separated keyword links
Fund p.funds Fund information text
Classification .clc-code CLC classification codes
Journal .doc-top a Source journal link
Online first .brief .icon-shoufa Present if paper is online first
Citation tabs ul.module-tab.tpl_lieteratures li data-id attr identifies type
Weekly Installs
26
GitHub Stars
33
First Seen
13 days ago
Installed on
opencode26
gemini-cli26
github-copilot26
codex26
amp26
cline26