sd-download

SKILL.md

ScienceDirect PDF Download

Download PDF files from ScienceDirect articles to the user's local disk.

Prerequisites

  • The user must have access to the article (institutional subscription, open access, or personal subscription).
  • If the article is behind a paywall and the user has no access, the download will fail.

Single Article Download

Step 1: Navigate to article page and extract PDF link

If already on the article page, skip navigation. Otherwise use navigate_page with initScript:

navigate_page({
  url: "{BASE_URL}/science/article/pii/{PII}",
  initScript: "Object.defineProperty(navigator, 'webdriver', {get: () => undefined})"
})

Then extract the PDF URL with evaluate_script:

async () => {
  // Wait for page to load
  for (let i = 0; i < 20; i++) {
    if (document.querySelector('a[href*="pdfft"]') || document.querySelector('.access-options')) break;
    await new Promise(r => setTimeout(r, 500));
  }

  const pdfLink = document.querySelector('a[href*="pdfft"][class*="accessbar"]')
               || document.querySelector('a[href*="pdfft"]');
  if (pdfLink) {
    return { pdfUrl: pdfLink.href };
  }

  const noAccess = document.querySelector('.access-options, .get-access, [class*="GetAccess"]');
  if (noAccess) {
    return { error: 'No access. User needs institutional or subscriber access.' };
  }
  return { error: 'PDF link not found on this page.' };
}

Step 2: Navigate to PDF URL

Open the PDF URL with initScript:

navigate_page({
  url: "{pdfUrl}",
  initScript: "Object.defineProperty(navigator, 'webdriver', {get: () => undefined})"
})

Step 3: Handle Cloudflare verification

After navigation, wait 5s then check the page state with evaluate_script:

async () => {
  await new Promise(r => setTimeout(r, 5000));
  return {
    contentType: document.contentType,
    title: document.title,
    url: window.location.href.substring(0, 80)
  };
}

Three possible outcomes:

A) contentType === 'application/pdf' → PDF loaded directly. Skip to Step 4.

B) Title is "Security verification" or body contains "Are you a robot" → Turnstile checkbox appeared. Auto-click:

  1. Use take_snapshot to get the a11y tree. Find the checkbox inside the Cloudflare iframe:
    Iframe "包含 Cloudflare 安全质询的小组件"
      checkbox "确认您是真人"   ← target uid
    
  2. Use click(uid) on the checkbox element. (Chrome DevTools MCP works at CDP protocol level and can interact with cross-origin iframes.)
  3. Wait 5s, then re-check document.contentType:
    • If application/pdf → success, proceed to Step 4.
    • If still on verification → retry click once. After 2 failed attempts, tell user: "请在浏览器中完成验证后告知我。"

C) Title is "请稍候…" → JS challenge still running. Wait another 5s and re-check. Usually resolves to A or B.

Step 4: Trigger download to local disk

Once contentType === 'application/pdf', the PDF is displayed in browser but NOT saved to disk yet. Trigger the actual download with evaluate_script:

(pii) => {
  const a = document.createElement('a');
  a.href = window.location.href;
  a.download = pii + '-main.pdf';
  document.body.appendChild(a);
  a.click();
  document.body.removeChild(a);
  return { downloaded: true, filename: pii + '-main.pdf' };
}

Pass the PII as an argument so the filename is meaningful (e.g. S0022169426003136-main.pdf).

Tell the user the PDF has been downloaded and the filename.

Batch Download from Search Results

Note: Batch download requires authentication. Checkboxes and download buttons are only visible to logged-in users.

Step 1: Select articles

On the search results page, select articles using checkboxes. Each checkbox id is the PII.

Step 2: Submit download form

Use evaluate_script:

(piis) => {
  piis.forEach(pii => {
    const cb = document.getElementById(pii);
    if (cb && !cb.checked) cb.click();
  });

  const downloadBtn = document.querySelector('.download-all-link-button');
  if (downloadBtn) {
    downloadBtn.click();
    return { success: true, message: 'Download initiated for selected articles.' };
  }

  const form = document.querySelector('form[action*="pdf/download"]');
  if (!form) return { error: 'Download form not found. User may need to log in.' };
  return { error: 'Download button not found.' };
}

PDF URL Pattern

{BASE_URL}/science/article/pii/{PII}/pdfft?md5={HASH}&pid=1-s2.0-{PII}-main.pdf

The md5 hash is unique per article and session. It cannot be constructed — it must be extracted from the page.

Notes

  • This skill is set to disable-model-invocation: true — it must be explicitly invoked with /sd-download.
  • Open access articles can be downloaded without authentication.
  • For subscription articles, the user must be authenticated (institutional access, VPN, or personal subscription).
  • After solving Cloudflare once, subsequent PDF downloads in the same session work without interruption.
  • If download fails, suggest the user check their access status or try logging in.
Weekly Installs
6
GitHub Stars
12
First Seen
11 days ago
Installed on
gemini-cli6
github-copilot6
codex6
kimi-cli6
cursor6
amp6