nature-citation

Installation
SKILL.md

Nature Citation

Use this skill to turn manuscript text into a defensible citation export:

  • segmented text with citation candidates for each segment
  • a reference-manager import file in .enw, .ris, or Zotero .rdf
  • conservative evidence notes explaining whether each candidate truly supports the segment

Chinese-user operating mode

When the user writes in Chinese, asks for "Nature系列", "CNS及其子刊", "支撑文献", "补引用", "自动给出引用", "分段引用", "导出EndNote", "RIS", "Zotero", "RDF", or provides Chinese manuscript text:

  • Accept the text in Chinese, but search using English concept queries unless the topic is explicitly China-specific or Chinese-language scholarship.
  • Return segment notes and evidence notes in Chinese by default.
  • Preserve the exact source segment and translate it into one or more English search claims.
  • Flag overclaiming clearly in Chinese: 强支撑, 部分支撑, 背景支撑, 不建议引用为该句支撑.
  • Do not present a paper as supporting the claim merely because its title is related.

Default scope

Interpret journal scope from the user's wording, but keep the filter strict:

  • Nature系列: search Nature Portfolio first. Include Nature, Nature [field], Nature Communications, Communications [field], Scientific Reports, and npj journals.
  • CNS: search Cell, Nature, and Science plus their major sister journals.
  • CNS及其子刊 or CNS/sister journals: search only accepted flagship and subjournal titles in Nature Portfolio, the AAAS Science family, and Cell Press.
  • 只要Nature/Science/Cell正刊: restrict to the flagship journals Nature, Science, and Cell.

Do not treat merely related journals as in-scope. A title is valid only if it is in the accepted publisher-family whitelist or clearly matches the official naming pattern for that family. If the user needs an exhaustive or submission-critical boundary, verify current official journal pages before finalizing because journal portfolios change.

Source hierarchy

Use sources in this order:

  1. Structured bibliographic metadata: Crossref, PubMed/NCBI E-utilities, DOI metadata.
  2. Publisher pages: nature.com, science.org, cell.com, and official journal pages.
  3. Full text or abstract pages, if accessible.
  4. Secondary databases such as Google Scholar, Semantic Scholar, Web of Science, or Scopus only as discovery aids, not as the sole support basis.

Prefer structured APIs for metadata and publisher pages for claim verification. If metadata and publisher page disagree, preserve the DOI and journal-page facts and flag the discrepancy.

Workflow

1. Segment the text

For each input text:

  • Split long text into citable segments. Prefer paragraph boundaries first, then sentence boundaries.
  • Keep each segment focused on one citable idea when possible.
  • Preserve original order and stable segment IDs such as S001, S002, S003.
  • Skip obvious non-citable connective sentences unless the user asks to cite every sentence.
  • For very long text, process in batches but keep a single final mapping table.

Default segmentation rules:

  • Use blank lines as paragraph boundaries.
  • If a paragraph is longer than about 700 characters or contains multiple claims, split into sentences.
  • Merge very short fragments into neighboring text unless they contain a distinct claim.
  • Keep section headings as labels, not as citable segments.

2. Parse each segment

For each citable segment:

  • Extract the core claim in one sentence.
  • Identify claim type: mechanism, association, method, clinical, epidemiology, background, definition, or review-context.
  • Identify entities, intervention/exposure, outcome, population/model, directionality, and boundary.
  • Convert the claim into 2-4 English search queries:
    • one precise query with all key terms
    • one synonym query
    • one broader background query
    • one methods or model query if relevant

If the claim is too broad, split it into citable subclaims rather than searching the whole sentence.

3. Search candidate papers

Start with scripts/nature_citation.py when internet access is available:

python scripts/nature_citation.py \
  --text "PASTE MANUSCRIPT TEXT HERE" \
  --scope cns \
  --outdir /tmp/nature-citation \
  --format enw \
  --with-artifacts

Useful options:

  • --text-file manuscript.txt: read long text from a file.
  • --claim "CLAIM TEXT" or --claim-file claims.txt: treat each claim as a segment.
  • --doi 10.xxxx/xxxxx or --doi-file dois.txt: export known DOI records after screening.
  • --scope nature: Nature Portfolio-style journals only.
  • --scope flagship: Nature, Science, and Cell only.
  • --from-year 2018 --to-year 2026: constrain publication dates.
  • --rows 40: raise for broad searches; keep top candidates manageable.
  • --per-segment 3: number of citation candidates to keep per segment.
  • --format enw|ris|zotero-rdf: export format. If omitted and --output-file is set, infer from suffix.
  • --mailto you@example.com: use Crossref's polite pool.

When the topic is biomedical or PubMed-indexed, also search PubMed with journal filters and compare results against Crossref. Use NCBI E-utilities rate limits and include tool/email parameters if running repeated searches.

4. Evaluate whether each paper supports the segment

Use a conservative support scale:

  • strong support: the paper directly tests the same relationship/mechanism/method and the result supports the segment.
  • partial support: the paper supports part of the segment, a related model, or a narrower condition.
  • background support: the paper supports field context, not the specific claim.
  • contradictory/limiting: the paper conflicts with or narrows the claim.
  • metadata-only candidate: title/metadata suggest relevance, but abstract/full text has not been checked.

Never cite a metadata-only candidate as support without checking the abstract or publisher page. If a paper is a review, label it as review/context and avoid using it as primary evidence for an experimental claim when primary articles are available.

5. Export reference-manager file

Default behavior:

  • write one reference-manager file
  • ALWAYS also generate review artifacts (HTML/TSV/JSON/report) — use --with-artifacts
  • support publication time filters with --from-year and --to-year

Default file:

  • references.enw: EndNote tagged export

Optional:

  • references.ris: if the user requests RIS instead of ENW
  • references.rdf: if the user requests Zotero RDF
  • review artifacts only when explicitly requested

If the user asks to choose the download format, treat ENW, RIS, and Zotero RDF as the supported options and return only one export file unless they explicitly ask for multiple formats.

Do not invent missing fields. If DOI, pages, volume, or issue are missing, leave them absent rather than fabricating them.

6. Optional review artifacts

Generate review artifacts (HTML/TSV/JSON/report) for every run — they are the primary way the user browses, filters, and selects candidates:

  • ALWAYS use --with-artifacts when running the script. The HTML browser is the most useful output for the user to inspect and curate citations.
  • Always report the HTML visualization path prominently in your final answer (section 7).
  • Generate TSV/JSON/report alongside the HTML so the user has multiple views.

7. Report results

Unless the user asks for a different format, return:

交互式引用浏览器
- [absolute path to citation_visualization.html]  ← 在浏览器中打开此文件,可筛选/选择/下载引用

检索范围
- [Nature Portfolio / Science family / Cell Press / flagship only, plus date limits]

分段引用对应关系
S001: [source segment]
  - [Author, year, title, journal, DOI]
  - 支撑等级: [strong/partial/background/limiting/metadata-only]
  - 插入建议: [e.g. after sentence / after clause]

导出文件
- [absolute path to references.enw / references.ris / references.rdf]

风险和缺口
- [missing full-text check, contradictory evidence, no direct CNS literature, etc.]

Put the HTML browser path FIRST in the report, above everything else, so the user can immediately open and browse candidates. If no suitable CNS/Nature-series paper exists, say so plainly and suggest the best nearby options from non-CNS literature only if the user wants broader coverage.

Search quality rules

  • Prefer precision over volume. A useful answer is usually 3-8 candidates, not 50 loosely related papers.
  • Use exact phrase searches only for distinctive terms; otherwise use concept terms and synonyms.
  • Check journal identity. Many journals contain the word "nature" but are not Nature Portfolio journals.
  • Treat citation count as a tie-breaker, not evidence of support.
  • Capture retractions, corrections, and expressions of concern when visible in Crossref or publisher metadata.
  • Date-sensitive topics require current searching and explicit search date.
  • For medical, clinical, or safety claims, search current literature and state that citations do not replace clinical guidance or systematic review.

Related files

File Open when
references/search-strategy.md You need help translating a manuscript claim into search queries and support grades
references/journal-scope.md You need the default Nature/CNS journal-family boundary and official source notes
references/ris-endnote.md You need RIS, EndNote, or Zotero RDF export guidance
scripts/nature_citation.py You need to segment text, search Crossref, export ENW/RIS/RDF, and generate HTML

Source notes

This skill is based on public bibliographic APIs and official publisher/import documentation: Crossref REST API and filters, NCBI E-utilities, EndNote RIS import options, Nature Portfolio, AAAS Science journals, and Cell Press portfolio descriptions. Verify pages at use time when exact journal coverage or current import behavior matters.

Related skills
Installs
47
GitHub Stars
2.2K
First Seen
2 days ago