LangCell

Use This Skill When

Use this skill for the local LangCell project at /DATA/disk0/zhaosy/home/LangCell. It is the right choice when the task involves:

Do not use this skill for ordinary Scanpy analysis that does not depend on LangCell.

Confirm whether the user wants zero-shot annotation, few-shot annotation, or cell-encoder-only finetuning.
Check that the input is already tokenized, or route through preprocessing first.
Check whether the required external assets exist: checkpoints, tokenized dataset, ontology / text-description JSON.
Prefer the zero-shot path first if the user is exploring LangCell rather than benchmarking a supervised baseline.

Start here for most LangCell usage. The defining behavior is:

Use LangCell-annotation-zeroshot/zero-shot.ipynb as the primary reference path.

Use LangCell-annotation-fewshot/fewshot.py when only a tiny labeled support set is available and the user still wants the multimodal LangCell path.

Use LangCell-CE-annotation/finetune.py when the user wants a standard supervised classifier on top of the pretrained cell encoder.

LangCell does not take raw .h5ad directly for these downstream scripts. First:

Do not claim raw .h5ad can be passed directly into LangCell inference; tokenization is required first.
Do not assume checkpoints live in the repo. The official repo expects external downloads.
Do not treat zero-shot prediction as plain nearest-neighbor on cell embeddings; the project combines similarity and cell-text matching.
Do not assume label columns are named consistently. The repo checks several alternatives such as celltype, cell_type, str_labels, and labels.
If new cell types are needed, prepare textual descriptions carefully instead of inventing bare labels with no ontology context.