The Agent Skills Directory

Indirect Prompt Injection Surface: The skill demonstrates how to ingest data from external sources such as web pages (via WebBaseLoader) and PDF files (via PyPDFLoader). When this untrusted content is retrieved and inserted directly into the LLM's system prompt or used as a tool for an agent, there is a possibility that the external content could contain instructions designed to influence the agent's behavior.
Ingestion points: WebBaseLoader, PyPDFLoader, and DirectoryLoader in SKILL.md.
Boundary markers: The prompt template uses basic delimiters (Use this context:\n\n{context}), but does not explicitly instruct the model to ignore potential instructions embedded within the context.
Capability inventory: The skill demonstrates using retrieved data within a standard LLM invocation and as a tool for a LangChain agent.
Sanitization: No explicit sanitization or filtering of the retrieved document content is shown before it is passed to the LLM.
Insecure Deserialization Consideration: In the FAISS vector store example, the skill uses the allow_dangerous_deserialization=True parameter when calling FAISS.load_local. This is a requirement for loading FAISS indexes in certain environments because they rely on Python's pickle module. Users should ensure that FAISS index files are only loaded from trusted sources, as malicious pickle files can execute arbitrary code during the loading process.
External Data Retrieval: The WebBaseLoader and CheerioWebBaseLoader components fetch content from remote URLs. This is the intended behavior for building a RAG system that uses web documentation as a knowledge base. In this skill, the examples target official documentation sites.

langchain-rag