The Agent Skills Directory

[SAFE]: The skill provides educational content and Python code snippets for calculating common NLP metrics such as BLEU, ROUGE, and BERTScore using established libraries.
[SAFE]: Dependencies referenced in code snippets, including nltk, scikit-learn, scipy, transformers, and detoxify, are standard, well-known libraries in the data science and machine learning ecosystem.
[SAFE]: All network-related patterns (e.g., using the OpenAI API or Hugging Face transformers) represent standard integration practices for LLM development and evaluation.
[SAFE]: No patterns indicative of prompt injection, data exfiltration, credential harvesting, or malicious persistence were found in the skill's instructions or implementation examples.
[SAFE]: The skill uses well-known models from Microsoft (DeBERTa) for evaluation tasks, which is a standard practice in the industry.

llm-evaluation