testing-and-quality-assurance
Testing & Quality Assurance
Overview
Test infrastructure uses pytest. The tests/ directory has a mix of utility scripts and actual test_*.py files. tests/conftest.py and pytest.ini are the canonical test configuration.
Running Tests
# From project root with .venv active
.venv/bin/python -m pytest # run all tests
.venv/bin/python -m pytest -m unit # unit tests only
.venv/bin/python -m pytest -m "not slow" # skip slow ML tests
.venv/bin/python -m pytest tests/test_basic.py -v
pytest.ini Markers
[pytest]
testpaths = tests
addopts = -v --tb=short -q
markers =
unit: Pure unit tests — no I/O or network
integration: Tests that require a real database or network
translation: Tests that exercise the Helsinki-NLP pipeline
slow: Tests expected to take > 5 seconds
conftest.py Fixtures
tests/conftest.py provides:
| Fixture | Scope | Purpose |
|---|---|---|
mock_db |
session | MagicMock of DictionaryDB — no real DB required |
flask_app |
session | Flask test app with DB and translator mocked |
client |
function | Flask test client |
auth_headers |
function | Pre-authenticated session |
Critical Patch Targets
The app uses module-level globals — patch them correctly:
# CORRECT — variable is dict_db, not app.db
from unittest.mock import patch, MagicMock
def test_search(client):
mock = MagicMock()
mock.search_entries.return_value = [{'chuukese_word': 'ran', 'english_translation': 'water'}]
with patch('app.dict_db', mock):
resp = client.get('/api/dictionary/search?q=ran')
assert resp.status_code == 200
# CORRECT — collection attribute is dictionary_collection
mock_db.dictionary_collection.find.return_value = iter([...])
# CORRECT — method is bulk_insert_entries, not bulk_insert
mock_db.bulk_insert_entries.return_value = {'inserted': 5}
# CORRECT — translator is a module global, not a function
with patch('app.helsinki_translator') as mock_translator:
mock_translator.translate.return_value = 'water'
...
API Test Pattern
import pytest
@pytest.mark.unit
def test_dictionary_search_empty_query(client, auth_headers):
resp = client.get('/api/dictionary/search?q=')
assert resp.status_code in (200, 400)
@pytest.mark.integration
def test_bible_coverage(client, auth_headers):
resp = client.get('/api/database/bible-coverage')
data = resp.get_json()
assert 'books' in data
Translation Accuracy Testing
from sacrebleu.metrics import BLEU
bleu = BLEU()
result = bleu.corpus_score(hypotheses, [references])
assert result.score > 15.0, f"BLEU too low: {result.score}"
Test File Inventory
Files in tests/ prefixed test_ are runnable with pytest:
test_basic.py— publication manager, JW.org lookuptest_collections.py— database collection opstest_helsinki_trainer.py— HelsinkiNLP training helperstest_translation.py— translation pipelinetest_scripture_parsing.py— scripture parsing logictest_word_families.py— word family grouping
Files without the test_ prefix are exploratory scripts and are not collected by pytest.
Source Files
tests/conftest.py— shared fixturespytest.ini— test runner configurationtests/test_basic.py— canonical example tests
More from findinfinitelabs/chuuk
large-document-processing
Process large documents (200+ pages) with structure preservation, intelligent parsing, and memory-efficient handling. Also covers intelligent text chunking for AI training and RAG systems. Use when working with complex formatted documents, multi-level hierarchies, or when splitting large content for AI pipelines.
28python-venv-management
Automatically manage Python virtual environments (.venv) in terminal commands. Always activate .venv before running Python/pip commands. Supports macOS, Linux, and Windows with shell-aware activation. Use when executing Python scripts, installing packages, or running development servers. Critical for consistent environment management.
14bible-epub-processing
Parse and extract structured content from Bible EPUBs (NWT) for parallel text alignment between Chuukese and English. Use when working with Bible data, verse extraction, parallel corpus building, or generating training data from Scripture.
14security-environment-standards
Security and environment configuration standards for web applications, including environment variable management, secure coding practices, and production deployment security. Use when setting up environments, configuring security, or deploying applications.
13intelligent-text-chunking
Split large texts into meaningful, AI-optimized chunks while preserving semantic coherence and document structure. Covered by the large-document-processing skill — see that skill for full details.
13document-ocr-processing
Process scanned documents and images containing Chuukese text using OCR with specialized post-processing for accent characters and traditional formatting. Use when working with scanned books, documents, or images that contain Chuukese text that needs to be digitized.
12