genai-services
OCI Generative AI Services
NEVER Do This
❌ NEVER send PHI/PII identifiers to GenAI APIs
# WRONG - patient identifiers in external service logs
prompt = f"Transcribe note for patient {patient_name}, MRN {mrn}, SSN {ssn}: {note}"
# RIGHT - redact first, keep mapping in secure DB
prompt = f"Transcribe this medical note: {redacted_note}"
# phi_mapping stored locally: temp_id → real_id
GenAI service logs may retain data. Sending PHI violates HIPAA/GDPR regardless of Oracle BAA status.
❌ NEVER trust GenAI output without validation in critical systems
- Hallucination rate: 5-15% for factual queries, higher for medical/legal
- Always route AI-suggested content to human review queue before acting on it
❌ NEVER exceed token limits silently
command-r-plus: 128k context (input + output combined)command-r: 4k context- Exceeding limit: request truncated silently OR fails with 400 error
❌ NEVER call GenAI without rate limit handling — 429s are common and predictable; see backoff pattern below
❌ NEVER use GenAI for deterministic tasks
- Wrong: "Extract invoice total from OCR text" → use regex/structured parsing
- Wrong: "Validate email format" → use validation library
- Right: "Summarize patient history", "Generate narrative report"
Model Selection
| Model | Context | Input Cost/1M | Output Cost/1M | Use For |
|---|---|---|---|---|
| command-r-plus | 128k | ~$15 | ~$75 | Complex reasoning, long docs, RAG |
| command-r | 4k | ~$1.50 | ~$7.50 | Chat, short prompts, high volume |
| embed-english-v3 | 512 | ~$0.10 | N/A | Semantic search (1000x cheaper than generation) |
| llama-2-70b | 4k | ~$2 | ~$10 | Cost-effective, open weights |
Decision rule: Start with command-r for everything. Upgrade to command-r-plus only when reasoning quality is demonstrably insufficient.
Cost optimization: Use embeddings for retrieval/search before invoking generation — same semantic result at 1000x lower cost.
OCI GenAI Rate Limits (Per Compartment)
| Model | Requests/Min | Requests/Day |
|---|---|---|
| command-r-plus | 20 | 1,000 |
| command-r | 60 | 3,000 |
| Embeddings | 100 | 10,000 |
Rate Limit Backoff Pattern
import time, random
from oci.exceptions import ServiceError
def generate_with_backoff(genai_client, request, max_retries=5):
for attempt in range(max_retries):
try:
response = genai_client.chat(request)
return response.data.chat_response.text
except ServiceError as e:
if e.status == 429 and attempt < max_retries - 1:
wait = (2 ** attempt) + random.uniform(0, 1) # 1s, 2s, 4s, 8s, 16s
time.sleep(wait)
elif e.status == 400:
if "token" in e.message.lower():
raise ValueError("Token limit exceeded — truncate input")
raise
else:
raise
Token Truncation
def truncate_for_model(text: str, model: str = "command-r-plus", max_output: int = 2000) -> str:
limits = {"command-r-plus": 128000, "command-r": 4000}
max_input_tokens = limits.get(model, 2000) - max_output
max_chars = max_input_tokens * 4 # ~4 chars per token
if len(text) <= max_chars:
return text
return "...[earlier content truncated]...\n" + text[-max_chars:]
Prompt token savings: Verbose system prompts waste tokens at scale. "Summarize: diagnoses, meds, allergies, treatment plan." vs a 50-word instruction saves 50 tokens × 1000 req/day = $68/month at command-r-plus rates.
PHI Redaction Pattern
import re
def redact_phi(text: str) -> tuple[str, dict]:
"""Remove PHI, return (redacted_text, mapping_to_restore)"""
mapping = {}
redacted = text
# MRNs
mrn_pattern = r'\b(MRN|Medical Record):?\s*([A-Z0-9]{6,10})\b'
redacted = re.sub(mrn_pattern, r'\1: [REDACTED]', redacted)
# SSNs
redacted = re.sub(r'\b\d{3}-\d{2}-\d{4}\b', '[SSN_REDACTED]', redacted)
# Names: use NER library (spacy or similar) for accuracy
# names = extract_names(text)
# for i, name in enumerate(names):
# placeholder = f"[PATIENT_{i}]"
# mapping[placeholder] = name
# redacted = redacted.replace(name, placeholder)
return redacted, mapping
Response Validation (Healthcare)
def validate_medical_response(response: str) -> tuple[bool, list[str]]:
issues = []
if not response or len(response.strip()) < 10:
issues.append("Response too short or empty")
# Hallucination markers
for marker in ["I don't have access", "I cannot", "As an AI", "[INSERT", "TODO"]:
if marker.lower() in response.lower():
issues.append(f"Hallucination marker: {marker}")
# Expected structure (customize per use case)
for section in ["Chief Complaint", "Assessment", "Plan"]:
if section.lower() not in response.lower():
issues.append(f"Missing section: {section}")
# PII leak detection (if input was redacted)
for pattern in [r'\b\d{3}-\d{2}-\d{4}\b', r'\b[A-Z]{2}\d{6,8}\b']:
if re.search(pattern, response):
issues.append(f"Potential PII in response: {pattern}")
return len(issues) == 0, issues
HIPAA Compliance Checklist
Before going live with PHI-adjacent GenAI:
- Business Associate Agreement (BAA) signed with Oracle — verify explicitly; don't assume
- PHI redacted before every API call
- Audit logging of all GenAI calls (who called, when, which model)
- Data retention policy for prompts/responses defined
Reference Files
Load references/oci-genai-reference.md when you need:
- Comprehensive GenAI API and SDK documentation
- RAG implementation with OCI
- GenAI Agents setup
- Fine-tuning and custom model deployment