OCI Generative AI Services

NEVER Do This

❌ NEVER send PHI/PII identifiers to GenAI APIs

# WRONG - patient identifiers in external service logs
prompt = f"Transcribe note for patient {patient_name}, MRN {mrn}, SSN {ssn}: {note}"

# RIGHT - redact first, keep mapping in secure DB
prompt = f"Transcribe this medical note: {redacted_note}"
# phi_mapping stored locally: temp_id → real_id

GenAI service logs may retain data. Sending PHI violates HIPAA/GDPR regardless of Oracle BAA status.

❌ NEVER trust GenAI output without validation in critical systems

Hallucination rate: 5-15% for factual queries, higher for medical/legal
Always route AI-suggested content to human review queue before acting on it

❌ NEVER exceed token limits silently

command-r-plus: 128k context (input + output combined)
command-r: 4k context
Exceeding limit: request truncated silently OR fails with 400 error

❌ NEVER call GenAI without rate limit handling — 429s are common and predictable; see backoff pattern below

❌ NEVER use GenAI for deterministic tasks

Wrong: "Extract invoice total from OCR text" → use regex/structured parsing
Wrong: "Validate email format" → use validation library
Right: "Summarize patient history", "Generate narrative report"

Model Selection

Model	Context	Input Cost/1M	Output Cost/1M	Use For
command-r-plus	128k	~$15	~$75	Complex reasoning, long docs, RAG
command-r	4k	~$1.50	~$7.50	Chat, short prompts, high volume
embed-english-v3	512	~$0.10	N/A	Semantic search (1000x cheaper than generation)
llama-2-70b	4k	~$2	~$10	Cost-effective, open weights

Decision rule: Start with command-r for everything. Upgrade to command-r-plus only when reasoning quality is demonstrably insufficient.

Cost optimization: Use embeddings for retrieval/search before invoking generation — same semantic result at 1000x lower cost.

OCI GenAI Rate Limits (Per Compartment)

Model	Requests/Min	Requests/Day
command-r-plus	20	1,000
command-r	60	3,000
Embeddings	100	10,000

Rate Limit Backoff Pattern

import time, random
from oci.exceptions import ServiceError

def generate_with_backoff(genai_client, request, max_retries=5):
    for attempt in range(max_retries):
        try:
            response = genai_client.chat(request)
            return response.data.chat_response.text
        except ServiceError as e:
            if e.status == 429 and attempt < max_retries - 1:
                wait = (2 ** attempt) + random.uniform(0, 1)  # 1s, 2s, 4s, 8s, 16s
                time.sleep(wait)
            elif e.status == 400:
                if "token" in e.message.lower():
                    raise ValueError("Token limit exceeded — truncate input")
                raise
            else:
                raise

Token Truncation

def truncate_for_model(text: str, model: str = "command-r-plus", max_output: int = 2000) -> str:
    limits = {"command-r-plus": 128000, "command-r": 4000}
    max_input_tokens = limits.get(model, 2000) - max_output
    max_chars = max_input_tokens * 4  # ~4 chars per token

    if len(text) <= max_chars:
        return text
    return "...[earlier content truncated]...\n" + text[-max_chars:]

Prompt token savings: Verbose system prompts waste tokens at scale. "Summarize: diagnoses, meds, allergies, treatment plan." vs a 50-word instruction saves 50 tokens × 1000 req/day = $68/month at command-r-plus rates.

PHI Redaction Pattern

import re

def redact_phi(text: str) -> tuple[str, dict]:
    """Remove PHI, return (redacted_text, mapping_to_restore)"""
    mapping = {}
    redacted = text

    # MRNs
    mrn_pattern = r'\b(MRN|Medical Record):?\s*([A-Z0-9]{6,10})\b'
    redacted = re.sub(mrn_pattern, r'\1: [REDACTED]', redacted)

    # SSNs
    redacted = re.sub(r'\b\d{3}-\d{2}-\d{4}\b', '[SSN_REDACTED]', redacted)

    # Names: use NER library (spacy or similar) for accuracy
    # names = extract_names(text)
    # for i, name in enumerate(names):
    #     placeholder = f"[PATIENT_{i}]"
    #     mapping[placeholder] = name
    #     redacted = redacted.replace(name, placeholder)

    return redacted, mapping

Response Validation (Healthcare)

def validate_medical_response(response: str) -> tuple[bool, list[str]]:
    issues = []

    if not response or len(response.strip()) < 10:
        issues.append("Response too short or empty")

    # Hallucination markers
    for marker in ["I don't have access", "I cannot", "As an AI", "[INSERT", "TODO"]:
        if marker.lower() in response.lower():
            issues.append(f"Hallucination marker: {marker}")

    # Expected structure (customize per use case)
    for section in ["Chief Complaint", "Assessment", "Plan"]:
        if section.lower() not in response.lower():
            issues.append(f"Missing section: {section}")

    # PII leak detection (if input was redacted)
    for pattern in [r'\b\d{3}-\d{2}-\d{4}\b', r'\b[A-Z]{2}\d{6,8}\b']:
        if re.search(pattern, response):
            issues.append(f"Potential PII in response: {pattern}")

    return len(issues) == 0, issues

HIPAA Compliance Checklist

Before going live with PHI-adjacent GenAI:

Business Associate Agreement (BAA) signed with Oracle — verify explicitly; don't assume
PHI redacted before every API call
Audit logging of all GenAI calls (who called, when, which model)
Data retention policy for prompts/responses defined

Reference Files

Load references/oci-genai-reference.md when you need:

Comprehensive GenAI API and SDK documentation
RAG implementation with OCI
GenAI Agents setup
Fine-tuning and custom model deployment

genai-services