iris-architecture
Installation
SKILL.md
When to Use
- Adding new features to voice or vision modules
- Understanding how modules interact (voice ↔ vision)
- Creating new adapters or use cases
- Working with domain entities or services
- Troubleshooting dependency issues
- Adding UI components following Atomic Design
Project Structure
mobile/src/
├── voice/ # Voice recognition & commands module
├── vision/ # Camera & AI vision module
└── shared/ # Reusable components & utilities
Each module follows Clean Architecture with 4 layers:
- Domain: Pure business logic (entities, services, value objects)
- Application: Use cases + ports (interfaces)
- Infrastructure: External implementations (adapters)
- Presentation: React components, hooks, XState machines
Voice Module
Structure
voice/
├── domain/
│ ├── entities/VoiceCommand.ts # Voice command entity
│ ├── services/WakeWordParser.ts # Parse wake word
│ └── value-objects/CommandIntent.ts # Intent classification
├── application/
│ ├── use-cases/ProcessCommand.ts # Execute commands
│ └── ports/
│ ├── SpeechSynthesizer.ts # TTS interface
│ ├── VisionService.ts # Vision module interface
│ └── DescriptionRepository.ts # Storage interface
├── infrastructure/
│ ├── adapters/
│ │ ├── expo/ExpoSpeechRecognitionAdapter.ts
│ │ ├── cloud/ReactNativeVoiceAdapter.ts
│ │ ├── whisper/WhisperAdapter.ts
│ │ ├── mock/MockVoiceAdapter.ts
│ │ └── simple/SimpleSpeechAdapter.ts
│ └── services/
│ ├── ContinuousWakeWordService.ts
│ └── PorcupineWakeWordService.ts
├── machines/
│ └── voiceMachine.ts # XState state machine
└── presentation/
└── hooks/
├── useVoiceCommands.ts # Main orchestrator
├── useVoiceRecognition.ts
└── useContinuousWakeWord.ts
Key Files
| File | Purpose |
|---|---|
ProcessCommand.ts |
Main use case: executes DESCRIBE, REPEAT, HELP, STOP, GOODBYE |
CommandIntent.ts |
Classifies intent with bilingual regex patterns |
voiceMachine.ts |
XState machine: idle → listening → processing → speaking |
useVoiceCommands.ts |
Main hook: integrates machine + recognition + use case |
Command Intents
DESCRIBE // "describe", "what do you see", "qué ves"
REPEAT // "repeat", "again", "repite"
HELP // "help", "ayuda"
STOP // "stop", "para"
GOODBYE // "goodbye", "adiós"
UNKNOWN // Fallback
Vision Module
Structure
vision/
├── domain/
│ ├── entities/
│ │ ├── DetectedObject.ts # Object with bbox, position, size
│ │ └── SceneDescription.ts # Scene with objects & description
│ ├── services/
│ │ └── SceneDescriptionGenerator.ts # Natural language generator
│ └── value-objects/
│ └── LabelTranslations.ts # COCO labels EN→ES
├── application/
│ ├── use-cases/
│ │ └── AnalyzeSceneUseCase.ts # Capture + analyze
│ └── ports/
│ ├── IVisionService.ts # Vision AI interface
│ └── ICameraService.ts # Camera interface
├── infrastructure/
│ └── adapters/
│ ├── tflite/TFLiteVisionAdapter.ts # TensorFlow Lite
│ ├── expo/ExpoCameraAdapter.ts # Expo Camera
│ └── voice/VisionServiceBridge.ts # Bridge to voice module
└── presentation/
├── components/
│ └── CameraCapture.tsx # Hidden camera (1x1px)
└── hooks/
└── useVisionService.ts # Create vision service
Key Files
| File | Purpose |
|---|---|
AnalyzeSceneUseCase.ts |
Orchestrates: permissions → capture → analyze → description |
SceneDescriptionGenerator.ts |
Converts detections to Spanish natural language |
TFLiteVisionAdapter.ts |
COCO-SSD MobileNet V1 (80 object categories) |
VisionServiceBridge.ts |
Connects vision to voice module |
LabelTranslations.ts |
Translates COCO labels (person → persona, chair → silla) |
Object Detection
TensorFlow Lite COCO-SSD model detects:
- 80 object categories (person, car, chair, laptop, etc.)
- Bounding boxes with confidence scores
- Position classification (left/center/right, top/middle/bottom)
- Size classification (small/medium/large)
Shared Module
Structure
shared/
└── presentation/
└── components/
├── atoms/ # Basic components
│ ├── Button.tsx
│ ├── Icon.tsx
│ ├── Typography.tsx
│ ├── PulsingCircle.tsx
│ └── WakeWordStatusBar.tsx
├── molecules/ # Composite components
│ └── VoiceCommandPanel.tsx # Main UI control
└── pages/
└── HomeScreen.tsx # Main app screen
Follows Atomic Design:
- Atoms: Button, Icon, Typography, PulsingCircle
- Molecules: VoiceCommandPanel
- Pages: HomeScreen
Module Communication
Voice → Vision Dependency
Voice (ProcessCommandUseCase)
→ VisionService port (interface)
→ VisionServiceBridge (infrastructure)
→ AnalyzeSceneUseCase (vision module)
→ TFLiteVisionAdapter + ExpoCameraAdapter
Why?
- Voice module doesn't know about TFLite or cameras
- Vision module is replaceable (GPT-4 Vision, Gemini, etc.)
- Bridge pattern maintains independence
Naming Conventions
| Type | Pattern | Example |
|---|---|---|
| Entity | Noun | VoiceCommand, DetectedObject |
| Value Object | Descriptive noun | CommandIntent, LabelTranslations |
| Service | Verb + noun | WakeWordParser, SceneDescriptionGenerator |
| Use Case | Verb + noun + UseCase | AnalyzeSceneUseCase, ProcessCommandUseCase |
| Port | I + PascalCase | IVisionService, ICameraService |
| Adapter | Tech + purpose + Adapter | TFLiteVisionAdapter, ExpoCameraAdapter |
| Hook | use + PascalCase | useVoiceCommands, useVisionService |
Critical Patterns
1. Dependency Rule
- Outer layers depend on inner layers
- Domain → Application → Infrastructure → Presentation
- Never reverse (infrastructure never imports presentation)
2. Ports & Adapters
- Application layer defines ports (interfaces)
- Infrastructure layer provides adapters (implementations)
- Multiple adapters per port (5 speech recognition options!)
Example:
// Application layer (port)
interface IVisionService {
analyzeImage(uri: string): Promise<SceneDescription>;
preloadModels(): Promise<void>;
isReady(): boolean;
}
// Infrastructure layer (adapter)
class TFLiteVisionAdapter implements IVisionService {
async analyzeImage(uri: string) { /* TFLite implementation */ }
async preloadModels() { /* Load COCO-SSD */ }
isReady() { /* Check model loaded */ }
}
3. Use Cases
- One use case = one application feature
- Orchestrates domain services and entities
- Returns domain entities or DTOs
Example:
class AnalyzeSceneUseCase {
constructor(
private cameraService: ICameraService,
private visionService: IVisionService
) {}
async execute(): Promise<SceneDescription> {
const photo = await this.cameraService.capturePhoto();
return this.visionService.analyzeImage(photo.uri);
}
}
4. State Machines (XState)
Voice flow managed by voiceMachine:
idle → listening → processing → speaking → idle
↓ ↓ ↓
error ←──────────────┘ └→ retry
Guards: hasWakeWord, isHighConfidence, isSuccess
5. Bridge Pattern
VisionServiceBridge connects modules:
- Implements voice's
VisionServiceport - Uses vision's
AnalyzeSceneUseCaseinternally - Maintains module independence
Adding New Features
New Voice Adapter
- Create
infrastructure/adapters/<tech>/<Tech>Adapter.ts - Implement
SpeechSynthesizerport - Export from
infrastructure/adapters/index.ts - Use in
useVoiceCommandsoruseVoiceRecognition
New Vision Adapter
- Create
infrastructure/adapters/<tech>/<Tech>VisionAdapter.ts - Implement
IVisionServiceport - Swap in
useVisionServicehook orVisionServiceBridge
New Command Intent
- Add pattern to
CommandIntent.ts - Handle in
ProcessCommandUseCase.execute() - Add tests in
ProcessCommand.test.ts
New UI Component
- Determine atomic level (atom/molecule/organism/page)
- Create in
shared/presentation/components/<level>/<Name>.tsx - Add tests in
__tests__/<Name>.test.tsx - Export from index if needed
Data Flow Example
User says "Iris, describe":
1. useVoiceCommands → ExpoSpeechRecognitionAdapter
Transcript: "Iris, describe"
2. WakeWordParser.parse("Iris, describe", 0.95)
→ ParsedCommand { intent: DESCRIBE, commandText: "describe" }
3. ProcessCommandUseCase.execute(parsedCommand)
→ visionService.analyzeScene()
4. VisionServiceBridge → AnalyzeSceneUseCase.execute()
→ cameraService.capturePhoto()
→ visionService.analyzeImage(photo)
5. TFLiteVisionAdapter runs COCO-SSD
→ Detections: [person: 0.92, chair: 0.85, laptop: 0.78]
6. SceneDescriptionGenerator.generate(detections)
→ "Veo una persona, una silla y un portátil"
7. speechSynthesizer.speak("Veo una persona...")
repository.saveLastDescription()
8. voiceMachine: processing → speaking → listening
VoiceCommandPanel updates UI
Technology Stack
| Layer | Voice | Vision |
|---|---|---|
| Domain | Pure JS/TS | Pure JS/TS |
| Application | Pure JS/TS | Pure JS/TS |
| Infrastructure | Expo Speech, Porcupine, Whisper | TensorFlow Lite, Expo Camera |
| Presentation | React, XState | React |
Testing Strategy
- Domain: Unit tests (entities, services, value objects)
- Application: Use case tests with mocked ports
- Presentation: Component/hook tests with Testing Library
- State Machines: XState machine tests
Example test locations:
voice/domain/services/__tests__/WakeWordParser.test.tsvoice/application/use-cases/__tests__/ProcessCommand.test.tsshared/presentation/components/atoms/__tests__/Button.test.tsx
Common Gotchas
-
Don't import presentation from infrastructure
- ❌
import { useVisionService } from '../presentation' - ✅ Use dependency injection or hooks
- ❌
-
Don't put business logic in hooks
- ❌ Intent classification in
useVoiceCommands - ✅ Use domain services (
WakeWordParser,CommandIntent)
- ❌ Intent classification in
-
Don't skip the bridge
- ❌ Voice directly imports vision's
AnalyzeSceneUseCase - ✅ Use
VisionServiceBridgeto maintain independence
- ❌ Voice directly imports vision's
-
Don't hardcode strings
- ❌
speak("I see a person") - ✅ Use
SceneDescriptionGeneratoror translation files
- ❌
Resources
- Architecture Diagram: See assets/architecture-diagram.md
- Feature Checklist: See assets/new-feature-checklist.md
Related skills