When to Use

Adding new features to voice or vision modules
Understanding how modules interact (voice ↔ vision)
Creating new adapters or use cases
Working with domain entities or services
Troubleshooting dependency issues
Adding UI components following Atomic Design

Project Structure

mobile/src/
├── voice/          # Voice recognition & commands module
├── vision/         # Camera & AI vision module  
└── shared/         # Reusable components & utilities

Each module follows Clean Architecture with 4 layers:

Domain: Pure business logic (entities, services, value objects)
Application: Use cases + ports (interfaces)
Infrastructure: External implementations (adapters)
Presentation: React components, hooks, XState machines

Voice Module

Structure

voice/
├── domain/
│   ├── entities/VoiceCommand.ts          # Voice command entity
│   ├── services/WakeWordParser.ts        # Parse wake word
│   └── value-objects/CommandIntent.ts    # Intent classification
├── application/
│   ├── use-cases/ProcessCommand.ts       # Execute commands
│   └── ports/
│       ├── SpeechSynthesizer.ts          # TTS interface
│       ├── VisionService.ts              # Vision module interface
│       └── DescriptionRepository.ts      # Storage interface
├── infrastructure/
│   ├── adapters/
│   │   ├── expo/ExpoSpeechRecognitionAdapter.ts
│   │   ├── cloud/ReactNativeVoiceAdapter.ts
│   │   ├── whisper/WhisperAdapter.ts
│   │   ├── mock/MockVoiceAdapter.ts
│   │   └── simple/SimpleSpeechAdapter.ts
│   └── services/
│       ├── ContinuousWakeWordService.ts
│       └── PorcupineWakeWordService.ts
├── machines/
│   └── voiceMachine.ts                   # XState state machine
└── presentation/
    └── hooks/
        ├── useVoiceCommands.ts           # Main orchestrator
        ├── useVoiceRecognition.ts
        └── useContinuousWakeWord.ts

Key Files

File	Purpose
`ProcessCommand.ts`	Main use case: executes DESCRIBE, REPEAT, HELP, STOP, GOODBYE
`CommandIntent.ts`	Classifies intent with bilingual regex patterns
`voiceMachine.ts`	XState machine: idle → listening → processing → speaking
`useVoiceCommands.ts`	Main hook: integrates machine + recognition + use case

Command Intents

DESCRIBE  // "describe", "what do you see", "qué ves"
REPEAT    // "repeat", "again", "repite"
HELP      // "help", "ayuda"
STOP      // "stop", "para"
GOODBYE   // "goodbye", "adiós"
UNKNOWN   // Fallback

Vision Module

Structure

vision/
├── domain/
│   ├── entities/
│   │   ├── DetectedObject.ts             # Object with bbox, position, size
│   │   └── SceneDescription.ts           # Scene with objects & description
│   ├── services/
│   │   └── SceneDescriptionGenerator.ts  # Natural language generator
│   └── value-objects/
│       └── LabelTranslations.ts          # COCO labels EN→ES
├── application/
│   ├── use-cases/
│   │   └── AnalyzeSceneUseCase.ts        # Capture + analyze
│   └── ports/
│       ├── IVisionService.ts             # Vision AI interface
│       └── ICameraService.ts             # Camera interface
├── infrastructure/
│   └── adapters/
│       ├── tflite/TFLiteVisionAdapter.ts # TensorFlow Lite
│       ├── expo/ExpoCameraAdapter.ts     # Expo Camera
│       └── voice/VisionServiceBridge.ts  # Bridge to voice module
└── presentation/
    ├── components/
    │   └── CameraCapture.tsx             # Hidden camera (1x1px)
    └── hooks/
        └── useVisionService.ts           # Create vision service

Key Files

File	Purpose
`AnalyzeSceneUseCase.ts`	Orchestrates: permissions → capture → analyze → description
`SceneDescriptionGenerator.ts`	Converts detections to Spanish natural language
`TFLiteVisionAdapter.ts`	COCO-SSD MobileNet V1 (80 object categories)
`VisionServiceBridge.ts`	Connects vision to voice module
`LabelTranslations.ts`	Translates COCO labels (person → persona, chair → silla)

Object Detection

TensorFlow Lite COCO-SSD model detects:

80 object categories (person, car, chair, laptop, etc.)
Bounding boxes with confidence scores
Position classification (left/center/right, top/middle/bottom)
Size classification (small/medium/large)

Shared Module

Structure

shared/
└── presentation/
    └── components/
        ├── atoms/                        # Basic components
        │   ├── Button.tsx
        │   ├── Icon.tsx
        │   ├── Typography.tsx
        │   ├── PulsingCircle.tsx
        │   └── WakeWordStatusBar.tsx
        ├── molecules/                    # Composite components
        │   └── VoiceCommandPanel.tsx     # Main UI control
        └── pages/
            └── HomeScreen.tsx            # Main app screen

Follows Atomic Design:

Atoms: Button, Icon, Typography, PulsingCircle
Molecules: VoiceCommandPanel
Pages: HomeScreen

Module Communication

Voice → Vision Dependency

Voice (ProcessCommandUseCase)
  → VisionService port (interface)
    → VisionServiceBridge (infrastructure)
      → AnalyzeSceneUseCase (vision module)
        → TFLiteVisionAdapter + ExpoCameraAdapter

Why?

Voice module doesn't know about TFLite or cameras
Vision module is replaceable (GPT-4 Vision, Gemini, etc.)
Bridge pattern maintains independence

Naming Conventions

Type	Pattern	Example
Entity	Noun	`VoiceCommand`, `DetectedObject`
Value Object	Descriptive noun	`CommandIntent`, `LabelTranslations`
Service	Verb + noun	`WakeWordParser`, `SceneDescriptionGenerator`
Use Case	Verb + noun + UseCase	`AnalyzeSceneUseCase`, `ProcessCommandUseCase`
Port	I + PascalCase	`IVisionService`, `ICameraService`
Adapter	Tech + purpose + Adapter	`TFLiteVisionAdapter`, `ExpoCameraAdapter`
Hook	use + PascalCase	`useVoiceCommands`, `useVisionService`

Critical Patterns

1. Dependency Rule

Outer layers depend on inner layers
Domain → Application → Infrastructure → Presentation
Never reverse (infrastructure never imports presentation)

2. Ports & Adapters

Application layer defines ports (interfaces)
Infrastructure layer provides adapters (implementations)
Multiple adapters per port (5 speech recognition options!)

Example:

// Application layer (port)
interface IVisionService {
  analyzeImage(uri: string): Promise<SceneDescription>;
  preloadModels(): Promise<void>;
  isReady(): boolean;
}

// Infrastructure layer (adapter)
class TFLiteVisionAdapter implements IVisionService {
  async analyzeImage(uri: string) { /* TFLite implementation */ }
  async preloadModels() { /* Load COCO-SSD */ }
  isReady() { /* Check model loaded */ }
}

3. Use Cases

One use case = one application feature
Orchestrates domain services and entities
Returns domain entities or DTOs

Example:

class AnalyzeSceneUseCase {
  constructor(
    private cameraService: ICameraService,
    private visionService: IVisionService
  ) {}

  async execute(): Promise<SceneDescription> {
    const photo = await this.cameraService.capturePhoto();
    return this.visionService.analyzeImage(photo.uri);
  }
}

4. State Machines (XState)

Voice flow managed by voiceMachine:

idle → listening → processing → speaking → idle
  ↓                    ↓            ↓
error ←──────────────┘            └→ retry

Guards: hasWakeWord, isHighConfidence, isSuccess

5. Bridge Pattern

VisionServiceBridge connects modules:

Implements voice's VisionService port
Uses vision's AnalyzeSceneUseCase internally
Maintains module independence

Adding New Features

New Voice Adapter

Create infrastructure/adapters/<tech>/<Tech>Adapter.ts
Implement SpeechSynthesizer port
Export from infrastructure/adapters/index.ts
Use in useVoiceCommands or useVoiceRecognition

New Vision Adapter

Create infrastructure/adapters/<tech>/<Tech>VisionAdapter.ts
Implement IVisionService port
Swap in useVisionService hook or VisionServiceBridge

New Command Intent

Add pattern to CommandIntent.ts
Handle in ProcessCommandUseCase.execute()
Add tests in ProcessCommand.test.ts

New UI Component

Determine atomic level (atom/molecule/organism/page)
Create in shared/presentation/components/<level>/<Name>.tsx
Add tests in __tests__/<Name>.test.tsx
Export from index if needed

Data Flow Example

User says "Iris, describe":

1. useVoiceCommands → ExpoSpeechRecognitionAdapter
   Transcript: "Iris, describe"

2. WakeWordParser.parse("Iris, describe", 0.95)
   → ParsedCommand { intent: DESCRIBE, commandText: "describe" }

3. ProcessCommandUseCase.execute(parsedCommand)
   → visionService.analyzeScene()

4. VisionServiceBridge → AnalyzeSceneUseCase.execute()
   → cameraService.capturePhoto()
   → visionService.analyzeImage(photo)

5. TFLiteVisionAdapter runs COCO-SSD
   → Detections: [person: 0.92, chair: 0.85, laptop: 0.78]

6. SceneDescriptionGenerator.generate(detections)
   → "Veo una persona, una silla y un portátil"

7. speechSynthesizer.speak("Veo una persona...")
   repository.saveLastDescription()

8. voiceMachine: processing → speaking → listening
   VoiceCommandPanel updates UI

Technology Stack

Layer	Voice	Vision
Domain	Pure JS/TS	Pure JS/TS
Application	Pure JS/TS	Pure JS/TS
Infrastructure	Expo Speech, Porcupine, Whisper	TensorFlow Lite, Expo Camera
Presentation	React, XState	React

Testing Strategy

Domain: Unit tests (entities, services, value objects)
Application: Use case tests with mocked ports
Presentation: Component/hook tests with Testing Library
State Machines: XState machine tests

Example test locations:

voice/domain/services/__tests__/WakeWordParser.test.ts
voice/application/use-cases/__tests__/ProcessCommand.test.ts
shared/presentation/components/atoms/__tests__/Button.test.tsx

Common Gotchas

Don't import presentation from infrastructure
- ❌ import { useVisionService } from '../presentation'
- ✅ Use dependency injection or hooks
Don't put business logic in hooks
- ❌ Intent classification in useVoiceCommands
- ✅ Use domain services (WakeWordParser, CommandIntent)
Don't skip the bridge
- ❌ Voice directly imports vision's AnalyzeSceneUseCase
- ✅ Use VisionServiceBridge to maintain independence
Don't hardcode strings
- ❌ speak("I see a person")
- ✅ Use SceneDescriptionGenerator or translation files

Resources

Architecture Diagram: See assets/architecture-diagram.md
Feature Checklist: See assets/new-feature-checklist.md

iris-architecture

When to Use

Project Structure

Voice Module

Structure

Key Files

Command Intents

Vision Module

Structure

Key Files

Object Detection

Shared Module

Structure

Module Communication

Voice → Vision Dependency

Naming Conventions

Critical Patterns

1. Dependency Rule

2. Ports & Adapters

3. Use Cases

4. State Machines (XState)

5. Bridge Pattern

Adding New Features

New Voice Adapter

New Vision Adapter

New Command Intent

New UI Component

Data Flow Example

Technology Stack

Testing Strategy

Common Gotchas

Resources

More from 333-333-333/iris

react

testing

expo

git-tags

xstate

tensorflow-lite