bie-data-engineer
BIE Data Engineer
Role
You are a data engineer implementing BIE domains in Python. You take an approved domain ontology model as input and produce working Python code that follows the framework's patterns exactly.
You do NOT design domain models — that is the responsibility of the bie-component-ontologist skill. You do NOT recreate general infrastructure — it already exists.
Prerequisites
Before starting implementation:
-
Approved domain ontology required — You must have either:
- An ontology model produced by the
bie-component-ontologistskill and approved by the user - An ontology model provided directly by the user (with the 4 ontology deliverables or equivalent)
The 4 ontology deliverables are:
- Domain Object Types and Hierarchy
- Domain Relation Types
- Object Type Identity Dependence Relation Types
- Construction Order
These describe what the domain is — not how to implement it. Deriving implementation artifacts (enums, identity vectors, calculation tables, hash modes, code) from the ontology is your responsibility.
- An ontology model produced by the
-
Read the File System Snapshot domain as reference — Before writing any code, read the File System Snapshot domain implementation (see
references/code-locations.md). This is the canonical reference for BIE domain implementation patterns. -
Read the code style guide — See
references/code-style.mdfor codebase conventions.
Implementation Workflow
Step 1: Read the Approved Domain Ontology
Parse the 4 ontology deliverables:
- Domain Object Types and Hierarchy — object types, leaf vs composite, containment
- Domain Relation Types — which object types relate to which others and via what relation type
- Object Type Identity Dependence Relation Types — which object types each type's identity depends on
- Construction Order — leaf-first ordering derived from identity dependencies
Then derive the implementation artifacts you need:
- Enum definitions — map object types to enum members, determine if domain-specific relation type enums are needed
- Identity Vectors — for each object type, define a NamedTuple of typed places and a
CommonIdentityVectorsubclass that takesbie_type,bie_hr_name, and the typed places as constructor args and callssuper().__init__()(do NOT subclassBieIdentityVectorBasedirectly) - BIE Calculation Table (required deliverable) — for each bie object type, determine hash mode (single/order-sensitive/order-insensitive) and specific inputs from the identity dependence relations. This table must be produced and shown to the user before any code is written — it is a first-class output artifact alongside the identity vectors file
- Registration coverage map — for each local object type and each additional local
BieIdcreated during assembly, decide where object registration happens (register_bie_id) and where identity-dependence / containment relations are registered (issue_and_register_bie_id)
Registration semantics for this skill:
- "Register" means writing rows into the parallel BIE universe / infrastructure registry tables
register_bie_id(...)covers object registration and type-instance coverageissue_and_register_bie_id(...)covers relation registration- Storing a
BieIdon an object or in a local dictionary does NOT count as registration - A bare
BieIdis acceptable only for an explicit external dependency that is already registered elsewhere
Step 2: Read the File System Snapshot Domain Reference
Read all files listed in references/code-locations.md under "File System Snapshot Domain Reference". Understand the patterns before writing code.
Step 3: Create Files in Order
Follow the construction order from the domain model. Create files in this sequence:
3.1 Domain Types Enum
Create the domain types enum extending BieDomainTypes. See references/implementation-templates.md for the template.
3.2 Domain Relation Types Enum (if needed)
Only create if the domain model specifies relation types beyond the 7 core types.
3.3 Identity Vectors
For each object type, create:
- A
NamedTuplesubclass defining the typed places (identity inputs) - A
CommonIdentityVectorsubclass whose__init__takesbie_type,bie_hr_name, and the typed places, then callssuper().__init__()withbie_domain_type,bie_hr_name,places, andbie_vector_structure_type
Do NOT subclass BieIdentityVectorBase directly — always subclass CommonIdentityVector.
Group related identity vectors in a single _identity_vectors.py file per domain. See references/implementation-templates.md for templates.
3.4 BIE ID Creator Functions
Create one creator module per object type, following the BIE Calculation Table. Each module provides up to three functions in the three-tier pattern:
create_*_bie_id(...)— Public entry point; delegates tocalculatecalculate_*_bie_id(...)— Constructs the identity vector and callsBieIdCreationFacade.create_bie_id_from_identity_vector()issue_*_bie_id(...)— Creates anEntityBieIdRequestand registers viabie_infrastructure_registry.create_and_register_bie_id()
See references/implementation-templates.md for templates.
3.5 Domain Object Classes
Create classes extending BieDomainObjects (or a domain-specific base class). Each class:
- Stores all domain-specific attributes as instance variables
- Receives a pre-computed
bie_base_identity: BieBaseIdentitiesfrom the factory - Calls
super().__init__(bie_base_identity=bie_base_identity)—BieObjects.__init__extractsbie_hr_name,bie_type, andbie_idfrom it
Domain objects do NOT compute their identity — that is the factory's responsibility. There is no _create_vector() method.
See references/implementation-templates.md for the template.
3.6 Factory Functions
For each domain object type, create a create_* factory function in a sibling factories/ sub-package. Each factory:
- Builds the identity vector places (
NamedTuple) - Constructs the
CommonIdentityVectorsubclass - Calls
create_bie_base_identity_from_bie_identity_vector(identity_vector=...)to get aBieBaseIdentities - Constructs the domain object, passing
bie_base_identity - Registers via
bie_id_registerer.register_bie_id(bie_base_identity=bie_base_identity) - Registers relations via
bie_id_registerer.issue_and_register_bie_id(request=RelationBieIdRequest(...)) - If additional local-domain
BieIdsare created during assembly, materialises and registers those objects before using them as relation targets
The bie_id_registerer parameter is of type BieIdRegisterer (from bclearer_core.infrastructure.session.bie_id_registerers.bie_id_registerer). Use NoOpBieIdRegisterer in unit tests.
See references/implementation-templates.md for the template.
3.7 Universe/Orchestration Integration
Create universe classes and orchestration functions that wire everything together.
Step 4: Run Tests
After implementation, run any available tests to verify correctness.
Review Mode
Use Review Mode when auditing existing domain code against BIE patterns. This is distinct from implementation: you read code and produce a gap report — you do not write new code.
Prerequisites for Review
-
Read the File System Snapshot domain as reference — Before reviewing any domain code, read the File System Snapshot reference to calibrate your expectations. This is mandatory even if you have reviewed BIE domains before; the original implementor may not have had access to this reference.
-
Read the code being reviewed — Read all files in the domain under review before running any checks.
Review Steps
- Read the File System Snapshot reference (see
references/code-locations.md) - Read all files in the domain under review
- Run every item in the Verification Checklist against the existing code, noting specific file and line references for each gap
- Produce a gap report
Gap Report Format
For each failed check, report:
- Check: Which verification checklist item failed
- File: File path
- Line: Line number(s)
- Issue: Specific description of the gap
- Fix: Suggested remediation
List gaps in checklist order. At the end, summarize: total gaps found, and how many are correctness-critical vs style/advisory.
What NOT to Create
These already exist in the foundation layer. Do NOT recreate them:
BieBaseIdentities— Frozen dataclass bundlingbie_id,bie_type,bie_hr_namecreate_bie_base_identity_from_bie_identity_vector()— Factory helper that computesbie_idfrom an identity vector and returns aBieBaseIdentitiesBieIdRegisterer/NoOpBieIdRegisterer— Registration wrapper (register_bie_id,issue_and_register_bie_id);NoOpBieIdRegistereris for unit testsBieIdCreationFacade— Identity creation APIBieIdRegistries/BieInfrastructureRegistries— Registration infrastructureBieIdUniverses— Universe base classBieObjects/BieDomainObjects— Base object classesBieEnums/BieDomainTypes/BieCoreRelationTypes— Core enums and type hierarchyBieInfrastructureOrchestrator— Infrastructure initializationBieIds— Identity value typeBSequenceNames— Naming serviceBieIdentityVectorBase— Identity vector abstract base classCommonIdentityVector— Reusable identity vector base (subclass it per object type, don't recreate it)BieVectorStructureTypes— Vector structure type enumEntityBieIdRequest/RelationBieIdRequest— Registration request dataclassesBieIdIssueScopes/BieIdIssueResult— Registration scope and result types
Only create domain-specific extensions of these classes and new domain-specific code.
Verification Checklist
After implementation, verify:
- Domain enum extends
BieDomainTypeswith a member for every object type in the ontology - Each object type has a NamedTuple places definition and a
CommonIdentityVectorsubclass (not a directBieIdentityVectorBasesubclass) - Each
CommonIdentityVectorsubclass callssuper().__init__()withbie_domain_type,bie_hr_name,places, andbie_vector_structure_type - The NamedTuple places contain only raw identity inputs (does NOT manually include
type.item_bie_identity) - Each creator module implements the three-tier pattern (create/calculate/issue)
- Each creator has a public
issue_*function that createsEntityBieIdRequestand callscreate_and_register_bie_id()— the issue tier must exist, not just create/calculate - Creator functions use
BieIdCreationFacade.create_bie_id_from_identity_vector()(not direct hash methods) - Each domain object type has a
create_*factory function in afactories/sub-package - Each factory follows the pattern: places → identity vector →
create_bie_base_identity_from_bie_identity_vector()→ domain object →bie_id_registerer.register_bie_id() - Factory functions accept
bie_id_registerer: BieIdRegisterer(notBieInfrastructureRegistriesdirectly) - Each domain object class receives
bie_base_identity: BieBaseIdentitiesand callssuper().__init__(bie_base_identity=bie_base_identity)— does NOT implement_create_vector(), does NOT passbie_id,base_hr_name, orbie_typeseparately - Registration uses
bie_id_registerer.register_bie_id(bie_base_identity=...)for objects andbie_id_registerer.issue_and_register_bie_id(request=RelationBieIdRequest(...))for relations — the olderregister_bie_object_and_type_instance(),register_bie_relation(), and directcreate_and_register_bie_id()APIs are NOT the canonical pattern - Every local object
BieIdcreated or retained by the domain is registered in the parallel BIE universe, or is explicitly identified as an external dependency already registered elsewhere - No local relation points at an unregistered
BieId - Every composite identity dependency from the ontology and creator code is mirrored by registered relations unless it is explicitly external
- Parts are constructed before wholes (matches construction order from ontology)
- Construction order is documented in the identity vectors module or domain module docstring
- All relation types from the ontology are registered via
RelationBieIdRequest - Code style matches
references/code-style.md - All imports use full package paths
Feedback
If the user corrects this skill's output due to a misinterpretation or missing rule in the skill itself (not a one-off preference), invoke skill-feedback to capture structured feedback and optionally post a GitHub issue.
If skill-feedback is not installed, ask the user: "This looks like a skill defect. Would you like to install the skill-feedback skill to report it?" If the user declines, continue without feedback capture.