parser-development
Purpose
Use this skill when creating or modifying Biome's parsers. Covers grammar authoring with ungrammar, lexer implementation, error recovery strategies, and list parsing patterns.
Prerequisites
- Install required tools:
just install-tools - Understand the language syntax you're implementing
- Read
crates/biome_parser/CONTRIBUTING.mdfor detailed concepts
Common Workflows
Create Grammar for New Language
Create a .ungram file in xtask/codegen/ (e.g., html.ungram):
// html.ungram
// Legend:
// Name = -- non-terminal definition
// 'ident' -- token (terminal)
// A B -- sequence
// A | B -- alternation
// A* -- zero or more repetition
// (A (',' A)* ','?) -- repetition with separator and optional trailing comma
// A? -- zero or one repetition
// label:A -- suggested name for field
HtmlRoot = element*
HtmlElement =
'<'
tag_name: HtmlName
attributes: HtmlAttributeList
'>'
children: HtmlElementList
'<' '/' close_tag_name: HtmlName '>'
HtmlAttributeList = HtmlAttribute*
HtmlAttribute =
| HtmlSimpleAttribute
| HtmlBogusAttribute
HtmlSimpleAttribute =
name: HtmlName
'='
value: HtmlString
HtmlBogusAttribute = /* error recovery node */
Naming conventions:
- Prefix all nodes with language name:
HtmlElement,CssRule - Unions start with
Any:AnyHtmlAttribute - Error recovery nodes use
Bogus:HtmlBogusAttribute - Lists end with
List:HtmlAttributeList - Lists are mandatory (never optional), empty by default
Generate Parser from Grammar
# Generate for specific language
just gen-grammar html
# Generate for multiple languages
just gen-grammar html css
# Generate all grammars
just gen-grammar
This creates:
biome_html_syntax/src/generated/- Node definitionsbiome_html_factory/src/generated/- Node construction helpers- Parser skeleton files (you'll implement the actual parsing logic)
Implement a Lexer
Create lexer/mod.rs in your parser crate:
use biome_html_syntax::HtmlSyntaxKind;
use biome_parser::{lexer::Lexer, ParseDiagnostic};
pub(crate) struct HtmlLexer<'source> {
source: &'source str,
position: usize,
current_kind: HtmlSyntaxKind,
diagnostics: Vec<ParseDiagnostic>,
}
impl<'source> Lexer<'source> for HtmlLexer<'source> {
const NEWLINE: Self::Kind = HtmlSyntaxKind::NEWLINE;
const WHITESPACE: Self::Kind = HtmlSyntaxKind::WHITESPACE;
type Kind = HtmlSyntaxKind;
type LexContext = ();
type ReLexContext = ();
fn source(&self) -> &'source str {
self.source
}
fn current(&self) -> Self::Kind {
self.current_kind
}
fn position(&self) -> usize {
self.position
}
fn advance(&mut self, context: Self::LexContext) -> Self::Kind {
// Implement token scanning logic
let start = self.position;
let kind = self.read_next_token();
self.current_kind = kind;
kind
}
// Implement other required methods...
}
Implement Token Source
use biome_parser::lexer::BufferedLexer;
use biome_html_syntax::HtmlSyntaxKind;
use crate::lexer::HtmlLexer;
pub(crate) struct HtmlTokenSource<'src> {
lexer: BufferedLexer<HtmlSyntaxKind, HtmlLexer<'src>>,
}
impl<'source> TokenSourceWithBufferedLexer<HtmlLexer<'source>> for HtmlTokenSource<'source> {
fn lexer(&mut self) -> &mut BufferedLexer<HtmlSyntaxKind, HtmlLexer<'source>> {
&mut self.lexer
}
}
Write Parse Rules
Example: Parsing an if statement:
use biome_parser::prelude::*;
use biome_js_syntax::JsSyntaxKind::*;
fn parse_if_statement(p: &mut JsParser) -> ParsedSyntax {
// Presence test - return Absent if not at 'if'
if !p.at(T![if]) {
return Absent;
}
let m = p.start();
// Parse required tokens
p.expect(T![if]);
p.expect(T!['(']);
// Parse required nodes with error recovery
parse_any_expression(p).or_add_diagnostic(p, expected_expression);
p.expect(T![')']);
parse_block_statement(p).or_add_diagnostic(p, expected_block);
// Parse optional else clause
if p.at(T![else]) {
parse_else_clause(p).ok();
}
Present(m.complete(p, JS_IF_STATEMENT))
}
Parse Lists with Error Recovery
Use ParseSeparatedList for comma-separated lists:
struct ArrayElementsList;
impl ParseSeparatedList for ArrayElementsList {
type ParsedElement = CompletedMarker;
fn parse_element(&mut self, p: &mut Parser) -> ParsedSyntax<Self::ParsedElement> {
parse_array_element(p)
}
fn is_at_list_end(&self, p: &mut Parser) -> bool {
// Stop at array closing bracket or file end
p.at(T![']']) || p.at(EOF)
}
fn recover(
&mut self,
p: &mut Parser,
parsed_element: ParsedSyntax<Self::ParsedElement>,
) -> RecoveryResult {
parsed_element.or_recover(
p,
&ParseRecoveryTokenSet::new(
JS_BOGUS_EXPRESSION,
token_set![T![']'], T![,]]
),
expected_array_element,
)
}
fn separating_element_kind(&mut self) -> JsSyntaxKind {
T![,]
}
}
// Use the list parser
fn parse_array_elements(p: &mut Parser) -> CompletedMarker {
let m = p.start();
ArrayElementsList.parse_list(p);
m.complete(p, JS_ARRAY_ELEMENT_LIST)
}
Implement Error Recovery
Error recovery wraps invalid tokens in BOGUS nodes:
// Recovery set includes:
// - List terminator tokens (e.g., ']', '}')
// - Statement terminators (e.g., ';')
// - List separators (e.g., ',')
let recovery_set = token_set![T![']'], T![,], T![;]];
parsed_element.or_recover(
p,
&ParseRecoveryTokenSet::new(JS_BOGUS_EXPRESSION, recovery_set),
expected_expression_error,
)
Handle Conditional Syntax
For syntax only valid in certain contexts (e.g., strict mode):
fn parse_with_statement(p: &mut Parser) -> ParsedSyntax {
if !p.at(T![with]) {
return Absent;
}
let m = p.start();
p.bump(T![with]);
parenthesized_expression(p).or_add_diagnostic(p, expected_expression);
parse_statement(p).or_add_diagnostic(p, expected_statement);
let with_stmt = m.complete(p, JS_WITH_STATEMENT);
// Mark as invalid in strict mode
let conditional = StrictMode.excluding_syntax(p, with_stmt, |p, marker| {
p.err_builder(
"`with` statements are not allowed in strict mode",
marker.range(p)
)
});
Present(conditional.or_invalid_to_bogus(p))
}
Test Parser
Create test files in tests/:
crates/biome_html_parser/tests/
├── html_specs/
│ ├── ok/
│ │ ├── simple_element.html
│ │ └── nested_elements.html
│ └── error/
│ ├── unclosed_tag.html
│ └── invalid_syntax.html
└── html_test.rs
Run tests:
cd crates/biome_html_parser
cargo test
Tips
- Presence test: Always return
Absentif the first token doesn't match - never progress parsing before returningAbsent - Required vs optional: Use
p.expect()for required tokens,p.eat()for optional ones - Missing markers: Use
.or_add_diagnostic()for required nodes to add missing markers and errors - Error recovery: Include list terminators, separators, and statement boundaries in recovery sets
- Bogus nodes: Check grammar for which
BOGUS_*node types are valid in your context - Checkpoints: Use
p.checkpoint()to save state andp.rewind()if parsing fails - Lookahead: Use
p.at()to check tokens,p.nth_at()for lookahead beyond current token - Lists are mandatory: Always create list nodes even if empty - use
parse_list()notparse_list().ok()
Common Patterns
// Optional token
if p.eat(T![async]) {
// handle async
}
// Required token with error
p.expect(T!['{']);
// Optional node
parse_type_annotation(p).ok();
// Required node with error
parse_expression(p).or_add_diagnostic(p, expected_expression);
// Lookahead
if p.at(T![if]) || p.at(T![for]) {
// handle control flow
}
// Checkpoint for backtracking
let checkpoint = p.checkpoint();
if parse_something(p).is_absent() {
p.rewind(checkpoint);
parse_something_else(p);
}
References
- Full guide:
crates/biome_parser/CONTRIBUTING.md - Grammar examples:
xtask/codegen/*.ungram - Parser examples:
crates/biome_js_parser/src/syntax/ - Error recovery: Search for
ParseRecoveryTokenSetin existing parsers
More from biomejs/biome
biome-developer
General development best practices and common gotchas when working on Biome. Use for avoiding common mistakes, understanding Biome-specific patterns (AST, syntax nodes, string extraction, embedded languages), and learning technical tips.
133lint-rule-development
Step-by-step guide for creating and implementing lint rules in Biome's analyzer. Use when implementing rules like noVar, useConst, or any custom lint/assist rule, adding code actions to fix diagnostics, implementing semantic analysis for binding references, or adding configurable options to rules.
74formatter-development
Guide for implementing formatting rules using Biome's IR-based formatter infrastructure. Use when implementing formatting for new syntax nodes, handling comments in formatted output, writing or debugging formatter snapshot tests, diagnosing idempotency failures, or comparing Biome's formatting against Prettier for JavaScript, CSS, JSON, HTML, Markdown, or other languages.
71testing-codegen
Guide for testing workflows and code generation commands in Biome. Use when running snapshot tests for lint rules, managing insta snapshots, or regenerating analyzer/parser/formatter code after changes.
69type-inference
Guide for working with Biome's module graph and type inference system. Use when implementing type-aware lint rules, understanding type resolution, working on the module graph infrastructure, or implementing type inference for new features.
67diagnostics-development
Guide for creating high-quality, user-friendly diagnostics in Biome. Use when creating diagnostics for lint rules, adding helpful advice to error messages, implementing code frame displays, or improving diagnostic quality.
67