parser-development
SKILL.md
Purpose
Use this skill when creating or modifying Biome's parsers. Covers grammar authoring with ungrammar, lexer implementation, error recovery strategies, and list parsing patterns.
Prerequisites
- Install required tools:
just install-tools - Understand the language syntax you're implementing
- Read
crates/biome_parser/CONTRIBUTING.mdfor detailed concepts
Common Workflows
Create Grammar for New Language
Create a .ungram file in xtask/codegen/ (e.g., html.ungram):
// html.ungram
// Legend:
// Name = -- non-terminal definition
// 'ident' -- token (terminal)
// A B -- sequence
// A | B -- alternation
// A* -- zero or more repetition
// (A (',' A)* ','?) -- repetition with separator and optional trailing comma
// A? -- zero or one repetition
// label:A -- suggested name for field
HtmlRoot = element*
HtmlElement =
'<'
tag_name: HtmlName
attributes: HtmlAttributeList
'>'
children: HtmlElementList
'<' '/' close_tag_name: HtmlName '>'
HtmlAttributeList = HtmlAttribute*
HtmlAttribute =
| HtmlSimpleAttribute
| HtmlBogusAttribute
HtmlSimpleAttribute =
name: HtmlName
'='
value: HtmlString
HtmlBogusAttribute = /* error recovery node */
Naming conventions:
- Prefix all nodes with language name:
HtmlElement,CssRule - Unions start with
Any:AnyHtmlAttribute - Error recovery nodes use
Bogus:HtmlBogusAttribute - Lists end with
List:HtmlAttributeList - Lists are mandatory (never optional), empty by default
Generate Parser from Grammar
# Generate for specific language
just gen-grammar html
# Generate for multiple languages
just gen-grammar html css
# Generate all grammars
just gen-grammar
This creates:
biome_html_syntax/src/generated/- Node definitionsbiome_html_factory/src/generated/- Node construction helpers- Parser skeleton files (you'll implement the actual parsing logic)
Implement a Lexer
Create lexer/mod.rs in your parser crate:
use biome_html_syntax::HtmlSyntaxKind;
use biome_parser::{lexer::Lexer, ParseDiagnostic};
pub(crate) struct HtmlLexer<'source> {
source: &'source str,
position: usize,
current_kind: HtmlSyntaxKind,
diagnostics: Vec<ParseDiagnostic>,
}
impl<'source> Lexer<'source> for HtmlLexer<'source> {
const NEWLINE: Self::Kind = HtmlSyntaxKind::NEWLINE;
const WHITESPACE: Self::Kind = HtmlSyntaxKind::WHITESPACE;
type Kind = HtmlSyntaxKind;
type LexContext = ();
type ReLexContext = ();
fn source(&self) -> &'source str {
self.source
}
fn current(&self) -> Self::Kind {
self.current_kind
}
fn position(&self) -> usize {
self.position
}
fn advance(&mut self, context: Self::LexContext) -> Self::Kind {
// Implement token scanning logic
let start = self.position;
let kind = self.read_next_token();
self.current_kind = kind;
kind
}
// Implement other required methods...
}
Implement Token Source
use biome_parser::lexer::BufferedLexer;
use biome_html_syntax::HtmlSyntaxKind;
use crate::lexer::HtmlLexer;
pub(crate) struct HtmlTokenSource<'src> {
lexer: BufferedLexer<HtmlSyntaxKind, HtmlLexer<'src>>,
}
impl<'source> TokenSourceWithBufferedLexer<HtmlLexer<'source>> for HtmlTokenSource<'source> {
fn lexer(&mut self) -> &mut BufferedLexer<HtmlSyntaxKind, HtmlLexer<'source>> {
&mut self.lexer
}
}
Write Parse Rules
Example: Parsing an if statement:
use biome_parser::prelude::*;
use biome_js_syntax::JsSyntaxKind::*;
fn parse_if_statement(p: &mut JsParser) -> ParsedSyntax {
// Presence test - return Absent if not at 'if'
if !p.at(T![if]) {
return Absent;
}
let m = p.start();
// Parse required tokens
p.expect(T![if]);
p.expect(T!['(']);
// Parse required nodes with error recovery
parse_any_expression(p).or_add_diagnostic(p, expected_expression);
p.expect(T![')']);
parse_block_statement(p).or_add_diagnostic(p, expected_block);
// Parse optional else clause
if p.at(T![else]) {
parse_else_clause(p).ok();
}
Present(m.complete(p, JS_IF_STATEMENT))
}
Parse Lists with Error Recovery
Use ParseSeparatedList for comma-separated lists:
struct ArrayElementsList;
impl ParseSeparatedList for ArrayElementsList {
type ParsedElement = CompletedMarker;
fn parse_element(&mut self, p: &mut Parser) -> ParsedSyntax<Self::ParsedElement> {
parse_array_element(p)
}
fn is_at_list_end(&self, p: &mut Parser) -> bool {
// Stop at array closing bracket or file end
p.at(T![']']) || p.at(EOF)
}
fn recover(
&mut self,
p: &mut Parser,
parsed_element: ParsedSyntax<Self::ParsedElement>,
) -> RecoveryResult {
parsed_element.or_recover(
p,
&ParseRecoveryTokenSet::new(
JS_BOGUS_EXPRESSION,
token_set![T![']'], T![,]]
),
expected_array_element,
)
}
fn separating_element_kind(&mut self) -> JsSyntaxKind {
T![,]
}
}
// Use the list parser
fn parse_array_elements(p: &mut Parser) -> CompletedMarker {
let m = p.start();
ArrayElementsList.parse_list(p);
m.complete(p, JS_ARRAY_ELEMENT_LIST)
}
Implement Error Recovery
Error recovery wraps invalid tokens in BOGUS nodes:
// Recovery set includes:
// - List terminator tokens (e.g., ']', '}')
// - Statement terminators (e.g., ';')
// - List separators (e.g., ',')
let recovery_set = token_set![T![']'], T![,], T![;]];
parsed_element.or_recover(
p,
&ParseRecoveryTokenSet::new(JS_BOGUS_EXPRESSION, recovery_set),
expected_expression_error,
)
Handle Conditional Syntax
For syntax only valid in certain contexts (e.g., strict mode):
fn parse_with_statement(p: &mut Parser) -> ParsedSyntax {
if !p.at(T![with]) {
return Absent;
}
let m = p.start();
p.bump(T![with]);
parenthesized_expression(p).or_add_diagnostic(p, expected_expression);
parse_statement(p).or_add_diagnostic(p, expected_statement);
let with_stmt = m.complete(p, JS_WITH_STATEMENT);
// Mark as invalid in strict mode
let conditional = StrictMode.excluding_syntax(p, with_stmt, |p, marker| {
p.err_builder(
"`with` statements are not allowed in strict mode",
marker.range(p)
)
});
Present(conditional.or_invalid_to_bogus(p))
}
Test Parser
Create test files in tests/:
crates/biome_html_parser/tests/
├── html_specs/
│ ├── ok/
│ │ ├── simple_element.html
│ │ └── nested_elements.html
│ └── error/
│ ├── unclosed_tag.html
│ └── invalid_syntax.html
└── html_test.rs
Run tests:
cd crates/biome_html_parser
cargo test
Tips
- Presence test: Always return
Absentif the first token doesn't match - never progress parsing before returningAbsent - Required vs optional: Use
p.expect()for required tokens,p.eat()for optional ones - Missing markers: Use
.or_add_diagnostic()for required nodes to add missing markers and errors - Error recovery: Include list terminators, separators, and statement boundaries in recovery sets
- Bogus nodes: Check grammar for which
BOGUS_*node types are valid in your context - Checkpoints: Use
p.checkpoint()to save state andp.rewind()if parsing fails - Lookahead: Use
p.at()to check tokens,p.nth_at()for lookahead beyond current token - Lists are mandatory: Always create list nodes even if empty - use
parse_list()notparse_list().ok()
Common Patterns
// Optional token
if p.eat(T![async]) {
// handle async
}
// Required token with error
p.expect(T!['{']);
// Optional node
parse_type_annotation(p).ok();
// Required node with error
parse_expression(p).or_add_diagnostic(p, expected_expression);
// Lookahead
if p.at(T![if]) || p.at(T![for]) {
// handle control flow
}
// Checkpoint for backtracking
let checkpoint = p.checkpoint();
if parse_something(p).is_absent() {
p.rewind(checkpoint);
parse_something_else(p);
}
References
- Full guide:
crates/biome_parser/CONTRIBUTING.md - Grammar examples:
xtask/codegen/*.ungram - Parser examples:
crates/biome_js_parser/src/syntax/ - Error recovery: Search for
ParseRecoveryTokenSetin existing parsers
Weekly Installs
55
Repository
biomejs/biomeGitHub Stars
24.0K
First Seen
Feb 18, 2026
Security Audits
Installed on
opencode55
github-copilot55
codex55
kimi-cli55
amp55
gemini-cli55