data
Trigger Intents
Use this skill when user asks to:
- search strings/bytes/patterns quickly
- map string evidence to code usage
- investigate raw data-level indicators (IOCs, constants, signatures)
Route to:
xrefsfor relationship expansion from matched addressesanalysisfor triage synthesis from data signalsdebuggerwhen findings should drive patch/breakpoint actions
Do This First (Warm-Start Sequence)
-- 1) Validate string availability
SELECT string_count();
-- 2) Sample high-value strings
SELECT content, printf('0x%X', address) AS addr
FROM strings
WHERE length >= 8
ORDER BY length DESC
LIMIT 40;
-- 3) If expected strings are missing, rebuild once
SELECT rebuild_strings();
Interpretation guidance:
- Strings are often quickest behavioral clues; pivot to
xrefsimmediately for execution context. - For opcode/pattern hunts, prefer
search_bytes()over full instruction scans.
Failure and Recovery
- No strings or unexpectedly low count:
- Run
rebuild_strings()and validate withstring_count().
- Run
- Too many false positives:
- Increase specificity (
LIKE, regex-like pattern narrowing, module/function join filters).
- Increase specificity (
- Byte pattern search too broad:
- Restrict by range or post-filter via
func_start(...).
- Restrict by range or post-filter via
- Need named functions, labels, structs, or members instead of string contents:
- Use
grep, notstringsorsearch_bytes().
- Use
Handoff Patterns
data->xrefsto convert data hits into code paths/functions.data->analysisfor risk scoring and campaign-level insight.data->debuggerfor actionable patch/watchpoint setup.
strings
String literals found in the binary. IDA maintains a cached string list that can be configured.
| Column | Type | Description |
|---|---|---|
address |
INT | String address |
length |
INT | String length |
type |
INT | String type (raw encoding bits) |
type_name |
TEXT | Type name: ascii, utf16, utf32 |
width |
INT | Char width (0=1-byte, 1=2-byte, 2=4-byte) |
width_name |
TEXT | Width name: 1-byte, 2-byte, 4-byte |
layout |
INT | String layout (0=null-terminated, 1-3=pascal) |
layout_name |
TEXT | Layout name: termchr, pascal1, pascal2, pascal4 |
encoding |
INT | Encoding index (0=default) |
content |
TEXT | String content (the actual text — not value or text) |
String Type Encoding: IDA stores string type as a 32-bit value:
- Bits 0-1: Width (0=1B/ASCII, 1=2B/UTF-16, 2=4B/UTF-32)
- Bits 2-7: Layout (0=TERMCHR, 1=PASCAL1, 2=PASCAL2, 3=PASCAL4)
- Bits 8-15: term1 (first termination character)
- Bits 16-23: term2 (second termination character)
- Bits 24-31: encoding index
-- Find error messages
SELECT content, printf('0x%X', address) as addr FROM strings WHERE content LIKE '%error%';
-- ASCII strings only
SELECT * FROM strings WHERE type_name = 'ascii';
-- UTF-16 strings (common in Windows)
SELECT * FROM strings WHERE type_name = 'utf16';
-- Count strings by type
SELECT type_name, layout_name, COUNT(*) as count
FROM strings GROUP BY type_name, layout_name ORDER BY count DESC;
Important: For new analysis (exe/dll), strings are auto-built. For existing databases (i64/idb), strings are already saved. If you see 0 strings unexpectedly, run SELECT rebuild_strings() once to rebuild the list. See String List Functions section below.
String References (explicit join pattern)
Use strings + xrefs + funcs directly. This is the canonical pattern.
-- Find call sites/functions referencing error-like strings
SELECT
s.content as string_value,
printf('0x%X', x.from_ea) as ref_addr,
func_at(x.from_ea) as func_name
FROM strings s
JOIN xrefs x ON x.to_ea = s.address
WHERE s.content LIKE '%error%' OR s.content LIKE '%fail%'
ORDER BY func_name, ref_addr;
-- Functions with most string references
SELECT
func_at(x.from_ea) as func_name,
COUNT(*) as string_refs
FROM strings s
JOIN xrefs x ON x.to_ea = s.address
GROUP BY func_name
ORDER BY string_refs DESC
LIMIT 10;
SQL Functions — Byte Access (Read-Only)
| Function | Description |
|---|---|
bytes(addr, n) |
Read n bytes as hex string |
bytes_raw(addr, n) |
Read n bytes as BLOB |
SQL Functions — Binary Search
| Function | Description |
|---|---|
search_bytes(pattern) |
Find all matches, returns JSON array |
search_bytes(pattern, start, end) |
Search within address range |
search_first(pattern) |
First match address (or NULL) |
search_first(pattern, start, end) |
First match in range |
Pattern syntax (IDA native):
"48 8B 05"- Exact bytes (hex, space-separated)"48 ? 05"or"48 ?? 05"-?= any byte wildcard (whole byte only)"(01 02 03)"- Alternatives (match any of these bytes)
Note: Nibble wildcards and regex are not supported in byte patterns.
Example:
-- Find all matches for a pattern
SELECT search_bytes('48 8B ? 00');
-- Parse JSON results
SELECT json_extract(value, '$.address') as addr
FROM json_each(search_bytes('48 89 ?'))
LIMIT 10;
-- First match only
SELECT printf('0x%llX', search_first('CC CC CC'));
-- Search with alternatives
SELECT search_bytes('E8 (01 02 03 04)');
Optimization Pattern: Find functions using specific instruction
To answer "How many functions use RDTSC instruction?" efficiently:
-- Count unique functions containing RDTSC (opcode: 0F 31)
SELECT COUNT(DISTINCT func_start(json_extract(value, '$.address'))) as count
FROM json_each(search_bytes('0F 31'))
WHERE func_start(json_extract(value, '$.address')) IS NOT NULL;
-- List those functions with names
SELECT DISTINCT
func_start(json_extract(value, '$.address')) as func_ea,
name_at(func_start(json_extract(value, '$.address'))) as func_name
FROM json_each(search_bytes('0F 31'))
WHERE func_start(json_extract(value, '$.address')) IS NOT NULL;
This is much faster than scanning all disassembly lines because:
search_bytes()uses native binary searchfunc_start()is O(1) lookup in IDA's function index
Choose the Right Search Surface
- Use
grepfor named entities such as functions, labels, structs, enums, and members. - Use
stringswhen the user is searching literal string contents inside the binary. - Use
search_bytes()when the target is a raw byte or opcode pattern.
SQL Functions — String List Functions
IDA maintains a cached list of strings. Use rebuild_strings() to detect and cache strings.
| Function | Description |
|---|---|
rebuild_strings() |
Rebuild with ASCII + UTF-16, minlen 5 (default) |
rebuild_strings(minlen) |
Rebuild with custom minimum length |
rebuild_strings(minlen, types) |
Rebuild with custom length and type mask |
string_count() |
Get current string count (no rebuild) |
Type mask values:
1= ASCII only (STRTYPE_C)2= UTF-16 only (STRTYPE_C_16)4= UTF-32 only (STRTYPE_C_32)3= ASCII + UTF-16 (default)7= All types
-- Check current string count
SELECT string_count();
-- Rebuild with defaults (ASCII + UTF-16, minlen 5)
SELECT rebuild_strings();
-- Rebuild with shorter minimum length
SELECT rebuild_strings(4);
-- Rebuild with specific types
SELECT rebuild_strings(5, 1); -- ASCII only
SELECT rebuild_strings(5, 7); -- All types (ASCII + UTF-16 + UTF-32)
-- Typical workflow: rebuild then query
SELECT rebuild_strings();
SELECT * FROM strings WHERE content LIKE '%error%';
IMPORTANT - Agent Behavior for String Queries: When the user asks about strings (e.g., "show me the strings", "what strings are in this binary"):
- First run
SELECT rebuild_strings()to ensure strings are detected - Then query the
stringstable
The rebuild_strings() function configures IDA's string detection with sensible defaults (ASCII + UTF-16, minimum length 5) and rebuilds the string list. This ensures the user gets results even if the database had no prior string analysis.
Performance Rules
| Table/Function | Architecture | Notes |
|---|---|---|
strings |
Cached | Rebuilt on demand via rebuild_strings(); fast once cached |
search_bytes() |
Native binary search | Much faster than iterating instructions table |
bytes() |
Direct read | O(1) per address, no table overhead |
Key rules:
- Always call
rebuild_strings()before the first string query on a new database or after making code/data changes that may create new strings. search_bytes()uses IDA's native binary search engine — it scans raw bytes directly, bypassing the instruction decoder. For "find functions containing opcode X",search_bytes()+func_start()is orders of magnitude faster than scanning theinstructionstable.stringstable is a snapshot of the cached string list. If IDA's analysis creates new strings after your initial query, callrebuild_strings()again.- For cross-referencing strings with functions, the
strings + xrefs + funcsJOIN pattern is canonical — IDA's xref index makes the JOIN fast.
Advanced Data Patterns (CTEs)
Security-relevant string triage
Categorize strings by security relevance for rapid threat assessment:
-- Categorize strings by security relevance
SELECT rebuild_strings();
WITH categorized AS (
SELECT address, content,
CASE
WHEN content LIKE '%password%' OR content LIKE '%passwd%' OR content LIKE '%secret%'
THEN 'credential'
WHEN content LIKE '%http://%' OR content LIKE '%https://%' OR content LIKE '%ftp://%'
THEN 'url'
WHEN content LIKE '%error%' OR content LIKE '%fail%' OR content LIKE '%exception%'
THEN 'error'
WHEN content LIKE '%debug%' OR content LIKE '%trace%' OR content LIKE '%assert%'
THEN 'debug'
WHEN content LIKE '%.exe%' OR content LIKE '%.dll%' OR content LIKE '%.sys%'
THEN 'file_path'
WHEN content LIKE '%HKEY_%' OR content LIKE '%SOFTWARE\\%'
THEN 'registry'
ELSE 'other'
END AS category
FROM strings
WHERE length >= 5
)
SELECT category, COUNT(*) AS count,
GROUP_CONCAT(SUBSTR(content, 1, 60), ' | ') AS samples
FROM categorized
WHERE category != 'other'
GROUP BY category
ORDER BY count DESC;
Combine string references with function size for suspicion scoring
Functions referencing security-relevant strings AND having significant size are high-priority targets:
-- Suspicious functions: reference interesting strings AND are non-trivial
WITH interesting_strings AS (
SELECT address, content FROM strings
WHERE content LIKE '%password%' OR content LIKE '%encrypt%'
OR content LIKE '%decrypt%' OR content LIKE '%http%'
OR content LIKE '%socket%' OR content LIKE '%connect%'
),
string_funcs AS (
SELECT DISTINCT func_start(x.from_ea) AS func_addr,
s.content AS matched_string
FROM interesting_strings s
JOIN xrefs x ON x.to_ea = s.address
WHERE func_start(x.from_ea) IS NOT NULL
)
SELECT func_at(sf.func_addr) AS func_name,
printf('0x%X', sf.func_addr) AS addr,
f.size AS func_size,
GROUP_CONCAT(sf.matched_string, ' | ') AS strings_referenced
FROM string_funcs sf
JOIN funcs f ON f.address = sf.func_addr
GROUP BY sf.func_addr
ORDER BY f.size DESC
LIMIT 20;
Find functions referencing both crypto-related and network-related strings
Cross-category correlation for identifying data exfiltration or C2 communication:
-- Functions touching both crypto and network strings
WITH crypto_refs AS (
SELECT DISTINCT func_start(x.from_ea) AS func_addr
FROM strings s JOIN xrefs x ON x.to_ea = s.address
WHERE s.content LIKE '%crypt%' OR s.content LIKE '%aes%'
OR s.content LIKE '%cipher%' OR s.content LIKE '%hash%'
),
network_refs AS (
SELECT DISTINCT func_start(x.from_ea) AS func_addr
FROM strings s JOIN xrefs x ON x.to_ea = s.address
WHERE s.content LIKE '%socket%' OR s.content LIKE '%connect%'
OR s.content LIKE '%send%' OR s.content LIKE '%recv%'
OR s.content LIKE '%http%'
)
SELECT func_at(c.func_addr) AS func_name,
printf('0x%X', c.func_addr) AS addr
FROM crypto_refs c
JOIN network_refs n ON n.func_addr = c.func_addr;
More from allthingsida/idasql-skills
disassembly
Query IDA disassembly. Use when asked about functions, segments, instructions, blocks, operands, control flow, or raw code structure.
11xrefs
Analyze IDA cross-references. Use when asked about callers, callees, imports, data refs, call graphs, or dependency chains.
9debugger
IDA debugger operations. Use when asked to set breakpoints, patch bytes, add conditions, or manage a patch inventory.
9storage
Persistent key-value storage in IDA databases. Use when asked to store metadata, track progress, or persist session state via netnode_kv.
9grep
Search named IDA entities by pattern. Use when asked to find functions, labels, types, or members by name, or to seed xref/decompiler workflows from a name lookup.
8connect
Connect to IDA databases and bootstrap sessions. Use when starting analysis, routing to other skills, or setting up CLI/HTTP/MCP connections.
8