watchlist-import
Watchlist Import
Import a watchlist for: $ARGUMENTS
Overview
- Implementation status: code-backed
- Local entry script:
<bundle-root>/watchlist-import/run.py - Primary purpose: normalize raw universe input into resolved and unresolved items without silently guessing ambiguous names
- Research layer: universe formation (Stage 1: Strategy Design & Hypothesis Formation - Universe definition)
- Workflow stage: stage 1
Strategy Design & Universe Formation - Local executor guarantee: parse raw text or supported files, normalize symbol-like items, preserve unresolved entries, and return a structured import summary
Use When
- The user starts with pasted symbols, symbol-like text, or a watchlist file.
- The caller needs symbol normalization before screening, dashboards, or repeat coverage.
- The user wants unresolved names separated from recognized symbols instead of guessed mapping.
- The user wants to build a universe from external sources (spreadsheets, text files, web scraping results).
- The user wants to validate and clean a symbol list before analysis.
Do Not Use When
- The user already has a clean symbol list and wants ranking. Use
market-screenordecision-dashboard. - The user wants post-import grouping, review cadence, or repeated tracking. Use
watchlist. - The user wants one-symbol analysis rather than universe construction.
- The user wants portfolio construction with position sizing. Use
decision-supportorstrategy-design.
Inputs
- Direct positional items interpreted as raw symbol-like text.
- Optional
--text TEXT: explicit text payload as the watchlist source. - Optional
--file PATH: load from a local text or.xlsxfile. - Current implementation scope: text and
.xlsxparsing only. - Universe-construction note:
- import does not validate liquidity, sector fit, tradeability, or strategy fit
- it only resolves symbol identity as far as the local parser can do so safely
Execution
Step 1: Define import source and objectives
Before importing, clarify the source and purpose:
Import source types:
- Direct text input (pasted symbols, comma-separated, space-separated)
- Text file (one symbol per line, or delimited)
- Spreadsheet file (.xlsx, .csv with symbol column)
- Mixed format (symbols + company names + other text)
- Web scraping result (HTML table, JSON, etc.)
Import objectives:
- Build initial universe for screening
- Clean and validate existing watchlist
- Merge multiple watchlists
- Convert company names to symbols
- Validate symbol format and existence
Expected output:
- Clean symbol list (resolved symbols only)
- Unresolved items list (for manual review)
- Import statistics (total, resolved, unresolved, confidence)
- Ready for downstream screening or analysis
Step 2: Parse and normalize input
Apply systematic parsing and normalization:
Text parsing:
- Delimiter detection: Identify separators (comma, space, newline, tab, semicolon)
- Symbol extraction: Extract symbol-like patterns (6-digit codes, alphanumeric with dots)
- Company name detection: Identify Chinese company names (contains 公司, 股份, 集团, etc.)
- Noise removal: Remove common non-symbol text (headers, labels, punctuation)
Symbol normalization:
- A-share format: 6-digit code (e.g., 600519, 000001, 300750)
- Board prefix: Add board prefix if missing (SH for 600xxx/601xxx/603xxx/688xxx, SZ for 000xxx/001xxx/002xxx/300xxx)
- Case normalization: Uppercase for consistency
- Padding: Pad short codes to 6 digits if needed (e.g., 1 → 000001)
Company name resolution:
- Exact match: Look up company name in symbol database
- Fuzzy match: Try partial matching for common abbreviations
- Ambiguity handling: If multiple matches, mark as unresolved
- No match: Keep in unresolved list for manual review
Spreadsheet parsing:
- Column detection: Identify symbol column (by header or content pattern)
- Multi-column support: Extract symbols from multiple columns if present
- Row filtering: Skip header rows, empty rows, total rows
- Data type handling: Convert numeric codes to strings with leading zeros
Step 3: Validate and categorize symbols
Validate each resolved symbol:
Symbol format validation:
- 6-digit numeric code
- Valid board prefix (SH or SZ)
- Matches A-share symbol pattern
- No invalid characters
Symbol existence validation:
- Symbol exists in A-share market
- Symbol is not delisted
- Symbol has recent trading data
- Symbol board matches prefix
Symbol categorization:
- Main board: 600xxx, 601xxx, 603xxx (Shanghai), 000xxx, 001xxx (Shenzhen)
- ChiNext: 300xxx (Shenzhen)
- STAR Market: 688xxx (Shanghai)
- Beijing Stock Exchange: 8xxxxx, 4xxxxx
- ST/*ST: Special treatment stocks (higher risk)
Confidence scoring (0-100):
- High confidence (90-100): Valid 6-digit code, exists in market, has recent data
- Medium confidence (70-89): Valid format, exists in market, but stale data or suspended
- Low confidence (50-69): Valid format, but cannot verify existence
- Unresolved (0-49): Invalid format, ambiguous, or no match found
Step 4: Handle unresolved items
Process items that cannot be resolved:
Unresolved categories:
- Invalid format: Not a valid A-share symbol format
- Ambiguous: Multiple possible matches (e.g., company name matches multiple symbols)
- Not found: Valid format but symbol does not exist
- Delisted: Symbol was valid but is now delisted
- Foreign: Non A-share symbol (H-share, US stock, etc.)
For each unresolved item:
- Original text (as provided by user)
- Reason for non-resolution (invalid format, ambiguous, not found, etc.)
- Suggested actions (manual lookup, format correction, removal)
- Possible matches (if ambiguous, list candidates)
Unresolved handling strategies:
- Preserve for manual review: Keep in separate list for user to resolve
- Suggest corrections: Provide likely corrections based on fuzzy matching
- Flag for removal: Clearly invalid items that should be removed
- Request clarification: Ambiguous items that need user input
Step 5: Generate import summary
Organize results into structured report:
Part 1: Import Summary
- Import source (text, file, spreadsheet)
- Import timestamp
- Total items processed
- Resolved symbols (count and %)
- Unresolved items (count and %)
- Average confidence score
Part 2: Resolved Symbols For each resolved symbol:
- Symbol (with board prefix)
- Company name (if available)
- Board (Main/ChiNext/STAR/Beijing)
- Confidence score (0-100)
- Status (active/suspended/ST/*ST)
- Latest price and date (if available)
Part 3: Resolved Symbol Statistics
- Board distribution (% Main, ChiNext, STAR, Beijing)
- Status distribution (% active, suspended, ST/*ST)
- Confidence distribution (% high, medium, low)
- Sector distribution (if available)
- Market cap distribution (if available)
Part 4: Unresolved Items For each unresolved item:
- Original text
- Reason for non-resolution
- Suggested action
- Possible matches (if any)
Part 5: Unresolved Item Statistics
- Unresolved by category (invalid format, ambiguous, not found, delisted, foreign)
- Unresolved by source (which part of input had most issues)
- Resolvability assessment (% that could be resolved with manual effort)
Part 6: Import Quality Assessment
- Import quality score (0-100)
- Based on resolution rate, confidence scores, and data completeness
- Data completeness (% with company name, price, status)
- Source quality (clean vs. noisy input)
- Screening readiness (ready/needs cleanup/major issues)
Part 7: Recommended Next Steps
- If high quality (>90% resolved, high confidence):
- Ready for
market-screenordecision-dashboard
- Ready for
- If medium quality (70-90% resolved):
- Review unresolved items, then proceed to screening
- If low quality (<70% resolved):
- Manual cleanup required before screening
- Suggest using
watchlistfor ongoing management
Step 6: Run the local executor
python3 <bundle-root>/watchlist-import/run.py [--text TEXT] [--file PATH] [items...]
Step 7: Review result as universe-intake artifact
When delivering results, maintain proper framing:
Import interpretation:
- This is symbol normalization and validation, not investment recommendation
- Resolved symbols are format-valid and exist in market, not necessarily good investments
- Confidence scores reflect data quality, not investment quality
- Unresolved items require manual review, not automatic exclusion
Source transparency:
- State import source (text, file, spreadsheet, mixed)
- State source quality (clean, noisy, mixed format)
- State whether source looked like pure symbols or mixed content
- State any parsing assumptions made
Quality assessment:
- State import quality score and what it means
- State resolution rate and confidence distribution
- State whether result is screening-ready or needs cleanup
- State specific issues found (invalid formats, ambiguous names, etc.)
Ambiguity preservation:
- Do not silently coerce ambiguous names into symbols
- Keep unresolved items explicit with reasons
- Provide suggested actions for each unresolved item
- Do not guess or fabricate symbol mappings
Validation limitations:
- Import validates format and existence, not tradeability
- Import does not validate liquidity, listing status, or strategy fit
- Import does not check for duplicates across different formats (e.g., 600519 vs. SH600519)
- Import does not validate sector fit, market cap range, or other strategy criteria
Step 8: Route to appropriate next skill
Based on import quality, route to next step:
High quality import (>90% resolved):
market-screen: Rank symbols by technical strengthdecision-dashboard: Generate batch decision summarywatchlist: Save for ongoing monitoring
Medium quality import (70-90% resolved):
- Manual review of unresolved items first
- Then proceed to
market-screenordecision-dashboard - Consider using
watchlistfor ongoing cleanup
Low quality import (<70% resolved):
- Manual cleanup required
- Use
watchlistfor iterative refinement - Consider re-importing with cleaner source
For individual symbol analysis:
market-brief: Quick overview of specific symbolsanalysis: Full thesis development for selected symbols
Output Contract
- Minimum local executor output: human-readable text beginning with
导入结果. - Core fields: total item count, resolved symbol count, unresolved count, resolved entries, unresolved entries.
- Side effects: stores a session-history entry for the import operation.
- Caller-facing delivery standard:
- Seven-part structure: Import summary, resolved symbols, resolved statistics, unresolved items, unresolved statistics, quality assessment, recommended next steps
- Source transparency: State source form (text, file, spreadsheet, mixed) and quality (clean, noisy)
- Resolution accounting: Distinguish resolved symbols from unresolved items with specific reasons
- Confidence scoring: Provide confidence scores (0-100) for each resolved symbol
- Ambiguity preservation: Keep unresolved entries explicit, do not silently coerce or guess
- Quality assessment: Import quality score, resolution rate, confidence distribution, screening readiness
- Validation scope: State what was validated (format, existence) and what was NOT (liquidity, tradeability, strategy fit)
- Next steps guidance: Recommend appropriate downstream skills based on import quality
- No investment claims: Resolved symbols are format-valid, not investment recommendations
- Non-guarantees:
- No liquidity, listing-status, survivorship, or strategy-fit validation is guaranteed here
- No durable local watchlist store is created by this skill alone (use
watchlistfor persistence) - No duplicate detection across different symbol formats
- No sector, market cap, or other strategy criteria validation
Failure Handling
- Parse and argument errors: non-zero exit with a readable
命令错误message. - File read or parse failures: readable failure text beginning with
执行失败:. - Unrecognized names: stay in the unresolved list instead of being guessed.
- If the source format is unsupported, say so directly instead of partial silent import.
- Ambiguous company names: list all possible matches in unresolved items, do not pick one arbitrarily.
- Invalid symbol formats: keep in unresolved list with specific format error.
- Delisted symbols: flag as delisted in unresolved list, do not silently include.
Key Rules
- Treat import as preprocessing, not as recommendation.
- Preserve ambiguity and unresolved rows explicitly.
- Do not silently coerce names into symbols.
- Use the import result as the intake gate before ranking, dashboarding, or repeated coverage.
- Confidence scoring is mandatory. Every resolved symbol must have confidence score.
- Unresolved categorization is mandatory. Every unresolved item must have reason.
- Quality assessment is mandatory. Provide import quality score and screening readiness.
- Source transparency is mandatory. State source type and quality.
- Validation scope must be explicit. State what was validated and what was NOT.
- No silent guessing. If ambiguous, keep unresolved with suggestions.
- No fabricated mappings. Only resolve when confident.
- Routing guidance must be specific. Recommend next skill based on import quality.
Composition
- Natural upstream step for
market-screen,decision-dashboard, andwatchlist. - Can also prepare symbol lists for
analysisand repeated one-symbol workflows. - Should be followed by
watchlistfor persistent storage and ongoing management. - High quality imports can proceed directly to
market-screenordecision-dashboard. - Low quality imports should route to
watchlistfor iterative cleanup. - Resolved symbols can feed into any single-symbol skill (
market-brief,analysis, etc.).