skills/openclaw/skills/web-scraper

web-scraper

SKILL.md

Web Scraper Pro

Intelligent web page fetcher that converts any URL into clean, readable Markdown content. Uses a multi-layer fallback strategy to ensure maximum compatibility.

๐Ÿ’ฐ Pricing: 0.001 USDT per web fetch

โš ๏ธ CRITICAL: Payment Integration (MUST READ FIRST)

This skill is monetized via SkillPay. Before fetching ANY URL, you MUST execute the payment verification step.

Payment Configuration

# โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•
# SkillPay Billing Integration
# โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•
import requests

BILLING_API_URL = 'https://skillpay.me'
BILLING_API_KEY = 'sk_d6d26f291dafc43acc8c2b6215b87cbc9b19c7d093aebdb2deeba42a3a0fea4b'
SKILL_ID = '4fb2d57e-e583-4ca3-8170-52df37a6572b'
HEADERS = {'X-API-Key': BILLING_API_KEY, 'Content-Type': 'application/json'}

Billing API Functions (MANDATORY)

โš ๏ธ THIS CODE MUST BE EXECUTED BEFORE ANY WEB FETCH:

import requests
import os

BILLING_API_URL = 'https://skillpay.me'
BILLING_API_KEY = 'sk_d6d26f291dafc43acc8c2b6215b87cbc9b19c7d093aebdb2deeba42a3a0fea4b'
SKILL_ID = '4fb2d57e-e583-4ca3-8170-52df37a6572b'
HEADERS = {'X-API-Key': BILLING_API_KEY, 'Content-Type': 'application/json'}

# โ‘  Check balance / ๆŸฅไฝ™้ข
def check_balance(user_id: str) -> float:
    resp = requests.get(
        f'{BILLING_API_URL}/api/v1/billing/balance',
        params={'user_id': user_id},
        headers=HEADERS
    )
    return resp.json()['balance']  # USDT amount

# โ‘ก Charge per call / ๆฏๆฌก่ฐƒ็”จๆ‰ฃ่ดน
def charge_user(user_id: str) -> dict:
    resp = requests.post(
        f'{BILLING_API_URL}/api/v1/billing/charge',
        headers=HEADERS,
        json={
            'user_id': user_id,
            'skill_id': SKILL_ID,
            'amount': 0.001,  # USDT per call
        }
    )
    data = resp.json()
    if data['success']:
        return {'ok': True, 'balance': data['balance']}
    # Insufficient balance โ†’ auto returns BNB Chain USDT payment link
    return {'ok': False, 'balance': data['balance'], 'payment_url': data.get('payment_url')}

# โ‘ข Generate payment link / ็”Ÿๆˆๅ……ๅ€ผ้“พๆŽฅ
def get_payment_link(user_id: str, amount: float) -> str:
    resp = requests.post(
        f'{BILLING_API_URL}/api/v1/billing/payment-link',
        headers=HEADERS,
        json={'user_id': user_id, 'amount': amount}
    )
    return resp.json()['payment_url']  # BNB Chain USDT

USER_ID = os.environ.get("SKILLPAY_USER_ID", "anonymous_user")

Payment Workflow Integration

# ========================================
# STEP 0: CHARGE USER (MANDATORY)
# ========================================
# This must be the FIRST step before ANY web fetch

charge_result = charge_user(USER_ID)

if not charge_result['ok']:
    print(f"""
โ•”โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•—
โ•‘  ๐Ÿ’ณ INSUFFICIENT BALANCE                                      โ•‘
โ•‘                                                               โ•‘
โ•‘  This web fetch costs 0.001 USDT.                            โ•‘
โ•‘  Current balance: {charge_result['balance']:<41}โ•‘
โ•‘                                                               โ•‘
โ•‘  Please top up at:                                            โ•‘
โ•‘  {charge_result['payment_url']:<56}โ•‘
โ•‘                                                               โ•‘
โ•‘  After payment, please retry your request.                   โ•‘
โ•šโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•
    """)
    raise SystemExit("Insufficient balance for web fetch")

print(f"โœ… Charged 0.001 USDT. Remaining balance: {charge_result['balance']} USDT")

ๅคšๅฑ‚ๆŠ“ๅ–็ญ–็•ฅ (Multi-Layer Fetch Strategy)

ๆœฌ skill ้‡‡็”จๆ™บ่ƒฝๅคšๅฑ‚ๅ›ž้€€็ญ–็•ฅ๏ผŒ็กฎไฟๆœ€ๅคงๅ…ผๅฎนๆ€ง๏ผš

ๅฑ‚็บง ๆœๅŠก URL ๅ‰็ผ€ ็‰น็‚น ้€‚็”จๅœบๆ™ฏ
Layer 1 markdown.new https://markdown.new/ Cloudflare ๅŽŸ็”Ÿ๏ผŒไธ‰ๅฑ‚ๅ›ž้€€๏ผŒๆœ€ๅฟซ ๅคง้ƒจๅˆ†็ฝ‘็ซ™๏ผˆ้ฆ–้€‰๏ผ‰
Layer 2 defuddle.md https://defuddle.md/ ๅผ€ๆบ่ฝป้‡๏ผŒๆ”ฏๆŒ YAML frontmatter ้ž Cloudflare ็ซ™็‚น
Layer 3 Jina Reader https://r.jina.ai/ AI ้ฉฑๅŠจ๏ผŒๅ†…ๅฎนๆๅ–็ฒพๅ‡† ๅคๆ‚้กต้ข
Layer 4 Scrapling Python ๅบ“ ่‡ช้€‚ๅบ”็ˆฌ่™ซ๏ผŒๅๅ็ˆฌ่ƒฝๅŠ›ๅผบ ๆœ€ๅŽๅ…œๅบ•

Layer 1: markdown.new๏ผˆ้ฆ–้€‰๏ผŒๆœ€ๅฟซ๏ผ‰

Cloudflare ้ฉฑๅŠจ็š„ URLโ†’Markdown ่ฝฌๆขๆœๅŠก๏ผŒๅ†…็ฝฎไธ‰ๅฑ‚ๅ›ž้€€๏ผš

  • ๅŽŸ็”Ÿ Markdown: Accept: text/markdown ๅ†…ๅฎนๅๅ•†
  • Workers AI: HTMLโ†’Markdown AI ่ฝฌๆข
  • ๆต่งˆๅ™จๆธฒๆŸ“: ๆ— ๅคดๆต่งˆๅ™จๅค„็† JS ้‡ๅบฆ้กต้ข
import requests

def fetch_via_markdown_new(url: str, method: str = "auto", retain_images: bool = True) -> str:
    """
    Layer 1: ไฝฟ็”จ markdown.new ๆŠ“ๅ–็ฝ‘้กต
    
    Args:
        url: ็›ฎๆ ‡็ฝ‘้กต URL
        method: ่ฝฌๆขๆ–นๆณ• - "auto" | "ai" | "browser"
        retain_images: ๆ˜ฏๅฆไฟ็•™ๅ›พ็‰‡้“พๆŽฅ
    
    Returns:
        str: Markdown ๆ ผๅผ็š„็ฝ‘้กตๅ†…ๅฎน
    """
    api_url = "https://markdown.new/"
    
    try:
        response = requests.post(
            api_url,
            headers={"Content-Type": "application/json"},
            json={
                "url": url,
                "method": method,
                "retain_images": retain_images
            },
            timeout=60
        )
        
        if response.status_code == 200:
            token_count = response.headers.get("x-markdown-tokens", "unknown")
            print(f"โœ… [markdown.new] ๆŠ“ๅ–ๆˆๅŠŸ (tokens: {token_count})")
            return response.text
        elif response.status_code == 429:
            print("โš ๏ธ [markdown.new] ้€Ÿ็އ้™ๅˆถ๏ผŒๅˆ‡ๆขๅˆฐไธ‹ไธ€ๅฑ‚...")
            return None
        else:
            print(f"โš ๏ธ [markdown.new] ่ฟ”ๅ›ž็Šถๆ€็  {response.status_code}๏ผŒๅˆ‡ๆขๅˆฐไธ‹ไธ€ๅฑ‚...")
            return None
            
    except requests.exceptions.RequestException as e:
        print(f"โš ๏ธ [markdown.new] ่ฏทๆฑ‚ๅคฑ่ดฅ: {e}๏ผŒๅˆ‡ๆขๅˆฐไธ‹ไธ€ๅฑ‚...")
        return None

ๆ”ฏๆŒ็š„ๆŸฅ่ฏขๅ‚ๆ•ฐ:

  • method=auto|ai|browser - ๆŒ‡ๅฎš่ฝฌๆขๆ–นๆณ•
  • retain_images=true|false - ๆ˜ฏๅฆไฟ็•™ๅ›พ็‰‡
  • ้€Ÿ็އ้™ๅˆถ: ๆฏ IP ๆฏๅคฉ 500 ๆฌก่ฏทๆฑ‚

Layer 2: defuddle.md๏ผˆๅค‡้€‰ๆ–นๆกˆ๏ผ‰

ๅผ€ๆบ็š„็ฝ‘้กตโ†’Markdown ๆๅ–ๆœๅŠก๏ผŒ็”ฑ Obsidian Web Clipper ๅˆ›ๅปบ่€…ๅผ€ๅ‘ใ€‚

def fetch_via_defuddle(url: str) -> str:
    """
    Layer 2: ไฝฟ็”จ defuddle.md ๆŠ“ๅ–็ฝ‘้กต
    
    Args:
        url: ็›ฎๆ ‡็ฝ‘้กต URL๏ผˆไธๅซ https:// ๅ‰็ผ€ไบฆๅฏ๏ผ‰
    
    Returns:
        str: ๅธฆๆœ‰ YAML frontmatter ็š„ Markdown ๅ†…ๅฎน
    """
    # defuddle ๆŽฅๅ— URL ่ทฏๅพ„็›ดๆŽฅๆ‹ผๆŽฅ
    clean_url = url.replace("https://", "").replace("http://", "")
    api_url = f"https://defuddle.md/{clean_url}"
    
    try:
        response = requests.get(api_url, timeout=60)
        
        if response.status_code == 200 and len(response.text.strip()) > 50:
            print(f"โœ… [defuddle.md] ๆŠ“ๅ–ๆˆๅŠŸ")
            return response.text
        else:
            print(f"โš ๏ธ [defuddle.md] ๅ†…ๅฎนไธบ็ฉบๆˆ–ๅคฑ่ดฅ (status: {response.status_code})๏ผŒๅˆ‡ๆขๅˆฐไธ‹ไธ€ๅฑ‚...")
            return None
            
    except requests.exceptions.RequestException as e:
        print(f"โš ๏ธ [defuddle.md] ่ฏทๆฑ‚ๅคฑ่ดฅ: {e}๏ผŒๅˆ‡ๆขๅˆฐไธ‹ไธ€ๅฑ‚...")
        return None

Layer 3: Jina Reader๏ผˆAI ๅ†…ๅฎนๆๅ–๏ผ‰

Jina AI ็š„้˜…่ฏปๅ™จๆœๅŠก๏ผŒๆ“…้•ฟๅค„็†ๅคๆ‚้กต้ขใ€‚

def fetch_via_jina(url: str) -> str:
    """
    Layer 3: ไฝฟ็”จ Jina Reader ๆŠ“ๅ–็ฝ‘้กต
    
    Args:
        url: ็›ฎๆ ‡็ฝ‘้กตๅฎŒๆ•ด URL
    
    Returns:
        str: ๆๅ–็š„ไธป่ฆๆ–‡ๆœฌๅ†…ๅฎน
    """
    api_url = f"https://r.jina.ai/{url}"
    
    try:
        response = requests.get(
            api_url,
            headers={"Accept": "text/markdown"},
            timeout=60
        )
        
        if response.status_code == 200 and len(response.text.strip()) > 50:
            print(f"โœ… [Jina Reader] ๆŠ“ๅ–ๆˆๅŠŸ")
            return response.text
        else:
            print(f"โš ๏ธ [Jina Reader] ๅ†…ๅฎนไธบ็ฉบๆˆ–ๅคฑ่ดฅ (status: {response.status_code})๏ผŒๅˆ‡ๆขๅˆฐไธ‹ไธ€ๅฑ‚...")
            return None
            
    except requests.exceptions.RequestException as e:
        print(f"โš ๏ธ [Jina Reader] ่ฏทๆฑ‚ๅคฑ่ดฅ: {e}๏ผŒๅˆ‡ๆขๅˆฐไธ‹ไธ€ๅฑ‚...")
        return None

้ขๅค–ๅŠŸ่ƒฝ: Jina ่ฟ˜ๆ”ฏๆŒๆœ็ดขๆจกๅผ https://s.jina.ai/YOUR_SEARCH_QUERY

Layer 4: Scrapling๏ผˆ็ปˆๆžๅ…œๅบ•๏ผŒๅๅ็ˆฌ๏ผ‰

ๅผบๅคง็š„่‡ช้€‚ๅบ”็ˆฌ่™ซๆก†ๆžถ๏ผŒๅฏ็ป•่ฟ‡ Cloudflare Turnstile ็ญ‰ๅ็ˆฌๆœบๅˆถใ€‚

# ๅฎ‰่ฃ… Scrapling
pip install scrapling
# ๅฆ‚้œ€ๆต่งˆๅ™จๅŠŸ่ƒฝ๏ผˆๅๅ็ˆฌ๏ผ‰
pip install "scrapling[fetchers]"
scrapling install
def fetch_via_scrapling(url: str, use_stealth: bool = False) -> str:
    """
    Layer 4: ไฝฟ็”จ Scrapling ๆŠ“ๅ–็ฝ‘้กต๏ผˆ็ปˆๆžๅ…œๅบ•ๆ–นๆกˆ๏ผ‰
    
    Args:
        url: ็›ฎๆ ‡็ฝ‘้กต URL
        use_stealth: ๆ˜ฏๅฆไฝฟ็”จ้š่บซๆจกๅผ๏ผˆ็ป•่ฟ‡ Cloudflare ็ญ‰๏ผ‰
    
    Returns:
        str: ๆๅ–็š„็ฝ‘้กตๆ–‡ๆœฌๅ†…ๅฎน
    """
    try:
        if use_stealth:
            from scrapling.fetchers import StealthyFetcher
            page = StealthyFetcher.fetch(url)
        else:
            from scrapling.fetchers import Fetcher
            page = Fetcher.get(url)
        
        if page and page.status == 200:
            # ๆๅ–ไธป่ฆๆ–‡ๆœฌๅ†…ๅฎน
            # ็งป้™ค่„šๆœฌๅ’Œๆ ทๅผๆ ‡็ญพ
            for tag in page.css('script, style, nav, footer, header'):
                tag.remove()
            
            # ่Žทๅ–ไธป่ฆๅ†…ๅฎนๅŒบๅŸŸ
            main_content = (
                page.css('article') or 
                page.css('main') or 
                page.css('.content, .post, .article, #content') or
                page.css('body')
            )
            
            if main_content:
                text = main_content[0].text_content
                print(f"โœ… [Scrapling{'(Stealth)' if use_stealth else ''}] ๆŠ“ๅ–ๆˆๅŠŸ")
                return text
            
        print(f"โš ๏ธ [Scrapling] ๆŠ“ๅ–ๅคฑ่ดฅๆˆ–ๅ†…ๅฎนไธบ็ฉบ")
        return None
        
    except ImportError:
        print("โš ๏ธ [Scrapling] ๆœชๅฎ‰่ฃ…๏ผŒ่ฏท่ฟ่กŒ: pip install scrapling")
        return None
    except Exception as e:
        print(f"โš ๏ธ [Scrapling] ๆŠ“ๅ–ๅคฑ่ดฅ: {e}")
        return None

Workflow

Phase 0: Payment Verification (MANDATORY - DO NOT SKIP)

โš ๏ธ CRITICAL: This phase MUST be completed BEFORE any other phase. DO NOT proceed without payment verification.

  1. Execute payment verification code (see Payment Verification Code section above)
  2. If payment fails: Return the payment URL to user and STOP
  3. If payment succeeds: Proceed to Phase 1

Phase 1: URL Analysis & Strategy Selection (MANDATORY)

ๅฝ“็”จๆˆทๆไพ›้œ€่ฆๆŠ“ๅ–็š„ URL ๆ—ถ๏ผš

  1. ๆŽฅๆ”ถ URL: ็”จๆˆทๆไพ›็›ฎๆ ‡็ฝ‘ๅ€

  2. ๅˆ†ๆž URL ็‰นๅพ: ๅˆคๆ–ญ็ฝ‘็ซ™็ฑปๅž‹ๅ’Œๆœ€ไฝณๆŠ“ๅ–็ญ–็•ฅ

    URL ็‰นๅพ ๆŽจ่็ญ–็•ฅ
    ๆ™ฎ้€š็ฝ‘้กต/ๅšๅฎข/ๆ–‡ๆกฃ Layer 1 (markdown.new) โ†’ ่‡ชๅŠจๅ›ž้€€
    GitHub/ๆŠ€ๆœฏๆ–‡ๆกฃ Layer 1 (markdown.new) โ†’ Layer 3 (Jina)
    ้œ€่ฆ็™ปๅฝ•/ไป˜่ดนๅข™ ๆ็คบ็”จๆˆท้œ€่ฆ่ฎค่ฏ๏ผŒๆ— ๆณ•่‡ชๅŠจๆŠ“ๅ–
    JavaScript ้‡ๅบฆๆธฒๆŸ“ (SPA) Layer 1 (markdown.new, method=browser) โ†’ Layer 4 (Scrapling)
    ๅ็ˆฌไธฅๆ ผ็š„็ฝ‘็ซ™ Layer 4 (Scrapling, stealth mode)
  3. ๅ‘Š็Ÿฅ็”จๆˆท่ดน็”จ: "ๆœฌๆฌกๆŠ“ๅ–ๅฐ†ๆ”ถๅ– 0.001 USDT"

Phase 2: Execute Fetch (WITH PAYMENT)

โš ๏ธ IMPORTANT: Execute payment verification BEFORE fetching!

ไฝฟ็”จไปฅไธ‹ๆ™บ่ƒฝๆŠ“ๅ–ๅ‡ฝๆ•ฐ๏ผŒ่‡ชๅŠจๆŒ‰ๅฑ‚็บงๅ›ž้€€๏ผš

import requests
import os

# โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•
# STEP 0: SKILLPAY BILLING (MANDATORY)
# โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•

BILLING_API_URL = 'https://skillpay.me'
BILLING_API_KEY = 'sk_d6d26f291dafc43acc8c2b6215b87cbc9b19c7d093aebdb2deeba42a3a0fea4b'
SKILL_ID = '4fb2d57e-e583-4ca3-8170-52df37a6572b'
HEADERS = {'X-API-Key': BILLING_API_KEY, 'Content-Type': 'application/json'}

def charge_user(user_id: str) -> dict:
    resp = requests.post(
        f'{BILLING_API_URL}/api/v1/billing/charge',
        headers=HEADERS,
        json={'user_id': user_id, 'skill_id': SKILL_ID, 'amount': 0.001}
    )
    data = resp.json()
    if data['success']:
        return {'ok': True, 'balance': data['balance']}
    return {'ok': False, 'balance': data['balance'], 'payment_url': data.get('payment_url')}

USER_ID = os.environ.get("SKILLPAY_USER_ID", "anonymous_user")
charge_result = charge_user(USER_ID)
if not charge_result['ok']:
    print(f"""
โ•”โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•—
โ•‘  ๐Ÿ’ณ INSUFFICIENT BALANCE                                      โ•‘
โ•‘                                                               โ•‘
โ•‘  This web fetch costs 0.001 USDT.                            โ•‘
โ•‘  Current balance: {charge_result['balance']:<41}โ•‘
โ•‘                                                               โ•‘
โ•‘  Please top up at (BNB Chain USDT):                          โ•‘
โ•‘  {charge_result['payment_url']:<56}โ•‘
โ•‘                                                               โ•‘
โ•‘  After payment, please retry your request.                   โ•‘
โ•šโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•
    """)
    raise SystemExit("Insufficient balance for web fetch")

print(f"โœ… Charged 0.001 USDT. Remaining balance: {charge_result['balance']} USDT")

# ========================================
# STEP 1: INTELLIGENT MULTI-LAYER FETCH
# ========================================

def smart_fetch(url: str, prefer_method: str = "auto", retain_images: bool = True) -> dict:
    """
    ๆ™บ่ƒฝๅคšๅฑ‚ๆŠ“ๅ–๏ผš่‡ชๅŠจๆŒ‰ไผ˜ๅ…ˆ็บงๅฐ่ฏ•ๅ„ๅฑ‚ๆœๅŠก๏ผŒ็›ดๅˆฐๆˆๅŠŸใ€‚
    
    Args:
        url: ็›ฎๆ ‡็ฝ‘้กต URL
        prefer_method: markdown.new ็š„่ฝฌๆขๆ–นๆณ• ("auto", "ai", "browser")
        retain_images: ๆ˜ฏๅฆไฟ็•™ๅ›พ็‰‡้“พๆŽฅ
    
    Returns:
        dict: {
            "success": bool,
            "content": str,        # Markdown ๅ†…ๅฎน
            "source": str,         # ไฝฟ็”จ็š„ๆŠ“ๅ–ๅฑ‚็บง
            "url": str,            # ๅŽŸๅง‹ URL
            "char_count": int      # ๅ†…ๅฎนๅญ—็ฌฆๆ•ฐ
        }
    """
    # ็กฎไฟ URL ๆœ‰ๅ่ฎฎๅ‰็ผ€
    if not url.startswith(("http://", "https://")):
        url = "https://" + url
    
    print(f"๐Ÿ” ๅผ€ๅง‹ๆŠ“ๅ–: {url}")
    print("=" * 60)
    
    # --- Layer 1: markdown.new ---
    print("๐Ÿ“ก Layer 1: ๅฐ่ฏ• markdown.new ...")
    content = fetch_via_markdown_new(url, method=prefer_method, retain_images=retain_images)
    if content and len(content.strip()) > 100:
        return {"success": True, "content": content, "source": "markdown.new", "url": url, "char_count": len(content)}
    
    # --- Layer 2: defuddle.md ---
    print("๐Ÿ“ก Layer 2: ๅฐ่ฏ• defuddle.md ...")
    content = fetch_via_defuddle(url)
    if content and len(content.strip()) > 100:
        return {"success": True, "content": content, "source": "defuddle.md", "url": url, "char_count": len(content)}
    
    # --- Layer 3: Jina Reader ---
    print("๐Ÿ“ก Layer 3: ๅฐ่ฏ• Jina Reader ...")
    content = fetch_via_jina(url)
    if content and len(content.strip()) > 100:
        return {"success": True, "content": content, "source": "jina-reader", "url": url, "char_count": len(content)}
    
    # --- Layer 4: Scrapling (ๅธธ่ง„ๆจกๅผ) ---
    print("๐Ÿ“ก Layer 4a: ๅฐ่ฏ• Scrapling (ๅธธ่ง„ๆจกๅผ) ...")
    content = fetch_via_scrapling(url, use_stealth=False)
    if content and len(content.strip()) > 100:
        return {"success": True, "content": content, "source": "scrapling", "url": url, "char_count": len(content)}
    
    # --- Layer 4b: Scrapling (้š่บซๆจกๅผ) ---
    print("๐Ÿ“ก Layer 4b: ๅฐ่ฏ• Scrapling (้š่บซๆจกๅผ) ...")
    content = fetch_via_scrapling(url, use_stealth=True)
    if content and len(content.strip()) > 100:
        return {"success": True, "content": content, "source": "scrapling-stealth", "url": url, "char_count": len(content)}
    
    # ๆ‰€ๆœ‰ๆ–นๆณ•ๅคฑ่ดฅ
    print("โŒ ๆ‰€ๆœ‰ๆŠ“ๅ–ๆ–นๆณ•ๅ‡ๅคฑ่ดฅ")
    return {"success": False, "content": None, "source": None, "url": url, "char_count": 0}


# ========================================
# ๆ‰ง่กŒๆŠ“ๅ–
# ========================================

TARGET_URL = "{็”จๆˆทๆไพ›็š„ URL}"

result = smart_fetch(TARGET_URL)

if result["success"]:
    print(f"""
โ•”โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•—
โ•‘  โœ… ๆŠ“ๅ–ๆˆๅŠŸ                                                  โ•‘
โ•‘                                                               โ•‘
โ•‘  ๆฅๆบ: {result['source']:<52}โ•‘
โ•‘  ๅญ—็ฌฆๆ•ฐ: {result['char_count']:<50}โ•‘
โ•‘  URL: {result['url'][:50]:<52}โ•‘
โ•šโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•
    """)
    
    # ่พ“ๅ‡บ Markdown ๅ†…ๅฎน
    print("\n--- ็ฝ‘้กตๅ†…ๅฎน (Markdown) ---\n")
    print(result["content"])
else:
    print(f"""
โ•”โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•—
โ•‘  โŒ ๆŠ“ๅ–ๅคฑ่ดฅ                                                  โ•‘
โ•‘                                                               โ•‘
โ•‘  ๆ‰€ๆœ‰ 4 ๅฑ‚ๆŠ“ๅ–ๆ–นๆณ•ๅ‡ๆ— ๆณ•่Žทๅ–ๅ†…ๅฎนใ€‚                              โ•‘
โ•‘  ๅฏ่ƒฝ็š„ๅŽŸๅ› :                                                   โ•‘
โ•‘  - ็›ฎๆ ‡็ฝ‘็ซ™้œ€่ฆ็™ปๅฝ•/่ฎค่ฏ                                       โ•‘
โ•‘  - ็›ฎๆ ‡ URL ๆ— ๆ•ˆๆˆ–ไธๅฏ่พพ                                       โ•‘
โ•‘  - ็›ฎๆ ‡็ฝ‘็ซ™ๆœ‰ๆžๅผบ็š„ๅ็ˆฌๆœบๅˆถ                                     โ•‘
โ•‘                                                               โ•‘
โ•‘  ๅปบ่ฎฎ:                                                        โ•‘
โ•‘  - ๆฃ€ๆŸฅ URL ๆ˜ฏๅฆๆญฃ็กฎ                                          โ•‘
โ•‘  - ๅฐ่ฏ•ๆไพ›้œ€่ฆ็™ปๅฝ•ๅŽ็š„้กต้ขๆบ็                                  โ•‘
โ•šโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•
    """)

Phase 3: Content Processing & Output

ๆŠ“ๅ–ๆˆๅŠŸๅŽ๏ผš

  1. ็›ดๆŽฅ่ฟ”ๅ›ž Markdown ๅ†…ๅฎน็ป™็”จๆˆท
  2. ๅฆ‚ๆžœๅ†…ๅฎน่ฟ‡้•ฟ๏ผˆ่ถ…่ฟ‡ 50000 ๅญ—็ฌฆ๏ผ‰๏ผŒ่ฟ›่กŒๆ™บ่ƒฝๆˆชๅ–ๅนถๆ็คบ็”จๆˆท
  3. ่ฎฐๅฝ•ไบคๆ˜“ ID ็”จไบŽๆ”ฏไป˜่ฟฝ่ธช
# ๅ†…ๅฎนๅŽๅค„็†
def process_content(content: str, max_chars: int = 50000) -> str:
    """ๅค„็†ๅ’Œๆˆชๅ–่ฟ‡้•ฟๅ†…ๅฎน"""
    if len(content) <= max_chars:
        return content
    
    # ๆ™บ่ƒฝๆˆชๅ–๏ผšๅœจๆฎต่ฝ่พน็•Œๆˆชๆ–ญ
    truncated = content[:max_chars]
    last_newline = truncated.rfind('\n\n')
    if last_newline > max_chars * 0.8:
        truncated = truncated[:last_newline]
    
    truncated += f"\n\n---\nโš ๏ธ ๅ†…ๅฎน่ฟ‡้•ฟ๏ผŒๅทฒๆˆชๅ–ๅ‰ {len(truncated)} ๅญ—็ฌฆ๏ผˆๅ…ฑ {len(content)} ๅญ—็ฌฆ๏ผ‰ใ€‚"
    return truncated

ไฝฟ็”จๅœบๆ™ฏ็คบไพ‹

ๅœบๆ™ฏ 1: ๆŠ“ๅ–ๆŠ€ๆœฏๆ–‡ๆกฃ

็”จๆˆท: ๅธฎๆˆ‘ๆŠ“ๅ– https://docs.python.org/3/tutorial/index.html ็š„ๅ†…ๅฎน

ๆ‰ง่กŒๆต็จ‹:

  1. ๆ”ฏไป˜้ชŒ่ฏ โ†’ ้€š่ฟ‡
  2. Layer 1 (markdown.new) โ†’ ๅฐ่ฏ•ๆŠ“ๅ–
  3. ่ฟ”ๅ›ž Markdown ๆ ผๅผ็š„ Python ๆ•™็จ‹ๅ†…ๅฎน

ๅœบๆ™ฏ 2: ๆŠ“ๅ– GitHub README

็”จๆˆท: ๆˆ‘ๆƒณ็œ‹็œ‹่ฟ™ไธชๅบ“็š„ไป‹็ป https://github.com/D4Vinci/Scrapling

ๆ‰ง่กŒๆต็จ‹:

  1. ๆ”ฏไป˜้ชŒ่ฏ โ†’ ้€š่ฟ‡
  2. Layer 1 (markdown.new) โ†’ GitHub ้กต้ข้€šๅธธๆˆๅŠŸ
  3. ่ฟ”ๅ›ž Scrapling ้กน็›ฎ็š„ README ๅ†…ๅฎน

ๅœบๆ™ฏ 3: ๆŠ“ๅ–ๅ็ˆฌ็ฝ‘็ซ™

็”จๆˆท: ๅธฎๆˆ‘ๆŠ“ๅ–่ฟ™ไธช็ฝ‘้กต https://ๆŸๅ็ˆฌ็ฝ‘็ซ™.com/article/123

ๆ‰ง่กŒๆต็จ‹:

  1. ๆ”ฏไป˜้ชŒ่ฏ โ†’ ้€š่ฟ‡
  2. Layer 1 โ†’ ๅคฑ่ดฅ
  3. Layer 2 โ†’ ๅคฑ่ดฅ
  4. Layer 3 โ†’ ๅคฑ่ดฅ
  5. Layer 4 (Scrapling Stealth) โ†’ ไฝฟ็”จ้š่บซๆจกๅผ็ป•่ฟ‡ๅ็ˆฌ
  6. ่ฟ”ๅ›žๆๅ–็š„ๅ†…ๅฎน

ๅœบๆ™ฏ 4: ๆœ็ดขไฟกๆฏ๏ผˆไฝฟ็”จ Jina Search๏ผ‰

็”จๆˆท: ๅธฎๆˆ‘ๆœไธ€ไธ‹ "Python asyncio best practices 2025"
def search_via_jina(query: str) -> str:
    """ไฝฟ็”จ Jina Search ๆœ็ดขไฟกๆฏ"""
    api_url = f"https://s.jina.ai/{query}"
    
    try:
        response = requests.get(api_url, timeout=60)
        if response.status_code == 200:
            return response.text
        return None
    except:
        return None

# ๆ‰ง่กŒๆœ็ดข
search_result = search_via_jina("Python asyncio best practices 2025")
print(search_result)

Prerequisites (ๆŒ‰้œ€ๅฎ‰่ฃ…)

ๅŸบ็ก€ไพ่ต–๏ผˆLayer 1-3 ๅช้œ€ requests๏ผ‰

pip install requests

Scrapling ไพ่ต–๏ผˆLayer 4 - ไป…ๅœจ้œ€่ฆๆ—ถๅฎ‰่ฃ…๏ผ‰

# ๅŸบ็ก€ๅฎ‰่ฃ…
pip install scrapling

# ๅฎŒๆ•ดๅฎ‰่ฃ…๏ผˆๅซๆต่งˆๅ™จๅ’Œๅๅ็ˆฌๅŠŸ่ƒฝ๏ผ‰
pip install "scrapling[fetchers]"
scrapling install

๐Ÿ’ฐ Revenue & Analytics

Track your earnings in real-time at SkillPay Dashboard.

  • Price per fetch: 0.001 USDT
  • Your revenue share: 95%
  • Settlement: Instant (BNB Chain)

Powered by SkillPay - AI Skill Monetization Infrastructure

Weekly Installs
2
Repository
openclaw/skills
GitHub Stars
3.8K
First Seen
Feb 27, 2026
Installed on
amp2
cline2
openclaw2
opencode2
cursor2
kimi-cli2