wordpress-migration-best-practices
WordPress Migration Best Practices
Comprehensive guide for extracting and migrating content from WordPress sites. Covers multiple extraction strategies, their trade-offs, and best practices for handling WordPress-specific content like custom plugins, WooCommerce, forms, and media.
When to Apply
Reference these guidelines when:
- Planning a migration away from WordPress
- Exporting content from WordPress for any target platform
- Dealing with complex WordPress sites (WooCommerce, custom plugins, page builders)
- Evaluating which extraction method to use
- Handling media, images, and uploads during migration
Extraction Strategies
Strategy 1: WordPress XML Export (WXR)
The built-in WordPress export tool. Go to WP Admin → Tools → Export to generate an XML file containing posts, pages, comments, categories, tags, and custom post types.
What you get:
- All posts and pages with full content (HTML)
- Categories, tags, and custom taxonomies
- Comments and comment metadata
- Custom post types and their content
- Author information
- Featured image references (URLs, not files)
- Custom field / post meta data
What you don't get:
- Actual media files (only URLs/references)
- Theme settings and customizations
- Widget configurations
- Plugin data (WooCommerce products, form entries, etc.)
- Menu structures
- Site options and settings
- Page builder layouts (stored as shortcodes or serialized data in post content)
Best for: Simple blogs and content-focused sites with standard posts/pages.
Usage:
# Parse the WXR XML export with a script
python3 scripts/parse-wxr.py wordpress-export.xml --output ./content/
# Or use wp-cli if you have server access
wp export --dir=./exports/ --post_type=post,page
Key considerations:
- Export in smaller batches if the site has 1000+ posts (the export can time out)
- Custom fields are included but may contain serialized PHP data that needs parsing
- Gutenberg block content is stored as HTML comments within the post content
- Shortcodes from plugins will appear as raw
[shortcode]text unless the plugin is active
Strategy 2: Site Mirroring (HTTrack / wget)
Clone the entire rendered site as static HTML files. This captures exactly what visitors see, including all rendered plugin output, theme styling, and dynamic content.
What you get:
- Complete rendered HTML of every public page
- All publicly accessible media files (images, PDFs, etc.)
- CSS and JavaScript files
- The exact visual output of the site
What you don't get:
- Content structure (no frontmatter, no metadata separation)
- Draft or private content
- Admin-only pages
- Dynamic functionality (forms, search, e-commerce carts)
- Server-side logic (PHP functions, plugin behavior)
- Database content not rendered on public pages
Best for: Sites with heavy theme customization or page builder content where the rendered output is more reliable than the raw database content.
Usage:
# HTTrack — full mirror
httrack "https://example.com" -O ./mirror \
--mirror --robots=0 --depth=10
# wget — alternative approach
wget --mirror --convert-links --adjust-extension \
--page-requisites --no-parent https://example.com
Key considerations:
- Password-protected pages won't be captured (server-side auth)
- JavaScript-rendered content may not be captured (use a headless browser for SPAs)
- Look for duplicate pages:
/embed/variants, print versions, AMP pages - Check
robots.txt— some important pages may be blocked from crawlers - Media files in
wp-content/uploads/are organized byYYYY/MM/— preserve this structure
Strategy 3: Combined Approach (Recommended)
Use both XML export AND site mirroring together. The XML export gives you structured content with metadata; the mirror gives you rendered output and media files.
Workflow:
- Export XML from WordPress admin for structured content and metadata
- Mirror the site with HTTrack/wget for rendered HTML and all media
- Cross-reference: use XML metadata (dates, tags, categories) with mirror content (clean rendered HTML)
- Use the mirror's media files as the authoritative source for images and uploads
Strategy 4: Direct Database Access
If you have server/hosting access, query the WordPress MySQL database directly.
Key tables:
| Table | Content |
|---|---|
wp_posts |
All content (posts, pages, revisions, attachments) |
wp_postmeta |
Custom fields, featured images, plugin data |
wp_terms / wp_term_taxonomy |
Categories, tags, custom taxonomies |
wp_comments |
Comments with metadata |
wp_options |
Site settings, widget configs, plugin settings |
wp_usermeta |
User profile data |
Usage:
-- Export all published posts
SELECT ID, post_title, post_content, post_date, post_name, post_type
FROM wp_posts
WHERE post_status = 'publish'
AND post_type IN ('post', 'page')
ORDER BY post_date DESC;
-- Get post metadata (featured images, custom fields)
SELECT p.post_title, pm.meta_key, pm.meta_value
FROM wp_posts p
JOIN wp_postmeta pm ON p.ID = pm.post_id
WHERE p.post_status = 'publish';
Best for: Large sites where the XML export times out, or when you need access to plugin-specific database tables.
Handling Plugin-Specific Content
WooCommerce
WooCommerce stores products, orders, and customer data in custom post types and meta tables.
Content to extract:
- Products:
wp_postswherepost_type = 'product' - Product meta: price, SKU, stock, dimensions in
wp_postmeta - Product categories: custom taxonomy
product_cat - Product images: gallery images stored as serialized arrays in
_product_image_gallerymeta - Variations:
post_type = 'product_variation'
Not typically migrated:
- Orders and order history (platform-specific)
- Customer accounts (usually recreated on new platform)
- Coupons and promotions (recreated manually)
Best practice: Export products to CSV using WooCommerce's built-in exporter (WP Admin → Products → Export), then transform to the target platform's format.
Contact Forms (Contact Form 7, Gravity Forms, WPForms)
Form plugins store form definitions and submissions differently:
- Contact Form 7: Forms stored as
wpcf7_contact_formpost type. Submissions are emailed, not stored in DB by default (unless using Flamingo plugin). - Gravity Forms: Forms in
wp_gf_formtable, entries inwp_gf_entry. Export entries from Forms → Import/Export. - WPForms: Forms stored as
wpformspost type (serialized data). Entries inwp_wpforms_entries.
Best practice: Export form submissions as CSV. Recreate form structures manually on the new platform — form definitions rarely migrate cleanly between systems.
Page Builders (Elementor, WPBakery/Visual Composer, Divi)
Page builder content is the hardest to migrate because it's stored as:
- Shortcodes in post content (WPBakery):
[vc_row][vc_column][vc_column_text]Content[/vc_column_text][/vc_column][/vc_row] - Serialized JSON in postmeta (Elementor):
_elementor_datafield - Shortcodes with custom syntax (Divi):
[et_pb_section][et_pb_row]...
Best practice: Use site mirroring to capture the rendered output. The raw shortcode/JSON data is only useful if migrating to another WordPress site with the same page builder installed.
Cleanup required (from mirrored HTML):
- Strip nested wrapper divs (5+ levels deep for Visual Composer)
- Remove decorative elements and spacer divs
- Strip inline styles that override new theme styling
- Convert page builder grid layouts to semantic HTML or your new CSS framework's grid
SEO Plugins (Yoast, Rank Math, All in One SEO)
SEO plugins store metadata in wp_postmeta:
| Meta Key (Yoast) | Meta Key (Rank Math) | Content |
|---|---|---|
_yoast_wpseo_title |
rank_math_title |
Custom page title |
_yoast_wpseo_metadesc |
rank_math_description |
Meta description |
_yoast_wpseo_focuskw |
rank_math_focus_keyword |
Focus keyword |
_yoast_wpseo_canonical |
rank_math_canonical_url |
Canonical URL |
Best practice: Export SEO metadata early in the migration. Losing meta descriptions and custom titles impacts search rankings. Map these to your new platform's SEO fields.
Custom Post Types and Advanced Custom Fields (ACF)
- Custom post types appear in
wp_postswith their registeredpost_typeslug - ACF fields are stored in
wp_postmetawith field names asmeta_key - ACF field group definitions are stored as
acf-field-grouppost type - Repeater fields use numbered meta keys:
field_name_0_subfield,field_name_1_subfield
Best practice: Map ACF fields to your new platform's equivalent (frontmatter fields, structured content types, etc.). Document the field mapping before starting migration.
Media Migration Best Practices
Image Handling
- Download all media locally — Don't rely on hotlinking to the old WordPress URLs
- Preserve directory structure — WordPress organizes uploads by
YYYY/MM/. Keep this or create a clear mapping - Strip query parameters — WordPress appends
?resize=800,600&ssl=1to image URLs. Remove these for static hosting - Handle lazy loading — WordPress lazy-loading plugins store real URLs in
data-src, notsrc. Always checkdata-srcfirst - Check for CDN URLs — If the site uses a CDN (Cloudflare, Jetpack Photon), images may be served from a different domain. Map these back to original paths
- Regenerate responsive sizes — WordPress generates multiple sizes per image (
-150x150,-300x200, etc.). Decide whether to keep these or regenerate for your new platform
Media Inventory Script
# Find all media references in exported content
grep -roh 'wp-content/uploads/[^"'"'"' ]*' content/ | sort -u > media-inventory.txt
# Download all referenced media
while read url; do
wget -x -nH "https://example.com/$url" -P ./media/
done < media-inventory.txt
URL Structure Preservation
Maintaining existing URLs prevents broken links from search engines and external sites.
Common WordPress permalink structures:
| WordPress Setting | URL Pattern | Example |
|---|---|---|
| Day and name | /:year/:month/:day/:title/ |
/2024/01/15/my-post/ |
| Month and name | /:year/:month/:title/ |
/2024/01/my-post/ |
| Post name | /:title/ |
/my-post/ |
| Custom | varies | /blog/:title/ |
Best practices:
- Check the WordPress permalink setting before migration (Settings → Permalinks)
- Configure your new platform to match the exact URL structure
- Create 301 redirects for any URLs that must change
- Test with a crawl tool (Screaming Frog, wget) to verify no 404s
- Don't forget to redirect
/feed/to your new RSS feed URL - Redirect
/wp-content/uploads/paths if you reorganize media
Migration Checklist
Pre-Migration
- Audit the site: count posts, pages, custom post types, media files
- Identify all active plugins and what content they manage
- Document the current URL/permalink structure
- Export XML backup from WordPress admin
- Mirror the site with HTTrack or wget
- Export SEO metadata (titles, descriptions, canonical URLs)
- Export form submissions as CSV
- Download all media files locally
- Note any password-protected or members-only content
During Migration
- Map WordPress content types to target platform equivalents
- Extract and transform content (HTML cleanup, frontmatter generation)
- Migrate media files with correct paths
- Set up URL redirects for any changed paths
- Preserve SEO metadata on each page
- Handle page builder content (render from mirror, not from shortcodes)
- Test internal links
Post-Migration
- Crawl the new site for 404 errors
- Verify all redirects work correctly
- Check that images load on every page
- Validate RSS feeds
- Submit updated sitemap to search engines
- Monitor search console for crawl errors over 2-4 weeks
- Keep the old WordPress site accessible (read-only) for at least 30 days