Cannibalization Analyzer
Detect competing URLs with Quick metrics or Deep AI content similarity
Cannibalization Analyzer Tutorial
What This Tool Does:
The Cannibalization Analyzer detects when multiple URLs on your site compete for the same keyword by:
- Quick Mode (Metrics-Based): Fast detection using GSC click patterns and ranking overlaps
- Deep Mode (AI Content): Advanced semantic similarity analysis using OpenAI embeddings to find topically overlapping pages
- Action Planning: Generates prioritized recommendations for consolidating or differentiating competing URLs
- GSC Integration: Direct OAuth connection to pull real-time data without manual CSV exports
How to Use:
- Choose Input Mode: Select CSV upload or OAuth (GSC API)
- Select Analysis Mode: Quick for fast metrics-based detection, Deep for AI content similarity
- Upload Data or Connect:
- CSV: Upload GSC export with Query, Page, Clicks, Impressions, CTR, Position columns
- OAuth: Click "Connect to GSC", select property, choose date range
- Configure Thresholds: Adjust detection sensitivity in sidebar settings
- Run Analysis: Click "Analyze Cannibalization"
- Review Results: Check stats, cannibalization instances, and action plan
- Download Reports: Export full data for implementation tracking
Analysis Modes Compared:
| Feature | Quick Mode | Deep Mode |
|---|---|---|
| Detection Method | GSC metrics (clicks, impressions, rankings) | AI content similarity (OpenAI embeddings) |
| Speed | Very fast (<1 minute) | Slower (depends on URL count) |
| Cost | Free | ~$0.01-$0.10 per 100 URLs (OpenAI API) |
| Accuracy | Good for obvious cases | Excellent for subtle content overlaps |
| Best For | Initial audits, large sites, budget-conscious | Deep analysis, content strategy, AI-powered insights |
| Requirements | GSC data only | GSC data + OpenAI API key |
Quick Mode Thresholds:
- URL Overlap Threshold (2-5): Minimum number of competing URLs to flag a query. Lower = more sensitive.
- Click Concentration (40-70%): Max % of clicks one URL can have before it's considered dominant (not cannibalized). Higher = stricter detection.
- Min Impressions (50-200): Minimum impressions to consider a query significant enough to analyze.
- Min Clicks (5-20): Minimum clicks to filter out low-traffic noise.
Deep Mode Settings:
- Similarity Threshold (0.80-0.95): Minimum cosine similarity score to flag pages as similar. Higher = stricter (only very similar pages).
- Max URLs (50-500): Maximum number of URLs to analyze (to control API costs).
- OpenAI API Key: Required for embeddings generation. Get key at platform.openai.com/api-keys.
Common Cannibalization Fixes:
| Issue Type | Recommended Action | Implementation |
|---|---|---|
| Duplicate/near-duplicate content | Consolidate pages with 301 redirect | Merge content into strongest URL, redirect others |
| Similar but distinct content | Differentiate with unique angles | Rewrite to target different subtopics or intents |
| Internal linking issues | Canonicalize primary URL | Update internal links to consistently point to main URL |
| Product/category overlap | Use canonical tags | Set rel="canonical" from similar pages to primary version |
| Blog post series overlap | Create hub/pillar page | Build comprehensive pillar page, link individual posts as spokes |
Best Practices:
- Start with Quick Mode: Run initial audit to identify obvious issues before investing in Deep Mode
- Use 3-6 Month Date Range: Sufficient data to detect patterns without seasonal noise
- Filter Brand Queries: Exclude branded terms that intentionally have multiple landing pages (products, categories)
- Prioritize High-Traffic Queries: Focus on queries with significant impressions/clicks for maximum impact
- Test Before Consolidating: Use Google Search Console's URL inspection tool to verify canonical relationships
- Monitor After Changes: Re-run analysis monthly to track improvements and catch new cannibalization
CSV Format Requirements (CSV Mode):
Your GSC export must include these columns (case-insensitive):
Query- The search keywordPage- The landing page URLClicks- Number of clicksImpressions- Number of impressionsCTR- Click-through rate (percentage)Position- Average ranking position