02-13-01, 11-04-26, 1330 1301 6 am26
This document explains the conceptual process of handling XML sitemaps from WordPress (or similar CMS systems) and storing their contents in a database. The goal is to provide a clear framework for differentiating between sitemap indexes and URL sitemaps, and to outline how a system can navigate from the top-level feed down to actual content URLs.
<sitemapindex>.
It contains multiple <sitemap> entries, each pointing to another sitemap file.<urlset>.
It contains multiple <url> entries, each pointing directly to a content page.<loc>
link, and continuing until only
<url> entries remain.<lastmod>, rather than storing every intermediate sitemap file./wp-sitemap.xml).<sitemapindex> → treat as a directory of other sitemaps.<urlset> → treat as a list of content URLs.<sitemap> entries, follow each <loc> link to another XML file.<url> entries, collect the <loc> values as actual content URLs.<sitemap> entries until only <url> entries remain.
wp-sitemap.xml
├── wp-sitemap-posts-post-1.xml
│ ├── URL 1
│ ├── URL 2
│ └── ...
├── wp-sitemap-posts-page-1.xml
│ ├── URL A
│ ├── URL B
│ └── ...
└── wp-sitemap-taxonomies-category-1.xml
├── Category URL X
└── Category URL Y
The simplest way to differentiate between sitemap types is by checking the elements inside:
<sitemap> → leads to further sitemaps.<url> → leads directly to content pages.-post-1.xml), as custom structures may vary.<sitemapindex> vs. <urlset>) to determine behavior.<lastmod> for freshness tracking.
Think of the sitemap index as a table of contents. Each child sitemap is a chapter, and each
<url> entry is a page. You don’t need to store every chapter file separately; you
just need to know how to navigate from the table of contents down to the pages.
By differentiating between <sitemap>
and <url> elements, a system can
reliably traverse from the top-level sitemap index
down to the actual content URLs. This process ensures
efficient storage in the database and keeps the system aligned with search engine standards for sitemap
handling.
The effectiveness of our sitemap ingestion and keyword mapping process can be demonstrated directly in the Domain Map
The Domain Map is the visible outcome of the entire pipeline:
By visiting the Domain Map linked to a partner domain, one can verify:
The Domain Map acts as the “proof of the pudding” — a direct, navigable representation of how sitemaps, content URLs, and keyword indexing converge into a coherent partner domain listing. It is both a diagnostic tool and a demonstration of system integrity.
(last activity recently ago)