In a relational database, an index is a specialized data structure that maps
KEY → ROW LOCATION to speed up reads. Most engines store indexes as
B/B+‑Trees made of pages (root, branch, leaf). Clustered indexes keep the
data rows in the leaf pages; non‑clustered indexes keep the key plus a row locator.
Pages (e.g., 8 KB in SQL Server, 16 KB in InnoDB) live in their own index allocation, are cached in the buffer pool, and leaf pages are doubly-linked to support efficient range scans. Maintenance operations (rebuild/reorganize), statistics, and page splits keep the tree balanced.
Root → Branch → Leaf (sorted keys) Leaf (clustered): [key | full row] Leaf (non-clustered): [key | row locator]
The following diagram illustrates B+Tree storage with both clustered and non‑clustered leaf pages.
A page split occurs when inserting into a full leaf page. The engine allocates a new page, moves about half the rows, inserts the new key, and updates parent pointers so the B‑Tree remains balanced.
Before: [10, 20, 30] (insert 25 → no space) Split : allocate new page After : [10, 20] | [25, 30] Parent updated with new child pointer
Splits cause extra I/O and can lead to logical fragmentation if the new page
isn't allocated near the original. Use sequential keys, appropriate FILLFACTOR,
and periodic maintenance to reduce impact.
If the extent containing the original page has no free space, the engine allocates the new page from another extent (possibly far away). Logical adjacency is preserved via B‑Tree links, but physical locality is lost, increasing fragmentation and hurting range‑scan performance. Rebuild/reorganize and lower fill factor can help.
Old extent (full) → allocate new page in another extent Logical leaf order remains; physical order diverges (fragmentation)
The following diagram shows page split with remote allocation, illustrating the contrast between logical links and physical separation.