|

Most people never knew that the Pentium's original design
included 36-bit addressing, and the capability to access 2M page
sizes. These extensions were known as Page Address Extensions
(PAE), and were to be enabled in CR4. When CR4.PAE=1 (CR4[5]=1),
page address extensions were enabled. When CR4.PAE=0, A[35..32]
were forced to 0, regardless of what addresses could be generated
in protected mode with a descriptor pointing near 4G, and an
offset pointing above the 4G address space. Even when CR4.PAE=1,
addresses above 4G would not be generated unless they were the
result of a page-mode, paging translation. The only means to
access memory above 4G was through these extensions to page mode.
This document will describe PAE based on what little I know from
the Pentium, and from preliminary P6 literature. This document
will also include extensions to PAE that are exclusive to P6.
Whether or not PAE was ever implemented in the Pentium beyond
the conceptual stage is not known. But vestiges of its existence
are visible throughout the Pentium documentation and
architecture. There are at least four references to 2M pages in
the various Pentium manuals[1,2,3,4]. In addition to these
documentation references, CR4[5] is marked reserved, and was to
enable PAE; CPUID.flags[6] is marked reserved and was to indicate
the existence of PAE; the MSR TR8 is marked reserved, and
contained the upper 4 address bits used for TLB testability. Now
it appears that the P6 is going to implement 36-bit addressing
and 2M page sizes.
To support 36-bit addressing, it is necessary to make
substantial changes to the paging mechanism. 32-bit linear
addresses are still used, but they are translated to 36-bit
physical addresses. Intel choose to use a three-tier paging
mechanism to support PAE for 4K pages, and a two-tier mechanism
for 2M pages. When CR4.PAE=1, CR3 points to a small table of Page
Directory Pointers (PDPs). Each PDP entry references a
separate page directory. Each page directory points to a page
table, for 4K pages, or directly to the page frame, for 2M pages.
Figure 1 gives a detailed description of all
of the CPU structures associated with page translations while PAE
is enabled. For comparative purposes, Figure 2
gives a detailed description of all of the CPU structures
associated with page translations while Page Size Extensions
(PSE) is enabled (4-Mbyte pages).
Figure 1 -- Paging Structures for
PAE

Figure 2 -- Paging Structures for
PSE

In addition to CR4.PAE, which enables Page Address Extensions,
CR4 contains another addition to enhance page mode performance.
CR4.PGE (bit-7) enables Paging Global Extensions (PGE).
PGE determines whether moves to CR3 flush all of the PTE's from
the TLB, or only those whose G-bit (global bit) is not set.
Likewise, for task switches which implicitly set CR3, CR4.PGE
controls TLB flushing in the same manner.
As shown in Figure 1, CR3 is still a 32-bit register, and
therefore the PDP must reside within the first 4G address space.
Each PDP is selected by the upper 2 bits of the linear address --
A[31..30]. Therefore the PDP contains only 4 entries. Each PDP
entry points to the physical address of a page directory, and is
64-bits wide, though only 36-bits are used. Therefore, each PDP
can reference a page directory anywhere in the 64G address space.
The index into the Page Directory (PDE) is determined by the
linear address bits -- A[29..21]. The Page Directory is therefore
limited to 512 entries (2^9) of 8-bytes each. Even though the PDE
has been reduced to 512 entries, its structure takes up the same
amount of memory space when CR4.PAE=0 (4096 bytes), because of
the increase in its element size (to 8-bytes). For 4K pages, each
8-byte PDE points to the physical address of the Page Table. For
2M pages, each 8-byte PDE points to the physical address of the
page frame, itself. For 4K pages, the index of the Page Table
Entry (PTE) is determined by the linear address bits --
A[20..12]. Similar to the PDE, each Page Table is limited to 512
entries of 8-bytes each; each 8-byte entry pointing to the
physical Page Frame Address (PFA). Figure
3 shows the page translation for 4K pages while CR4.PAE=1.
Figure 3 -- Page Translation for 4K
Page Address Extensions

Page translation for 2M pages is virtually identical to 4M
page translations. The main difference between the two
translation mechanism, is the addition of the PDP reference, and
the number of index bits in the PDE. Like 4K page translations
with PAE enabled, each PDP entry points to the physical address
of a page directory. The index into the Page Directory (PDE) is
determined by linear address bits -- A[29..21]. The remaining
address bits in the linear address, A[20..00], are used to
directly index into the page frame. Since the offset is 21-bits
wide, the page size is 2M (2^21). Figure 4
shows a diagram of page translations for 2M pages.
Figure 4 -- Page Translation for 2M
Page Address Extensions

Some distinction needs to be made as to whether PAE and PSE
are mutually exclusive, and which has a higher precedence.
Likewise, what is the role of the PDE.PS bit when the page
address extensions are enabled. I will assume the two features
are mutually exclusive, and that PAE has higher precedence than
PSE. Therefore, Table 1 details a description
for possible combinations of PAE, PSE, and PDE.PS.
Table 1 -- Control bits for Paging
Extensions

Definition of fields in paging structure
figures:
Endnotes:
- The PentiumTM Processor User's Manual
Volume 1 (241428-001), Chapter 2 Overview, paragraph 8.
- The PentiumTM Processor User's Manual
Volume 1 (241428 all revisions), Figure 3-4.
- The PentiumTM Processor at iCOMPTM
Index 735\90 MHz (241997 all revisions), Section 1.1,
paragraph 5.
- The PentiumTM Processor at iCOMPTM
Index 610\75 MHz (242323-001), Section 2.1, paragraph 7.
Back to Intel
Secrets and Bugs
Back to Books and Articles
home page
|