LOADALL structure as described by Intel is this:
Physical Address (Hex) Associated CPU Register 800-805 None 806-807 MSW 808-815 None 816-817 TR 818-819 Flag word 81A-81B IP 81C-81D LDT 81E-81F DS 820-821 SS 822-823 CS 824-825 ES 826-827 DI 828-829 SI 82A-82B BP 82C-82D SP 82E-82F BX 830-831 DX 832-833 CX 834-835 AX 836-83B ES descriptor cache 83C-841 CS descriptor cache 842-847 SS descriptor cache 848-84D DS descriptor cache 84E-853 GDTR 854-859 LDT descriptor cache 85A-85F IDTR 860-865 TSS descriptor cache
The normally visible registers aren't of much interest. That includes the MSW and flags:
LOADALL can't change any of the reserved bits, and can't clear the protected mode bit once it has been set. There was even a bug in early steppings of the chip, where the word preceding the MSW (at 804h) would be mistakenly loaded into that register during a memory wait state. And if bit 0 happened to be set by this, it could not be cleared again!
What remains are the descriptor caches and those mysterious gaps. All of these were previously write-only, but with
STOREALL we can look at what gets loaded into them under various conditions.
The term is somewhat misleading from a modern perspective, but this is what Intel called them.
They can be better understood as being the part of the segment registers which actually matters for the addressing and protection logic. As far as that unit is concerned, the programmer-visible segment values might as well contain any random 16 bits.
It is only when a segment register gets loaded, that the value (and operating mode) make any difference. There are not that many opcodes that do this, and they each have two entries in the instruction decoder PLA¹, so that they can be directed to different microcode entry points depending on the mode².
The layout of the 8 internal segment registers (including GDTR and IDTR) is the same:
3 BYTE base address 1 BYTE access rights 1 WORD limit
By making every segment load cause a protection fault, and using
LOADALL to update the descriptor caches, an operating system could in theory emulate the real mode behaviour. But performance would be bad, since "large model" programs typically load segment registers every time they dereference a pointer.
Also, there was no paging on the 286, so the only address translation possible would be to move the base of the emulated address space. Every "virtual machine" would have to be in its own contiguous memory block.
More useful is the ability to load any arbitrary base address for the segment registers without entering protected mode. Some versions of Microsofts HIMEM.SYS did this to copy data between real and extended memory.
This new segment base would only be in effect until the next time that segment register is reloaded. That could happen unexpectedly if your code got interrupted, but there was a clever trick to detect this situation: also set a non-standard base of CS, so that when the interrupt handler returned it would go to somewhere else in your code (since CS would have been reloaded to its normal base). That way, a
REP MOVS instruction could run with interrupts enabled and be restarted.
¹ basically a ROM, addressed by the opcode and mode bits, with some of them being ignored. More on this at the end!
² because this decoding can happen while a previous instruction is still executing, the LMSW instruction used to enter protected mode should be followed by a (near) jump so that the decoded instruction queue is flushed.
The access rights byte
bit 7 : valid 6-5 : DPL 4 : ignored? 3 : code segment 2 : expand-down / conforming 1 : writable / readable 0 : accessed (only set on pm descriptor load)
In real mode, every segment register load sets this to 82h. This is the value for a writable ring 0 data segment, except with bit 4 cleared (which seems to have no effect). The value for CS is the same, so it is also writable - using
LOADALL, it can be made expand-down as well.
When loaded in protected mode, the byte will match the descriptor table entry, and bit 0 (accessed) will always be set.
The Current Privilege Level (CPL) is always determined by the DPL field of the stack segment.
If the valid bit on a normal segment (or LDTR) is clear, any access causes a protection fault. For GDTR, IDTR and TSS the access rights byte does not exist, and reads as FFh. LDTR only has bit 7, with the others reading as set.
While TSS can't be marked as invalid, the limit is checked like for every other segment.
There are 10 registers which haven't been described so far. The values loaded into them don't have any effect, but they are used by the microcode as places for temporary data.
Some documents about early SMM on the 386 give the names "tmpa", "tmpb" ... "tmph", as well as "tst" and "idx". Not very meaningful, and the order in which these registers would appear on the 286 isn't clear since it could be the reverse. A diagram in the 286 patent shows similar names.
I will just refer to them as X0 through X9, in the order that they appear in memory.
One of the first things I tried is to test if all of these can in fact be loaded with arbitrary values. Only two couldn't, and that is because they are used by the
LOADALL instruction itself (but interestingly not by
STOREALL): X1, for some reason, gets the access rights of one of the segment descriptors (usually ES? dependent on some random timing?), and X8 is used as an address register. When
LOADALL is finished, it will always have the value 864h, pointing to the last word loaded.
X1 generally seems to be used for protection checks, it gets loaded with either the word containing a descriptor's access rights or with the MSW (by floating point opcodes). It may be the one shown in the patent as connected to the "TEST PLA".
These don't use any of the temporaries:
MOV(except to segment register in protected mode)
DEC, shift/rotate and other ALU operations (except for immediate operands)
- conditional jumps
- other jumps (except inter-segment in prot. mode)
If there is an immediate operand to
SUB, etc., it will be loaded into
X9. That register is also used for memory-to-memory compares (
CMPS): it is loaded with the byte/word at
Unlike the 8086 (and 186), other operands don't have to first be loaded into temporary ALU registers.
Remember to backup your data
Some instructions normally don't do much, but may cause exceptions in which case all of their effects will need to be reverted. So they will either save the old register values in temporaries, or hold updated ones there until they can be sure to complete successfully:
POPsave the previous stack pointer in X3.
REPeated string instructions do this too for some reason?
LOOPputs the decremented CX value in X2 (while unusual, like any conditional jump it can also go forward, potentially causing a CS limit violation)
RETput the new IP into X2 (near) or X9 (far), and SP into X8
- probably many similar ones
X0, X5 and X6 seem to be only used in protected mode, which I didn't test extensively. Task switching probably uses all 10, given how complicated an operation it is.
X4 is also mostly for protected mode and a specific purpose: it contains a copy of the error code pushed on some exceptions.
X7 gets the start offset of a floating point operand, with the segment limit in X8. This must be passed to the FPU interface, which acts somewhat like a DMA controller and contains its own base and limit registers.
State after reset
MSW FFF0 FLAGS 0002 all defined bits are cleared X2 002A answer to life, the universe & everything? ES 0000 base=000000 limit=FFFF attr=82 CS F000 base=FF0000 limit=FFFF attr=82 SS 0000 base=000000 limit=FFFF attr=82 DS 0000 base=000000 limit=FFFF attr=82 IDTR base=000000 limit=FFFF TR 0000 -- -- -- descriptor cache preserved LDTR 0000 -- -- -- descriptor cache preserved
All other registers keep the value they had before reset. The value of X2 is the most unusual thing here, and might even be an easter egg?
Reading out all of the values immediately after power-up would require a custom ROM, instead of the cheap trick I used (overwriting the BIOS entry point in shadow RAM).
A normal BIOS will overwrite most of the registers during a cold boot, except for X4, X6 and X7. There are some differences between chips that can be observed in these:
X4 X6 X7 chip FFFF FFFF FFFF N80L286-10/S 003D42S 0000 0000 0000 N80C286-12 ET 037E6KX 5EEC CD8D 8BFC HARRIS CS80C286-16 F3360
The last one could be some kind of early CPUID?
After "triple-fault" shutdown
After shutdown & reset from protected mode, some information about what caused the exception is preserved:
X0 0040 X1 source of limit violation on first exception 6CFF GDT 6DFF LDT 6EFF IDT 6FFF TSS 70FF ES 71FF should be CS? never seen 72FF should be SS? never seen 73FF DS X4 0000 (error code from double fault) X8 selector if exception was from segment load, else zero
This might be a side effect of the exception handling rather than something intentional, because if the source is CS or SS, X1 gets loaded with the access rights for CS instead.
In real mode, only X4 will be set to zero.
What more is there?
Honestly, most of this is not all that useful, but it offers some insights into how the chip works.
STOREALL to single step through code is only possible in ring 0, since it is privileged. It is also 3 bytes that have to be inserted after every instruction to step through. I don't think there is any equivalent of the trap flag to trigger it automatically, since the ICE hardware wouldn't have needed it.
With custom hardware, it might be possible to blindly put opcodes on the bus and have them executed in ICE mode. Even if it works, it is somewhat unlikely that there are any extra registers that can be accessed this way. Another idea is to run one instruction again and again, starting from a defined state and each time resetting it at a different clock cycle to observe what happens when.
Getting the actual microcode bits certainly isn't possible without some very high-res photos of the die, and possibly delayering.
The best I could find is on visual6502.org. You can see how the ROM is split in two, with the upper half providing control lines and 3 x 6 bits in the lower half selecting which registers are put on the internal buses. Unfortunately, it seems too dense to get anything out of it, but I would be happy to be proven wrong by some deep learning or signal processing wizardry!
The entry point PLA is in the two blocks below, and similar to the decoder in the 8086 and 186. Staring at this for a long time, I could make out some of it, enough to confirm that there are no more hidden opcodes: all of them have matching patterns that go to the same entry point, which must be the one that generates the "invalid opcode" exception.