intel汇编指令格式学习 - 图文(2)

2020-04-21 07:21

C The reg field of the ModR/M byte selects a control register (for example, MOV (0F20, 0F22)).

D The reg field of the ModR/M byte selects a debug register (for example,

MOV (0F21,0F23)).

E A ModR/M byte follows the opcode and specifies the operand. The operand is either a general-purpose register or a memory address. If it is a memory address, the address is computed from a segment register and any of the following values: a base register, an index register, a scaling factor, a displacement.

F EFLAGS Register.

G The reg field of the ModR/M byte selects a general register (for example, AX (000)).

I Immediate data. The operand value is encoded in subsequent bytes of the instruction.

J The instruction contains a relative offset to be added to the instruction pointer register (for example, JMP (0E9), LOOP).

M The ModR/M byte may refer only to memory (for example, BOUND, LES, LDS, LSS, LFS, LGS, CMPXCHG8B).

O The instruction has no ModR/M byte; the offset of the operand is coded as a word or double word (depending on address size attribute) in the instruction. No base register, index register, or scaling factor can be applied (for example, MOV (A0–A3)).

P The reg field of the ModR/M byte selects a packed quadword MMX? technology reg- Ister.

Q A ModR/M byte follows the opcode and specifies the operand. The operand is either an MMX? technology register or a memory address. If it is a memory address, the ad- dress is computed from a segment register and any of the following values: a base reg- ister, an index register, a scaling factor, and a displacement.

R The mod field of the ModR/M byte may refer only to a general register (for example, MOV (0F20-0F24, 0F26)). S The reg field of the ModR/M byte selects a segment register (for example, MOV (8C,8E)).

T The reg field of the ModR/M byte selects a test register (for example, MOV (0F24,0F26)).

V The reg field of the ModR/M byte selects a packed SIMD floating-point register.

W An ModR/M byte follows the opcode and specifies the operand. The operand is either a SIMD floating-point register or a memory address. If it is a memory address, the ad- dress is computed from a segment register and any of the following values: a base reg- ister, an index register, a scaling factor, and a displacement.

X Memory addressed by the DS:SI register pair (for example, MOVS, CMPS, OUTS, or LODS).

Y Memory addressed by the ES:DI register pair (for example, MOVS, CMPS, INS,

STOS, or SCAS).

（2）操作类型关键字：

a Two one-word operands in memory or two double-word operands in memory, depending on operand-size attribute (used only by the BOUND instruction).

b Byte, regardless of operand-size attribute.

c Byte or word, depending on operand-size attribute.

d Doubleword, regardless of operand-size attribute.

dq Double-quadword, regardless of operand-size attribute.

p 32-bit or 48-bit pointer, depending on operand-size attribute.

pi Quadword MMX? technology register (e.g. mm0)

ps 128-bit packed FP single-precision data.

q Quadword, regardless of operand-size attribute.

s 6-byte pseudo-descriptor.

Ss Scalar element of a 128-bit packed FP single-precision data.

si Doubleword integer register (e.g., eax)

v Word or doubleword, depending on operand-size attribute.

w Word, regardless of operand-size attribute.

综上就是opcode解码的关键字，现在举例如下：

还是看C7这个机器码，查表one-byte opcode map有：

可以看到这个属于Grp 11(1a)类型，有Ev，Iv属性，由以上的属性说明关键字可知： E A ModR/M byte follows the opcode and specifies the operand. The operand is either a general-purpose register or a memory address. If it is a memory address, the address is computed from a segment register and any of the following values: a base register, an index register, a scaling factor, a displacement.

I Immediate data. The operand value is encoded in subsequent bytes of the instruction. v Word or doubleword, depending on operand-size attribute.

自己翻译就是：

Ev：是一个在opcode后的 ModR/M由操作数大小决定的关键字，或者双字，该操作字指定了一般目的的寄存器或者内存地址，如果指定的是内存地址，那么这个地址是由段寄存器和他之后的基地址寄存器，索引寄存器，比例因子或者displacement值所计算出来。

Iv：是一个立即数数据，其大小是由操作数大小决定为关键字，或者双字。

也就是说，C7这个指令，对应的就是mov ModR/M,Immediate 这个汇编指令了，现在从opcode出发继续看例子中的机器码26 c7 84 c8 44 33 22 11 78 56 34 12，自己现在翻译就是：

前缀26：ES（1字节），段寄存器

核心opcode C7:mov ModR/M,Immediate（1字节） ModR/M：84就是10-000-100（233划分1字节） SIB：11-001-000（233划分1字节）

Displacement，可理解为 offset，最大4个字节：44332211（4字节）——11223344 Immediate：78563412（4字节，按低位到高位排列）——12345678

既然opcode的属性确定了寻址方式，那么接下来就是：

查表32-bit addressing forms with the ModR/M byte表来找ModR/M结构可以得到：

可以得到是EAX

查表32-bit addressing forms with the SIB byte表来找SIB结构可以得到：

可以得到是ECX*8

然后因为目标操作是内存地址，即opcode决定了这个地址是由段寄存器和他之后的基地址寄存器和displacement值所计算出来，就是说：

EAX+ECX*8+11223344

最后就是由操作数决定长度的一个立即数，因为操作数是12345678

综上所分析，机器码26 c7 84 c8 44 33 22 11 78 56 34 12翻译成为汇编语言就是： mov es:[eax + ecx * 8 + 0x11223344], 0x12345678

这儿又出现了一个问题：SIB不是必须的，那么是什么导出了SIB吗？

总结如下：

首先我们从一句汇编代码的机器码得出了编码序列，并且对于序列的每一个部分进行了划分，分出了一般指令的六个组成部分（不一定全部都有）；

然后就对于每一个组成部分做出了解释，而这个解释的关键就在于intel文档的：one-byte opcode map，32-bit addressing forms with the ModR/M byte和32-bit addressing forms with the SIB byte的这三个表，通过查表就可以确定一个opcode的具体含义；

最后，对照表就可以写出相应的汇编代码了！

兴奋吧，终于开始要走上学些intel指令格式的道路了，但是距离自己写反汇编引擎还有很长的路要走啊！于20110124晚，23:16

第三节

既然通过以上两个可以看出关键的部分就在于查表，那么今天就来看看怎么样才能充分的都读懂intel给我们的这些表格了，呵呵

首先看看官方的文档中对于opcode的说明：

Use the opcode tables in this chapter to interpret IA-32 and Intel 64 architecture object code. Instructions are divided into encoding groups:

? 1-byte, 2-byte and 3-byte opcode encodings are used to encode integer, system, MMX technology, SSE/SSE2/SSE3/SSSE3/SSE4, and VMX instructions. Maps for these instructions are given in Table A-2 through Table A-6.

? Escape opcodes (in the format: ESC character, opcode, ModR/M byte) are used for floating-point instructions. The maps for these instructions are provided in Table A-7 through Table A-22. 简单的翻译就是：

指令按不同的编码分为几组：

（1）1，2，3字节的opcode用于整型编码，操作系统，MMX技术，

SSE/SSE2/SSE3/SSSE3/SSE4技术,，以及 VMX instructions（虚拟机指令）技术。这些指令的地图在表2到6中。

（2）Escape 指令用于浮点指令集，在表7到22中。

可以看见，表格的情况也是很复杂的，怎么样从关键的信息出发来找表格的突破点呐？

（1）表格中行指标是除去前缀（2,3字节opcode）之后的高4位字节，列指标是低4位字节：（2）有些1,2字节的opcode会使用ModR/M字节的第3-5位作为opcode的扩展：

这个地方就穿插讲解一下ModR/M字节的功能了：

如图可见，整个ModR/M字节分为三部分：MOD,REG/OPCODE和R/M

1）其中的MOD是寻址模式，由两位组成，有四种寻址模式，但是从总体上说，只有两种寻址模式：内存寻址模式和寄存器寻址模式：

<1>mod = 11，寄存器寻址模式，这儿的ModRM的寻址方式的定位是定位2个操作数寻址方式的这种模式，所以在只有1个寄存器时，寻址方式会直接嵌入在opcode中；

<2>mod = 00，定义register间接寻址，没有displacement值；

<3>mod = 01，定义[register + disp8]，有8位displacemnet 偏移值； <4>mod = 10，定义 [register + disp32]，有32位displacement偏移值。

总体上就是这样的寻址模式。

然后由最后的R/M来决定实际的操作数，如图就是其中的一部分表格：

2）REG/Opcode：由三个位组成：

<1>作为reg值：000-111对应RAX、RCX、RDX、RBX、RSP、RBP、RSI以及RDI(其中的R表示任意位数情况下可能的情况，比如：EAX（32）,AX（16）,AL（8）等)，究竟最终使用的是32位还是16位的寄存器，则由操作码字节以及操作数尺寸属性决定=>在介绍指令前缀的时候已经介绍了Intel如何识别8位，16位和32位操作对象的，这里对寄存器的识别也用的是同样的方式，如图：

<2>作为opcode扩展值：reg域的另一含义是对Opcode的补充，对分为一组Opcode的进行选择（就是所标记的Group属性），共有16中分组，这里就是（2）中所关注的部分了：

在opcode的操作数具有group属性的时候才会需要MorD//M中3-5位配合！也就是说：

（a）指令的操作对象中没有寄存器，只有内存和立即数；（b）指令操作的对象中的寄存器是默认已知的。这两种情况才会使用扩展的三位来对于opcode补充！

所以：从解码的代码实现的角度来说，我们读到了对应的Opcode，但是要确定其对应的操作，还需要接着再读取ModR/M中的Opcode部分，两个部分结合起来才能得到正确的指令编码。这个也就是不同反汇编引擎的主要区别！

共3页:

intel汇编指令格式学习 - 图文(2).doc 将本文的Word文档下载到电脑下载失败或者文档不完整，请联系客服人员解决！

下载这篇word文档