论文总字数：24128字

摘要

仿真手段是处理器架构设计是进行设计空间探索的主要之一手段。常用的仿真手段包括RTL仿真和高层时钟精确仿真，但这二者的仿真速度很慢且效率低下。因此，目前更普遍采用的方法之一是通过解析建模来进行性能评估。解析建模的输入一般为软件Trace获得的特征信息，特征信息提取的效率在很大程度上决定了解析模型的评估效率。然而现有的Trace提取特征信息的技术存在着诸多问题，有的技术仅支持X86平台，有的虽然支持多种指令集架构，但其效率很低，还有的只进行了仿真而并未对指令信息进行统计分析。

为了快速获取解析建模所需的统计信息，从而加快整个解析建模的过程，我们通过对QEMU这一模拟器的源码进行修改，实现了一款基于QEMU的，支持多指令集架构的，能够快速进行跟踪和分析的工具的设计。它可以直接分析目标架构的二进制代码，在QEMU的中间代码翻译至本地代码的过程中收集并统计与访存相关的Trace信息，最终输出该程序的重用距离分布（Reuse Distance Distribution，简称RDD）和堆栈距离分布（Stack Distance Distribution，简称SDD）。

与现有的Trace提取特征信息的技术相比，该工具不仅仅支持多种指令集架构，还有着较快的速度以及较高的精度。测试表明，该工具提取访存信息的时间开销仅为PIN的1.24倍，GEM的12.3%。而在进行抽样统计的情况下，其速度可以提高5.2倍。其在进行RDD/SDD统计时与GEM5的统计结果相比得到的平均误差分别为1.531%和1.265%，即使是在抽样统计（抽样率为1/1000）的情况下，平均误差也仅为1.910%和1.904%。

所以我们得出结论，其精度和运行速度能够很好地完成提取访存信息这项工作，从而大大提升解析建模的评估效率。

关键词：QEMU，访存地址，重用距离分布，堆栈距离分布

Abstract

When we design the processor architecture, we need to evaluate the performance of the processor architecture by analytical modeling. So we need to get the statistical results of memory reference address to create an analytical model. To some degree, the speed of statistics has an influence on the speed of modeling directly.

However, the existing trace tools have many problems. some tools support the X86 platform only, and other tools support multiple instruction set architectures while the efficiency is very low.

To solve these problems, we made a tool that can complete statistical tasks fast and precisely to speed up analytical modeling. This tool is based on QEMU. Compared with existing tools, this tool we made supports multiple instruction set architectures and is able to trace and analyze memory reference address quickly. This tool analyzes the binary code of the target architecture directly, extracts address flow from QEMU when QEMU translates the intermediate codes into host codes. After finishing statistics on address flow, it will output the Reuse Distance Distribution and Stack Distance Distribution.

Compared with GEM5, the deviation in RDD statistics is 1.531% and in SDD statistics is 1.265%. When we enable sam function(sample rate=1/1000), the deviations are 1.910%(RDD-sam) and 1.904%(SDD-sam). On the other hand, the speed of our tool is very fast. Its time cost is 1.24 times that of the PIN and 12.3% of the GEM5. We can increase the speed to 5.2 times the original speed by enabling sam function!

So we come to the conclusion that our tool can take the place of the existing trace tools when used for analytical modeling.

KEY WORDS: QEMU, Memory Reference Address, Reuse Distance Distribution, Stack Distance Distribution

摘要 I

Abstract II

第一章 绪论 1

1.1 课题研究背景及意义 1

1.2 国内外研究现状 2

1.2.1 QTrace 2

1.2.2 其他相关研究 3

第二章 Trace工具实现基础 4

2.1 QEMU 4

2.1.1 QEMU基础 4

2.1.2 动态翻译技术 4

2.1.3 QEMU的具体实现流程 5

2.2 重用距离分布和堆栈距离分布 6

第三章设计方案 8

3.1 访存地址获取 8

3.2 红黑树^[15]（Red-Black Tree，简称R-B Tree） 9

3.2.1 二叉树 9

3.2.2 排序二叉树 9

3.2.3 红黑树 11

3.3 RDD/SDD统计 13

3.4 抽样统计 14

第四章实验结果 15

4.1 实验结果对比 15

4.1.1 SPEC CPU2006 15

4.2 X86平台测试精度 17

4.2.1 Load/Store指令数量统计精度对比 17

4.2.2 RDD/SDD统计精度对比 17

4.3 速度对比 18

第五章总结 20

致谢 21

参考文献（Reference） 22

绪论

课题研究背景及意义

现代处理器设计中的一个关键问题就是针对存储结构的设计。而在对存储结构进行高速缓冲存储器（Cache）解析建模时常用的模型之一就是统计模型。一般来说解析建模建立的统计模型的输入是衡量数据局部性的重用距离分布和堆栈距离分布，这两组统计数据在对储存结构进行Cache解析建模时有着非常重要的作用。Kecheng Ji介绍了一个将堆栈距离分布作为输入的评估乱序处理器Cache性能的框架^[1]，其主要功能是基于对程序的Trace分析，建立解析模型从而对Cache心跟那个进行评估，能够在不进行仿真的情况下快速估计出MLP和有效的缓存缺失服务时间。另一个以SDD作为输入的解析模型--Artificial Neural Network（ANN）神经网络模型^[2]，可以快速预测乱序处理器上的私有LRU-Cache缓存行为，比如程序访存缺失数等。

而想要获取这两组统计数据，必须要将程序运行时的访存数据提取出来并对其进行相关的统计，这个过程往往要耗费大量的时间，而这一统计获取重用距离分布和堆栈距离分布的过程的速度在一定程度上影响着整个解析建模的速度。所以需要一种工具，它需要支持多种指令集架构，能够对访存信息进行跟踪和统计，且需具有较快的运行速度和较高的统计精度。

剩余内容已隐藏，请支付后下载全文，论文总字数：24128字

您需要先支付 80元 才能查看全部内容！立即支付

该课题毕业论文、开题报告、外文翻译、程序设计、图纸设计等资料可联系客服协助查找;

注册

找回密码

基于QEMU仿真器的程序Trace分析工具设计

Abstract

绪论

课题研究背景及意义

您可能感兴趣的文章

登录

Abstract

绪论

课题研究背景及意义

您可能感兴趣的文章