Hello and welcome to my blog. This is my first blog posting.
My name is Mark Charney and I work at Intel in Hudson, Massachusetts. Intel has just made available some software that I've been working on for emulation of new instructions: Intel® Software Development Emulator, or Intel® SDE for short. Intel SDE emulates instructions in the SSE4, the AES and PCLMULQDQ instruction set extensions, and also the Intel® AVX instructions. Intel SDE runs on Windows* and Linux* and supports IA-32 architecture and Intel® 64 architecture programs.
Intel SDE is a functional emulator, meaning that it is useful for testing software that uses these new instructions to make sure it computes the right answers. Testing software that uses instructions that do not exist in hardware yet requires an emulator. Intel SDE is not meant for predicting performance.
Intel SDE is actually a "Pintool" built upon the Pin dynamic binary instrumentation system.. The Pin that comes with Intel SDE uses a special version of the software encoder decoder called XED that I also develop. While Intel SDE is primarily useful for learning about the new instructions, it also has some features for doing simple workload analysis. The "mix" tool compute static and dynamic histograms. It can compute histograms by the type of the instruction (ADD, SUB, MUL, etc.) or by "iforms" which are XED classifications of instructions that include the operands, or by instruction length.
Intel SDE is fairly speedy. I actually haven't measured it because it was so much faster than the other emulator we have been using (over 100x faster) that I'm not getting any complaints internally. We routinely run SPEC2006 using Intel SDE using the reference inputs. Most of the inputs can run in several hours while a few of the longer running inputs take about a day. Emulation performance is tricky to measure as each instruction requires a different amount of work and each application is different. I could probably take the slow down relative to a version of SPEC2006 that only used native instructions. The reason that Intel SDE is faster than our previous "trap-and-emulate" emulator basically has to do with the fact that we do not rely on the illegal-instruction exception saving 1000s of cycles dispatching and returning from the emulation routines. Because Intel SDE is built upon Pin, we can JIT-translate the original program and branch to the emulation routines, saving that exception overhead.
Right now, there are several ways that I know about to write programs using the new instructions. If you want to use the SSE4, AES and PCLMULQDQ instructions, then you can use the Intel® Compiler. The Intel Compiler supporting Intel AVX is expected to be released in the first quarter of 2009. GCC4.3 supports SSE4. There is also a version of GCC that supports AES and PCLMULQDQ available in the svn (subversion) respository svn://gcc.gnu.org/svn/gcc/branches/ix86/gcc-4_3-branch . GCC for Intel AVX is under development as well: svn://gcc.gnu.org/svn/gcc/branches/ix86/avx. GNU binutils which includes the "gas" assembler is available for AES, PCLMULQDQ and Intel AVX. Also available are the YASM and NASM assemblers.
If anyone has questions about this or suggestions for something they'd like me to write about, please post a comment. I'd like to hear about what is important to you. There are so many aspects of this that I'd like to describe in future postings:
- How it works
- Isolation issues
- Debugging
- Advanced use options
- Program checkers
Also if you have software questions you can post them to the Intel® AVX and CPU forum at:
/en-us/forums/intel-avx-and-cpu-instructions/