libSIMD
Last updated 17th June 2002
Copyright (C)
2001-2002 by Iain Nicholson
Mathematical library utilising SIMD features of common processors to accelerate many commonly-used algorithms where compilers fear to tread.
Most modern processors possess Single Instruction Multiple Data (SIMD) instructions which are used mainly in games and multimedia applications such as 3D graphics and image processing. Simply put, these instructions operate on pairs or quadruplets of floating-point (and integer) numbers simultaneously and often in a single clock cycle. Code written to use such instructions can run many times faster than conventional code using legacy floating-point instructions which operate one-at-a-time in general. libSIMD aims to be a cross-platform, Free (as in speech) library of common mathematical operations using SIMD code as much as possible, distributed under the GNU Lesser General Public License (LGPL).
The freely-availably GNU Compiler Collection, GCC has been ported to many processors and it does a reasonable job at generating efficient code, however it does not know how to "vectorise" i.e. use SIMD instructions on multiple data where available. Proprietary SIMD libraries exist for many processors. Often they require a commercial C compiler or assembler to be used. Also, in general, they are not freely redistributable or modifiable in source code form and API's vary from platform to platform, so they are not portable.
libSIMD seeks to address these issues by providing a unified API to scalar, vector, matrix etc. primitives (and eventually higher-level functions) tuned to use these facilities on architectures which support them, and a conventional implementation on all others, such that code can be written to one API and compiled on all architectures on which GCC is implemented.
libSIMD will be in the form of a dynamic library for Unix-like operating systems (and maybe even Windows), compilable with GCC (which is available, Free (GPL) and stable on most platforms). The programmer will be able to code to an API which will be implemented in optimised SIMD code on many platforms and conventional floating-point on others, such that their program will be able to take advantage of the performance gains available on those platforms, whilest still running on others.
The main goal is to provide scalar, vector, matrix, trigonometric, complex number, quaternion and FFT operations in the form of a dynamic library which can be compiled with GCC. An API will be defined and a basic set of functions, each of which will be implemented in inline assembler using SIMD instructions and C for portability to machines lacking these instructions. Initially, development will be on Linux and using AMD 3DNow! instructions (because my own machine is a K6-2). If the concept proves to be sound and the library is found to be useful, there could be implementations of the library functions using other SIMD instruction sets, such as intel SSE, Sun VIS and Motorola/Apple AltiVec/Velocity Engine. There could be ports to other operating systems and interfaces to other popular languages such as C++, Delphi, Java and Visual Basic, but these are all a long way off in the future :)
The initial implementation will contain functions in the following categories:
Function calls will be along the lines of simd_foo_bar(a, b, c) where a and b are pointers to the source data and c is a pointer to the destination or result data structure (usually an array).
The interface to the library in its natural language, i.e. C, will be through the header file simd.h and the library will be compiled as a standard ELF dynamic library.
The ABI will conform to the standard (default) C method of parameter passing and data structure alignment on the machine on which the library is compiled e.g. 32-bit little-endian, 4-byte aligned on AMD and intel x-86, or 64-bit big-endian, 8-byte aligned on UltraSPARC etc. This will be a natural consequence of GCC output on the target architecure.
Note: Under Construction. Here will be listed the basic set of function primitives to be implemented. Please look in CVS at the header (.h) files for current status. (Hint : click on the version number, not the file name :->)
When a reasonable set of primitives has been defined and implemented, I'd like to think about higher-level functions, starting with operations on arrays of objects.
Implementation is underway. There is not much optimisation yet i.e. no instruction reordering (except in limited cases), no cache prefetching etc. I have decided to include operations on double precision floating-point numbers in the library exactly following the single-precision functions, so for each single-precision operation, there will be an equivalent double precision operation. 3DNow! and SSE do not implement double-precision operations, so these will be implemented using conventional floating-point.
At the moment, two very simple Makefiles are provided which compile the library and two trivial test programs (one each for single and double precision operations).
A small test program, libsimd_test, is provided. It can be compiled to use the C or 3DNow! implementations of the library functions. Please compare the results for consistency.
As of this writing, I have begun to implement basic functionality, the API and test code simultaneously. I don't like sitting around. :-)
Go here to browse the CVS archive. Alternatively, you can download a source tarball by following the links here. Note that the CVS tree will probably be more up-to-date, but the source tarball should compile and run the test out of the box.
Some time ago I wrote this article explaining briefly how to use the inline assembler in GCC to implement functions in assembly language. There are references to more detailed explanations, the purpose of extended inline assembly syntax, what it means and why it is useful, including how to pass parameters (from C functions) to the assembly language code.
Robin Miyagi has written a much better and more comprehesive tutorial which is available here.
Email me (Iain Nicholson) as ijnicholson at users dot sourceforge dot net. I would be pleased to recieve constructive feedback, especially ideas for extra functions to implement, or what you would (or have) found useful.
3DNow!, SSE, SSE2, AltiVec, Velocity Engine and VIS are all registered trademarks.
Developed on slackware 8.0 using anjuta.