Mac OS X Numerics Benchmarks

AltiVec

27 Jun 2003
Now that the new G5 Macs have been announced, I got more interested in AltiVec, which is also found in the G4 chips. I wrote a simple sum of the square roots program using vecLib.

My current daily machine is a Power Macintosh G4 (Mirrored Drive Doors) with twin 1 GHz CPUs, 1 GB of RAM, and 360 GB of drive space. This machine can sum 100,000,000 single-precision square roots in less than 5 seconds!

Here's the program, which will build on Mac OS X using GCC or under MPW with MRC:


/*
 *	altivec.c - AltiVec benchmark program.
 *	altivec is Copyright Daniel K. Allen, 2003. (This program, not the Velocity Engine.)
 *	All rights reserved.
 *
 *	26 Jun 2003 - Created by Dan Allen in MPW & Terminal simultaneously.
 *
 *	Dual 1 GHz G4 Power Mac running Mac OS X 10.2.6 times:
 *
 *		cc altivec.c -o altivec -framework vecLib -faltivec -O3 -mdynamic-no-pic
 *
 *			100,000,000 square roots in 4.69 seconds
 *	  		1,000,000 square roots in  .05 seconds
 *
 *		MRC altivec.c -opt speed,unroll -vector on 
 *
 *			100,000,000 square roots in 5.03 seconds
 *	  		1,000,000 square roots in  .05 seconds
 *
 *
 */


#ifdef powerc
#include 
#else
#include 
#endif
#include 
#include 


typedef union {
	vector float v;
	float f[4];
} vf;


main(int argc,char *argv[])
{
  vf a,b;
	int i = 0,n;
	double sum = 0;
	clock_t t = clock();


	n = (argc == 2) ? atoi(argv[1]) : 1000000;
  while (i < n) {
		a.f[0] = i++;
		a.f[1] = i++;
		a.f[2] = i++;
		a.f[3] = i++;
		b.v = vsqrtf(a.v);
		sum += b.f[0];
		sum += b.f[1];
		sum += b.f[2];
		sum += b.f[3];
	}
	t = clock() - t;
  printf("Time: %.2f sec\n Sum: %d sqrts = %.8f\n",t/(float)CLOCKS_PER_SEC,i,sum);
  return 0;
}


/*


MRC altivec.c -o altivec.o -opt speed,unroll -vector on 
PPCLink -o altivec altivec.o "{PPCLibraries}InterfaceLib" "{PPCLibraries}MathLib" "{PPCLibraries}StdCLib" "{PPCLibraries}StdCRuntime.o" "{PPCLibraries}PPCCRuntime.o" "{PPCLibraries}PPCToolLibs.o"  "{PPCLibraries}vecLib"
SetFile altivec -d . -m . -t MPST -c 'MPS '


*/



Floating Point

Jan 2002

A quiet improvement in OS X 10.1.2 appears to be faster math library routines which have greatly improved numerics benchmark scores. GCC 2.95.2 now appears to have competitive codegen with MRC. In this case it turned out that improved math libraries in 10.1.2 make the difference.

Once again this proves that benchmarking rarely measures an individual component but reflects an entire system: CPU, memory, buses, disks, I/O, an operating system, the compiler, and libraries all contribute to the final result.

Bench Scores

OS VersioniTunesCompilerInteger AdditionSum of SqrtsSimulationMost Remote (Trig)
Mac OS 9.2.2-MRC -opt speed1.170.230.070.30
Mac OS X 10.1-gcc -O31.091.150.110.49
Mac OS X 10.1.1-gcc -O31.081.290.110.49
Mac OS X 10.1.2-gcc -O31.090.250.070.27
Mac OS X 10.1.2 & 9.2.2-MRC -opt speed1.150.230.070.30
Windows 2000 SP2-Visual C++ 6.0 -Ox1.120.130.090.28
Mac OS X 10.1playinggcc -O31.111.350.120.51
Mac OS X 10.1.2playinggcc -O31.140.270.070.28
Mac OS X 10.1.2 & 9.2.2playingMRC -opt speed1.330.270.100.35
Mac OS 9.2.2playingMRC -opt speed1.250.250.080.33

Mac OS X is now faster than 9.x!

Jun 2003
GCC 3.3 is now out and sometime I'll update these numbers for my faster machine and the new compiler.

The bench tool is a collection of small numeric loops that are common in scientific and engineering programming. This tool is written in C by Dan Allen, with contributions by Dr. Paul A. Finlayson of JPL. The benchmark scores used to take a long time. Fast machines will require us to update the benchmarks so that the differences are more apparent and are not in the noise.


Back to Dan Allen's home page.
Created:  22 Dec 2001
Modified: 27 Jun 2003