Topics include:
From: Dan Allen
Sent: Saturday, July 21, 2001 12:08 PM
To: MPW
Subject: Unix / Win / Mac comparison
Awk Benchmark
20 Jul 2001
By Dan Allen
I wrote an interesting Awk script that greps through a list of over
150,000 US cities and does some calculations.  This script runs extremely
slowly under Mac OS 10.0.4, hereafter referred to as Unix.  Here are the
details:
UNIX
====
On a Macintosh G4 Cube with 450 MHz G4 processor, 320 MB of RAM, and Mac
OS 10.0.4, I ran the following from Terminal:
	awk -f cities.awk USCities.txt
This took 1,800 seconds (30 minutes!!) to run on 10.0.4, and it used over
300 MB of virtual RAM.
Here are a few snapshots from the "top" command while it ran:
  PID COMMAND   %CPU   TIME   #TH #PRTS #MREGS RPRVT  RSHRD  RSIZE  VSIZE
  497 awk      69.7% 13:23.95   1    16    21   245M+  100K-  245M+  247M+
  497 awk      80.9% 19:01.76   1    16    22   231M+  220K   161M+  282M+
  497 awk      83.6% 20:57.76   1    16    22   233M   220K   180M+  300M+
  497 awk      87.9% 22:58.63   1    16    22   233M-  220K   193M+  314M
The awk I used I built myself using sources from Brian Kernighan's website.
I built it using the GCC that comes with Apple's Developer CD using the -O3
level of optimizations which results in an executable 136,476 bytes in size.
Small awk scripts run in an acceptable time, but this large script shows up
a major problem in memory allocation.
WINDOWS 2000
============
If I run the same command on a 450 MHz Pentium III Dell computer using
Windows 2000 and 256 MB of RAM, the job runs in 50 seconds and uses 32 MB
of RAM.  (The Mem Usage is 32 MB and the VM Size is 32 MB, as measured by
the Windows Task Manager.  This does not mean 64 MB of RAM--it means that
all of the RAM that was virtually allocated also was resident in physical
RAM.)  For Windows I built awk using Visual C++ 6.0 -Ox from the exact same
sources as for Unix and Mac.  The resulting awk binary is 139,264 bytes
in size.
MAC OS 9.1
==========
If I run the same script on the same G4 Cube but running under Mac OS 9.1
and MPW, the job runs in 269 seconds and uses a maximum of 26 MB of memory.
This version of awk is 101,938 bytes in size and was built using MPW's
MRC 4.1 with the -opt speed level of optimization, thus all 3 builds of
awk were optimized for speed.  Note: The sort was actually done via piping
the output of the cities.awk script into the sort tool under MPW as my port
of Awk under MPW does not support piping or the system command from an MPW
tool.  The sort, however, happens near instantly on all systems so it is
in the noise.)
SUMMARY
=======
On all three platforms (Mac, Unix, Windows) the same results are returned,
a file about 679 KB in size.  (It is a list of cities showing the city names
that are the most commonly used and a list of the states that each city is
found in.  It also computes the geographic center of the cities that share
a common name.)
Comparative Stats:
Windows is 36 times faster than Unix.
Windows is 5.4 times faster than Mac OS.
Mac OS is 6.7 times faster than Unix.
Mac OS uses 26 MB of memory, about 81% of Windows.
Windows uses 32 MB of memory, about 10% of Unix.
Unix uses 314 MB of memory, about 12 times Mac OS.
This explains why Apple's new Unix is incredibly slow compared to the Mac OS:
memory allocation has major problems!  Why is over 314 MB allocated when the
other operating systems use 25-32 MB of RAM?  It seems to me that solving this
problem would solve a lot of Mac OS X sluggishness problems, such as those in
the Finder and menus.
APPENDIX 1 - Cities.awk script used in this test
==========
#!/Users/danallen/bin/awk -f
# cities.awk  - city name frequencies from USCities.txt
# 17 Jul 2001 - Created by Dan Allen.
# 19 Jul 2001 - Average location computed.
# 20 Jul 2001 - Ported to OS X.
BEGIN { FS = OFS = "\t" }
$3 == "Puerto Rico" || $3 == "Guam" { next }
{
	name = $1
	lat = DMS2Dec($4)
	lon = DMS2Dec($5)
	if (lat ~ "NAN" || lon ~ "NAN") next
	sub(/,.*/,"",name)
	sub(/ ?\(.*\)/,"",name)
	sub(/St\.? /,"Saint ",name)
	if (!index(a[name],$3)) {
		a[name] = a[name] ", " $3
		avgLat[name] += DMS2Dec($4)
		avgLon[name] += DMS2Dec($5)
		numPts[name]++
	}
}
END {
  for (i in a) {
    n = split(a[i],t,",") - 1
    sub(/^, /,": ",a[i])
    if (n > 3)
      print n, avgLat[i] / numPts[i],avgLon[i] / numPts[i],i a[i] | "sort -n"
    sum += n
  }
  print sum,"total cities"
}
function DMS2Dec(dms,  neg) { # ddmmssx --> dd.ddddd
  CONVFMT = "%.5f"
  if (length(dms) == 7) { # latitude
    neg = (substr(dms,7,1) == "S") ? -1 : 1
    return neg * (substr(dms,1,2) + substr(dms,3,2)/60 + substr(dms,5,2)/3600)
  }
  else if (length(dms) == 8) { # longitude
    neg = (substr(dms,7,1) == "E") ? -1 : 1
    return neg * (substr(dms,1,3) + substr(dms,4,2)/60 + substr(dms,6,2)/3600)
  }
}
APPENDIX 2 - Sample from USCities.txt
==========
City		County		State		Latitude	Longitude
Abanda		Chambers	Alabama	330603N	0853147W
Abbeville	Henry		Alabama	313418N	0851502W
Abbot Springs	Shelby		Alabama	332139N	0862854W
Abel		Cleburne	Alabama	333255N	0854245W
.
.
(153,000+ cities omitted)
Daniel K. Allen
Measure | Analyze | Communicate
Perl Update - 27 Jan 2002
I used Perl's a2p tool to create a Perl script that did exactly the 
same thing as my original Awk script and it runs on OS X 10.1.2 in... drumroll
please ... the Perl script on Mac OS X 10.1.2 runs in:
38 seconds and uses 24.5 MB of memory!
This does not make any sense of course. Awk scripts generally run faster than Perl scripts, but the Awk version takes 30 minutes and uses 300+ MB of memory on OS X, leading to the following bug report below... As to why the new Perl version runs so fast, I do not know. Note that the Awk version runs very quickly on Mac OS 9.2 or Windows 2000.
Cities script - Perl version
Just for completeness, here is the Perl script that runs quickly in OS X:
#!/usr/bin/perl
#  cities.pl  - city name frequencies from USCities.txt
# 17 Jul 2001 - Created by Dan Allen. (Paradise, CA)
# 19 Jul 2001 - Average location computed. (Paradise, CA, 3:00 AM)
# 27 Jan 2002 - Converted from Awk to Perl via a2p.
$[ = 1;			# set array base to 1
$\ = "\n";		# set output record separator
$FS = $, = "\t";
open(SORT__N, '|sort -n') || die 'Cannot pipe to "sort -n".';
line: while (<>) {
	chomp;	# strip record separator
	($Fld1,$Fld2,$Fld3,$Fld4,$Fld5) = split($FS, $_, 9999);
	if ($Fld3 eq 'Puerto Rico') { next line; }
	if ($Fld3 eq 'Guam') { next line; }
	$name = $Fld1;
	$lat = &DMS2Dec($Fld4);
	$lon = &DMS2Dec($Fld5);
	if ($lat =~ 'NAN' || $lon =~ 'NAN') { next line; }
	$name =~ s/,.*//;
	$name =~ s/ ?\(.*\)//;
	$name =~ s/St\.? /Saint /;
	if (!index($a{$name}, $Fld3)) {
		$a{$name} = $a{$name} . ', ' . $Fld3;
		$avgLat{$name} += &DMS2Dec($Fld4);
		$avgLon{$name} += &DMS2Dec($Fld5);
		$numPts{$name}++;
	}
}
foreach $i (keys %a) {
	$n = (@t = split(/,/, $a{$i}, 9999)) - 1;
	$a{$i} =~ s/^, /: /;
	print SORT__N $n, $avgLat{$i} / $numPts{$i},$avgLon{$i} / $numPts{$i}, $i . $a{$i} if $n > 3;
	$sum += $n;
}
print $sum, 'total cities';
sub DMS2Dec {
	local($dms, $neg) = @_;
	# ddmmssx --> dd.ddddd
	$CONVFMT = '%.5f';
	if (length($dms) == 7) {
		# latitude 472945N
		$neg = (substr($dms, 7, 1) eq 'S') ? -1 : 1;
		return $neg * (substr($dms, 1, 2) + substr($dms,3,2) / 60 + substr($dms, 5, 2) / 3600);
	}
	elsif (length($dms) == 8) {
		# longitude 1214708W
		$neg = (substr($dms, 7, 1) eq 'E') ? -1 : 1;
		return $neg * (substr($dms, 1, 3) + substr($dms,4,2) / 60 + substr($dms, 6, 2) / 3600);
	}
}
Timing Comparison
Using the Unix time command shows that awk is running 35 times slower on Mac OS X 10.1.2,
as measured on 18 Feb 2002 on a 450 MHz G4 Cube with 512 MB of RAM.
[~/AwkTrue] % time awk -f cities.awk cities.txt > foo 1472.970u 22.420s 36:08.55 68.9% 0+0k 0+16io 0pf+0w [~/AwkTrue] % time perl cities.pl cities.txt > foopl 41.540u 0.940s 1:46.48 39.8% 0+0k 3+5io 0pf+0w36 minutes verses less than 2 minutes!
Problem Identified
4 Sept 2002
The problem has finally been identified!
Using the sample command I have learned that the problem lies in deallocating memory.  
The system's szone_malloc routine is where the time is spent.  As memory is deallocated
by calling free the allocator tries to coelesce free blocks and I believe the bug lies therein,
but I am not completely sure yet.
The source for this is available from http://www.opensource.apple.com/projects/darwin/1.4/projects.html, the Libc-186.tgz component, in the file Libc/gen.subproj/scalable_malloc.c. I am studying it now to see what the problem is.
On a new dual 1 GHz G4 tower I was able to get much better performance by changing the routine tfree in Awk's run.c source as follows:
void tfree(Cell *a)	/* free a tempcell */
{
	if (freeable(a)) {
		   dprintf( ("freeing %s %s %o\n", NN(a->nval), NN(a->sval), a->tval) );
		#ifdef __APPLE__ /* DKA 4 Sep 2002 */
		a->sval = NULL;
		#else
		xfree(a->sval);
		#endif
	}
	if (a == tmps)
		FATAL("tempcell list is curdled");
	a->cnext = tmps;
	tmps = a;
}
Note that xfree is a macro that wraps free.
My change was to not call free at all but to just set the pointer to NULL. This change changes virtual RSIZE RAM usage from 320 MB to 83 MB! Interesting how NOT freeing up memory uses much less memory. Memory management is definitely screwed up.
Here are the times without the fix and then with the fix on a Dual 1GHz G4 Tower running 10.2:
[~/AwkTrue] % time ./awk -f cities.awk < cities.txt > foo 1079.520u 9.030s 18:21.95 98.7% 0+0k 0+10io 0pf+0w [~/AwkTrue] % time ./awk -f cities.awk < cities.txt > foo 18.300u 0.830s 0:19.27 99.2% 0+0k 0+12io 0pf+0wThese results show that execution time has gone from 18 minutes down to 19 seconds. More work is needed to determine why the OS X malloc routines are behaving the way they are.
Problem Still There
12 Nov 2002
The 10.2.2 updater I installed this morning has not yet fixed the bug.
I'm Getting Upset
27 Jun 2003
Well we're now up to Mac OS X 10.2.6.  These updates still have not fixed the bug.  Many hangs occur on my system, and the 
sample tool reveals that most of these occur in threading and in something related
to szone_malloc.  This is the worst part of Mac OS X!
Fixed!
27 Nov 2003
The bug existed up through 10.2.8.  In Panther, Mac OS X 10.3, the bug finally has been fixed.
That's the good news.  The bad news is that I thought it was the same bug that caused a lot of
of the hangs in the system, but it wasn't: I've had 10.3 hang on me in the Finder just like before.
Many other problems came with 10.3 (it broke printing entirely on my main Mac, Norton Utilities 
for OS X no longer work, network volumes often do not show up in Classic, Favorites are gone, etc.),
so I have gone back to 10.2.8.  Sigh.  One step forward, several steps back.
The Bug Explained
12 Dec 2003
The bug was in libc, in the szone_malloc routines.  I tried doing a diff between Darwin 6.8 (10.2.8) and
Darwin 7.0 (10.3), hoping to see a small fix, but they changed a lot of code in scalable_malloc.c - the
diffs were voluminous.  It all has to do with coalescing small blocks of freed up memory, etc.  It
sure looks more like a garbage collector for Lisp or Java than it does a nice straightforward memory
allocator.
Created: 24 Jul 2001 Modified: 11 Oct 2025