Mac OS X Memory Deallocation Bug

This page discusses an interesting script that has exposed an interesting memory deallocation bug in Mac OS X. The script scans a large (9 MB) text file that lists 25,000+ cities of the United States. It tries to determine which city names are found most commonly in the 50 states.

Topics include:

  1. The original bug described - 20 Jul 2001
  2. The Awk script
  3. The bug disappears with Perl - 27 Jan 2002
  4. The Perl script
  5. Timings show it is still there - 18 Feb 2002
  6. Bug located and a workaround - 4 Sep 2002
  7. 10.2.2: Problem Still There - 12 Nov 2002
  8. 10.2.6: I'm Getting Upset - 27 Jun 2003
  9. 10.3: Fixed! - 27 Nov 2003
  10. The Bug Explained - 12 Dec 2003
If you want a copy of the script & file of cities, email me.


Mac OS X Bug

From: Dan Allen
Sent: Saturday, July 21, 2001 12:08 PM
To: MPW
Subject: Unix / Win / Mac comparison


Awk Benchmark
20 Jul 2001
By Dan Allen




I wrote an interesting Awk script that greps through a list of over
150,000 US cities and does some calculations.  This script runs extremely
slowly under Mac OS 10.0.4, hereafter referred to as Unix.  Here are the
details:




UNIX
====
On a Macintosh G4 Cube with 450 MHz G4 processor, 320 MB of RAM, and Mac
OS 10.0.4, I ran the following from Terminal:


	awk -f cities.awk USCities.txt


This took 1,800 seconds (30 minutes!!) to run on 10.0.4, and it used over
300 MB of virtual RAM.


Here are a few snapshots from the "top" command while it ran:


  PID COMMAND   %CPU   TIME   #TH #PRTS #MREGS RPRVT  RSHRD  RSIZE  VSIZE
  497 awk      69.7% 13:23.95   1    16    21   245M+  100K-  245M+  247M+
  497 awk      80.9% 19:01.76   1    16    22   231M+  220K   161M+  282M+
  497 awk      83.6% 20:57.76   1    16    22   233M   220K   180M+  300M+
  497 awk      87.9% 22:58.63   1    16    22   233M-  220K   193M+  314M


The awk I used I built myself using sources from Brian Kernighan's website.
I built it using the GCC that comes with Apple's Developer CD using the -O3
level of optimizations which results in an executable 136,476 bytes in size.
Small awk scripts run in an acceptable time, but this large script shows up
a major problem in memory allocation.




WINDOWS 2000
============
If I run the same command on a 450 MHz Pentium III Dell computer using
Windows 2000 and 256 MB of RAM, the job runs in 50 seconds and uses 32 MB
of RAM.  (The Mem Usage is 32 MB and the VM Size is 32 MB, as measured by
the Windows Task Manager.  This does not mean 64 MB of RAM--it means that
all of the RAM that was virtually allocated also was resident in physical
RAM.)  For Windows I built awk using Visual C++ 6.0 -Ox from the exact same
sources as for Unix and Mac.  The resulting awk binary is 139,264 bytes
in size.




MAC OS 9.1
==========
If I run the same script on the same G4 Cube but running under Mac OS 9.1
and MPW, the job runs in 269 seconds and uses a maximum of 26 MB of memory.
This version of awk is 101,938 bytes in size and was built using MPW's
MRC 4.1 with the -opt speed level of optimization, thus all 3 builds of
awk were optimized for speed.  Note: The sort was actually done via piping
the output of the cities.awk script into the sort tool under MPW as my port
of Awk under MPW does not support piping or the system command from an MPW
tool.  The sort, however, happens near instantly on all systems so it is
in the noise.)




SUMMARY
=======
On all three platforms (Mac, Unix, Windows) the same results are returned,
a file about 679 KB in size.  (It is a list of cities showing the city names
that are the most commonly used and a list of the states that each city is
found in.  It also computes the geographic center of the cities that share
a common name.)


Comparative Stats:


Windows is 36 times faster than Unix.
Windows is 5.4 times faster than Mac OS.
Mac OS is 6.7 times faster than Unix.


Mac OS uses 26 MB of memory, about 81% of Windows.
Windows uses 32 MB of memory, about 10% of Unix.
Unix uses 314 MB of memory, about 12 times Mac OS.


This explains why Apple's new Unix is incredibly slow compared to the Mac OS:
memory allocation has major problems!  Why is over 314 MB allocated when the
other operating systems use 25-32 MB of RAM?  It seems to me that solving this
problem would solve a lot of Mac OS X sluggishness problems, such as those in
the Finder and menus.



APPENDIX 1 - Cities.awk script used in this test

==========


#!/Users/danallen/bin/awk -f
# cities.awk  - city name frequencies from USCities.txt
# 17 Jul 2001 - Created by Dan Allen.
# 19 Jul 2001 - Average location computed.
# 20 Jul 2001 - Ported to OS X.


BEGIN { FS = OFS = "\t" }


$3 == "Puerto Rico" || $3 == "Guam" { next }


{
	name = $1
	lat = DMS2Dec($4)
	lon = DMS2Dec($5)
	if (lat ~ "NAN" || lon ~ "NAN") next
	sub(/,.*/,"",name)
	sub(/ ?\(.*\)/,"",name)
	sub(/St\.? /,"Saint ",name)
	if (!index(a[name],$3)) {
		a[name] = a[name] ", " $3
		avgLat[name] += DMS2Dec($4)
		avgLon[name] += DMS2Dec($5)
		numPts[name]++
	}
}


END {
  for (i in a) {
    n = split(a[i],t,",") - 1
    sub(/^, /,": ",a[i])
    if (n > 3)
      print n, avgLat[i] / numPts[i],avgLon[i] / numPts[i],i a[i] | "sort -n"
    sum += n
  }
  print sum,"total cities"
}


function DMS2Dec(dms,  neg) { # ddmmssx --> dd.ddddd
  CONVFMT = "%.5f"
  if (length(dms) == 7) { # latitude
    neg = (substr(dms,7,1) == "S") ? -1 : 1
    return neg * (substr(dms,1,2) + substr(dms,3,2)/60 + substr(dms,5,2)/3600)
  }
  else if (length(dms) == 8) { # longitude
    neg = (substr(dms,7,1) == "E") ? -1 : 1
    return neg * (substr(dms,1,3) + substr(dms,4,2)/60 + substr(dms,6,2)/3600)
  }
}







APPENDIX 2 - Sample from USCities.txt
==========


City		County		State		Latitude	Longitude
Abanda		Chambers	Alabama	330603N	0853147W
Abbeville	Henry		Alabama	313418N	0851502W
Abbot Springs	Shelby		Alabama	332139N	0862854W
Abel		Cleburne	Alabama	333255N	0854245W
.
.
(153,000+ cities omitted)




Daniel K. Allen
Measure | Analyze | Communicate


Perl Update - 27 Jan 2002

I used Perl's a2p tool to create a Perl script that did exactly the same thing as my original Awk script and it runs on OS X 10.1.2 in... drumroll please ... the Perl script on Mac OS X 10.1.2 runs in:

38 seconds and uses 24.5 MB of memory!

This does not make any sense of course. Awk scripts generally run faster than Perl scripts, but the Awk version takes 30 minutes and uses 300+ MB of memory on OS X, leading to the following bug report below... As to why the new Perl version runs so fast, I do not know. Note that the Awk version runs very quickly on Mac OS 9.2 or Windows 2000.

Cities script - Perl version

Just for completeness, here is the Perl script that runs quickly in OS X:
#!/usr/bin/perl
#  cities.pl  - city name frequencies from USCities.txt
# 17 Jul 2001 - Created by Dan Allen. (Paradise, CA)
# 19 Jul 2001 - Average location computed. (Paradise, CA, 3:00 AM)
# 27 Jan 2002 - Converted from Awk to Perl via a2p.


$[ = 1;			# set array base to 1
$\ = "\n";		# set output record separator
$FS = $, = "\t";
open(SORT__N, '|sort -n') || die 'Cannot pipe to "sort -n".';


line: while (<>) {
	chomp;	# strip record separator
	($Fld1,$Fld2,$Fld3,$Fld4,$Fld5) = split($FS, $_, 9999);
	if ($Fld3 eq 'Puerto Rico') { next line; }
	if ($Fld3 eq 'Guam') { next line; }
	$name = $Fld1;
	$lat = &DMS2Dec($Fld4);
	$lon = &DMS2Dec($Fld5);
	if ($lat =~ 'NAN' || $lon =~ 'NAN') { next line; }
	$name =~ s/,.*//;
	$name =~ s/ ?\(.*\)//;
	$name =~ s/St\.? /Saint /;
	if (!index($a{$name}, $Fld3)) {
		$a{$name} = $a{$name} . ', ' . $Fld3;
		$avgLat{$name} += &DMS2Dec($Fld4);
		$avgLon{$name} += &DMS2Dec($Fld5);
		$numPts{$name}++;
	}
}


foreach $i (keys %a) {
	$n = (@t = split(/,/, $a{$i}, 9999)) - 1;
	$a{$i} =~ s/^, /: /;
	print SORT__N $n, $avgLat{$i} / $numPts{$i},$avgLon{$i} / $numPts{$i}, $i . $a{$i} if $n > 3;
	$sum += $n;
}
print $sum, 'total cities';


sub DMS2Dec {
	local($dms, $neg) = @_;
	# ddmmssx --> dd.ddddd
	$CONVFMT = '%.5f';
	if (length($dms) == 7) {
		# latitude 472945N
		$neg = (substr($dms, 7, 1) eq 'S') ? -1 : 1;
		return $neg * (substr($dms, 1, 2) + substr($dms,3,2) / 60 + substr($dms, 5, 2) / 3600);
	}
	elsif (length($dms) == 8) {
		# longitude 1214708W
		$neg = (substr($dms, 7, 1) eq 'E') ? -1 : 1;
		return $neg * (substr($dms, 1, 3) + substr($dms,4,2) / 60 + substr($dms, 6, 2) / 3600);
	}
}


Timing Comparison

Using the Unix time command shows that awk is running 35 times slower on Mac OS X 10.1.2, as measured on 18 Feb 2002 on a 450 MHz G4 Cube with 512 MB of RAM.

[~/AwkTrue] % time awk -f cities.awk cities.txt > foo
 1472.970u 22.420s  36:08.55   68.9%    0+0k 0+16io 0pf+0w
[~/AwkTrue] % time perl cities.pl cities.txt > foopl
   41.540u  0.940s   1:46.48   39.8%    0+0k 3+5io 0pf+0w
36 minutes verses less than 2 minutes!


Problem Identified

4 Sept 2002

The problem has finally been identified! Using the sample command I have learned that the problem lies in deallocating memory. The system's szone_malloc routine is where the time is spent. As memory is deallocated by calling free the allocator tries to coelesce free blocks and I believe the bug lies therein, but I am not completely sure yet.

The source for this is available from http://www.opensource.apple.com/projects/darwin/1.4/projects.html, the Libc-186.tgz component, in the file Libc/gen.subproj/scalable_malloc.c. I am studying it now to see what the problem is.

On a new dual 1 GHz G4 tower I was able to get much better performance by changing the routine tfree in Awk's run.c source as follows:

void tfree(Cell *a)	/* free a tempcell */
{
	if (freeable(a)) {
		   dprintf( ("freeing %s %s %o\n", NN(a->nval), NN(a->sval), a->tval) );
		#ifdef __APPLE__ /* DKA 4 Sep 2002 */
		a->sval = NULL;
		#else
		xfree(a->sval);
		#endif
	}
	if (a == tmps)
		FATAL("tempcell list is curdled");
	a->cnext = tmps;
	tmps = a;
}
Note that xfree is a macro that wraps free.

My change was to not call free at all but to just set the pointer to NULL. This change changes virtual RSIZE RAM usage from 320 MB to 83 MB! Interesting how NOT freeing up memory uses much less memory. Memory management is definitely screwed up.

Here are the times without the fix and then with the fix on a Dual 1GHz G4 Tower running 10.2:

[~/AwkTrue] % time ./awk -f cities.awk < cities.txt > foo
1079.520u   9.030s   18:21.95   98.7%    0+0k 0+10io 0pf+0w
[~/AwkTrue] % time ./awk -f cities.awk < cities.txt > foo
  18.300u   0.830s    0:19.27   99.2%    0+0k 0+12io 0pf+0w
These results show that execution time has gone from 18 minutes down to 19 seconds. More work is needed to determine why the OS X malloc routines are behaving the way they are.


Problem Still There

12 Nov 2002

The 10.2.2 updater I installed this morning has not yet fixed the bug.


I'm Getting Upset

27 Jun 2003

Well we're now up to Mac OS X 10.2.6. These updates still have not fixed the bug. Many hangs occur on my system, and the sample tool reveals that most of these occur in threading and in something related to szone_malloc. This is the worst part of Mac OS X!


Fixed!

27 Nov 2003

The bug existed up through 10.2.8. In Panther, Mac OS X 10.3, the bug finally has been fixed. That's the good news. The bad news is that I thought it was the same bug that caused a lot of of the hangs in the system, but it wasn't: I've had 10.3 hang on me in the Finder just like before. Many other problems came with 10.3 (it broke printing entirely on my main Mac, Norton Utilities for OS X no longer work, network volumes often do not show up in Classic, Favorites are gone, etc.), so I have gone back to 10.2.8. Sigh. One step forward, several steps back.


The Bug Explained

12 Dec 2003

The bug was in libc, in the szone_malloc routines. I tried doing a diff between Darwin 6.8 (10.2.8) and Darwin 7.0 (10.3), hoping to see a small fix, but they changed a lot of code in scalable_malloc.c - the diffs were voluminous. It all has to do with coalescing small blocks of freed up memory, etc. It sure looks more like a garbage collector for Lisp or Java than it does a nice straightforward memory allocator.


Back to Dan Allen's home page.
Created:  27 Feb 2001
Modified: 12 Dec 2003