Topics include:
From: Dan Allen Sent: Saturday, July 21, 2001 12:08 PM To: MPW Subject: Unix / Win / Mac comparison Awk Benchmark 20 Jul 2001 By Dan Allen I wrote an interesting Awk script that greps through a list of over 150,000 US cities and does some calculations. This script runs extremely slowly under Mac OS 10.0.4, hereafter referred to as Unix. Here are the details: UNIX ==== On a Macintosh G4 Cube with 450 MHz G4 processor, 320 MB of RAM, and Mac OS 10.0.4, I ran the following from Terminal: awk -f cities.awk USCities.txt This took 1,800 seconds (30 minutes!!) to run on 10.0.4, and it used over 300 MB of virtual RAM. Here are a few snapshots from the "top" command while it ran: PID COMMAND %CPU TIME #TH #PRTS #MREGS RPRVT RSHRD RSIZE VSIZE 497 awk 69.7% 13:23.95 1 16 21 245M+ 100K- 245M+ 247M+ 497 awk 80.9% 19:01.76 1 16 22 231M+ 220K 161M+ 282M+ 497 awk 83.6% 20:57.76 1 16 22 233M 220K 180M+ 300M+ 497 awk 87.9% 22:58.63 1 16 22 233M- 220K 193M+ 314M The awk I used I built myself using sources from Brian Kernighan's website. I built it using the GCC that comes with Apple's Developer CD using the -O3 level of optimizations which results in an executable 136,476 bytes in size. Small awk scripts run in an acceptable time, but this large script shows up a major problem in memory allocation. WINDOWS 2000 ============ If I run the same command on a 450 MHz Pentium III Dell computer using Windows 2000 and 256 MB of RAM, the job runs in 50 seconds and uses 32 MB of RAM. (The Mem Usage is 32 MB and the VM Size is 32 MB, as measured by the Windows Task Manager. This does not mean 64 MB of RAM--it means that all of the RAM that was virtually allocated also was resident in physical RAM.) For Windows I built awk using Visual C++ 6.0 -Ox from the exact same sources as for Unix and Mac. The resulting awk binary is 139,264 bytes in size. MAC OS 9.1 ========== If I run the same script on the same G4 Cube but running under Mac OS 9.1 and MPW, the job runs in 269 seconds and uses a maximum of 26 MB of memory. This version of awk is 101,938 bytes in size and was built using MPW's MRC 4.1 with the -opt speed level of optimization, thus all 3 builds of awk were optimized for speed. Note: The sort was actually done via piping the output of the cities.awk script into the sort tool under MPW as my port of Awk under MPW does not support piping or the system command from an MPW tool. The sort, however, happens near instantly on all systems so it is in the noise.) SUMMARY ======= On all three platforms (Mac, Unix, Windows) the same results are returned, a file about 679 KB in size. (It is a list of cities showing the city names that are the most commonly used and a list of the states that each city is found in. It also computes the geographic center of the cities that share a common name.) Comparative Stats: Windows is 36 times faster than Unix. Windows is 5.4 times faster than Mac OS. Mac OS is 6.7 times faster than Unix. Mac OS uses 26 MB of memory, about 81% of Windows. Windows uses 32 MB of memory, about 10% of Unix. Unix uses 314 MB of memory, about 12 times Mac OS. This explains why Apple's new Unix is incredibly slow compared to the Mac OS: memory allocation has major problems! Why is over 314 MB allocated when the other operating systems use 25-32 MB of RAM? It seems to me that solving this problem would solve a lot of Mac OS X sluggishness problems, such as those in the Finder and menus. APPENDIX 1 - Cities.awk script used in this test ========== #!/Users/danallen/bin/awk -f # cities.awk - city name frequencies from USCities.txt # 17 Jul 2001 - Created by Dan Allen. # 19 Jul 2001 - Average location computed. # 20 Jul 2001 - Ported to OS X. BEGIN { FS = OFS = "\t" } $3 == "Puerto Rico" || $3 == "Guam" { next } { name = $1 lat = DMS2Dec($4) lon = DMS2Dec($5) if (lat ~ "NAN" || lon ~ "NAN") next sub(/,.*/,"",name) sub(/ ?\(.*\)/,"",name) sub(/St\.? /,"Saint ",name) if (!index(a[name],$3)) { a[name] = a[name] ", " $3 avgLat[name] += DMS2Dec($4) avgLon[name] += DMS2Dec($5) numPts[name]++ } } END { for (i in a) { n = split(a[i],t,",") - 1 sub(/^, /,": ",a[i]) if (n > 3) print n, avgLat[i] / numPts[i],avgLon[i] / numPts[i],i a[i] | "sort -n" sum += n } print sum,"total cities" } function DMS2Dec(dms, neg) { # ddmmssx --> dd.ddddd CONVFMT = "%.5f" if (length(dms) == 7) { # latitude neg = (substr(dms,7,1) == "S") ? -1 : 1 return neg * (substr(dms,1,2) + substr(dms,3,2)/60 + substr(dms,5,2)/3600) } else if (length(dms) == 8) { # longitude neg = (substr(dms,7,1) == "E") ? -1 : 1 return neg * (substr(dms,1,3) + substr(dms,4,2)/60 + substr(dms,6,2)/3600) } } APPENDIX 2 - Sample from USCities.txt ========== City County State Latitude Longitude Abanda Chambers Alabama 330603N 0853147W Abbeville Henry Alabama 313418N 0851502W Abbot Springs Shelby Alabama 332139N 0862854W Abel Cleburne Alabama 333255N 0854245W . . (153,000+ cities omitted) Daniel K. Allen Measure | Analyze | Communicate
Perl Update - 27 Jan 2002
I used Perl's a2p tool to create a Perl script that did exactly the
same thing as my original Awk script and it runs on OS X 10.1.2 in... drumroll
please ... the Perl script on Mac OS X 10.1.2 runs in:
38 seconds and uses 24.5 MB of memory!
This does not make any sense of course. Awk scripts generally run faster than Perl scripts, but the Awk version takes 30 minutes and uses 300+ MB of memory on OS X, leading to the following bug report below... As to why the new Perl version runs so fast, I do not know. Note that the Awk version runs very quickly on Mac OS 9.2 or Windows 2000.
Cities script - Perl version
Just for completeness, here is the Perl script that runs quickly in OS X:
#!/usr/bin/perl # cities.pl - city name frequencies from USCities.txt # 17 Jul 2001 - Created by Dan Allen. (Paradise, CA) # 19 Jul 2001 - Average location computed. (Paradise, CA, 3:00 AM) # 27 Jan 2002 - Converted from Awk to Perl via a2p. $[ = 1; # set array base to 1 $\ = "\n"; # set output record separator $FS = $, = "\t"; open(SORT__N, '|sort -n') || die 'Cannot pipe to "sort -n".'; line: while (<>) { chomp; # strip record separator ($Fld1,$Fld2,$Fld3,$Fld4,$Fld5) = split($FS, $_, 9999); if ($Fld3 eq 'Puerto Rico') { next line; } if ($Fld3 eq 'Guam') { next line; } $name = $Fld1; $lat = &DMS2Dec($Fld4); $lon = &DMS2Dec($Fld5); if ($lat =~ 'NAN' || $lon =~ 'NAN') { next line; } $name =~ s/,.*//; $name =~ s/ ?\(.*\)//; $name =~ s/St\.? /Saint /; if (!index($a{$name}, $Fld3)) { $a{$name} = $a{$name} . ', ' . $Fld3; $avgLat{$name} += &DMS2Dec($Fld4); $avgLon{$name} += &DMS2Dec($Fld5); $numPts{$name}++; } } foreach $i (keys %a) { $n = (@t = split(/,/, $a{$i}, 9999)) - 1; $a{$i} =~ s/^, /: /; print SORT__N $n, $avgLat{$i} / $numPts{$i},$avgLon{$i} / $numPts{$i}, $i . $a{$i} if $n > 3; $sum += $n; } print $sum, 'total cities'; sub DMS2Dec { local($dms, $neg) = @_; # ddmmssx --> dd.ddddd $CONVFMT = '%.5f'; if (length($dms) == 7) { # latitude 472945N $neg = (substr($dms, 7, 1) eq 'S') ? -1 : 1; return $neg * (substr($dms, 1, 2) + substr($dms,3,2) / 60 + substr($dms, 5, 2) / 3600); } elsif (length($dms) == 8) { # longitude 1214708W $neg = (substr($dms, 7, 1) eq 'E') ? -1 : 1; return $neg * (substr($dms, 1, 3) + substr($dms,4,2) / 60 + substr($dms, 6, 2) / 3600); } }
Timing Comparison
Using the Unix time command shows that awk is running 35 times slower on Mac OS X 10.1.2,
as measured on 18 Feb 2002 on a 450 MHz G4 Cube with 512 MB of RAM.
[~/AwkTrue] % time awk -f cities.awk cities.txt > foo 1472.970u 22.420s 36:08.55 68.9% 0+0k 0+16io 0pf+0w [~/AwkTrue] % time perl cities.pl cities.txt > foopl 41.540u 0.940s 1:46.48 39.8% 0+0k 3+5io 0pf+0w36 minutes verses less than 2 minutes!
Problem Identified
4 Sept 2002
The problem has finally been identified!
Using the sample command I have learned that the problem lies in deallocating memory.
The system's szone_malloc routine is where the time is spent. As memory is deallocated
by calling free the allocator tries to coelesce free blocks and I believe the bug lies therein,
but I am not completely sure yet.
The source for this is available from http://www.opensource.apple.com/projects/darwin/1.4/projects.html, the Libc-186.tgz component, in the file Libc/gen.subproj/scalable_malloc.c. I am studying it now to see what the problem is.
On a new dual 1 GHz G4 tower I was able to get much better performance by changing the routine tfree in Awk's run.c source as follows:
void tfree(Cell *a) /* free a tempcell */ { if (freeable(a)) { dprintf( ("freeing %s %s %o\n", NN(a->nval), NN(a->sval), a->tval) ); #ifdef __APPLE__ /* DKA 4 Sep 2002 */ a->sval = NULL; #else xfree(a->sval); #endif } if (a == tmps) FATAL("tempcell list is curdled"); a->cnext = tmps; tmps = a; }Note that xfree is a macro that wraps free.
My change was to not call free at all but to just set the pointer to NULL. This change changes virtual RSIZE RAM usage from 320 MB to 83 MB! Interesting how NOT freeing up memory uses much less memory. Memory management is definitely screwed up.
Here are the times without the fix and then with the fix on a Dual 1GHz G4 Tower running 10.2:
[~/AwkTrue] % time ./awk -f cities.awk < cities.txt > foo 1079.520u 9.030s 18:21.95 98.7% 0+0k 0+10io 0pf+0w [~/AwkTrue] % time ./awk -f cities.awk < cities.txt > foo 18.300u 0.830s 0:19.27 99.2% 0+0k 0+12io 0pf+0wThese results show that execution time has gone from 18 minutes down to 19 seconds. More work is needed to determine why the OS X malloc routines are behaving the way they are.
Problem Still There
12 Nov 2002
The 10.2.2 updater I installed this morning has not yet fixed the bug.
I'm Getting Upset
27 Jun 2003
Well we're now up to Mac OS X 10.2.6. These updates still have not fixed the bug. Many hangs occur on my system, and the
sample tool reveals that most of these occur in threading and in something related
to szone_malloc. This is the worst part of Mac OS X!
Fixed!
27 Nov 2003
The bug existed up through 10.2.8. In Panther, Mac OS X 10.3, the bug finally has been fixed.
That's the good news. The bad news is that I thought it was the same bug that caused a lot of
of the hangs in the system, but it wasn't: I've had 10.3 hang on me in the Finder just like before.
Many other problems came with 10.3 (it broke printing entirely on my main Mac, Norton Utilities
for OS X no longer work, network volumes often do not show up in Classic, Favorites are gone, etc.),
so I have gone back to 10.2.8. Sigh. One step forward, several steps back.
The Bug Explained
12 Dec 2003
The bug was in libc, in the szone_malloc routines. I tried doing a diff between Darwin 6.8 (10.2.8) and
Darwin 7.0 (10.3), hoping to see a small fix, but they changed a lot of code in scalable_malloc.c - the
diffs were voluminous. It all has to do with coalescing small blocks of freed up memory, etc. It
sure looks more like a garbage collector for Lisp or Java than it does a nice straightforward memory
allocator.
Created: 27 Feb 2001 Modified: 12 Dec 2003