Distribution comparisons
With Debian and Ubuntu so closely related, it's often interesting to see just how close the results are in technical terms. One of the easiest ways to do that is just by a raw package-by-package comparison. I've done that a few times now, including when warty released and when hoary released and more recently in reviewing Debian and Ubuntu security support.
Comparing a release of Debian with a release of Ubuntu in this manner isn't too hard: it's a matter of getting the Sources (or Packages) files for the releases, possibly merging main and universe for Ubuntu, or skipping the universe packages in Debian, and then looking at the version strings for the corresponding packages. It's usually interesting to check three different components of the version strings -- the upstream component gives you some idea how up to date the package is, the maintainer revision hints at how many patches have been applied, and if there's an "ubuntuNN" at the end you know you've got some Ubuntu patches in the package.
So here we go. Boilerplate to start:
#!/usr/bin/env python
# Copyright (C) 2009 Anthony Towns <aj@erisian.com.au>
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
For parsing purposes we need a few external libraries -- regexps to pull version strings apart, compression libraries to get at the actual package info, and aptpkg to do accurate comparisons of Debian version strings. Unfortunately aptpkg needs to be initialised before it'll do anything useful, so we'll get that out of the way here too.
import re, bz2, gzip, apt_pkg
apt_pkg.init()
So the first bit of work we'll do is to parse the Packages/Sources files. We'll make a class that does that for us, and only keep the things we're really interested in -- package name, version, and section (so we can pull out main-v-universe info). We'll also allow ourselves to specify a couple of files at once, so that merging of main+universe in Ubuntu (or even main+contrib+non-free in Debian) is trivial.
class Packages(object):
def __init__(self, filenames):
if type(filenames) != list:
filenames=[filenames]
self.packages={}
self.section={}
for f in filenames:
self.read(f)
def read(self, filename):
if filename.endswith(".gz"):
f = zlib.GzipFile(filename)
elif filename.endswith(".bz2"):
f = bz2.BZ2File(filename)
else:
f = open(filename)
p,v,s=None,None,None
for l in f.xreadlines():
if l.startswith("Package: "):
p=l[9:].strip()
elif l.startswith("Version: "):
v=l[9:].strip()
elif l.startswith("Section: "):
s=l[9:].strip()
elif l == "\n":
if p is not None and v is not None: self.packages[p]=v
if p is not None and s is not None: self.section[p]=s
p,v=None,None
So having gotten that far the next interesting thing to do is the actual comparisons. For probably no good reason we combine the packages objects into a hash -- the "d" entry is the Debian instance, the "u" entry is the Ubuntu one. The regular files are called "norm" (as in normal), the security ones "sec".
We handle the basic comparison by iterating through all packages and assigning a two letter code to each package; "==" if they're the same, otherwise the first letter is "U" or "D" representing whoever's better, and the second letter is "*" if it's not in the other distro, or "u" if it's a newer upstream, "m" if it's a new Debian revision, or "p" if it's just an ubuntuN patch.
def compare_debian_ubuntu(norm):
d=norm["d"].packages
u=norm["u"].packages
pkgs=list( set(d.keys()) | set(u.keys()) )
pkgs.sort()
re_deb=re.compile("^(.*?)(-([^-]*))?$")
re_ubu=re.compile("^(.*?)(-([^-]*?))?(ubuntu[0-9]+)?$")
res={}
for p in pkgs:
if p not in d:
res[p]="U*"
continue
elif p not in u:
res[p]="D*"
continue
if d[p]==u[p]:
res[p]="=="
continue
(d_up,_,d_de) = re_deb.match(d[p]).groups()
(u_up,_,u_de,u_ub) = re_ubu.match(u[p]).groups()
def vercmp(x,y,char):
x=apt_pkg.VersionCompare(x,y)
if x<0:
return "U"+char
elif x>0:
return "D"+char
else:
return "="+char
if d_up != u_up:
res[p]=vercmp(d_up,u_up,"u")
continue
elif d_de != u_de:
res[p]=vercmp("%s-%s"%(d_up,d_de),"%s-%s"%(u_up,u_de),"m")
continue
elif u_ub is not None:
res[p]="Up"
continue
else:
res[p]="??"
continue
return res
We also supply a function to summarise those results, and expand the meaning of the two letter codes.
def print_cnts(res):
cnts={}
str=""
for p in res.keys():
x=res[p]
cnts[x]=cnts.get(x,0)+1
codes=["==","D*","Du","Dm","U*","Uu","Um","Up"]
what=["same version", "only in Debian",
"Debian has newer upstream","Debian has newer patches",
"only in Ubuntu", "Ubuntu has newer upstream",
"Ubuntu has newer Debian patches",
"Ubuntu has ubuntuX patches"]
total=sum(cnts.values())
for c,w in zip(codes, what):
if cnts[c] == 0: continue
str += "%7d (%2.0f%%) %s\n" % (cnts[c],cnts[c]*100.0/total,w)
return str
Comparing security responses builds on the earlier comparison -- with the added conundrum that it's occassionally possible that packages are introduced in security updates, even though they weren't in the original release. But otherwise it's pretty much the same -- and we're mostly about returning a three character string that's a "D" or a "U" if there's only an update in one distro, or an "=" if both included an update, along with the previous two characters we already worked out (and a separating space).
def compare_debian_ubuntu_security(norm,sec,res):
deb_nonsec,deb=norm["d"],sec["d"]
ubu_nonsec,ubu=norm["u"],sec["u"]
d,u=deb.packages,ubu.packages
pkgs=set(d.keys()) | set(u.keys())
pkgs=list(pkgs)
pkgs.sort()
not_present = {"d":[],"u":[]}
pull_for = {"d":[],"u":[]}
counts={}
for p in pkgs:
if p in d and p not in deb_nonsec.packages:
not_present["d"].append(p)
if p in u and p not in ubu_nonsec.packages:
not_present["u"].append(p)
if p not in res:
continue
resp=res[p]
if p not in d:
l="U"
if resp == "==": pull_for["d"].append(p)
elif p not in u:
l="D"
if resp == "==": pull_for["u"].append(p)
else:
l="="
which = "%s %s" % (l,resp)
counts[which] = counts.get(which,0)+1
return (counts,not_present,pull_for)
Because of all the special cases, summarising the results isn't quite as easy. We simplify it a bit by having a sub-function to collapse the results that are all essentially the same, but otherwise it's a matter of just looking at every possibility, and adding an explanation of what it means.
def print_sec_cmp(x,norm,secs):
counts,not_present,pull_for = x
eq = ["= ==", "= Du", "= Dm", "= Uu", "= Um", "= Up",
"D D*", "U U*",
"D ==", "U ==",
"D Du", "D Dm", "D Uu", "D Um", "D Up",
"U Du", "U Dm", "U Uu", "U Um", "U Up"
]
counts_o = [counts.get(x,0) for x in eq]
for x in counts:
if x in eq: continue
print "Weird result: %s %d" % (x,counts[x])
what=["same version",
"Debian has newer upstream","Debian has newer patches",
"Ubuntu has newer upstream", "Ubuntu has newer Debian patches",
"Ubuntu has ubuntuX patches"]
def report_subset(str,cmps):
print str % sum(cmps)
for n,w in zip(cmps, what):
if n == 0: continue
print " %3d %s" % (n,w)
report_subset("%d packages with security updates in both Debian and Ubuntu",
counts_o[0:6])
print "\n%d updates in Debian to Debian only packages" % (counts_o[6])
print "%d updates in Ubuntu to Ubuntu only packages" % (counts_o[7])
print "\n%d updates in Debian to packages with the exact same source in Ubuntu" % (counts_o[8])
print "%d updates in Ubuntu to packages with the exact same source in Debian" % (counts_o[9])
report_subset("\n%d packages updated in Debian but not Ubuntu",
[0]+counts_o[10:15])
report_subset("\n%d packages updated in Ubuntu but not Debian",
[0]+counts_o[15:20])
for distro in ["Debian","Ubuntu"]:
l=distro[0].lower()
if not_present[l]:
print "\nUpdates in %s for packages not in %s:" % (distro,distro)
for np in not_present[l]:
print " %s %s %s" % (np,secs[l].packages[np],secs[l].section[np])
if pull_for[l]:
print "\nPackages %s should pull advisories for:" % (distro)
for pf in pull_for[l]:
print " %s %s %s" % (pf,norm[l].packages[pf],norm[l].section[pf])
print ""
With all that worked out, we can do the actual deed! This is all hardcoded -- I pulled the various Sources files from the Debian and Ubuntu archives and snapshot.debian.net, and put them in the current working directory, and if you want the same results, you'll have to do that too... Anyway, the idea is to populate a dictionary with two packages instances for each distro -- the original release, and the latest security updates.
suites = {
"etch": (Packages("Sources_etch_20070408.bz2"),
Packages("Sources_etch_security.bz2")),
"lenny": (Packages("Sources_lenny_20090215.bz2"),
Packages("Sources_lenny_security.bz2")),
"feisty": (Packages(["Sources_feisty_main_20070417.bz2",
"Sources_feisty_universe_20070417.bz2"]),
Packages(["Sources_feisty_main_security.bz2",
"Sources_feisty_universe_security.bz2"])),
"hardy": (Packages(["Sources_hardy_main_release.bz2",
"Sources_hardy_universe_release.bz2"]),
Packages(["Sources_hardy_main_security.bz2",
"Sources_hardy_universe_security.bz2"])),
"intrepid": (Packages(["Sources_intrepid_main_20081120.bz2",
"Sources_intrepid_universe_20081120.bz2"]),
Packages(["Sources_intrepid_main_security.bz2",
"Sources_intrepid_universe_security.bz2"])),
"jaunty": (Packages(["Sources_jaunty_main_20090423.bz2",
"Sources_jaunty_universe_20090423.bz2"]),
Packages(["Sources_jaunty_main_security.bz2",
"Sources_jaunty_universe_security.bz2"]))
}
And having done that, we add a little function that does the full comparison between two distros (using the right input format for the previously defined functions)...
def pretty_compare(dname,uname):
name="%s vs %s" % (dname,uname)
norm={"d":suites[dname][0], "u":suites[uname][0]}
sec={"d":suites[dname][1], "u":suites[uname][1]}
cmp=compare_debian_ubuntu(norm)
print "%s\n%s\n\n%s" % (name,"="*(len(name)),print_cnts(cmp))
x=compare_debian_ubuntu_security(norm,sec,cmp)
print_sec_cmp(x,norm,sec)
And then compare the distros that might be interesting:
pretty_compare("etch","feisty")
pretty_compare("lenny","intrepid")
pretty_compare("lenny","jaunty")
pretty_compare("etch","hardy")
pretty_compare("lenny","hardy")
And finally make sure you get linked on LWN, and you're done!