Tuesday, May 30, 2006

Sandboxes everywhere

Jon Udell's most recent InfoWorld column talks about "Easing app deployment with an open source sandbox". The article talks about a hosting service which offers an automated sandbox installer and thus considerably eases the deployment and testing of packages. This resonates with one of the main goals of the Cheesecake Summer of Code project: to offer a sandbox environment where Python packages can be uploaded and inspected dynamically, by running their unit tests, getting code coverage numbers, etc.

To me, 2006 seems to be the year of the virtual machine. Here's the equation:

commodity hardware + solid open-source virtualization technologies such as Xen = great opportunities for testers

Many companies have already started to capitalize on this equation (Autoriginate/HostedQA is just one example). The beauty of Open Source however makes this opportunity available to anybody who possesses a medium-to-high amount of Linux hacking skillz :-).

I'll post more about this topic as work on Cheesecake/SoC progresses. The way I see it, we'll offer a way for people to post their packages to one of our servers, and we'll compute all the dynamic Cheesecake scores (such as code coverage obtained by running unit tests) in a dedicated virtual machine. These dynamic scores will also be computed when a request comes from the PyPI interface. This is all on the drawing board right now, but that's the general idea.

Another related project that hasn't been started yet, based on an idea that Titus had last year, would be to automatically apply patches to Python core, compile, build and run all unit tests, all of this in a safe sandbox environment. This will hopefully lower the barrier of accepting patches into Python core.

Michael Feathers on refactoring and continuous integration

From the author of "Working effectively with legacy code", Michael Feathers, a blog post on "Refactoring needs more than tests". Interesting point of view: sometimes unit tests are not enough when it comes to refactoring.

M.F. disusses the situation of a project with a large code base, where several teams work on different releases/branches of the code at the same time. How do you refactor with confidence in this case? Another scenario discussed in the post is a project with dependencies on 3rd party code. How do you refactor with confidence, when you know that the 3rd party code keeps changing and needs to be patched everytime it gets updated?

Michael's answer is: integrate frequently. I say: buildbot to the rescue! :-)

Wednesday, May 24, 2006

Cheesecake and the Summer of Code

I'm very pleased to announce that Michał Kwiatkowski's project "Cheesecake enhancements and its integration with PyPI" was accepted as a Google Summer of Code project under the Python Software Foundation umbrella. Here's a summary of Michał's application:

Cheesecake is an application designed to evaluate and estimate the overall quality (or so called 'kwalitee') of a given software package written in Python. It emphasizes a need for well-written documentation and unit tests, encouraging good programming practices and penalizing sloppy design and careless distribution. Using Cheesecake to check your code gives you confidence that your software doesn't merely run, but is usable and easy to test and modify as well.

Because Python is very easy to learn and use there exists a vast variety of software written in it, most of which was scattered until PyPI was created. Now, when new packages are being indexed on Cheese Shop every day, an effort can be made to spread the spirit of good software design and code reuse among the Python community. This can be achieved by combining the power of Cheesecake and Cheese Shop. Everytime a new version of a package would be uploaded to Cheese Shop, its cheesecake index will be calculated and published on web. Having a way to measure a quality of a package with accordance to other existing packages will be of invaluable help for all developers. It will promote well built packages and in the long run raise the overall quality of Python software.

Adding Cheesecake functionality to PyPI has been already mentioned by Phillip J. Eby on the catalog-sig mailing list. Together with Cheesecake maintainer Grig Gheorghiu we've discussed modifications needed to be done to Cheesecake code to be reliable enough so it could be incorporated into PyPI service. A working copy of our ideas is accessible on the project wiki. It includes enhancing Cheesecake code scoring techniques to take into account unit tests of a package, running tests in secure environment, extending supported archive formats and fixing all known bugs. Development of Cheesecake will adhere to best practices such as unit testing, continuous integration (via buildbot), pylint verification, etc.

The next part of this project will include collaboration with Richard Jones, PyPI maintainer, and merging Cheesecake into PyPI service. Upon completion all PyPI uploads will be automatically scored by Cheesecake. It will be possible to browse packages archive by cheesecake index, sorting results by installability, documentation and code kwalitee index. Statistics in numeric and graphical form will also be made available. This part of a project will involve writing server-side code, with emphasis on security and robustness.

The remaining time will be spent on resolving all problems that would occur during usage of Cheesecake and PyPI. Along with fixing bugs, I will develop a simple Hello world package that can be taken as an example of good development practices for all Python developers. It should also score 100% in the Cheesecake test of course. ;-) It will be what hello is for GNU Project.

If you're interested in details, this Cheesecake wiki page contains a lot of ideas which will start being turned into reality as of today :-) Please feel free to edit the page and add your own wishlist-type items.

Here are a few thoughts I had regarding the value of this project:

This project will have 2 very important contributions: first of all, it will integrate with PyPI and help rank the Cheeseshop packages according to various quality criteria. People learn better by example -- and what better examples than tools that score high on a scale that looks at different quality indicators such as documentation, installability, and code 'kwalitee'? Cheesecake will provide a way to identify the best-of-breed packages in those areas.

Second, the project will investigate ways to dynamically assess packages by executing their code in a sandbox environment. This will help mainly with getting code coverage numbers by running a project's unit tests, but one can easily envision many other applications -- one idea that Titus Brown had was to automatically apply and verify patches to Python core, without the fear that the host machine will crash and burn. This will hopefully streamline the process of accepting patches into Python core (a famously complicated process currently).

Michał and I will use Trac to manage this project. The idea is to have short iterations represented as milestones in Trac, with tickets of type 'enhancement' that represent the stories to be done in each iteration. Each story will be split into short tasks that can be accomplished in a matter of hours, and each task will be represented as a ticket of type....'task', what else? This will give us a nice way of watching the progress of the project over the summer. Of course, the criterion for the completion of a given story is: all unit/acceptance/functional tests should pass for that story.

I'm very excited to have Michał work on this project and I'm very hopeful that at the end of this summer we'll have a solid application that will benefit the Python community.

Here is the list of the 25 applications accepted to the Summer of Code under the PSF umbrella.

Wednesday, May 10, 2006

Dynamically updating buildbot status text

Let's assume you want to update the build step status text displayed in the buildbot HTML status page, based on some information that only the build slave knows -- such as a version number that is computed by the slave during the build step for example.

Note that if you just want to customize the build step status with some text that is known in advance by the master (e.g. "client install" or "twill functional tests"), all you need to do is to subclass from ShellCommand and set the descriptionDone class variable to the desired custom text. See this post for more details on how to do this.

For dynamically updating the status text, the solution I found was to override some of the methods in the ShellCommand class.

My particular scenario is this: the build slave installs some package and identifies its version number. I want to be able to display that version number in the status for that build step.

I defined the following subclass of ShellCommand:

class ClientInstall(ShellCommand):
name = "client install"
description = ["running %s" % name]
descriptionDone = [name]

def __init__(self, **kwargs):
ShellCommand.__init__(self, **kwargs)
self.version = None

def createSummary(self, log):
log_text = log.getText()
s = re.search("--version=(.*)", log_text)
if s:
self.version = s.group(1)

def getText(self, cmd, results):
text = self.describe(True)[:]
if results == WARNINGS:
text.append("warnings")
if results == FAILURE:
text.append("failed")
if self.version:
text.append("version=" + self.version)
return text
The two most important methods in this case are createSummary and getText. I chose createSummary for overriding because it has access to the slave's log. In my case, that log contained the version number computed by the slave, so I just introduced a new variable, self.version, and set it to the result of a regular expression search for "--version=(.*)".

The getText method is called by ShellCommand inside the setStatus method, like this (the ShellCommand class lives in the process/step.py file installed under the buildbot root installation directory, in my case /usr/local/lib/python2.4/site-packages/buildbot):

def setStatus(self, cmd, results):
# this is good enough for most steps, but it can be overridden to
# get more control over the displayed text
self.step_status.setColor(self.getColor(cmd, results))
self.step_status.setText(self.getText(cmd, results))
self.step_status.setText2(self.maybeGetText2(cmd, results))
My overridden version of getText (shown above) checks to see if self.version is non-empty, and if this is the case, it appends it to the variable text, which is a copy of the list returned by self.describe(True). Copying the list into a variable instead of modifying it in place is very important. Initially, I did something like:

text = self.describe(True)
if results == WARNINGS:
text += ["warnings"]

The net effect of this was that each of my build slaves was updating this particular build step status with version information from all the other build slaves. It felt like a global or class-wide variable was being trampled under foot by all the build slaves, and indeed this was the case, as I found out when I sent a message to the buildbot-devel list and Brian Warned explained what was going on: the list of strings returned by self.describe() is a class-wide value that's not supposed to be mutated. Brian suggested modifying the above snippet of code to:

text = self.describe(True)
if results == WARNINGS:
text = text + ["warnings"]
Neal Norwitz suggested the solution I finally adopted, which is to first make a copy of the list returned by self.describe, then append to it. This is more efficient, because it only allocates the list one time, then resizes it if necessary:

text = self.describe(True)[:]
if results == WARNINGS:
text.append("warnings")
Once again, buildbot proved to be very flexible and customizable -- but not without jumping through some hoops in this particular scenario. In any case, I hope this post will be useful for buildbot users out there who want to display more customized information in their build steps.

Tuesday, May 09, 2006

SSH tunnelling with Putty

Courtesy of David Hancock, here's a mini-howto on configuring Putty for SSH tunnelling. Let's say you have an account on a Linux box (with an IP address of 192.168.2.100) that you can SSH into. Let's say you want to connect to a Trac instance running on port 8000 on a different box (with IP 192.168.2.200), and you can't get directly to port 8000 on the second IP. You can still use your account on the first box and create an ssh tunnel that will allow you to get to port 8000 on IP #2.

Here's David's howto, almost verbatim:

What we'll do is forward port 9080 on the PC to 8000 on 192.168.2.200 (the host/port for Trac). I'm using Putty version 0.54.

1. Start Putty (so you're looking at the PuTTY Configuration screen.)
2. Enter 192.168.2.100 (the IP of the box you can ssh into) in the Host name / IP address box.
3. Check SSH as the protocol (port number should change to 22.)
4. Enter 'trac-tunnel' as the Saved Sessions name, and click Save.
5. Open the Connection list in the left pane.
6. Open the SSH list in the left pane, Click Tunnels.
7. Check X11 Forwarding (in case you need to run X-based applications.)
8. Back on the right side, at the bottom, enter 9080 for source port (there's nothing special about port 9080, it can be any non-used port on your local machine.)
9. Enter 192.168.2.200:8000 as the Destination, leave Local checked.
10. Click Add.
11. Important, easy to forget: Click Session on the left pane, Click Save.

Now your 'trac-tunnel' session will not only connect you to the .100 box, but when you're logged into the .100, it will mediate a tunnel between your PC's port 9080 and port 8000 on 192.168.2.200.

So, let's try it out:

1. Use Putty to open the 'trac-tunnel' connection, and log in as yourself
2. Point your browser to http://127.0.0.1:9080/ and you'll get right in.

You'd repeat Steps 8-10 to add more local port forwardings. Step 11 is easy to forget, so be warned...

Zen of Unicode

I attended David Goodger's Unicode talk at PyCon earlier this year and I thought I'm well on my way to Unicode enlightenment. It turns out I still need to chop a lot of wood, carry a lot of water before I attain this particular Zen...In the hope that other people will find it useful, here's a mini-tutorial on Unicode in the form of an email message from David, who responded in excruciating detail to some Unicode-related questions I sent him. I tried to copy and paste the text into the Blogger editor, only to get all sorts of markup-related errors, so I just put it on a Trac wiki. Hopefully David will soon publish his Unicode tutorial on the Web. Until then, happy Unicode hacking!

Modifying EC2 security groups via AWS Lambda functions

One task that comes up again and again is adding, removing or updating source CIDR blocks in various security groups in an EC2 infrastructur...