Recomputing With Microsoft Azure

One of the important ways in which we want to make recomputable experiments easily available is to have them be available in a variety of ways. This is why we applied for an Azure for Research grant in 2013, and were delighted to receive one: in fact we got more cloud resources than we had requested! We are also very grateful that an extension to our grant was very generously given when Ian Gent’s illness caused problems for the project.

Given that we have made experiments available as Vagrant boxes, using Oracle VirtualBox, why would we want to focus on a cloud solution as well? A key issue - often raised by people who discuss recomputation with us - is that it necessitates a major download to run an experiment. To run even the simplest experiment might need a half gigabyte download, and for complex experiments with much data the download size is essentially prohibitive. A second major problem is our focus of providing VirtualBox virtual machines. This software is freely available - and free as-in-beer. However, this doesn’t mean it is universal. One obvious place it cannot be used are in a locked-down environment where a user cannot install extra software. This is perhaps not critical for our key target market of researchers who typically have installation rights on their experimental machines. But a second problem is critical. This is that somebody might have a competing virtualisation product installed, such as VMWare, and typically two competing virtualisation suites cannot be used on the same machine because they compete catastrophically for low level resources. I remember one of the fastest crashes I ever saw was when I fired up two different virtual machine executables at the same time. (For full disclosure this was deliberate for fun to see if the dire warnings each product gave were correct: they were!)

So this is why our work with Microsoft Azure is really important to us. It gives us the ability to make experiments available in a way which needs neither software installation nor major downloads for users. Users will need to have an Azure account and pay to rerun experiments, but this cost can be on the scale of pennies if the experiment is not large. For users who can use Azure, this can be a much nicer alternative than downloading full virtual machines.

Another important use-case for Azure is to take experiments already prepared in Azure and make them available via recomputation.org. We expect to see more experiments being performed in Azure, so we would like to make it easy for experimenters to give us their experiments. After all, a cloud experiment must necessarily have been virtualised - or indeed never existed in pure hardware form. So it should be easier to get it into recomputation.org than if the experiment had existed only on a physical machine in a lab somewhere.

A service like Microsoft Azure cannot be the only way of providing recomputable experiments. The main reason is in our mission that we wish experiments to be recomputable for 20 years. With a standalone freely licensed and free tool like VirtualBox, we have a chance of persisting - albeit with difficulty - even if Oracle stops supporting and distributing the product. We can keep old versions of the software, and if necessary even old hardware to run it on. While we cannot do anything similar for commercial cloud services, it’s very important to have different options for users to rerun experiments, including cloud platforms such as Microsoft Azure, as long as they are available

Here at recomputation.org, we are continually exploring ways to make things easy for those who wish to make their experiments recomputable. In this regard, we currently have two disticnt strategies by which we support Microsft Azure. They are:

  1. Providing pre-configured Ubuntu images in VMDepot for easy creation of recomputation.org compatible VMs. These images have already been “Vagrantised”, meaning they have been setup with the required Vagrant software needed for automating the process of provisioning and launching VM boxes. Users creating Azure VMs from these images only need to worry about setting up their experiments. They can thereafter give us the downloaded virtual hard drive (VHD) of their VM to be published on recomputation.org. Another benefit of using recomputation images is that these custom images are much smaller in size, currently 2GB, in comparison to the 20GB standard images. Therefore, users may find downloading VMs made out of these images more convenient.

  2. Create VMs of recomputable experiments in the Azure cloud through Vagrant Azure interface. This approach is useful for users who wish execute experiments in recomputation.org but do not have suffient computing resources. In this case, a user needs to only install Vagrant in their local computer, and instruct Vagrant to instatiate a VM in the Azure cloud from an experiment hosted in recomputation.org. The user, of course, needs to have an Azure account to make use of this capability.

Once the necessary Vagrant software, plug-in and configuration file have all been installed on the local computer, the following commands show how to instantiate VM in the Azure cloud from an experiment in recomputation.org.

$ vagrant plugin install vagrant-azure
$ vagrant box add azure http://recomputation.org/cp2013/experiment1/recomputation-QueensPuzzle-b.box
$ vagrant up --provider=azure

Detailed instructions for deploying experiments on Azure with this approach can be found in at this GitHub repository.

Recomputation.org has always been a project which cries out for cloud deployment, because of the inherent scalability it provides.

We are very grateful to Microsoft for the support they have shown us, to the indviduals at Microsoft such as Kenji Takeda of Microsoft Research,
and also to the Software Sustainability Institute for help in working with Microsoft Azure.