Wednesday, August 31, 2011

Linux Runs Computational Neuroscience

Test Tubes


What OS is actually used in computational neuroscience? A recent paper in Frontiers in Neuroinformatics (let's hear it for Open Access!) has looked at this.

There's plenty of data there, but the main finding is that the most used system is Linux. Most researchers in the field use more than one OS, but Linux is the most common system, used by more than two-thirds of respondents, with Windows in second place with half and OSX third with a quarter1. Some people use Linux as their primary OS while others use it in a virtual machine or logged in to a remote machine somewhere else. Of course, many, even most people use more than one system.

One reason for the popularity of Linux is that many computational research tools are developed primarily for Linux and Unix; another one is that clusters and supercomputers mostly run Linux today. If you need to run your model or computation on a larger cluster you will need to use Linux in one form or another. But it's not simply a matter of necessity; the paper finds that satisfaction is also highest for Linux. Windows is most likely to be used specifically to access Word, Outlook or other specific software that only runs on Windows.

Now, the paper is based on an online survey and a self-selected sample of respondents; this is problematic at best. But it does fit with my own anecdotal experience. At OCNC I saw some people that primarily used Linux, but many more dual-booted Linux and another OS, or combined more than one OS using virtual machines. I often see similar setups at conferences and meetings as well.

Virtual machines have long been used on big servers and mainframes, but are fairly recent in the desktop world. A virtual machine — a VM — is a piece of software that emulates a real computer. You can install an operating system and applications in it and the OS will think it has direct access to the real hardware. In reality the virtual machine runs as an application in a host operating system, and tightly controls the access to the real system.

A VM is extremely convenient. You can start and stop the system it hosts at any time; you can save the entire system state into a (large) file on disk and go back to that saved image whenever you want, or copy that image to other computers to run there. It lets you create a specific software environment guaranteed to be the same every time you use it. Modern PCs lets a VM give out controlled access to the real hardware so there is not much speed reduction.

You can use a hosted Ubuntu Linux system for your software development and data analysis, use a remote cluster for your actual simulations, and the desktop OS you're already familiar with for email and web surfing. Or run Linux or OSX as your primary system, then a copy of Windows in a VM to access legacy Windows-only applications. Or run a second copy of your system in a VM, to make sure the environment is identical every time you run a simulation.
 
The major drawback of the virtual machine approach is really that each hosted OS really needs as much memory and disk space as if it was the only system on the computer. But modern laptops tend to have plenty of both, and for large simulations you're likely to use a remote cluster anyhow.

--
#1 This seemed a bit low to me at first. But this survey counts desktops and clusters as well, not just laptops, and OSX isn't nearly as prevalent in those areas as in portable computing. Also, Apple laptops have a very distinctive, uniform design; you end up with a positive bias where you remember seeing them but forget about all the anonymous, generic laptops that were really the majority at the meeting or the conference.



No comments: