Behind the Scenes at OST

These past several months have been exciting for the O’Reilly School of Technology. Our course offerings have expanded, the OST student body has increased, we’ve added new staff, and we officially outgrew our datacenter. As the lead system administrator for OST, I’m part of the team charged with addressing the challenges of datacenter expansion. Our plan has been to make this big move without our students ever noticing, so I hope this is the first you’re hearing about it.

Initially we considered expanding within our original server location in Champaign, Illinois, but the costs for additional cooling and electrical capacity were prohibitive, compelling us to explore other options. We came up with three possible solutions to our dilemma: building a whole new datacenter at a new location, outsourcing all of our services to a cloud provider, or renting rack space in an existing datacenter.

Since we’re already pretty good at running our own datacenter, building at a new site was the first option we considered. But as we started to add up the costs for this type of project, we realized pretty quickly that since we require only 3 racks of equipment, the economies of scale didn’t justify the expense. The costs of installing fully redundant cooling, internet, and power sources outweighed the benefits of building our own site from scratch. Additionally, managing this type of project would require a lot of staff time, which would detract from the time we normally spend developing learning solutions for our students.

With the creation of a brand new datacenter off the table, the prospect of moving services to a cloud provider such as Amazon Web Services or Rackspace looked to be the more realistic and cost-effective route, but many factors prevented us from taking it. Cloud services turned out to be more expensive than we first thought. We learned that most providers’ fees are based upon the amount of RAM used. Around the 250GB/ram mark, cloud computing becomes more expensive than traditional options. With our original system, we employed virtualization and memory overcommitment technologies and so we were able to allocate more RAM capacity for students than we actually had on site. If we were to switch to a cloud provider, that efficiency would have been lost and transformed into an added expense. In addition, we have been using specialized storage technology that’s not available currently from any cloud provider. We also came to discover that most cloud providers do not have standards or processes for data archival that would meet our needs.

The last option we looked at, and the one we ultimately chose, was to move our equipment to an existing datacenter (Prominic.net Inc) and rent colocation space there. This allowed us to keep full control of our hosted learning environment, while shifting the burden of managing the physical site and infrastructure to a company with experience and a track record of reliability in that industry. Selecting this method allowed us to expand our capacity with minimal disruption to services, while keeping expenses in check. This transition was a real challenge, and I’m proud of our success in meeting it.

Throughout this process, we were inundated with the buzz about the cloud, and it got me thinking about our work and the path we’ve taken to get here-we were actually working in the cloud services business before the term ever existed. We’ve provided hosted learning services for over 10 years, where students access files and applications from our servers over the internet. These hosted services allow our students to start building programming skills immediately when they start a course, rather than having to manage an operating system and application stack first. And now, by moving our physical equipment to Prominic, our team at OST is free to focus even more fully on our core competency-delivering a comprehensive learning platform to our students.

  • http://www.hourback.info/ Ali

    Great job, Trent! Thanks for the info.

    We both picked the same option, so I know it’s the best choice! ;-)

  • J. Altman

    Trent wrote:

    “We also came to discover that most cloud providers do not have standards or processes for data archival that would meet our needs.”

    You mean data archiving, right? As in, backups? Or something else?

    “Throughout this process, we were inundated with the buzz about the cloud, and it got me thinking about our work and the path we’ve taken to get here-we were actually working in the cloud services business before the term ever existed.”

    I wonder if it’s the case that the only cloud is the rhetoric about the cloud. See: The Intercloud.

    I’m sure the move will go fine.

  • http://oreillyschool.com/ Trent Johnson

    Thanks for you comment.

    J. Altman wrote:

    “You mean data archiving, right? As in, backups? Or something else?”

    Both backups, and the ability for students or mentors to perform their own restores. The storage systems we have in use allow for user accessible / restorable hourly and nightly backups, accessible via a normal unix filesystem. Cd to your .snapshot directory in unix mode to see what I mean.

    Additionally we employ traditional nightly, weekly, monthly and offsite backups to tape.

    We couldn’t find a could provider that would be able to deliver both, particularly the .snapshots.

    I hadn’t heard the term Intercloud. That does seem like where the Internet is heading, but at the end of the day, all the clouds run on real hardware in a data center somewhere.

  • J. Altman

    Trent wrote:

    “Cd to your .snapshot directory in unix mode to see what I mean.”

    Oh, nice; I’m familiar with .snapshot. It’s good to have it. Thanks.

  • http://trip.invisibledog.net Trip

    Thanks. It would be great to see your detailed plan, if you could do so w/o compromising security.

  • Trent

    Trip, we don’t the detailed plan in an easy to distribute form (It is spread across task lists, spreadsheets, notebooks and white boards)

    On the move day, we did try to reduce config changes as much as possible, and pretest any config changes that did need to be made.

    We also pre-wired all the racks at the new site, to the necessary cables were already waiting at the back of the server once it was slid into it’s determined location.

  • Amit Mathur

    Great article Trent.

    Being one of the contributors to the “cloud hype” it is great to see how your infrastructure had adopted some of the key tenants of the cloud [elasticity & virtualization] before it was en vogue.

    BTW, if you haven’t already read it, I encourage you to read this paper: http://www.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-28.pdf
    It is by far the best single paper written on the cloud to date IMO.

  • Piero Brewer

    Illuminating article. Interesting choice. I’ve seen the same occur locally.

    Virtualization using it to achieve scalability, economies of scale etcetera etcetera as the King of Siam would say … leads to an “old school” observation:

    Too often when those VMware hosts are swapped in and out of blades things can slow to the point where too often it is like being in the olden days of computing. 1000+ Decwriters hooked up to an ancient Univac 1170 or 1140 series mainframe. Thousands of students typing into a machine and OS that barely had 32KB of Ram. Lots of swapping. Type a key and wait 20 minutes. I’ve actually experienced that … both then and today.

    Was wondering what your take was on the true performance of virtualization. And whether you plan on adding a course or two on that to your sysadmin certificate program?

    Thank you in advance,
    Piero Brewer

  • Trent

    Thanks for the comments Amit. Good to hear from you. That is a great paper – it’s interesting to think of compute power as a commodity good.

  • Trent

    Hi Piero,

    We’ve found the small performance hit with virtualization is usually offset by the convenience of hardware independence, and reduced maintenance burden.

    Where performance really matters though such as database and storage servers, we’re still running directly on the hardware.