It’s that time of year again – module picking season. You’ve conspired for months prior with your mates to take music production in hopes of dropping out of college and living your dreams as the next Swedish House Mafia. The calendar, clock, and reminders apps on your phone are all set to make sure there is no possible way you miss the window to choose your elective. You log into the my.tcd.ie portal 10 minutes to the hour, anxiously refreshing the page every minute or so, and then that time finally hits . . . Nothing. That blank white screen seals your fate yet again to a seminar module on ethical basket-weaving.
Designing web-based applications and the surrounding architecture to be fast, reliable, and scalable is the nightmare that keeps every developer in the web space up at 3am. Even the biggest companies, with all their resources, deal with downtime issues now and again. While the code behind web pages like my.tcd.ie is very much important to overall responsiveness, this article will focus on the architecture and technologies that get those webpages to your computer.
The first piece of terminology that is massively important to understanding web architecture is the server. When you make a request by typing in the URL of a website in the browser bar, a global registry system known as the DNS takes your URL and matches it to an IP Address, which is a unique identifier that locates the computer that is running the server code. Then, the server will run background logic (such as possibly making sure you’re logged in, the page you want exists, or that you’re on the right device), and if everything checks out, it will return a HTML file, which is essentially a text file containing information on how to format the page. Your browser will then read that file, and finally display the output to you.
“The type of web outage that Trinity students saw can be equated to our analogous restaurant getting hit with an unexpected dinner rush without enough wait staff to cover everyone.”
Most of the technicalities can be glossed over, as the main point to get at is that a web server functions much like a server at a restaurant. The server gets a request from you, and it returns a response, and much like food servers, web servers are often required to serve multiple people at once. The more traffic you get, the faster you need your server to be, however, this reaches a point where the single server becomes the bottleneck. Using the restaurant analogy again, the solution for a busy restaurant would be to hire more servers and appoint a host/hostess to evenly distribute patrons to sections for each server to cover. In web architecture, the host is what we call a ‘load balancer’ for a web application with a lot of traffic. The user will make a request, which is then passed from the load balancer to one of a cluster of servers, and that server handles and returns the request to the user. This system works great because if you need to accommodate more traffic, you can deploy new servers without having to upgrade the computers themselves. In practice, deploying servers as a means of scaling is known as horizontal scaling while deploying faster computers is known as vertical scaling.
So, with this system in place to evenly distribute requests, why couldn’t the Trinity portal handle everyone’s requests? The type of outage that Trinity students saw can be equated to our analogous restaurant getting hit with an unexpected dinner rush with not enough wait staff to cover everyone. This leads us to another essential concept in web architecture – auto-scaling. Auto-scaling is a feature of some cloud providers to dynamically add and remove servers based on the demand, much like a manager might call in extra staff for a particularly busy evening.
“As this elective picking happens yearly at a set time, one possible solution is to proactively allocate more servers to handle student requests.”
It is tricky to say for certain what caused the outages this past July, but the most likely explanation is that the portal and surrounding architecture behind it was not prepared to scale that quickly from the influx of users logging in at the same exact time. The booting of extra servers, even if done automatically, would take a good few minutes before it would be able to process web requests. As this elective picking happens yearly at a set time, one possible solution is to proactively allocate more servers to handle student requests. This would be fine and decently cost-effective if the portal is run in the cloud with a platform like Amazon Web Services, since the servers being allocated are run by Amazon. However, if these are physical servers being run on campus, spending a couple thousand euros to buy new servers to smooth other a once-a-year traffic spike could very well be deemed overkill.
So if you find yourself endlessly waiting for the module selection page to load again next year, take a minute to appreciate the decades of web architecture design that makes a web portal like Trinity’s even possible; after all, you’ll need to pass the time somehow.