Perils of a part time web server admin

Not being “in it” all the time can make simple things hard.

Recently one of the domain names I’ve held for a while expired. Or actually, I let it expire. It was hosted on this same web server along with several other websites and had a secure connection using a Lets Encrypt SSL certificate. All good.

The domain name expired, I disabled the website, and all the other websites on the server continued to be available. Until they weren’t! When I first noticed I just tried restarting the web server. No joy, that didn’t get the other sites back up.

And here’s the perils of part time admin. Where to start with the troubleshooting? For all my sites and the hosting server I really don’t do much except keep the patches current and occasionally post content using WordPress CMS. Not much troubleshooting, monitoring logs, etc. because there isn’t much going on. And, though some might say otherwise, I don’t spend all my time at the computer dissecting how it operates.

I put off troubleshooting for a while. This web server’s experimental, not production, so sometimes I cut some slack and don’t dive right in when things aren’t working. Had other things pending that required more attention.

When I did start I was very much at a loss where to start because, as noted, I disabled a web site and everything continued to work for a while. When it stopped working I hadn’t made any additional changes.

Logs are always a good place to look, yes? This web server is set up to create separate logs for most of the sites it’s hosting. Two types of logs are created, access logs and error logs. Access logs showed what was expected, no more access to that site after I disabled it.

Error logs confused me though. The websites use Lets Encrypt SSL certificates. And they use Certbot to set up the https on the Apache http server. A very common setup. The confusing thing about the error log was it showed the SSL configuration for the expired web site failing to load. Why was the site trying to load at all??? I had disabled the site using the a2dissite program provided by the server distribution. The thing I hadn’t thought about is the Certbot script for Apache sets up the SSL by modifying the <site_name>.conf file AND creating a <site_name>-le-ssl.conf file.

So even though the site had been disabled by a2dissite <site_name>.conf I hadn’t thought to a2dissite <site_name>-le-ssl.conf. Once I recognized that issue and ran the second a2dissite command the web server again started right up. No more failing to load SSL for the expired site. And, surprising, failing to load the SSL for the one site prevented the server from starting rather than disabling the one site and loading the others that didn’t have configuration issues.

Something for another time… I expect there must be a way for the server to start and serve correctly configured sites while not loading incorrectly configured sites and not allowing presence of an incorrectly configured site to prevent all sites from loading. It just does not seem likely that such a widely used web server would fail to serve correctly configured sites when only one or some of multiple hosted sites is misconfigured.

The perils of part-time admin, or jack of all trades and master of none, is that these sort of gotcha’s pop up all the time because of limited exposure to the full breath of dependencies for a program to perform in a particular way. It isn’t a bad thing. Just something to be aware of so rather than blame the software for not doing something, need to be aware that there are often additional settings to make to achieve the desired effect.

Be patient. Expect to need to continue learning. And always, always, RTFM and any other supporting documents.