Up again, but not public yet

Well, except, you’re reading this so it is public.

Lost interest in maintaining this server and website when I lost my job and couldn’t get another. The server’s Ubuntu, web server is Apache, and CMS is WordPress. It’s been running for a number of years without issue. I wouldn’t call it production because I don’t rely on it for anything. It’s just a test bed to familiarize myself with the software stack and gain some understanding of it’s setup and administration. I’m self hosting. Its an old computer repurposed as a server.

One other thing I experimented with is DNS. I wanted to be able to get to my server on my home network using wp.boba.org, whether on the public Internet or my home network. That worked fine for years with BIND9 and isc-dhcp.

I developed the habit of running upgrades periodically without testing. If there was a problem then no big deal, not production, figure out the issue, repair and proceed. Problems happened a few times with that approach and were always easily rectified.

DNS on the server stopped working after an upgrade. I tried many things and couldn’t figure out why. Rather than rollback the upgrade or restore the system from a backup I kept mucking with it to try and get it to work. No success. Eventually I just lost interest and let the server go dark. I wasn’t working so didn’t have anyone to talk with about the server. With no one to talk tech with about my server project there seemed no point to fixing it.

I did want to dip my toe in the water again after a while. I decided to rebuild the server and bring all components up to the latest release. I still couldn’t get BIND9 DNS to work. Searching BIND9 issues I found other Ubuntu users were also having problems with it. After searching for alternate DNS servers I decided to try dnsmasq. That got me to a working DNS on my home network. And that got me to the point of having the server up and publicly available again.

All development of the server configuration and settings was done on a virtual machine, vm, in a virtual network with virtual clients. VirtualBox is the hypervisor being used. Once everything worked as expected I migrated the server vm to a physical host. That took surprisingly little tweaking. Network addresses had to be changed from the virtual network settings to the home network settings and a different Ethernet device name entered where needed. That was about it to migrate from a virtual to physical server.

For all the world to see, in all its underwhelming glory, wp.boba.org is back. Enjoy.

Perils of a part time web server admin

Not being “in it” all the time can make simple things hard.

Recently one of the domain names I’ve held for a while expired. Or actually, I let it expire. It was hosted on this same web server along with several other websites and had a secure connection using a Lets Encrypt SSL certificate. All good.

The domain name expired, I disabled the website, and all the other websites on the server continued to be available. Until they weren’t! When I first noticed I just tried restarting the web server. No joy, that didn’t get the other sites back up.

And here’s the perils of part time admin. Where to start with the troubleshooting? For all my sites and the hosting server I really don’t do much except keep the patches current and occasionally post content using WordPress CMS. Not much troubleshooting, monitoring logs, etc. because there isn’t much going on. And, though some might say otherwise, I don’t spend all my time at the computer dissecting how it operates.

I put off troubleshooting for a while. This web server’s experimental, not production, so sometimes I cut some slack and don’t dive right in when things aren’t working. Had other things pending that required more attention.

When I did start I was very much at a loss where to start because, as noted, I disabled a web site and everything continued to work for a while. When it stopped working I hadn’t made any additional changes.

Logs are always a good place to look, yes? This web server is set up to create separate logs for most of the sites it’s hosting. Two types of logs are created, access logs and error logs. Access logs showed what was expected, no more access to that site after I disabled it.

Error logs confused me though. The websites use Lets Encrypt SSL certificates. And they use Certbot to set up the https on the Apache http server. A very common setup. The confusing thing about the error log was it showed the SSL configuration for the expired web site failing to load. Why was the site trying to load at all??? I had disabled the site using the a2dissite program provided by the server distribution. The thing I hadn’t thought about is the Certbot script for Apache sets up the SSL by modifying the <site_name>.conf file AND creating a <site_name>-le-ssl.conf file.

So even though the site had been disabled by a2dissite <site_name>.conf I hadn’t thought to a2dissite <site_name>-le-ssl.conf. Once I recognized that issue and ran the second a2dissite command the web server again started right up. No more failing to load SSL for the expired site. And, surprising, failing to load the SSL for the one site prevented the server from starting rather than disabling the one site and loading the others that didn’t have configuration issues.

Something for another time… I expect there must be a way for the server to start and serve correctly configured sites while not loading incorrectly configured sites and not allowing presence of an incorrectly configured site to prevent all sites from loading. It just does not seem likely that such a widely used web server would fail to serve correctly configured sites when only one or some of multiple hosted sites is misconfigured.

The perils of part-time admin, or jack of all trades and master of none, is that these sort of gotcha’s pop up all the time because of limited exposure to the full breath of dependencies for a program to perform in a particular way. It isn’t a bad thing. Just something to be aware of so rather than blame the software for not doing something, need to be aware that there are often additional settings to make to achieve the desired effect.

Be patient. Expect to need to continue learning. And always, always, RTFM and any other supporting documents.

Server upgrade

…and I’m publishing again.

Well, this was a big publishing gap. Four months. Hope not to have such a long one again. Anyway, there are a number of drafts in the wings but I decided to publish about this most recent change because it is what I wanted to get done before publishing again.

The server is now at Ubuntu 20.04, 64‑bit of course. It started out at 16.04 32‑bit, got upgraded to 18.04 i686 and then, attempted 20.04 upgrade and couldn’t because had forgotten was legacy 32‑bit and 20.04 only available in 64-bit. On to other things and plan different upgrade solution. When I got back to it I thought should upgrade to 22.04 since that had been released. As I’m going through the upgrade requirements I discovered that several needed applications didn’t have 22.04 packages yet, particularly Certbot and MySQL. So back to 20.04 and complete the upgrade.

MySQL upgrade wasn’t too bad. There was a failure, but it was common and a usable fix for the column-statistics issue was found quickly. Disable column-statistics during mysqldump (mysqldump -u root -p --all-databases --column-statistics=0 -r dump_file_name.sql).

Also, switched to the Community Edition rather than the Ubuntu packages because of recommendations online at MySQL about the Ubuntu package not being so up to date.

Fortunately I’m dealing with small databases with few transactions so mysqldump was my upgrade solution. Dump the databases from v 5.x 32-bit. Load them into v 8.x 64-bit. But wait, not all the user accounts are there!!

select * from INFORMATION_SCHEMA.SCHEMA_PRIVILEGES; will show only two grantees, 'mysql.sys'@'localhost' and 'mysql.session'@'localhost'. There should be about 20. The solution was simple, add upgrade = force to mysql.cfg and restart the server. After this, select * from INFORMATION_SCHEMA.SCHEMA_PRIVILEGES; shows all the expected accounts AND the logins function and the correct databases are accessible to the accounts.

All the other applications upgraded successfully. DNS, ddclient, Apache2, and etc. It was an interesting exercise to complete and moved the server onto newer, smaller hardware and updated the OS to 64-bit Ubuntu 20.04.

I’ll monitor for 22.04 packages for Certbot and MySQL and once I see them, update the OS again to get it to 22.04. Always better to have more time before needing (being forced) to upgrade. 20.04 is already about halfway through its supported life. Better to be on 22.04 and have almost five years until needing to do the next upgrade.

Doing all this in a virtual environment is a great time saver and trouble spotter. Gotchas and conflicts can be resolved so the actual activation, virtual or physical, goes about as smoothly as could be hoped with so many dependencies and layers of architecture. Really engrossing stuff if you’re so inclined.

DHCP on the server was new. The router doing DHCP only allowed my internal DNS as secondary. That seemed to cause issues reaching local hosts, sometimes the name would resolve to the public not the private IP. Switching to DHCP on the server lets it be specified as THE DNS authority on the network.

Watching syslog to see the messages, the utility of having addressable names for all hosts seemed obvious. A next virtual project, update DNS from DHCP.

Ubuntu server upgrade 16.04 to 18.04 (20.04 pending)

Virtualize, document, and test. The surest way to upgrade success.

For years my server has been running my personal websites and other services without a hitch. It was Ubuntu 16.04. More than four years old at this point. Only a year left on the 16.04 support schedule. Plus 20.04 is out. Time to move to the latest platform without rushing rather than make the transition with support ended or time running out.

With the above in mind I decided to upgrade my 16.04.6 server to 20.04 and get another five years of support on deck. I’m half way there, at 18.04.4, and hovering for the next little while before the bump up to 20.04. The pause is because of a behavior of do-release-upgrade that I learned about while planning and testing the upgrade.

It turns out that do-release-upgrade won’t actually run the upgrade until a version’s first point release is out. A switch, -d, must be used to override that. Right now 20.04 is just that, 20.04. Once it’s 20.04.1 the upgrade will run without the switch. Per “How to upgrade from Ubuntu 18.04 LTS to 20.04 LTS today” the switch, which is intended to enable upgrading to a development release, does the upgrade to 20.04 because it is released.

I’m interested to try out the VPN that is in 20.04, WireGuard, so may try the -d before 20.04.1 gets here. In the meantime let me tell you about the fun I had with the upgrade.

First, as you should always see in any story about upgrade, backup! I did, several different ways. Mostly as experiments to see if I want to change how I’m doing it, rsync. An optional feature of 20.04 that looks to make backup simpler and more comprehensive is ZFS. It’s newly integrated into Ubuntu and I want to try it for backups.

I got my backups then took the server offline to get a system image with Clonezilla. Then I used VBoxManage convertfromraw to turn the Clonezilla disk image into a VDI file. That gave me a clone of the server in VirtualBox to practice upgrading and work out any kinks.

The server runs several websites, a MySQL server for the websites and other things, an SSH server for remote access, NFS, phpmyadmin, DNS, and more. They are either accessed remotely or from a LAN client. Testing those functions required connecting a client to the server. VirtualBox made that a simple trick.

In the end my lab setup was two virtual machines, my cloned server and a client, on a virtual network. DHCP for the client was provided by the VirtualBox Internal Network, the server had a fixed ip on the same subnet as the VirtualBox Internal Network and the server provided DNS for the network.

I ran the 16.04 to 18.04 upgrade on the server numerous times taking snapshots to roll back as I made tweaks to the process to confirm each feature worked. Once I had a final process I did the upgrade on the virtual machine three times to see if I could find anything I might have missed or some clarification to make to the document. Success x3 with no changes to the document!

Finally I ran the upgrade on the production hardware. Went exactly as per the document which of course is a good thing. Uneventful but slower than doing it on the virtual machine, which was expected. The virtual machine host is at least five years newer than the server hardware and has an SSD too.

I’ll continue running on 18.04 for a while and monitor logs for things I might have missed. Once I’m convinced everything is good then I’ll either use -d to get to 20.04 or wait until 20.04.1 is out and do it then.

Chasing my tail and finding something new to learn

Experience and keeping notes helps limit chasing tail.

In my last post, Help people get the job done, I wrote about disappointment with how a change was made in the end user’s environment at my office. The change required they do something different to accommodate a purely technical change in systems. Once connected their work was no different than it had been.

Why we didn’t build in the logic to connect them to the new resource and make it transparent for the user seemed to me like a failure on our part. Simplify the user experience so they can focus on the work they do by IT using our skills to make the computers work for people rather than the other way around.

I made some changes to personal websites to demonstrate redirection could be used to point at the correct work websites. It was meant to illustrate the analog idea that one work website could be pointed at the other. Going to my websites, train.boba.org and sclc.boba.org, immediately sent a browser to the intended work website. Success!

After demonstrating the capability I disabled it so my URLs go to their originally intended websites.

So where’s chasing my tail come in?

While experimenting with the redirect I modified the boba.org configuration. For a while it wasn’t possible to get to that site at all. Then depending on the URL got to it or andrewboba.com. Putting boba.org in the browser’s address bar ended up at andrewboba.com, but not correctly displayed. Putting http://boba.org went to the correct site but didn’t rewrite the link as secure, https://.

To stop being distracted by that issue and continue testing the redirect I disabled the boba.org website.

Worked more with the redirect over a few days. Got to the point I felt I understood it well and tried boba.org again.

It wouldn’t come up no matter what I tried. Everything went to a proper display of andrewboba.com.

I increased the logging level. I created a log specifically for boba.org (it didn’t show up which was my first clue). Not seeing the log I went through other site configurations to see how their custom logs were set up. They appeared to be the same.

Finally I decided to try boba.org without a secure connection. I wasn’t sure the name of the .conf file for secure connections and decided to look in Apache’s ../sites-enabled directory to see if there were separate .conf files for https connections.

And guess what I found? There are separate .conf’s for https, yes. There were no .confs of any kind for boba.org! Then it hit me. There had been no log files for boba.org because there were no ../sites-enabled .conf files for boba.org.

And then I finally remembered I had disabled the site myself to focus on the redirect. Chasing my tail because I’m very new at Apache webserver administration. I disabled a feature to focus on making something happen then forgot the change I made when I resolved the first challenge.

Better notes, and more experience, would have helped me remember sooner.

And I also found something new to learn. While boba.org was disabled, andrewboba.com was being displayed. Would prefer “not found” or something similar to show up rather than a different website on the server.

New challenge. Figure out how to serve a desired site/page not available message when a site on this server is down.

One of the reasons I like information technology. Always something new to learn at every turn.

Certbot headaches!

Modifying certificates with certbot. It works and it was a long journey to get it done.

If anyone’s reading this you may have noticed the URL is wp.boba.org. Possibly you entered www.wp.boba.org to get here and saw it change to wp.boba.org. Whatever. Until this evening (15 Jan, 2020) the URL’s protocol would have been http://. https:// wouldn’t have even connected. Now, even if http:// is entered it changes to https://. Hooray!!

The SSL certificate for this website is now part of the alanboba.net certificate. But, until tonight, I was unable to expand the domains in the alanboba.net certificate to include wp.boba.org and www.wp.boba.org.

My attempts to expand the alanboba.net certificate began nearly a month ago. Everything I tried failed. In desperation I posted on the LetsEncrypt community forum a little over three weeks ago, Apache certificate modification not successful, hoping someone would quickly recognize the problem and suggest a solution.

That didn’t pan out. Not a lot of respondents. The fix suggested didn’t address the issue the error message presented, the request was “unauthorized”, or suggest if the message might be misleading.

Domain: wp.boba.org
Type: unauthorized
Detail: Invalid response from http://wp.boba.org/.well-known/acme-challenge/QVV-1Skk-Xvrr6QAL-IvvDZuMGnhr2mNOfoAWbkYCnw [67.86.147.116]: "\n\n404 Not Found\n\n

More reading. More checking settings on this server. Some experimental configuration changes to see if the issue resolved and the command certbot –expand… would succeed in adding two additional domains to the existing certificate. None of the changes worked.

Finally came across a different command and decided to try it. As I understood it, it is meant to renew existing certificates not add domains to them. However it does include a “webroot” parameter and some of the documents I’d read suggested the webroot location might not be correctly interpreted by the command I was using.

The documentation I found doesn’t say anything to suggest the command can be used to expand the domain names covered by a certificate. I just had an inspiration and decided that if webroot might be the problem then explicitly specifying the webroot and adding domain names at the same time might turn the trick.

Tonight I tired the command with the webroot parameter and my additional domains appended to the list of domains already on the certificate. Surprise and delight! The domains were added to the certificate AND the protocol is now changed to https:// even if http:// is used in the URL name!

The following command…

sudo certbot run -a webroot -i apache -w /var/www/wp.boba.org/public_html -d alanboba.net,boba.org,sclc.boba.org,train.boba.org,training.boba.org,www.alanboba.net,www.boba.org,wp.boba.org,www.wp.boba.org

Produced the output below. Plus it added my two additional domains to the existing certificate and modified apache’s config for the website so http:// requests are rewritten as https://. Like I said at the beginning, hooray!!

Saving debug log to /var/log/letsencrypt/letsencrypt.log
Plugins selected: Authenticator webroot, Installer apache
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
You have an existing certificate that contains a portion of the domains you requested (ref: /etc/letsencrypt/renewal/alanboba.net.conf)
It contains these names: alanboba.net, boba.org, sclc.boba.org, train.boba.org, training.boba.org, www.alanboba.net, www.boba.org
You requested these names for the new certificate: alanboba.net, boba.org, sclc.boba.org, train.boba.org, training.boba.org, www.alanboba.net, www.boba.org, wp.boba.org, www.wp.boba.org.
Do you want to expand and replace this existing certificate with the new certificate?
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
(E)xpand/(C)ancel: E
Renewing an existing certificate
Performing the following challenges:
http-01 challenge for wp.boba.org
http-01 challenge for www.wp.boba.org
Using the webroot path /var/www/wp.boba.org/public_html for all unmatched domains.
Waiting for verification...
Cleaning up challenges
Deploying Certificate to VirtualHost /etc/apache2/sites-enabled/boba.org-le-ssl.conf
Deploying Certificate to VirtualHost /etc/apache2/sites-enabled/boba.org-le-ssl.conf
Deploying Certificate to VirtualHost /etc/apache2/sites-enabled/sclc.boba.org-le-ssl.conf
Deploying Certificate to VirtualHost /etc/apache2/sites-enabled/train.boba.org-le-ssl.conf
Deploying Certificate to VirtualHost /etc/apache2/sites-enabled/train.boba.org-le-ssl.conf
Deploying Certificate to VirtualHost /etc/apache2/sites-enabled/boba.org-le-ssl.conf
Deploying Certificate to VirtualHost /etc/apache2/sites-enabled/boba.org-le-ssl.conf
Created an SSL vhost at /etc/apache2/sites-available/wp.boba.org-le-ssl.conf
Deploying Certificate to VirtualHost /etc/apache2/sites-available/wp.boba.org-le-ssl.conf
Enabling available site: /etc/apache2/sites-available/wp.boba.org-le-ssl.conf
Deploying Certificate to VirtualHost /etc/apache2/sites-available/wp.boba.org-le-ssl.conf
Please choose whether or not to redirect HTTP traffic to HTTPS, removing HTTP access.
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
1: No redirect - Make no further changes to the webserver configuration. 2: Redirect - Make all requests redirect to secure HTTPS access. Choose this for new sites, or if you're confident your site works on HTTPS.
You can undo this change by editing your web server's configuration.
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Select the appropriate number [1-2] then [enter] (press 'c' to cancel): 2
Enhancement redirect was already set.
Enhancement redirect was already set.
Enhancement redirect was already set.
Enhancement redirect was already set.
Enhancement redirect was already set.
Enhancement redirect was already set.
Enhancement redirect was already set.
Redirecting vhost in /etc/apache2/sites-enabled/wp.boba.org.conf to ssl vhost in /etc/apache2/sites-available/wp.boba.org-le-ssl.conf
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Your existing certificate has been successfully renewed, and the new certificate has been installed.
The new certificate covers the following domains: https://alanboba.net,https://boba.org, https://sclc.boba.org, https://train.boba.org,https://training.boba.org, https://www.alanboba.net, https://www.boba.org,https://wp.boba.org, and https://www.wp.boba.org

Certbot automatic authentication

Enable certificate auto renew after a manual renew.

I have a number of websites run from my own web server, like this one. Something I set up to experiment with web technologies and gain some insight into how things work.

One of the things I did was setup HTTPS for the websites once I found about about EFF‘s LetsEncrypt service. I wanted to see if I could provide secure connections to my sites even if they’re only for browsing.

I was able to get HTTPS working for my sites and have the certificates renew automatically. Then I changed ISPs. With TWC, now Spectrum, there was never a problem with the automated renewals. With Optimum the renewals didn’t work.

Emails alerting me to certificate expiration were my first indication there was a problem.

The logs indicated that files on my server couldn’t be manipulated to confirm my control of the website. Plus, entering the website address as boba.org or http://boba.org no longer connected to the website (externally, on the local network it still worked). Connection to any of my hosted sites now required prefixing https:// to the name. Automatic translation from http to https no longer worked.

After talking, chatting online actually, with Optimum they told me yup, that’s just the way it works. “We block port 80 to protect you” and “you can’t unblock it”.

Panic! How to maintain my certificates so https continues working? Fortunately certbot offers a manual option that requires updating DNS TXT records. It’s slow and cumbersome and NOT suitable for long term maintenance of even one certificate containing one domain but it works.

Sixty days pass and the certificate expiration emails start again. This time I determined that I’d speak to a person at Optimum and not use the chat. After some time with my Optimum support tech, and after she escalated to a supervisor, I was told there is in fact a way to open port 80. And it is a setting available to me via my account login. So I opened port 80 and thought all set now, renewals will happen automatically.

Not so. I got more certificate expiration warning emails. What to do? All the automated renewal tests I tried indicated a problem with a plugin. I read the certbot documentation, did searches for the error and tried to find a solution that was applied to the problem I had. I didn’t find it. But I did get a clue from a post that said once a manual certification has been done that setting needs to be removed before automated renewal will work again.

After more digging I discovered the certificate config files in /etc/letsencrypt/renewal. In them were two variables that seemed likely to be related to the auto renew problem. They were authenticator = and pref_challs =. The settings were manual and dns-01 respectively.

I never touched these files. It turns out doing manual renewal with DNS TXT records using the command sudo certbot certonly --manual --preferred-challenges dns --cert-name <name> -d <name1>,<name2>,etc just changes the config files in the background. Attempting auto renew later doesn’t work because the settings in the config files have now been changed to authenticator = manual and pref_challs = dns-01.

There was no help I could find that explicitly listed the acceptable values for these variables. And I didn’t have copies of these files from before the changes. After digging around in the help for a while I decided it was likely they should be authenticator = apache and pref_challs = http-01.

I made the change for one certificate and tested auto renew. Eureka, it worked!!

Next I changed the config files for all the certificates and did a test to see if it worked.

$ sudo certbot renew --dry-run
** DRY RUN: simulating 'certbot renew' close to cert expiry
** (The test certificates below have not been saved.)
Congratulations, all renewals succeeded. The following certs have been renewed:
/etc/letsencrypt/live/alanboba.net/fullchain.pem (success)
/etc/letsencrypt/live/andrewboba.org/fullchain.pem (success)
/etc/letsencrypt/live/danielboba.org/fullchain.pem (success)
/etc/letsencrypt/live/kevinkellypouredfoundations.com/fullchain.pem (success)
/etc/letsencrypt/live/www.anhnguyen.org/fullchain.pem (success)
/etc/letsencrypt/live/www.conorboba.org/fullchain.pem (success)
/etc/letsencrypt/live/www.mainguyen.org/fullchain.pem (success)
** DRY RUN: simulating 'certbot renew' close to cert expiry
** (The test certificates above have not been saved.)

It worked. All my certificates will again auto renew.

This website was created after the problems began. So I didn’t even attempt to make it https. Now that I’ve figured out how to have my certs auto renew again I’ll be converting this site over to https too.



Virtual Host??

Setting up Apache to support multiple websites on one host. My server already does that for my public websites.

However I want to control what is returned to the browser if a site isn’t available for some reason. So I’ve set up a virtual server with multiple sites. Each site works when enabled. However if the site is set up to be unavailable, disabled, no index file, etc. the default page returned to the browser is not what I’d like.

Need to identify a few fail conditions, see what the server returns when the condition exists, see if what’s returned for a given condition is the same regardless which site the failure is generated by, then figure out why the webserver is sending back the page it does.

Reasons not available:

  • site not being served, e.g. not enabled on server
  • site setting wrong, e.g. DocumentRoot invalid
  • site content wrong, no index file

Answers that might be returned:

  • site not available
  • forbidden
  • …other’s I’ve seen but don’t remember now

From what I’ve read it seems whatever’s in 000-defalut.conf should control which page/site loads when a site isn’t available. That’s not the result I’m getting.

Either I’m doing it wrong or I’m just not understanding what’s supposed to happen and how to make it happen.

More digging…