MySQL export to .csv

The elusive MySQL export to CSV *WITH* headers.

It took a while, review of a number of weblogs, support sites, and error message causes but export from MySQL to .csv with the column titles is finally working!

SELECT convert('Title' using utf8mb4) 'Title',
	convert('Year' using utf8mb4) 'Year',
	convert('Time' using utf8mb4) 'Time',
	convert('Rating' using utf8mb4) 'Rating',
	convert('Genre' using utf8mb4) 'Genre'  
UNION ALL
SELECT * from 
	(SELECT * from movies_i_want order by Title) a
INTO OUTFILE '/var/lib/mysql-files/movies_i_want.csv' 
FIELDS TERMINATED BY ',' 
ENCLOSED BY '"';

and the file it produces…

"Title","Year","Time","Rating","Genre"
"Amazon Women on the Moon","1987","85","R","Comedy,Sci-Fi"
"Blade of the Immortal","2017","140","R","Action,Drama"
"Bound","1996","108","NR","Crime,Romance,Thriller"
 ...

The first SELECT above produces a single row table with the column’s titles as the field values in the row. The table is not named.

mysql> SELECT 'Title', 'Year', 'Time', 'Rating', 'Genre';
+-------+------+------+--------+-------+
| Title | Year | Time | Rating | Genre |
+-------+------+------+--------+-------+
| Title | Year | Time | Rating | Genre |
+-------+------+------+--------+-------+

The full SELECT expression uses convert('Title' using utf8mb4) 'Title'… for each field because UNION failed from incompatible collations between the first SELECT’s row and the “SELECT ... movies_i_want” statement.

Collations can be associated with one character set. A character set can be associated with many collations. For collations to match, both character sets must match. The purpose of the first SELECT is to emit the column titles for the .csv export. It produces only one row, the column titles. Converting that SELECT’s character set to match the character set used in the live data table permitted the collation the UNION was doing to work.

Once the collations were compatible it was necessary to get each column’s title into the output as the first row of the output. The view used for the second SELECT has the same column titles in the same order as the first select. Its data needs to be the 2nd and subsequent rows.

After many attempts, several ways to state the problem were found. Unfortunately none included the column titles in the output. The column titles are a must have for free wpDataTables to show headings on the tables it displays from reading the .csv files.

ignoring the comment about collation to simplify the statements...
The following both work except for no column titles

SELECT 'Title', 'Year', 'Time', 'Rating', 'Genre'  
UNION ALL
SELECT * 
FROM movies_i_want order by Title;

SELECT 'Title', 'Year', 'Time', 'Rating', 'Genre'  
UNION ALL
SELECT 'Title', 'Year', 'Time', 'Rating', 'Genre' 
FROM movies_i_want order by Title;

To include the column titles in the output three SELECTs are needed. And be sure to include an alias for the ‘derived table’, the one created by the second SELECT.

SELECT 'Title', 'Year', 'Time', 'Rating', 'Genre'  
UNION ALL
SELECT * FROM (SELECT * FROM movies_i_want order by Title) a;

Now the output can be created in the necessary format using native MySQL. The method is a bit dense but I think simpler than the PHP, Use PHP to export MySQL to .csv.

With this and the PHP method of exporting the data a trigger is need to cause it to happen, either method can run on demand, or either method can run on a schedule.

A trigger, in other words “only when needed”, is my preferred approach. MySQL can be triggered by INSERT, UPDATE, and DELETE.


Solutions:

A MySQL trigger runs a SQL user procedure to export the data.
Problem: export destinations are limited to a single location with very restrictive access and it is not the location where either .csv file for the web tables is.

A MySQL trigger runs an external program, the PHP, to export the data.
Problem: can a trigger launch an external program?

Use PHP to export MySQL to .csv

The wpDataTables update when the source .csv files change. Next, update the source .csv files when the movies database is updated.

To update the Movies tables the underlying .csv source files must be updated and the wpDataTables tables’ links with those files must be refreshed. Without the link refresh the table doesn’t update even if the webpage is refreshed.

The .csv source files for the “Movies I Have” and “Movies I Want” tables are monitored for changes. The wpDataTables links are refreshed when the files have changed. The table display will update the next time the page is refreshed or landed on.

Another step in the chain is a database update, (INSERT, UPDATE, or DELETE), triggering the .csv file to be updated. And to make that happen, need to be able to do several things

  • Figure out how to create and trigger an event
    • Trigger the the event after INSERT, UPDATE, or DELETE on movies database
  • Write trigger(s) to a) export the new data and hopefully b) update the .csv files
  • Figure out HOW? to update the .csv files
    • PL/SQL export, then use non SQL program to update files with the export
    • PHP program that extracts the data and overwrites the .csv files
    • Program (PHP, PL/SQL) extracts the data to monitored files. Another program updates the .csv files with the data when the monitored files change.

I’ve put together the PHP to extract the data. Also need to see what can be done with SELECT * INTO OUTFILE. PHP may be able to do the whole process, extract records and update .csv file. MySQL may be able to do it with INTO OUTFILE but I think that’s very limited the destinations it can write to.

<?php
 $server = "myserver.boba.org"; 
 $username = "fan";
 $password = xx; 
 $database_name = "the_movies"; 
 $link_myMovies = mysqli_connect($server, $username, $password, $database_name);
 $Views = array("movies_i_want","movies_i_have");
 $out_path = "/var/tmp/";
 
 foreach ($Views as $view)
 {
	$query_result = null;
	$columns_total = null;
 	$str_query = "SELECT * FROM $view";
 	$query_result = mysqli_query($link_myMovies, $str_query);
 	$columns_total = mysqli_num_fields($query_result);
 
 	$col_names = array();
 
 	for ($i = 0; $i < $columns_total; $i++) 
	 {
	 	$Heading = mysqli_fetch_field_direct($query_result, $i);
	 	array_push($col_names,$Heading->name);
	 }
 
	 $fileOut = fopen("$out_path$view.csv", 'w') or die("Unable open ./$out_path$view.csv"); 
	 fputcsv($fileOut, $col_names);

	 while ($row = mysqli_fetch_array($query_result, MYSQLI_NUM)) 
	 {
	 	fputcsv($fileOut, array_values($row)); 
	 }
	 
	 fclose($fileOut) or die("Unable to close ./$view.csv");
}
?>

Server upgrade

…and I’m publishing again.

Well, this was a big publishing gap. Four months. Hope not to have such a long one again. Anyway, there are a number of drafts in the wings but I decided to publish about this most recent change because it is what I wanted to get done before publishing again.

The server is now at Ubuntu 20.04, 64‑bit of course. It started out at 16.04 32‑bit, got upgraded to 18.04 i686 and then, attempted 20.04 upgrade and couldn’t because had forgotten was legacy 32‑bit and 20.04 only available in 64-bit. On to other things and plan different upgrade solution. When I got back to it I thought should upgrade to 22.04 since that had been released. As I’m going through the upgrade requirements I discovered that several needed applications didn’t have 22.04 packages yet, particularly Certbot and MySQL. So back to 20.04 and complete the upgrade.

MySQL upgrade wasn’t too bad. There was a failure, but it was common and a usable fix for the column-statistics issue was found quickly. Disable column-statistics during mysqldump (mysqldump -u root -p --all-databases --column-statistics=0 -r dump_file_name.sql).

Also, switched to the Community Edition rather than the Ubuntu packages because of recommendations online at MySQL about the Ubuntu package not being so up to date.

Fortunately I’m dealing with small databases with few transactions so mysqldump was my upgrade solution. Dump the databases from v 5.x 32-bit. Load them into v 8.x 64-bit. But wait, not all the user accounts are there!!

select * from INFORMATION_SCHEMA.SCHEMA_PRIVILEGES; will show only two grantees, 'mysql.sys'@'localhost' and 'mysql.session'@'localhost'. There should be about 20. The solution was simple, add upgrade = force to mysql.cfg and restart the server. After this, select * from INFORMATION_SCHEMA.SCHEMA_PRIVILEGES; shows all the expected accounts AND the logins function and the correct databases are accessible to the accounts.

All the other applications upgraded successfully. DNS, ddclient, Apache2, and etc. It was an interesting exercise to complete and moved the server onto newer, smaller hardware and updated the OS to 64-bit Ubuntu 20.04.

I’ll monitor for 22.04 packages for Certbot and MySQL and once I see them, update the OS again to get it to 22.04. Always better to have more time before needing (being forced) to upgrade. 20.04 is already about halfway through its supported life. Better to be on 22.04 and have almost five years until needing to do the next upgrade.

Doing all this in a virtual environment is a great time saver and trouble spotter. Gotchas and conflicts can be resolved so the actual activation, virtual or physical, goes about as smoothly as could be hoped with so many dependencies and layers of architecture. Really engrossing stuff if you’re so inclined.

DHCP on the server was new. The router doing DHCP only allowed my internal DNS as secondary. That seemed to cause issues reaching local hosts, sometimes the name would resolve to the public not the private IP. Switching to DHCP on the server lets it be specified as THE DNS authority on the network.

Watching syslog to see the messages, the utility of having addressable names for all hosts seemed obvious. A next virtual project, update DNS from DHCP.

MySQL backup and restore

Dig in and do it, and repeat. Get the desired result faster by combining research and testing.

Maintenance is important. A car needs oil changes or eventually the engine will be damaged by regular operation. A server needs software updates to fix bugs and protect against threats. Even when those things are done failures can happen that create problems returning to normal operation.

For a car there needs to be a spare ready to go in case of a flat. If there’s not a spare ready for use it will take longer to get the car back in operation when a flat happens. For a computer, programs and data need to be backed up. If a disk drive crashes the information stored there may be lost or too expensive to recover, so just as good as lost.

This website has not been well protected for too long and I knew that needed to change. There’s a server operating system, a web server, WordPress software, and a MySQL database that all operate interdependently to make it work. As the amount of content slowly continues to grow my manual system to back everything up has become too cumbersome and is not done frequently enough to ensure minimal to no loss of data.

That needed to change.

Step one – automate the MySQL backups. Documentation states the “logical” backup method is slow and not recommend for large databases. The alternative “physical” backup entails stopping the database server and copying the files. The licensed MySQL Enterprise Backup performs physical backups and from what I’m able to tell runs clone databases so one can be stopped and the files backed up while the clone continues to run and is available for use.

This is a hobby operation and has limited resources so purchasing a license for Enterprise Backup is out of the question. Taking the whole thing offline to backup probably doesn’t bother anyone except me. Still, I did want to be able to continue to run the server while the databases are being backed up. Enter logical backup.

It didn’t take long to find the command, mysqldump. Confirming that it would backup everything including user names and passwords so all the accounts got restored with all the data took longer.

Despite my best search-fu I was unable to find any documentation that explicitly says “do this” to back up user accounts in addition to system databases and other databases. Let me fill that gap by saying “do this to back up user accounts, system databases, and other databases”. mysqldump -u root -p -h server.yourdomain.org --all-databases -r backup_file.sql. I did find the preceding command as the backup command. Nothing I could find said this backs up user accounts and system databases. I tested it. It does.

With the backup done, the next step is restore. And confirming the restore works as expected. Another case of things that otherwise seem obvious not being declared in the documentation.

Restore from the command line looks like this: mysql -u root -p database < backup_file.sql. But wait, I want to restore all databases. Search-fu failed again to find any explicit instruction how to restore all databases and what database to name on the command line.

Try the command without naming a database to see if all are restored. No, that fails. Then a flash of insight. Create an empty database, name that on the command line, and try the restore again. It works!

$ mysql -u root -p
> create database scratch;
> exit
$ mysql -u root -p scratch < backup_file.sql

Did this a few times and then restored the tables. As far as I’ve been able to determine the restore is an exact replica of the backed up data.

It seems odd that important use cases, complete backup of database server and complete restore of database server aren’t clearly documented. The information is there but important nuggets are left out. The only way to be sure you’ll get what you need is to experiment until you’re able to produce the results you need.

So yes, do the research but also just do the work and inspect the results. When research doesn’t clearly answer the questions backup it up with experimentation. Do both and get a result faster.