Site down

1,963 views
Skip to first unread message

Philip Gladstone

unread,
Mar 17, 2022, 7:55:56 AM3/17/22
to psk-re...@googlegroups.com
PSKReporter went down at around 10PM EST last night. I have no idea why -- well it *appears* that the main database table that holds all the data has vanished. This is serious / catastrophic.

It may take a while to recover.

Philip 

Philip Gladstone

unread,
Mar 17, 2022, 8:22:00 AM3/17/22
to psk-re...@googlegroups.com
It turns out that the disk filled up and database (mariadb) got confused during a table modification ddl statement and has now lost(it seems) the .frm file. 

If there are any MySQL experts out there who could potentially assist me, then I'd appreciate an off list note. I haven't done anything yet because I don't want to make things worse..

Philip

Philip

unread,
Mar 17, 2022, 9:45:54 AM3/17/22
to PSK Reporter
For the gory details:

* The main disk of the server ran out of free disk at some point. The database (maria 10) detected this and was, I think, reasonably happy, but it just wouldn't update any data, 
* Early today UTC, two events happen: First a new partition is added to the 'report' table (this is the big table with one row per reported spot). This failed (no space). But it appears to have taken 6 hours to figure that out based on timestamps. Normally, this is essentially instantaneous.
* Then a second script files up (5 minutes after the first one starts) that dumps all data in the 'report' table into a file, and then sends that file to s3 for archiving. Those partitions are then dropped. The partition drop is normally quick. However, today, the dump failed and it certainly didn't drop any partitions.
* The database is not currently executing any queries.
* I have file_per_table set to on.

My guess is that the recovery from the the 'add partition' ddl did not go well and there is a window when running out of disk space is fatal.

I think that I now know what I should do:

0) move the partition .ibd files somewhere safe.
1) restart the db server
2) Recreate the 'report' table schema using a reasonably recent definition that I have.
3) add/remove partitions from this new empty table until the partitions match the files that I have saved
4) shut down the database
5) move the .ibd files back (replacing the new emtpy files)
6) start the database again.
7) see if the table has re-appeared

However, I really don't want to make this worse. by doing the wrong things (or even the right things in the wrong order).

I really would appreciate a mysql expert help me with validating this approach. Please contact me directly. 

Philip

Alexander Horner

unread,
Mar 17, 2022, 12:39:47 PM3/17/22
to PSK Reporter
Hi Philip,

Spotted this from another group.

My first port of call here would be take a copy of the entire data directory in it's current state. This would be before even shutting down the MariaDB service. I would then shut it down and take another copy. This will prevent data losses as a result of the shutdown.

After this shutdown, attempt to start MariaDB again and observe log output.

I am no expert but I use MySQL in production and have played with MariaDB in Docker containers and it is pretty well contained in it's own directory. Taking those initial backups is super important so that your options are open for retries.

Happy to help in any other way I can, do let me know if I can be of use!

Tristan Greaves

unread,
Mar 17, 2022, 12:39:48 PM3/17/22
to PSK Reporter
Hi Philip,

My honest opinion here is that you are best off restoring the database from a backup at this point. 

If you are using AWS (You mention S3...) then I'd also recommend moving the database to RDS as it will help reduce the administrative load such as this

Good luck!

Tristan.

On Thursday, March 17, 2022 at 1:45:54 PM UTC Philip wrote:

Dan Negreanu VE3DUN

unread,
Mar 20, 2022, 10:34:44 AM3/20/22
to PSK Reporter
Good Morning! Is the site down again? It behaves like 3 days ago... . I hope to be wrong.

Philip Gladstone

unread,
Mar 20, 2022, 11:21:06 AM3/20/22
to psk-re...@googlegroups.com
No -- it isn't down -- if you look at the bottom of the map page, it says that there is a processing delay of (ATM) 125 minutes. I suspect that the load is higher than normal today, and there were a couple of IPs that were really pummeling the website, so I blocked those. It looks to me as though the people who hurt the site the most are those running scripts to get data -- and they don't bother to check that their script is working (or they have forgotten that they are running the script in the first place). 

I have some protection in place against this type of usage, but clearly it isn't sufficient. I need to figure out an approach that does this blocking automatically so that I don't have to deal with it on a beautiful Sunday morning....

Philip

--
You received this message because you are subscribed to the Google Groups "PSK Reporter" group.
To unsubscribe from this group and stop receiving emails from it, send an email to psk-reporter...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/psk-reporter/925b5cba-459f-45b3-abd2-04b52a3d9edbn%40googlegroups.com.

Dan Negreanu VE3DUN

unread,
Mar 20, 2022, 11:28:18 AM3/20/22
to PSK Reporter
Thanks Philip. Enjoy the day! 

Michael Barnes

unread,
Mar 20, 2022, 5:01:56 PM3/20/22
to PSK Reporter
The site may not be down, but it isn't doing anything useful, either. It says "Active Monitors:<none>" and is showing no reception reports for the last 12 hours. The Reception Records counter is climbing by several thousand per minute. Processing delay is 138 minutes.

Todd Little

unread,
Mar 20, 2022, 8:29:04 PM3/20/22
to PSK Reporter
Philip, what tech stack are you using?  If you have something like Nginx as a load balancer, it can be configured to rate throttle requests.

Todd, N9MWB

Reply all
Reply to author
Forward
0 new messages