Update: Improved chat server restarts

I still don’t get it. You’re saying you don’t want to reconnect all clients immediately on a restart, to avoid thundering herds, but there will be an IRC event that makes the clients instead open another connection to rejoin the room? Wouldn’t that be the same as creating a thundering herd, only instead of the disconnection event there’s an IRC event triggering it?

Also, 30 seconds to rejoin rooms? As you can only (optimally) join 5 rooms per second, without counting the authentication rate limit, that’s going to be max 150 rooms in that duration, which is basically nothing. It’s currently taking me more than 3 hours to rejoin rooms on full restarts, and that’s not accounting for all the time it takes to get the moderator status, which often only occurs more than 10 minutes after a join. So we’re talking a significant amount of downtime here.

EDIT: If you’re doing lots of changes on the edge servers, wouldn’t it be better to have a dedicated edge server for this (I assume they run independently from each other), or use hot swapping? Don’t you even have dark launching for chat?