Semi Protection

UESPWiki:Administrator Noticeboard/Archives/Server Upgrade

The UESPWiki – Your source for The Elder Scrolls since 1995
Jump to: navigation, search
This is an archive of past UESPWiki:Administrator Noticeboard discussions. Do not edit the contents of this page, except for maintenance such as updating links.

Server Upgrade

I've been thinking of upgrading the server for a bit now and it may happen in the near future...either within the next week or at the end of September (trying to time it so I'm around for a while after it happens).

So what does this mean for the regulars and the rest of the site's users? The majority of the upgrade will be transparent to anyone using the server as it mainly involves setting up the new server and transferring everything from the old to the new. The main 'hiccup' will occur when actually switching from the old to the new server. It will involve locking the old server from edits shortly before the change (1 hour or so), updating the site's DNS entry, and then making the old site redirect to the new site. If everything goes well the new site can unlocked after a few hours.

There may be issues during the switch over which may cause the site to be unavailable in part or in whole or prevent it from being unlocked for a day. Basically this involves how well the site handles redirecting to the new site when the domain still points tot he old site. Unfortunately there is not a whole I can do to test this before hand.

Thoughts and comments welcome as usual... -- Daveh 18:00, 13 August 2007 (EDT)

Yay! --Wrye 21:13, 13 August 2007 (EDT)
Quick Update -- I've placed the order for the server and am just waiting for the payment to go through and the server to be setup. The current and new server are compared in the below table:
Old Server New Server
Celeron 2.4 GHz

2x 80GB IDE
1000GB Monthly Bandwidth
115.54 $CAD/month

Intel P4 2.4 GHz

1500GB Monthly Bandwidth
115.54 $CAD/month

So a small increase in CPU, twice as much disk space, and more bandwidth (we'll probably break 1000GB this month). The big difference is the x4 increase in RAM. This will let me do a bunch of stuff, like install MemCache, which should noticeably increase the site's performance and allow it to server more content. With only 512MB on the current server there isn't really too much I could do. Of course the biggest thing might be we may be getting all this for the same price as the old server (still waiting to confirm exact price).
Once I actually get the server I'll post a small blurb on the front page but it will likely be a few more days past that when the actual switch occurs. -- Daveh 17:40, 16 August 2007 (EDT)
Update -- The server was released to me this afternoon and I confirmed that its the same price as the previous server (which is a great deal). The server is 'naked' so it will take a few days to get setup properly before I make the actual switch. -- Daveh 17:32, 17 August 2007 (EDT)
Good grief! You mean this whole site runs off half a gig of RAM at the moment? That should definitely make a huge impact. Thanks, Daveh. --RpehTalk 00:52, 17 August 2007 (EDT)
Yah, it did very well despite being about the lowest-end dedicated server you can get. -- Daveh 17:32, 17 August 2007 (EDT)
Yay, we're finally getting away from the Celeron processor that had horrible performance even when it was introduced! Now we've just got a P4 that has horrible performance next to today's new processors! --Ratwar 19:57, 17 August 2007 (EDT)
Unfortunately to get a similar server with a better processor the prices start at around 250$ a month (since the new server was a special). I'm hoping that with the extra RAM and installing MemCached we won't have to worry about the server being CPU limited. -- Daveh 20:44, 17 August 2007 (EDT)
Update -- As mentioned on the front page the switch over to the new server will happen Tuesday morning, August 21st. Everything has been setup on the new server and it is currently replicating the content on the main site. Hopefully there will be no issues tomorrow morning. -- Daveh 21:19, 20 August 2007 (EDT)

Upgrade Done

The switch-over this morning went pretty well and the Wiki/forums were locked for a little over an hour. It would have been much shorter but the database replication had some errors so I did a complete DB backup/restore from the old to the new server just to make sure everything was synced correctly.

Note that the old server is temporarily redirecting to the new server Once the DNS changes apply in a day or two then everything will go back to using as usual.

Let me know of any issues that pop up. There are a few things left to do that I know of:

  • Oblivion map is not working. I think the problem will resolve itself one the DNS change propagates. I think if I try to fix it now it will just break again in a day or two.
  • E-mail server not tested (doesn't affect the Wiki).
  • FTP server not installed.
  • Server settings not tweaked (still using the same settings as the old server for now).
  • Make sure everything is copied from the old server before letting it go (a few misc/backup files are left to copy).
  • Setup automated backups and live-replication on the new server.

As I mentioned on the main page news, expect the server to be restarted now and then (downtime of 1 minute or less) and the occasional slow-down over the next few days. -- Daveh 09:58, 21 August 2007 (EDT)

I confirmed that the Oblivion map works fine once points to the new server (due to caching this will happen at different times for everyone). -- Daveh 12:41, 21 August 2007 (EDT)
Thanks Daveh! Of course now it's time to start listing all the little problems that come up :) So far, just one. Trying to add a new spam site to UESPWiki:Spam Blacklist returned a database error:
A database query syntax error has occurred. This may indicate a bug in the software. The last attempted database query was:
   (SQL query hidden)
from within function "SearchMySQL4::update". MySQL returned error "1034: Incorrect key file for table: 'searchindex'. Try to repair it (localhost)".
It looks like the change was made to the page, but I'm guessing that perhaps internally the blacklist regexp wasn't changed. --NepheleTalk 13:06, 21 August 2007 (EDT)
I got this error too on an edit I made around the same time as yours. I repaired the searchindex table and subsequent edits seemed to save fine. You might try doing another edit/save to make sure. -- Daveh 13:46, 21 August 2007 (EDT)
The server seems to be experiencing occasional hiccups where it decides to just reject all of your connections. This happened to me once yesterday then again just now, and based on IRC I may not be the only who's noticed it.
What just happened to me is that I made an edit to a page, posted it, and then immediately went to do a few other things (mark some edits as patrolled, refresh recent changes, etc.) in other tabs. All of the requests failed. A few timed out after 2-3 minutes; the others were returned with 503 status (service temporarily unavailable). Attempts to refresh those pages or other pages all failed, for a duration of about 5 minutes.
While my other connections were all timing out/failing, I was able to get a server status, which showed that the server definitely was not overloaded (10 active requests, 13 idle servers). The odd thing is that one of those ten active requests was the POST request from the page that I had edited. It was still showing up as an active connection (status W) in the server status, even after the request had already timed out at my end. Furthermore, the edit request had been accepted by the wiki [1] at 20:22, but at 20:27 the POST request was still active in the server logs.
So based on this one case it seems as if occasionally POST requests are just hanging for some unknown reason. And while that request is in limbo, you're prevented from doing anything else until the server releases the request. I'll keep paying attention to see whether the problem continues to occur and, if so, whether the pattern seems to be the same. But I figured I'd post this now to document it (instead of relying on being able decipher my scribbled scraps of paper later!). --NepheleTalk 20:53, 22 August 2007 (EDT)
The 503 error is the mod_limitipconn limiting you to 3 concurrent connections at a time. I haven't seen any problems with POSTing myself but will poke around and see if I can find anyhing. -- Daveh 21:03, 22 August 2007 (EDT)
Well, it just happened to me again :| It's definitely not every time I post... it's only happened 4 times since you switched to the new server, and I've made a lot more than just 4 edits. But it does seem like it's somehow the post that is triggering it. When I can finally get a server status, I can see the 3 connections from my IP to the server, and the oldest of the three is the post.
The strange things seem to be that (a) it's keeping a post connection active after it's obviously fully processed the info (it's updated the wiki page already) and (b) it's keeping multiple other connections active even after they've timed out on my browser's side. So I'm getting 503 messages even when from my side it looks like I only have one or two active requests (unlike most of the time when I see 503's, when it's quite obvious that it's because I'm requesting way too many pages at the same time). But the 503's look like they're really just a symptom/side effect of the main problem, namely the hung connections. I'll keep looking for patterns.... --NepheleTalk 22:29, 22 August 2007 (EDT)
A quick note that I'm on the road again for the next week or so and have only have itermittent Internet access. I will try and keep an eye on the site as I can. -- Daveh 19:19, 24 August 2007 (EDT)

Email and Editing Problems (aka 5 minute bug)

Mail services (i.e., sendmail or equivalent) on the new server are evidently not set up properly. Although mentioned above that mail would not affect the wiki, it in fact seems to be triggering multiple problems on both the wiki and the forums.

The reason why mail services affect the rest of the site is that both the wiki and the forums allow email notifications for a range of events. Right now, none of these email notifications are getting delivered. Even more problematic is that any action that triggers a notification is getting locked up at the step where email should be sent, and freezing the associated http request. On the user side, the connection times out after a couple minutes and returns an error message. On the server side, it becomes a frozen connection until apache gives up on it and forces a "graceful termination" after 5 minutes. In the meantime, other requests from the same IP also get frozen.

The specific symptoms of this problem include:

  • On the wiki, no email notifications are being delivered (for talk page changes or for changed pages in watchlists)
  • On the wiki, the "email this user" function cannot be used (using it triggers the 5 minute bug, see next)
  • Editing the wiki has a very annoying and seemingly unpredictable bug where posting an edit completely locks you out of UESP for 5 minutes -- what I've nicknamed the 5 minute bug. This is the same problem I reported on this page a few days ago, and has since been noticed repeatedly by multiple editors. It's unpredictable, because you don't know ahead of time whether any other editor has requested an email notification for modifications to that page (and it's impossible to know whether the editor has visited the page since the last previous edit, which is also necessary for an email to get sent).
  • On the forums, no email notifications are being delivered (for private messages or for new posts to watched threads)
  • On the forums, posting a message or post with an email notificatino triggers the 5 minute bug.

I'm very confident at this point that all of these problems are related and tied to the mail service. It's quite clear that no emails are being delivered (thanks, Bear, for pointing this out!). And it's also easy to confirm that the "email this user" function is broken. As for the connection to the 5 minute bug, I've done tests today that confirm that when an edit should cause a notification to be emailed, the edit locks up; the same edit without an email notification has no problems. (For those curious about all my posts this afternoon, I was changing NepheleBot's notification status for various pages. Turning it on and off in various combinations allowed me to test what would or would not hang, and I was able to reproducibly trigger the bug.) Furthermore, this explains why the 5 minute bug first appeared after moving to the new server. Finally, checking the wiki code confirms that after an edit is added to the database, but before it is auto-marked as patrolled, the wiki takes care of email notifications. And the 5 minute bug is clearly happening in the interval in between adding an edit to the database and patrolling the edit.

Hopefully now that the source of all these problems has been identified, it won't be too difficult to fix them! :) --NepheleTalk 18:14, 29 August 2007 (EDT)

Ahhh, nice work tracking this down. I just got back from my latest trip and should have time to look into this in the next few days. -- Daveh 11:08, 30 August 2007 (EDT)
Do you think this should be mentioned on the main page, at Latest News/Server Upgrade? --GuildKnight (Talk) contribs 02:02, 31 August 2007 (EDT)
Although very annoying, it is still a problem that only affects editors, not wiki readers. In general, main page news updates have only been used for items that will be of interest to readers as well as editors. --NepheleTalk 02:27, 31 August 2007 (EDT)
Ah, OK. I was just thinking that because, being only semi-active, I use only emails to stay up-to-date on the changes of my watched pages. So I don't check my watchlist or the recent changes pages often. When I didn't get any emails for a while and opened my UESP link to check to see if it was down, I saw that it was working properly and thought for a while there that everyone had abandoned the wiki :( --GuildKnight (Talk) contribs 02:40, 31 August 2007 (EDT)
Good point :) I hadn't been thinking of that aspect of the problem... although it's still mainly going to affect editors. This probably needs to be highlighted on the Community Portal, since more editors check that page than the Admin Noticeboard. Ideally it would be good to somehow send out an update to anyone who uses email notifications once the system has been fixed, but given how the system works that's pretty difficult to do (in particular, an email is only sent out the first time a page is updated after you visit it. So any followup edits to pages won't trigger new emails at this point).
I'll try to remember to follow up on this tomorrow morning... at the moment it's getting late for me, and I know when I hit submit on this post I'll trigger the bug ;) --NepheleTalk 03:40, 31 August 2007 (EDT)
I've added a warning which appears when you edit a page - not sure if there's a better place to put this, but it serves for now. If you've got a better idea, feel free to move it somewhere more appropriate. --TheRealLurlock Talk 11:23, 31 August 2007 (EDT)
Update -- It seems the issue was just a bad sendmail link (pointed to the old mta instead of the new qmail). I've tested on the Wiki and forums and it seems to be able to send mail fine now. There is still an issue of mailboxes on the server not working (i.e., but this should only affect me. Just let me know if there are other issues of if this issue is still present. -- Daveh 13:54, 1 September 2007 (EDT)
Fantastic! A couple of quick tests later, it looks like that's solved the problem. You've got me intrigued about the possibility of having an account now, though! Thanks, Daveh. --RpehTalk 14:03, 1 September 2007 (EDT)