« Yay! Private pilot license | Main | Rita "fun" »
September 15, 2005
Web programmer stabbing
Timeline:
Late June, Early August: We notice that at certain times of the day, websites are just outright inaccessible. It seems unrelated to a previous problem with a stray webdav semaphore file, and to be more related to the log analysis that ran every four hours. Over the next week or so I index the mysql tables used for storing Apache logs.
Aug 22-23: Severe user complaining is brought to my attention. I move the log analysis job to only one run a day (during the early morning, when it should not affect usage). There is still some slowness, so I put up a Zope 2.8.1 test server (for eventual migration).
Aug 31: After a week of little testing by the web programmer, it's requested that I instead make the NetApp NetCache box we have sitting on a rack (that was NEVER POWERED ON, and about which I HAD NO DOCUMENTATION) work to cache pages.
Sep 02: I get the NetCache configured for a test site, and notify my boss and the programmer.
Sep 04: I'm told to go ahead and start moving live sites to the cache for testing. I submit a request to get its port opened through the TAMU firewall.
Sep 06: I rerequest the port be opened. Urgently.
Sep 08: CIS helpfully closes the ports to the production webserver. Stabbings ensue.
Sep 13: I'm told that www.isc.tamu.edu and some other site I setup for testing around the 6th are fine, and to start moving more production live sites over. I move about 15 hosts over. I request that the programmer do even further testing on the 14th on the new site.
Sep 14: A fairly important site is found to be down. The web programmer apparently tried to enable page caching using plone.org documentation.
Sep 15: My boss relays frantic emails from users. The NetCache is delivering pages of logged-in users to everyone. Login credentials themselves don't seem to be affected, but this issue is considered to be huge and cross-browser.
Now, we got the problem fixed eventually (web programmer had to add a no-cache pragma for things to work again), but I still gotta wonder: WHERE WAS THE TESTING? Look, I'm lazy, I admit it. But testing the functionality of websites I shouldn't even have logins to seriously is not my responsibility. And if it's going to be my responsibility, I need to know ahead of time so that customer-visible changes get tested, and less shit gets broke.
Oh well.
Posted by jeff at September 15, 2005 04:07 PM