Monday, May 11, 2009

Developers: What your System Admins want you to know

Okay, I gave the designers their say: now it's time for the system admins. I wandered in to the gritty depths of NASA HQ, seeking out the agency's most battle-scarred warriors: the Unix team. These fearless men are many times all that stands between the chaos of the development environment and the unspoiled lands of production. At least once a week, missives of frustration are sent to me about what one of 'my people' have done this time. I would offer to smite skulls of their sworn enemies, but my steel is at home rather than by my desk. Instead, I offered them the chance to tell me what they would have developers know, upon threat of death.

I also have the feeling they're Conan fans, so I wrote the above just for them.

So what do your system admins want you to know?

Communication is king

One of the biggest complaints is that no one communicates with the guys running the servers. Don't underestimate how much keeping them in the loop can save you a ton of trouble. They know stuff. Lots of it. Like, what's already on the servers, and what's a bad idea to do, and what's crap and what isn't. If you don't know something, they can fill you in.

No, you may NOT have a compiler on production!

Compilers on production are BAD. Stop asking for them. The proper thing is to compile elsewhere, the upload everything as a package. It's one extra step! Considering all the steps you people will take to edit something in vi or emacs, it can't be that big a deal.

Instructions must be uber-complete

Ideally, when you hand over directions to deploy your application, they should be complete. Really complete. Ops should be able to pull a random admin off the street, hand them the directions, and shout 'Go!' and have the application install perfectly.

Don't script your installs

Which brings us to the next point: scripting your installs. Hey, why not write a script to do everything? Then your instructions would be super short, easy to follow, and fail-proof?

Wait, did we say fail-proof?

One problem with scripted installs is that they sometimes fail. Production should mirror development. Almost always, though, it doesn't. If an admin is just given one line to run, but no clues as to what the hell it's doing, if it goes tits up he can't help. Sys admins are good at troubleshooting, but no one can troubleshoot a black box.

If you DO script your installs, TEST TEST TEST

If you're still so dead set to use scripted installs, test the holy hell out of them. Do it on fresh systems. A polluted system you have lying around. Have someone else run through it without you tossing helpful hints at them.

ASK FOR DIRECTIONS

Played out jokes about men asking for directions aside, sys admins really are there to help. They know the systems you try to break inside and out. They've seen all sorts of things go awry and know how to fix them. Sure, they're not always up on the latest and greatest toys, but you'd be surprised how many times your issues with your new-fangled device are not new at all.

Do try to show up

If you're doing a deployment, try to show up. Where I am, developers are required to attend any deployment just in case. If your work doesn't require this, try to go anyway. You don't want them shrugging and restoring from back-up because something that you can troubleshoot right there.

Don't be a single point of failure

Single points of failures kill sys admins. If something goes wrong with your app, they should have more options than to hunt you down. Make sure another developer has at least some knowledge about your system. Document the holy hell out of it. Educate them about it! Because if they do have to hunt you down, it won't be pretty.

DON'T PUT YOUR DOCS THERE

For love of all that is good and right, put your docs where the sys admin can find them. Not in the code. Not in some weird sub-directory buried three levels deep. Not scattered amongst the ruins of mistreated wikis and ticketing systems. Top level. With nice names like README and INSTALL.

Sudo is a privilege , not a right

Just because you can do something on a production system does not mean you should.

The world is bigger than you and your program

Developers get to see a very small slice of the universe. They don't usually have to think about thinks like "Do we actually have a server to put this on?" or "Where are we going to plug this in?" or "How will a grunt restart this if everything goes down?"

Talk to your sys admin to see how your app fits into the larger ecosystem of operations center.

Do time

The developers the sys admins have the best relationships with have been sys admins themselves for several years. They understand the complications that can happen and have been through the fires when a server goes down, and they can't get the apps on it to come up again. They don't do stupid stuff that's easily avoided.

7 comments:

encolpe said...

All these requirements can be reached by python appication now:
- virtualenv to setup a repeatable environment
- zc.buildout + collective releaser to build binary release (it still misses a packager system)
- fabric to deploy your application across the network

All these tools are testable, if developers want to...

Chris Adams said...

I'd shift direction slightly: don't script your installs by hand, do script them using the package feature of your operating system. Being able to push out packages with dependencies, upgrade handling, etc. saves the sysadmins a ton of time but not if it involves someone's attempt to reinvent dpkg/rpm by hand in 30 lines of bash.

pyDanny said...

encolpe: You list some good tools, however I disagree because Katie's post had a lot more than technical issues involved. While your set of tools is really useful, it doesn't address the issue of developers like me dumping off a set of packages and commands and then running to the bar for happy hour.

Not that I've ever done that. Really. Not me. Not at all. ;)

Steve said...

I can only add one thing: remember that the admins are human beings as well! Too often they are treated as a separate race who live apart from normal human frustrations. So if you *haven't* done time as a sys admin, at least remember they are (despite their God-like qualities and feeble propensity to consume fantasy literature) pretty much like you and me. Except they enjoy dicking round with computers even more than we do!

~K said...

Can I hug you???!!! Oh and here is one more...don't wait until the last minute and then foist your heaping pile of Shit on the QA people, then leave for a convention with your fellow developers while everyone in the office goes code Red because customer's servers are now crashing across the country! :-) QA should be validation, not a festering pool of trouble tickets.

Chris Adams said...

~K: at times it felt like I needed to remind certain developers that QA was supposed to mean "We think it's ready to release", not "It's been 6 months - let's get someone to try running this!".

Licio Fernando said...

hi!
I really liked this post! With your permission, I'd like to translate it and post at my blog(of course with all credits). Is there any problems?

Thnks!