USENIX supports diversity, equity, and inclusion and condemns hate and discrimination.
Bespoke, Hand Crafted Engineer
I have a problem. Well, several. But specifically, I’m not trained as an engineer. That’s a bad thing, because the further I go in my career in IT Operations and software development, the more like an engineer I’m expected to be.
To be honest, I’m not actually trained as an IT Ops person, either. Like most people, I sort of backed into the job. I was a tech support agent who knew Linux, got taken under an admin’s wing, brought onto the team, learned and grew, and gradually got more responsibility.
Throughout my career, I’ve been able to rely on my ability to teach myself whenever I’ve encountered new technology and new experiences. Often, I’ve had to. I made poor decisions early and unknowingly went to a diploma mill college. Maybe I can be forgiven - I was actually the first person in my family to go to college, and none of us really knew better.
But now, I find myself almost 15 years into my career, reasonably good at what I’ve been tasked with, and I can see the direction things are going. I know that I’m not reaching the end of what I can do, but more likely, the end of what I can teach myself. And honestly, probably the end of what I should teach myself.
The most important infrastructure I’ve ever run was a risk analytics service that had tens of billions of dollars under management. To be up front, we didn’t hold those tens of billions of dollars, we held the asset positions for those funds. Our company knew what assets our clients held, and we ran analytics to create reports and provide scenario outcomes for our clients so that they could make investment decisions. One question that I dwelled on was that of how important my infrastructure was to the people who used our service.
It’s tempting to shrug and observe that I wasn’t actually dealing with the finances themselves, so if our service was down, the value of the loss was roughly equivalent to the amount our clients would have paid for our service for that time, but that’s a naive perspective. The real loss is the difference that our service made to the client’s decision making. We were responsible for influencing billions of dollars of decisions. If my infrastructure was unavailable, it’s conceivable that millions of dollars could be lost because of a lack of information, or the “fog of war”.
I don’t want to be responsible for a financial loss of millions of dollars because I didn’t do capacity planning correctly. In my current position as an administrator at a university in Boston, I don’t want to be responsible for a researcher losing a grant because I miscalculated the rebuild time of an array and someone didn’t get something published by a deadline.
There are ramifications to my job not being done correctly, and there are ramifications to your, as well. They’re probably not always (or even usually) dire, but how many times does it have to be?
The point is that I don’t feel like I have learned enough to be responsible for what might very well be critical infrastructure. It doesn’t sound like it’s critical, but I don’t know what all my infrastructure is used for, and what relies on it. And I just run a university network. I know people who run hospital networks, and in a lot of cases, they have less training than I do.
The larger infrastructures get, the more we need to treat them as true systems, in the classical systems engineering methodology, not in the sense of a throwaway job title we give away sometimes, like “tech support engineer”. No, I mean, Systems Engineering.
But I don’t have an engineering degree. And I have never heard of an engineering degree with a specialization in IT Operations. But I feel like that’s what I need to really be responsible in my job. If I were going to design an intersection in a major city, I would need to have a degree in civil engineering, but if I want to run the IT infrastructure that the intersection relies on, all I need is a high school diploma. In what world does that make sense?
Until there’s an option for people like me, or at least until I find it, I’m going to keep reaching out and trying to learn new things. Sometimes, that might be something like Khan Academy, or PluralSight, but often, I feel like I’m on my own learning how to get and apply relevant training to my career. And I feel like I’m fighting a relative uphill battle to even get people to see that these things aren’t just lacking in my career, it’s most of us who are lacking these skills.
This pervasive thought of “why would we care about (for example) statistics?”, is something that I’m always mystified by, so it was with a lot of joy that I saw that LISA14, the conference I had been helping find content for the invited talks track, was going to be featuring courses on Statistics for Ops and R for SysAdmins.
Both of these classes told me that the level of metrics I’m recording needs to be much more granular. I’m getting 1,000ft views, and I need to get floor level. Instead of free disk space, I need to watch IO. It’s impossible to correlate the behaviors of my systems without this low-level information. I can’t divine the true nature of the world through shadows. I need to turn around and watch the dancers in front of the fire.
Classes like these aren’t going to make me an engineer. That’s well beyond the scope and goal. But they can increase my capabilities, and by learning, I can improve myself. I’m going to be a better administrator than I was before because I’m getting the tools I need to analyze my infrastructure critically.
There’s no end to what I have left to learn, but all I can do is keep going, keep adding to my knowledge and skills, and try to be better tomorrow than I was yesterday.