TL;DR version:
My main concerns are:
1. This would be moving out of the control of the lab by going to a personal cloud account
2. The increased cost of maintaining the hosting
3. Was handled without proper governance by the OPs team. Afterall, we are the ones who maintain the infrastructure at the lab.
In my opinion, this proposal should be pulled back and let’s discuss this further. Perhaps it is the same path that Bob is proposing,
perhaps not, but I would like to see us come up with a more solid plan to move forward that does not cost the lab any additional
financial burden at this time.
Unfortunately, I will be out of town for the May 23rd HYH. I would like see this postponed until the following HYH (June 13th) so
I can be in attendance and available to discuss/answer any questions people may have.
Long version…
I would like to re-state the following top concerns/observations I have and the “guidelines” I follow when implementing solutions at
the lab that relate to Operations.
1. From an Operations perspective, we need to make sure we are implementing some form of governance around how we host software and implement solutions for the lab.
2. We have had issues in the past that have caused problems with hosted solutions, so we need to make sure that we do our due diligence to prevent those scenarios from happening in the future.
3. We need to be cognizant about how HSL spends its finances and take advantage of using what resources we have available, or are provided, to us without additional financial hardship.
Now, as to addressing Bob’s responses:
> Q: Why do you think you're better at operations than Jeff?
A: What does this have to do with the Wiki? Why are singling me out as the bad guy for having an opinion and voicing concerns that I
have about it being hosted under a personal account and the cost of operating it?
> A: MediaWiki happens to be in my specialized wheelhouse. I've been
> adminning production LAMP stacks for 28 years. My largest fleet of
> unmanaged hosts was 120 boxes at Apple. My largest fleet including
> managed hosts and serverless was at Amazon where I was tech lead then
> tech manager on a team with $6.5m of annual AWS spend. I mostly do data
> pipelines and ML; but for 23 years I've always had at least one wiki
> running, usually a few.
As I mentioned above, we need to make sure we are implementing solutions that are properly governed by the lab. Based on Bob’s answer,
it sounds like he has experience in the software field and should understand the need for governance with situations like this compared to
just going out and doing things on one’s own under the “do-ocracy” banner. While I am a fan of going and doing things to make progress,
doing so in a path that has potential detriment to HSL is not the correct way to do things.
> 1. Recent version of MW, regularly updated.
This can be handled by us, and in fact has already been setup to handle upgrades as needed on the new platform
> 2. Nightly tarball of the recovery content, publicly accessible. (this
> is a "run over by a bus" solution for non-hierarchical orgs - anyone can
> pull nightly and clone the server at their pleasure)
Again, the Database is already performing nightly backups.
> 3. IPv6
Why do we need IPv6? What benefit is there from it? While it is in use, it has not been widely adopted in the Internet community
and just adds additional complexity.
Now, I would like to address some other comments from the Wiki Upgrade thread.
> We rely upon outside services. We have dependencies on GitHub, Google
> Groups, Google Calendar, Slack, DropBox, PayPal and more. Most of those
> have lock-in problems.
Yes, we do rely on well established cloud solutions that are experts in their field and do not cost HSL anything, or a very small fee for
something we can not do ourselves (PayPal). Google, GitHub, Slack, Dropbox, etc provide free services to us. I know I do not want to
be on the hook for handling credit card payments, I’d rather leave that to the banks/companies that specialize in that.
I am not against using services in the cloud when they benefit us and do not cause extra drain on our finances and are under HSL’s
control. We have had issues in the past where a cloud service is under someone’s personal account (with good intentions) and
then something happens and we lose access to whatever they were doing and things break. I would like to prevent that from happening
again.
> This does not have lock-in. This is hosting, only. The server is vanilla
> Deb 12, and the backups are published nightly, ready to be restored by
> anyone who can do the work. Lift and shift is a snap.
While this might not be “vendor lock-in” in a sense, it being hosted on AWS, what is the process for moving it to another provider if
we so choose? Could a non-technical person do it?
> The in-house network is flawed. We do not have the resilience of a
> hosted solution. When I tried to work with you on the upgrade, you said
> we don't have remote access. We don't have IPv6. It's fun to run our own
> machines, and they should be used for fun. They are not production-grade.
>
> They should not be used for mission critical infrastructure.
I disagree with the above statements, and in fact I feel it is a slap in the face of the team(s) that have kept things running for 10+ years (I know
it is longer, I’m rounding down). It has taken them a lot of sweat and time to keep the lab up and running and to make sure we are able to do
what we can. Now, is the infrastructure outdated and need some re-architecting? Yes, but that does not mean we do not have the talent to
do it, whether it is on-prem or in the cloud, this is something that should be discussed by the OPs team for a path forward that benefits everyone
while making things easier to maintain. As for the remote access, it is locked down for security sake. Proper protocol is to not allow remote
access directly to servers.
> I started asking for a Wiki upgrade 3.5 years ago. I've documented the
> process multiple times. I started discussing it with you 16 months ago.
> HSL has had the chance to do this, and has not.
I agree that I have dropped the ball on getting the Wiki upgraded in a timely manner and that is on me. More discussion below.
> This past September, you were amenable to my approach. When I arrived,
> you said I could not have access to do the work. Then you said you
> didn't want to do it they way I do it. Then you asked me to walk you
> through it the way you want to do it.
1. You were wanting remote access to the server. We do not just give anyone remote access to the servers.
2. You were wanting access to the ROOT user. We do not just give anyone ROOT access to the servers. That is a huge security risk.
As for “doing it my way”, perhaps there was some kind of disconnect, but I am not the Wiki expert and I had no “way” to do it. I was
deferring to your expertise. As I told you, I had the Wiki upgraded and ready to migrate the data over and wanted your help with the
best path to do that. Unfortunately we seemed to be at a disagreement and did not get the data migrated.
> HSL owns ~[
wiki.heatsynclabs.org](
http://wiki.heatsynclabs.org/)~, which should be pointed at the new host,
> ASAP. heatsynclabs.wiki is a placeholder. If we can find a good path
> forward, I'm happy to transfer it.
Having a “new domain” again introduces another cost of at least $10/year that the lab had not been made aware of, along with the
approximately $7/mo expense of the hosting the Wiki. While they are small, all these small expenses add up. Again, while I appreciate
you initially covering these small expenses, by your own admission of “I'm happy to pay for it for the foreseeable future” how long is that
future? What happens when that is no longer the case? HSL will be on the hook for yet another nickel and dime expense that “we” did
not sign up for. Also, who is going to maintain it if/when you lose interest in maintaining it, or it becomes a bigger job than you want?
> The obvious reason that the wiki needed to be upgraded is that the current wiki is jacked, and has been as long as I've been here.
Aside from being outdated, how is the wiki “jacked”? From my understanding, several years ago it was hacked and spam was added
to the wiki, but it was my understanding that it has been remedied since then.
> nightly backups and a disaster recovery plan that gets tested regularly. I want to invest time and
> energy into making pages on the wiki, but I'm not comfortable doing that without a solid recovery plan.
>
> Second would be regular upgrades for security. Bugfixes for user-facing
> issues are nice; the security ones are important.
I agree with this as well and is something that should be handled accordingly.
> Bob is creating a new server because just installing updates on the old one is not feasible.
Actually it is very feasible and steps have already been preformed, as mentioned in my response above. The last steps are:
1. migrate the data to the updated version of Mediawiki
2. point
wiki.heatsynclabs.org from the old server to the new server (that again, is already up and running, just needs #1 above done)
> That is a legitimate concern. I would like the end result of this
> process to be a single wiki, administered by HSL.
This would still be the case no matter where it is hosted, in the cloud or on-prem.
> A hosted server gets us some resilience and network features that would
> be difficult or costly to replicate internally.
We already have the infrastructure in place for this and have been doing it for 10+ years.
> HSL should own and control the outside wiki as soon as it is able. I
> will be publishing a disaster recovery doc which will work as a
> migration doc. Deploying a replacement server is essentially the same as
> recovering from an HDD failure. As soon as the HSL server meets our
> needs, I'll kill the one hosted on my account.
Since the server hosted at the lab meets our needs, I would rather see an effort in getting this “ask” over the hill and coordinating getting
the data migrated than to spend time working on a cloud instance and then moving it back to the lab.
> I agree with running it by HYH. That has been and remains my intent.
Was this discussed at a HYH? I did not see it on any HYH agenda for discussion. Perhaps I missed it.
My apologies for the long response, I wanted to address some of the more pressing comments I saw.
My main concerns are:
1. This would be moving out of the control of the lab by going to a personal cloud account
2. The increased cost of maintaining the hosting
3. Was handled without proper governance by the OPs team. After all, we are the ones who maintain the infrastructure at the lab.
In my opinion, this should be pulled back and let’s discuss this further. Perhaps it is the same path that Bob is proposing, perhaps not,
but I would like to see us come up with a more solid plan to move forward that does not cost the lab any additional financial burden
at this time.
-Jeff