This is an OpenPGP/MIME signed message (RFC 4880 and 3156)
On 2024-07-20 at 22:07, Jeffrey Walton wrote:
On Sat, Jul 20, 2024 at 9:46 PM The Wanderer <[email protected]>
wrote:
On 2024-07-20 at 09:19, jeremy ardley wrote:
The problem is the Windows Systems Administrators who contracted
for / allowed unattended remote updates of kernel drivers on
live hardware systems. This is the height of folly and there is
no recovery if it causes a BSOD.
All the sysadmins involved did is agree to let an
antivirus-equivalent utility update itself, and its definitions. I
would be surprised if this could not have easily happened with
*any* antivirus-type utility which has self-update capability; I'm
fairly sure all modern broad-spectrum antivirus-etc. suites on
Windows do kernel-level access in similar fashion. CrowdStrike just
happens to be the company involved when it *did* happen.
I was around when Symantec Antivirus did about the same to about
half the workstations at the Social Security Administration. A
definition file update blue screened about half the Windows NT 4.0
and Windows 2000 hosts. That was about 50,000 machines, if I recall correctly.
There *is* a difference between this incident and that one, in the form
of the *scale* of the issue. But otherwise, yes, I've seen less-severe breakages of this sort occur in the past as well.
That the sysadmins decided to deploy CrowdStrike does not make it
reasonable to fault them for this consequence, any more than e.g.
if a gamer decided to install a game, and then the game required a
patch to let them keep playing, and that patch silently included
new/updated DRM which installed a driver which broke the system (as
I recall some past DRM implementations have reportedly done), it
would then be reasonable to fault the gamer. In neither case was
the consequence foreseeable from the decision.
Sysadmins don't make that decision in the Enterprise. That decision
was made above the lowly sysadmin's pay grade.
It does depend on the enterprise. In my organization, I'm fairly sure
the people who made the decision at least did so with informed input
from the sysadmins, including specifically the people who were
administering the existing antivirus solution (McAfee).
The situation is recoverable if all the windows machines are
virtual with a good backup/restore plan. The situation is not
recoverable if the kernel updates are on raw iron running
Windows.
The situation is trivially recoverable if you can get access to
the machine in a way which lets you either boot to safe mode and
get local-administrator access, or lets you boot an alternative
environment (e.g. live-boot media) from which you can read and
write to the hard drive.
I don't think it's trivial for some enterprises due to the sheer
number of machines and the remote workforce.
Yeah - after the fact it occurred to me that I hadn't specified that
what this is *not* is *automatable*, which has inevitable consequences
for the difficulty of scaling the solution out.
At most you could provide bootable media which would, when booted to,
fix the issue and reboot. (If you could set things up for that to be
available by PXE boot, and if you have everything configured to try PXE
booting first before booting locally, then maybe you could automate it
with nothing more than telling people to reboot any computer they see
affected? But even that type of solution has its limits.)
I'm guessing the company I work for will spend the next week or month
sorting things out. And the company is a medium size enterprise with
about 30,000 employees. Imagine how bad it's going to be for an
enterprise with 100,000 employees.
Oh, I can.
I've spent a fair chunk of my workday today going around to
affected computers and performing a variant of the latter process.
Once you've done that, the fix is simple: delete, or move out of
the way, a single file whose name claims that it's a driver. With
that file gone, you can reboot, and Windows will come up normally
without the bluescreen.
Unfortunately, I don't see this as scalable. It works fine for a
small business with 100 employees, but not an enterprise.
My own organization has thousands of computers, something like 1000-3000
of which have CrowdStrike Falcon as their antimalware solution. The part
of our IT department which would typically be expected to handle the client-side remediation of something like this (including making and
keeping appointments with remote workers who were impacted) is a maximum
of 16 people, and I believe we're currently working with two positions
empty.
That said, a *lot* of our CrowdStrike-using computers seem to have not
been affected; as far as I can tell, most of them were *off* for the
entire active-issue period, and so never received the problematic
update. Someone has estimated that only 8% of our total computers are
affected. (I don't know where they got the figure from, but I do know
that "our total computers" includes another 3000-5000 units which use a different antimalware solution, so that's going to skew the percentage.)
It's still likely to take us weeks, if not months, to get everything
affected by this back into working order.
Heads should roll but obviously won't
What good would decapitation do, here?
I think it's a figure of speech; not a literal.
Indeed. I was simply extending the metaphor.
At most, CrowdStrike's people are guilty of rolling out an
insufficiently-tested update, or of designing a system such that
it's too easy for an update to break things in this way, or that
it's possible to break things in this way not with an actual new
client version (which goes through a release cascade, with each
organization deciding which of the most recent three versions each
of their computers will get) but just with a data-files update
(which, as we have seen here, appears to go out to all clients
regardless of version).
At minimum, it is negligence.
Agreed.
The first would be poor institutional practice; the others would
be potentially-questionable software design, although it's hard to
know without seeing the internal architecture of the software in
question and understanding *why* it's designed that way.
In either case, it's not obvious to me why decapitating a few
scapegoats would *improve* the situation going forward, unless it
can be determined that specific people were actually negligent.
The incident affected the company's share price. Shares were down
$10 or $15.
I was watching this over the course of the day, and saw it quoted
starting at "down nearly 15%" before the start of trading, and "down 9%"
after trading had closed for the day. I'm not sure what that reflects in real-world practice, and I didn't see dollar prices quoted.
If the potential issues were not detailed in company literature and prospectus, then the Securities and Exchange Commission might get
involved for misrepresenting risk and liabilities. There could be big
fines, and that will cost the shareholders more money.
All this points to an incompetent board. If someone's head is going
to be taken (figuratively), then it should start with the CEO and
other executives.
I could see an argument for that, although I'm not convinced 100% based
on what I've seen to date. I'd need more information and details, and am unlikely to get them.
--
The Wanderer
The reasonable man adapts himself to the world; the unreasonable one
persists in trying to adapt the world to himself. Therefore all
progress depends on the unreasonable man. -- George Bernard Shaw
-----BEGIN PGP SIGNATURE-----
iQIzBAEBCgAdFiEEJCOqsZEc2qVC44pUBKk1jTQoMmsFAmadNwkACgkQBKk1jTQo MmuI5g//S4CtTq10dGZP262BCtkS5q5sQPMsPrFR4Ry864JiqQldr3xLLXejIzFb 8rfTUTms3SHe/MTsvcouB7viPvW4wJ0sYeJWfOcDudC/DqaoJasLYmKlPY4V4+Wp KGeP/U0/cYRfJWyxdpmy0yqHWgKrUysEekiHZyp7cZXhhKiv3iyRBirbdY9pNGt3 6QNYHEQnOq5zLfDhuJ8layTfA8ICRp8oS2U2iUXNchO60bE4UXSPRFCsl7NJdla3 42v28xrisdAPs9pxJjIjjkIq0ApFvIPYf3bP/HhwaObW+uiGza6E3sS0vim6bU2A hA9LNdfhpDNkMZdpsEe291jfMMKj3DgF9xg8uK+xdhONrWvpBmbzReYwAqsXARXX UwgqFUjVNLBCOW3eOBoYU0bd7d+p21kIHFXYERJl8ePVPcw40e7AkCH1DfNyZuMV RNpJCG3KBgzQyLpmDn0M2ZeaHJ/avXlRdlOJB8MRqtJypz8gxqC96oFvPs1gNrCW A4hi/Nu24q53VBU+YlT0GICl5JKDjsyNIulE8kSGLnCMecTYFHAQjhiB2V6zhY5O 06/6V77X5fIrIN2y7tNLTLWa5pXDKur+yZ4+2TZWyy/xMqBAlIAN5UXV+5bkEoAi a3jpNllmLgpqroTaN53j9wMrmHwX