Microsoft’s Pink/Danger backup problem blamed on Roz Ho
October 15th, 2009
Daniel Eran Dilger
A fourth source has added new insight into Microsoft’s Danger datacenter disaster, pinning the root cause directly on Pink Project leader Roz Ho rather than Microsoft’s third party vendors, which the company is itself seeking to scapegoat.
Microsoft. Your potential problem: our third party vendors’ lack of passion.
Meanwhile, according to a new report by Mary Jo Foley, Microsoft has been (understandably) scrambling to distance itself from the debacle, publicly stating that the Danger datacenter operations were built using a Sun SAN solution and running an Oracle database. (A Danger source earlier casually described the SAN upgrade as involving EMC; the company was quick to insist that it was not involved, even though the SAN’s vendor, as you’ll soon see, was not really the reason for Microsoft’s datacenter failure at all.)
Foley’s article also reported having heard that it was Hitachi Data Systems that performed the update for Microsoft acting as its contractor, apparently because nobody at Microsoft knows how to eat dogfood that hasn’t churned out of the Visual Studio grinder. Well, that and the fact that Microsoft doesn’t sell SAN firmware, which some pedantic critics latched onto to suggest that my source’s use of the term “dogfooding” was such a problem for them that they couldn’t finish considering the rest of the article.
To clarify, my source used “dogfooding” in the context of ridiculing the phrase, specifically writing: “keep in mind that Microsoft has a real hard-on for running the latest and greatest ‘dogfood’ (God, I loathe that word, especially the ‘dogfooding’ verb form, and I refused to use it without sarcasm in my tenure there).”
Apart from casually referencing EMC, the source also noted that Danger’s data center ran using a backend built on Linux, Sun hardware and Oracle RAC, as I reported. So please, united band of twittering pendants, keep spinning your fan blades in your vacuum of rage if that’s the best you got.
They also laughed derisively at the suggestion of sabotage.
Foley also wrote, “I’ve also heard that foul play has not been ruled out because the failure was so catastrophic and seemingly deliberate. Microsoft is supposedly continuing to do a full investigation.” She had earlier linked to my article outlining insider reports that indicated the issue might be related to either overly ambitious upgrades or intentional damage set up by a disgruntled employee.
On an unrelated side note: Foley is a good sport considering that a couple years ago I mercilessly lampooned her for writing that Apple had “licensed the Exchange ActiveSync licensing protocol” in order to gain compatibility with corporate email systems on the iPhone. Her article was singled out my criticism despite being merely one bucket in the torrent of naysaying ZDNet had unleashed in its efforts to drown Apple’s new phone in muddy FUD waters.
I should also note that while I was correct in pointing out at the time that “Apple would still have to write its own implementation of ActiveSync for the iPhone,” I also overstated my case in portraying Foley as Gollum, blindly focused on Windows as ‘the precious’ (an unflattering graphic which has subsequently become a top image hit in Google when you search for her name).
I’m glad she apparently didn’t take it personally, and I apologize if you did, Mary Jo. I also have to humbly admit that, under all the layers of Apple-ostracism that she is apparently contractually obligated to provide for ZDNet, Foley scooped the world on announcing that Apple was in fact licensing EAS for the iPhone, and did so almost a year before Apple publicly announced it. I didn’t directly contradict her scoop back then, but some readers who only took a cursory look at what I wrote were left with the idea that I had, so I might as well apologize for that too while I’m off on this wild Lord of the Rings tangent.
Microsoft’s Sidekick/Pink problems blamed on dogfooding and sabotage
Anyways… here’s what you came here to read
With Microsoft frantically trying to associate its Danger fiasco with everyone else possible in the industry, and particularly upon its direct competitors Sun and Oracle, it failed to point out that an awful lot of enterprise datacenters are running Sun and Oracle, and yet there aren’t regular outages that last for weeks and announce having lost all their users’ data. In fact, the losses Microsoft experienced (and its shoulder shrug response to T-Mobile’s million Sidekick users) are virtually unprecedented in the industry.
Brief service availability problems and isolated occurrences of data lost have hit Gmail and MobileMe, and generate big headlines each time. The difference is that data related to Gmail and MobileMe are designed to be backed up by the user; Danger’s Sidekick was designed to use a mission critical service operating under the assumption that users did not need to act to secure their data themselves. In fact, users had no capability to back up their devices’ data. Even the Outlook desktop sync tool (which Danger provided as an extra fee) synced with the cloud service, not the local device.
This means the Danger disaster is a lot closer to Microsoft’s previous fiascos in trying to migrate Sun-based services like HoTMaiL and WebTV to Windows than with the occasional service interruptions that impact other cloud services. RIM’s BlackBerry service goes off at regular intervals, but users don’t typically have their devices completely wiped when that happens.
The buck gets passed when the bucks stop.
Despite all of its attempts to pin blame elsewhere, Microsoft was running the Danger service not just as a convenience to users but under contractual obligation to T-Mobile.
This wasn’t just a case of Microsoft just deciding that it was tired of serving customers who weren’t paying it enough, like the times it has pulled the plug on its DRM authentication servers, stranding users of Yahoo Music, MSN Music, Major League Baseball, and everyone else who had invested in a Microsoft PlaysForSure partnership that the company decided it no longer needed to honor.
Microsoft had big money on the line with Danger: a major partner joined at the Hiptop, a million paying customers, an SLA contract to service, and top-tier enterprise grade equipment and software running the thing. The problem wasn’t Sun equipment or Oracle software, it was something unique to the way Microsoft was running its very serious obligation.
It was Microsoft management.
According to the source, the real problem was that a Microsoft manager directed the technicians performing scheduled maintenance to work without a safety net in order to save time and money. The insider reported:
“In preparation for this [SAN] upgrade, they were performing a backup, but it was 2 days into a 6 day backup procedure (it’s a lot of data). Someone from Microsoft (Roz Ho) told them to stop the backup procedure and proceed with the upgrade after assurances from Hitachi that a backup wasn’t necessary. This was done against the objections of Danger engineers.
”Now, they had a backup from a couple of months ago, but they only had the SAN space for a single backup. Because they started a new backup, they had to remove the old one. If they hadn’t done a backup at all, they’d still have the previous backup to fall back on.
“Anyway, after the SAN upgrade, disks started ‘disappearing.’ Logically, Oracle [software] freaked out and started trying to recover, which just made the damage worse.”
The problem with this report is that is places the blame, not on a complex Oracle deployment, not on bad SAN hardware or a firmware glitch, not a disgruntled employee with inappropriate levels of access to a mission critical service, but squarely upon Microsoft management.
This management decision was (allegedly) made by the same group within Microsoft that authorized spending $500 million to acquire Danger and take on accountability for its SLA with T-Mobile, botched the development of Pink, spent three years and untold sums developing the Zune brand so that users could sit through TV-style ads before launching Chess on a handheld, lost billions on Xbox and set a new ‘low-water mark’ in consumer device reliability, boondoggled Windows Mobile to the point where even Gartner can’t say nice things about it, and which has responded to the criticism of Apple’s App Store by launching its own software store with far more rules, significant new fees, and far fewer desirable offerings.
This latest report does not exactly fail to fit in with the general incompetence that emanates from Microsoft’s Entertainment and Devices Division. Rather, it seems entirely credible given the increasingly toxic relationship that has been brewing between Microsoft’s reality-challenged managers and its often frustrated engineers.
Why Can’t Microsoft Develop Software for Zune HD?
Microsoft uses adware model to pay for Zune HD apps
Gartner declares Android a second place winner in 2012. Why?
Microsoft sells restrictive new WiMo Marketplace via iPhone ads
Lance Ulanoff and Robbie Bach explain why library size doesn’t matter
A miraculous resurrection of data
How does Microsoft back itself out of this crisis? How about denial. Foley also reported that Microsoft has now announced (but not yet delivered) a reversal of its earlier summation that all of Danger’s data was lost.
In a website message signed by Roz Ho, the company stated, “We are pleased to report that we have recovered most, if not all, customer data for those Sidekick customers whose data was affected by the recent outage. We plan to begin restoring users’ personal data as soon as possible, starting with personal contacts, after we have validated the data and our restoration plan. We will then continue to work around the clock to restore data to all affected users, including calendar, notes, tasks, photographs and high scores, as quickly as possible.”
If the company has stumbled upon a novel recovery avenue or some unknown backup that somehow remained missing for nearly two weeks, then this is great news for Sidekick users and helps to wipe some of the egg from the company’s cloud computing services face, although the situation still remains as the worst datacenter failure to ever impact mobile users as well as one of the most absurd responses pertaining to lost data as well.
The public notice Microsoft posted also states “we have made changes to improve the overall stability of the Sidekick service and initiated a more resilient backup process to ensure that the integrity of our database backups is maintained” (italics mine). Certainly, if the source is correct and Ho ordered the work to continue without a backup, this is an understatement to say the least.
Is this just vaPoRware?
However, Microsoft is also well known for advertising bullshit it can’t deliver. Bill Gates talked up OS/2, floated a vision of Cairo that never materialized, falsely proclaimed himself the Moses of tablet computing, and blew so much vaporware at competitors (Bob, ActiveMovie, DirectMovie, Surround Video, Chromeffects, WinFS, SPOT, Mira, PlaysForSure, Advanced Streaming Format, Soapbox, Longhorn, Surface, Natal, Courier) that it wouldn’t exactly be a surprise if the company decided that the best way to compete with bad news was to generate some distracting good news that just never seemed to materialize after people’s attention spans moved on.
If your attention span wanes after a few of years, it might have slipped your mind that Microsoft demonstrated a blockbuster new version of Windows back in 2003, over a half-decade ago. Tomorrow’s Windows 7 still pales in comparison to the vapor Microsoft issued at the time, as this PDC2003 video highlights. Special highlight: a Pink song that edits out lyrics about “kicking ass.”
Vaporware is exactly Microsoft’s core competency as a company. I hope I’m wrong, and the million Sidekick users who depended upon Microsoft get their data back. But this weasel-worded announcement, issued nearly two weeks after the initial problem, suggests the possibility that the company primarily hopes to provide a pat answer for Windows Enthusiasts to use when denying that there was ever problem.
Ho wrote, “We will work with T-Mobile to post the next update on data restoration timing no later than Saturday.” That’s a status update two weeks after the problem first appeared. And it’s not a recovery due date, just a progress update. Who, two weeks after losing their contacts and other data, isn’t going to just move on and scrounge together their information from other sources?
If Microsoft strings along users long enough, it will be able to pat itself on the back with a “mission accomplished” even if it ultimately never actually delivered anything. It’s like saying you’ll call somebody back after a date and then just waiting until they figure out that you’re not really interested. After two weeks, the party on the other end begins blaming itself for waiting around.
Is it real or is it Microsoft?
If Microsoft can deliver even most of most users’ data, it will be a relief to those users who relied upon it but it won’t erase the the reason why those users lost their data in the first place. The available evidence says this was not because of some unfortunate, unforeseen accident involving a very complex situation, but because Microsoft management decided to play fast and loose with users’ data just to save time and money. That’s pretty outrageous.
If instead it turns out that this latest announcement is just a public relations stunt designed to deflect criticism away from the company until observers decide that the personal pictures and contacts of a bunch of Sidekick-bearing kids wasn’t really anything that mattered too much anyway, then Microsoft’s management failure and all associated lessons pertaining to partnering with the company will simply be erased like so much Danger data. This is beyond outrageous.
And really, the fact that Microsoft is officially trying to associate its datacenter problems with Sun hardware or Oracle software is additional evidence that the company is irresponsible and disingenuous on a professional level. If the company has actually found a way to recover users’ data, it will mean that, despite all sorts of incompetence on Microsoft’s part, a solution built using competitive technology from Sun and Oracle is pretty damned resilient. But Microsoft won’t tell you that of course. It’s too busy suggesting the fearsome idea that products from competing vendors might introduce uncertainty and risk, the exact opposite of the truth.
Further, with this announcement, even if the company has no real data to recover, it will have erected a plausible story for denying anything significant ever happened. Know somebody who actually lost their important Sidekick data? You’ll be able to write them off as “one of the few who didn’t benefit from Microsoft’s miraculous data recovery.” It will be their word against Microsoft’s PR. Nobody will have records of who was impacted and whose data was recovered apart from Microsoft and probably T-Mobile, and the provider will likely have its records sealed by court order when it gets its big SLA settlement from Microsoft.
This all happened before
If this sounds like a conspiracy theory, take the day off and read through the archives of one of the monopoly trials against Microsoft. The company has regularly pursued a combination of criminal and incompetent activities that it was regularly able to later hide underneath big settlements, except for the occasions where court documents have been made public. They’re all quite damning.
There’s plenty of examples of Microsoft’s criminal past being sealed up behind secret settlement payouts:
- A few years ago Microsoft settled with IBM in an antitrust suit involving OS/2 and IBM’s Lotus SmartSuite applications to the tune of $775 million.
- Microsoft paid Novell $539 million to settle its antitrust suit over the NetWare operating system, and Microsoft is still being sued by Novell over claims related to WordPerfect.
- Microsoft paid Palm over $23 million to settle an antitrust suit over the unfinished BeOS.
- Microsoft settled with Sun in an agreement that included $700 million in antitrust and $900 million in patent infringements, both related to Java.
- Microsoft paid AOL $750 million to settle the antitrust suit over Netscape.
- And of course, during the 1997 return of Steve Jobs, Apple settled its San Francisco Canyon case which involved the outright theft of QuickTime code by Microsoft, as part of a secret settlement that involved a number of issues ranging from Office and Internet Explorer for Mac, to a cross licensing agreement, to a truce that prevented Apple from working on voice technologies.
The common thread that runs through the entire, multifaceted Pink/Danger imbroglio is that Microsoft’s management is criminally inept. But rather that being chastised to the point where real changes are made, it appears that the Microsoft-serving tech media is primarily concerned with moving past this issue so that the ineptitude can continue without causing any sort of unpleasant shakeup.
Which is exactly why dinosaurs die and empires collapse.