Features

How Microsoft makes its own WLAN secure

by Guy Kewney | posted on 19 November 2002


Microsoft was one of the first large corporations to enable its entire corporate LAN over wireless, starting in 1999 with a rash promise by Bill Gates at Comdex. Three years later, says security chief John Biccum, they have something which is actually quite secure - more secure than the wired LAN. He told delegates at IT Forum how he did it.

Guy Kewney

John Biccum has been "the most unpopular man on campus" at Microsoft. "I didn't want to be the guy who told Bill Gates that it couldn't be done," he explained, "when he said that Microsoft would have the world's largest wireless network within a year." And so he was the man who had to make it work.

That involved making it not work, too - much to the rage of people who wanted to use the wireless network. John was the man who "switched off" 20,000 Pocket PC users by introducing security to his WLAN. Why?

"There was an executive call to action with Bill's speech," said Biccum at the Forum in Copenhagen today, "but there was also the issue of productivity. Microsoft employees are very mobile, and I'm pretty typical in having an office where I'm rarely to be found. So we had to have a network; but when we first put it up, it was very, very insecure."

The problem was the static WEP key adopted. Every access point and every client wireless adapter had the same ststic WEP key burned into it. It provided some security and at the time, WEP attack methods were a vision, not a reality. But the problem with a single secret key is that it was a secret. "And as BlackBeard said - any two people can keep a secret, if one is dead. Having 50,000 network cards with the same "secret" burned into them? It was only a matter of time before it was posted on the Internet - and for some weeks, part of my job was looking on the Internet to see if it was posted."

And the idea of switching it was just a dream. "Reality is that if you have 3,500 access points, you can't just say "On Monday we will switch keys!"

"We can't keep count!"

The scale of the Microsoft WLAN is daunting. "Our deployment has about 3,500 access points, and 30,000 plus client adapters - we can't keep count. It's a lot of plus - the last number was 36,000; but after Pocket PCs came back on, it got to be a whole lot more. All internal laptops today inside Microsoft come with integrated WiFi. We have 72 buildings in Puget Sound; another 46 sites in America. The EMEA region has 41 sites; and there are another 23 in Far East."

And there really wasn't a secure way of getting the key to the clients. You ordered the card and the delivery people burned the key on. The secret was bound to leak out eventually (although, in fact, it never did before it was withdrawn).

"We looked at MAC address filter as an alternative, and it didn't scale. The problem was, if each of the access points was going to have a list of 50,000 client MACs and match it, there wouldn't be any memory left for it to do its normal functions. Also we couldn't associate a MAC address with a user name; all our network permissions are based on who you are, not what computer you use. And besides, we also felt that users would not be able to report a card if it were stolen. How many of you know the MAC address of your wireless card? - I don't have that information committed to memory."

But the matter was becoming urgent. "Airsnort and WEP crack were still future developments, but we saw academic papers and it was clear that they were vulnerable.

The security department could have decided to cut it loose and make people VPN back into corporate - so it was just an insecure net. That was one option considered. They also considered LEAP or RSA Secure IT or PEAP. Or other things. The trouble was, there was always a trade off between usability, security, and deployment simplicity. "For example, raw 802.11b and a static key is very high on usability. Security level is dreadful. If you go to VPN; security level would be OK; but not terrific. Ease of deployment was OK, but scalability was hard. Integration was low. IPSEC is very secure, but boy, it's a pain to deploy. Each server needs to maintain an association with each client accessing the server; there are resource issues if the server is handling a lot of clients."

So they decided to go for the new IEEE security protocol: 802.1x with transport layer security. It had high security, an excellent user experience; and the only down side, was - it was not the easiest to deploy. "But we reckoned what the heck, we have the best IT staff ... "

The VPN solution was rejected despite the fact that it's very popular. "A lot of people do it because they already have the VPN server out there; they already have users used to VPN. The down side is you need to RAS in and use a RAS login. You don't get good policy. We manage the environment with group policy; updating virus signatures. So the idea of not getting group policy objects (GPOs) on our wireless clients was scary. Then on top of that, the servers have to be able to handle all the traffic; it's like a funnel to the VPN server, and so we were seeing pretty severe bottlenecks; we didn't like it."

"Throwing users to the wolves"

But the final straw was the security flaw. "It scared me from the security standpoint because was like throwing your users to the wolves. The wireless network was untrusted; they'd be out there, logging into it without any protection for their devices, completely vulnerable to people attacking them.

LEAP was rejected because it was judged vulnerable to a dictionary attack - because you log on in clear for authentication, which means the chat packets can be sniffed and cracked offline.

IPSEC was rejected because it's machine-level authentication only. "Our permissions, again, are based on who you are! we really didn't like basing it on whether users were the wrong people, and all we knew was whether they had the right machine or not. And finally, the certificates management system was unwieldy and processor intensive.

The only one thing wrong with 802.1x security was that it was a bit difficult to deploy. "But it scales. It is a simple user experience and it uses an existing public key infrastructure. Once a client is authenticated, it's able to talk on a control port to get DHCP. Once you have a user logged on, we have to go through the authentication process again with user context; if the user fails, we drop him off the network."

The big advantage of using 802.1x using EAP/TLS is that the only machines needing certificiates are client machines. So this truly delivers on the "Ethernet-like" user experience. "When I boot up, machine just works. Because the machine authenticates at the beginning, you get group policy (GPO). There are no network bottlenecks. The Radius server is doing only authentication; it's a very light-weight protocol, so not much traffic."

And it's not vendor proprietary. This sounds odd from a Microsoft employee, but remember, Biccum is an IT geek! not a product manager. That was important; "We envisaged visitors coming to the campus, and they might not be running Microsoft platforms."

Least popular man on campus

But when this secure system first rolled out, it didn't support non-Microsoft clients. In fact, it didn't support that many Microsoft clients; the only client was the XP client. That was OK for Microsoft; it could just tell users if they want to use wireless, have to upgrade, quit complaining. But the problem was the 20,000 users who couldn't upgrade!

"I became the least popular man on the campus with PocketPC users. There were 20,000 of them used every day. They tended to be quite rabid about how this made their day productive. When we cut over to 1x, they were off the network. Just like that. Instantly, they all sent me email saying what a jerk I was!"

Complainers were advised to tell the PocketPC software development team to accelerate the 802.1x pocket PC code. It became one of the fastest development projects in Redmond, as a result.

"So what we're doing: we now have clients for XP and PocketPC; Win2K or .Net Active Directory; and we're using our own flavour of Radius - IAS. We're using EAP - extensible authentication protocol, and PEAP, protected extensible authentication. Client access at the link layer is controlled by the access port. The client goes through IAS server; it authenticates him, tells the access port to open up the control port. We still use WEP keys, but they are dynamic; each client has a WEP key with the AP, and none in common with anything else. Every time they roam, they re-authenticate and pick a new WEP key."

Other problems were mostly in the area of troubleshooting: "That was a problem. There were no good tools; if a user didn't get on, it was quite difficult finding out why they weren't on. There were also huge problems with RF troubleshooting. Again, we found there were no tools. Co-channel interference was awful - there are only three discrete channels that don't overlap in the US, but even then, overlap occurred because we had problems with ad hoc networks. People would tell their notebook it was in ad hoc mode, but use the same SSID as the corporate network; so other people would log on and associate with it. Then we'd get the support call ... "

Radius failover was the next big issue. "This made me nearly the least popular person again. The problem was that if a Radius server would not authenticate a user in a certain time (whatever reason, including bad certificate) then the AP would failover - it would go and talk to another Radius server. But it would never fail back. So as soon as it got onto the second failover, it couldn't have any more failover. We couldn't get them to fail back! - and the problem couldn't be fixed until our AP maker delivered new code."

They have multiple certificate distribution points; so in that sense there's redundancy. But when it comes to publishing a CRL (certificate revocation list) it normally has to be from CA number one. If it wasn't available, it failed; a good thing, but a dreadful thing for the user! "The trick there is to make sure that the CRL you publish is available for a longer period of time than the publication; if you publish daily, make sure it's good for two days; and make sure that you can keep track of CRL publications so that if it isn't published, you know immediately."

Another big problem: Rogue APs! "Oour wireless network is considerably more secure than a wired network; users are authenticated before they are allowed to transmit a single packet. Then an employee buys a home AP (we sell them in employee store!) - and plugs it into the switch. Then it starts issuing DHCP leases. Think of the rogue access point as a 20-ft wide hole in the barbed wire fence around the LAN ... "

Controlling rogues needs to be a multi-pronged procedure. First, users need to know it's prohibited and second, there has to be user-friendly technology control to detect it and get it off. "We have an internal web page, which hosts the security policy. Technology control is a home grown app that scans our wired IP states, looking for MAC addresses associated with APs and also, wireless client hardware. When it finds any, it opens up a help desk ticket, closes it, and then disables the network path. When the user complains, the help desk will pull up the ticket, saying: "Ah yes; you have a rogue AP, and you have to talk to Security before we turn you back on."

Biccum has a few other homilies he can run off. "We mistakenly thought that if we stuck to one brand AP and client then everything would all work together. Wrong! We had more problems with one client interworking with it's own AP, than all others against each other. We had to drop that manufacturer."

They're lying ...

Another misapprehension is how many APs you need. "Scalability is a trap; we scale for 3-5 users per AP. If you really believe one AP maker who say 50 users per access point works, then be warned; they are lying. It's too many. You don't want to go over 20 users per AP or they will be really, really unhappy with the experience. Put lots of APs in, turn down the power."

The WLAN deals with guests at the moment by two different methods. In the Exec Briefing centre - a convention centre, essentially - any person there can open up a notebook and get an address, and they're on the Internet. There is no authentication at all, no encryption. That access gets them on the Internet, and then they can VPN back into their own company.

This isn't ideal from a security standpoint; "They are coming out of space owned by MS. So if a guest did something bad, hacked on Norad or FBI or DOS attack against White House, it would look bad. So I have given people a security requirement that they need to individually authenticate guests to a facility, and note who has what IP address."

Right now, the access point firmware allows only a single decision; authenticated, or not. Soon, there will be a new version of firmware. "That allows us to do gating; we create VLAN's on the fly. We hope to create two VLANs on same physical lan. One would be 1X security with only 1x people passing - and the second would be visitors who passed some other test, who would get put onto the Internet."

Here's how they're thinking of doing that; "You come to our facility, check in, we look at the roster, we show photo ID. That already happens for all visitors. We want to leverage that procedure and ask "Do you have Passport or Hotmail? Would you like Internet access?" and if they say yes, we ask for that, and give Internet access. If you don't, we point you at a kiosk in the lobby and you get one. We would use our own radius servers to proxy the request."