Dominic White

Reading Large Files and Perf

2023-09-19T08:15:00Z

One of the things that has often confused me is how little good advice there is for reading large files efficiently when writing code.

Typically most people use whatever the canonical file read suggestion for their language is, until they need to read large files and it’s too slow. Then they google “efficiently reading large files in ” and are pointed to a buffered reader of some sort, and that’s that.

However, in Halvar’s recent QCon talk he had several slides talking about how most code is written based on the old assumptions of spinning disks. With non-SSD HD’s there’s usually a single read head and you can’t do much in parallel. This requires code to optimise for single reads, minimal seeks, and large redhead of data layed out on disk next to each other. But modern SSDs are much more comfortable with seeks and parallelism.

So I wanted to test it. To do this I wrote five simple rust programs that read data from a large file. To keep it simple, I didn’t do any line reading - just read as much as you can as fast as you can.

The code for each of these is available here.

The strategies

1 - Vanilla

Vanilla is the simplest and based on what you get when you google “reading a file in rust” which points you to [this chapter] in the rust handbook.

It tries to read the whole file, and convert it into a single String in memory.

2 - IO Read

IO read dispenses with the String conversion and does the same as vanilla but with a raw read into a single byte buffer.

Both (1) and (2) will fail if the file you’re trying to read can’t fit into memory.

3 - Block Read

Block read is a modification of (2) to read the file in 8M blocks instead of trying to read the whole file into memory.

The 8M block size is based on some simple tests I did on my machine.

4 - Buffered Read

If you google “efficiently reading large files in rust” you’ll likely hit an article pointing you to BufReader. The most common use case is to read lines. Instead this is a slight modification to do block reads instead and keep it consistent with the other approaches.

5 - Thread Reader

Finally, this is a threaded version of (3) where each thread simultaneously opens its own file handle, seeks to its offset and reads a part of the file.

This used to be a “bad idea” - multiple concurrent seeks, and concurrent reads would be slow on spinning disks.

Measuring Methodology

I’m quite simply measuring total execution time of each version when reading a 5G file. I do this using the fantastic hyperfine tool.

I run each test three times to warm up caches, then I do five measured runs. The tests were run on my 2021 MBP with an M1 chip.

Hyperfine gives the mean of the five runs with standard deviation, as well as a min and max. Finally it gives some stats comparing each run.

Results

Here are the results of the run. As you can see the vanilla approach is horribly slow. Over 12x slower than the best approach. The IO reader is slightly faster, but not much, because it isn’t needing to mess with String allocations/conversions. There’s a significant speedup reading blocks, and the buffered reader can do this for you and is even very slightly faster than doing it manually. However, when we switch to concurrent reads, we get a significant speed up - nearly 3x faster than the buffered reader.

Conclusion

In short, Halvar was right, which isn’t a very controversial statement. However, I was genuinely surprised to see how big a difference it made, and that there’s little to no discussion on the topic. I hope this helps someone somewhere.

Fast NTCracking in Rust

2022-02-16T21:48:00Z

When I got a new MacBook with an M1 Pro chip, I was excited to see the performance benefits. The first thing I did was to fire up hashcat which gave an impressive benchmark speed for NT hashes (mode 1000) of around 9 GH/s, a solid doubling of the benchmark speed of my old Intel MacBook Pro. But, when it came to actually cracking things, the speed dropped off considerably. Instead of figuring out why, I decided to try my hand at writing my own NT hash cracker, because I’m kind of addicted to writing single use tooling in rust then taking time to perf optimise it.

If you’re only interested in the results, here it is, under a variety of scenarios against hashcat, and you’ll see it ranges from waaay faster to much faster than hashcat. You can get the code at https://github.com/sensepost/ntcrack/.

The Full Benchmarks

Click for the bigger image. (If you’re wondering why there’s no hashcat run for the full rockyou hashes as input, it’s because it takes about 10m with hashcat.)

I optimised for total running time, and got some good gains against hashcat by a faster startup. But even if we look at raw hashes generated per second, when cracking the 143 test hashes against the 1G sized insidepro wordlist, hashcat gets 26 205 kH/s while ntcrack gets 40 207 kH/s.

In this post, I’ll go through what I did that worked, and didn’t work to get this result.

But Wait You Didn’t …

The first response to this sort of work and comparison always seems to be to suggest shifting the goalposts to a different comparison, oft times driven by a belief this is some sort of fight, so let me get all the caveats out of the way.

Hashcat is amazing, not just the tool, the project and the community around it is too. In terms of total functionality, hashcat thrashes my little project into a little whimpering mess. They support a bajillion different hashes, and a bajillion different ways to crack them. For NT hashes and other fast hashes, they have a ton of rules and other manipulations that can be used. In fact, the second you throw a simple mutation to brute force an extra ASCII byte (?a) on the end of each word (-a6) in the wordlist, hashcat hits 900+ MH/s, which also thrashes the hashes per second, *and* total running time of ntcrack.

So this isn’t a “hashcat bad, me good” post.

What is it

ntcrack is a simple rust program, weighing in at around 150 lines of code. It runs multi threaded on CPU only, no GPU. It reads a list of input hashes to crack from a file, and a wordlist to check the hashes against from stdin. So you run it like this:

./ntcrack input.hashes < wordlist

It’s rough and ready right now. No error handling. Expects a wordlist with unix line breaks, you can’t specify the number of threads, and doesn’t even let you pipe the wordlist. I’ll get to that … maybe … pull requests welcome.

The Actual Post – What Did I Learn

I’ve commented the code so you can see what I did to speed things up, but it doesn’t really give you what the alternatives are, and which I tried. In the next section I’m going to go through each of those, in roughly descending order of impact i.e. I’ll start with what made stuff go the fastest not the order I actually built it.

Multi-threading

Multi-threading is an obvious way to speed something up. Especially for large scale brute forcing like this, we should just be able to parallelise the tasks and get an instant speed up. But it doesn’t always work like that.

The main problem with password cracking is first to write an optimised hash generator, but second to feed it from the wordlist fast enough. So if you head to threading too soon, you’ll either end up with an inefficient hash generator that threading would hide a little, or you end up constrained waiting for the data to be read from the file.

As I spent a lot of time working on getting the hashing fast first, by the time I got to threading I didn’t have that problem, but I did have the other … simple threading made stuff *slower* because the threads sat around waiting for things to be read and fed to them.

Threading in rust is hard, if you follow the “Rust by Example” guide (https://doc.rust-lang.org/rust-by-example/std_misc/channels.html) you quickly run into a protracted battle with the borrow checker. Steve Klabnik has a perfect write up of why here (https://news.ycombinator.com/item?id=24992747). Because rust’s borrow checker stops you from being a bonehead, it also makes it very hard to share data between threads (I’m not even talking write here). I tried his suggestion in the end, scoped threads, and it made things slower due to the read problem. I then tried the Rayon crate (https://docs.rs/rayon/) and it’s par_iter() which made things less slow than scoped threads but still slower than no threads. So I decided to build my own raw threading approach.

I moved all the logic for generating the hash and comparing it to a thread, and kept as little in the read_line_from_file loop as possible, to maximise read speed (i.e. read and send to the thread, then read the next line). I also used a multiple receiver single sender channel from the crossbeam crate (https://docs.rs/crossbeam/latest/crossbeam/channel/index.html) to implement a sort of queue that the threads could pick work from as fast as I could read it. I used crossbeam because the standard channel (https://doc.rust-lang.org/std/sync/mpsc/) is a multi-producer, single-consumer, which is the opposite of what I needed.

The big stumbling block was that what was read from the file went out of scope when the program ends, and the threads don’t. Rust’s compiler isn’t smart enough to spot that we wait for all the threads to exit at the end, so you have to instead stick each line in a new object for each thread, which means an expensive allocation. So instead I buffered a bunch of words into a single array (Vec) and sent a buffer to a thread to work through to both reduce the alloc()s as well as the amount of messages that needed to be sent and pickup over the channel.

Lastly, I wanted to be able to exit early if all the hashes supplied had been cracked; don’t waste time reading the rest of the file and generating hashes for them. This is why the first item in the test screenshot at the top of this post is so fast. But, each thread doesn’t know what the other threads have cracked, and introducing a shared, writable list was going to cause more blocking than it’s worth. So instead I send the number of hashes a thread has cracked back, and the main program checks if the total matches. That required some caching, as sending a message for every cracked hash introduced a significant slowdown on large input hash lists. So instead I buffer and send them through in chunks.

Reading a file … fast

You would have seen multiple references to file read speed above. That’s because with a fast hash like an NT hash, you’re likely to get large wordlists thrown at the input hashes (and less likely to get large input hash lists), so the thing that needs to be optimised the most is the file read speed.

For this I tried numerous options. The first was a vanilla lines() iterator which is what the “Rust by Example” documentation suggests (https://doc.rust-lang.org/stable/rust-by-example/std_misc/file/read_lines.html). This is very slow, primarily because it allocates a new String for each line, or so my bad reading of perf data tells me.

I then tried a few different versions of implementing my own line reader, all of which worked out either slower or only marginally faster. Until I was pointed to ripline (https://twitter.com/killchain/status/1482770333958553603) and (https://github.com/sstadick/ripline). Ripline takes its implementation from ripgrep, and has a few different ways of reading from a file. The one I was most interested in was it’s use of the mmap() (https://www.man7.org/linux/man-pages/man2/mmap.2.html) call, which in their benchmarking was the fastest way to read from a file and still get it line by line.

I of course tried several variations, including my own mmap reader, using mmap with other line readers etc. but ripline gave me the fastest iterator over a mmap’ed file. I also noticed that it was marginally faster getting the file from stdin rather than a filename. But ripline + mmap2 worked the best. The only downside is that it breaks DTrace profiling (https://gist.github.com/singe/70010e2f48a7ad8fdcbab177eeb9b18a).

Hash lookups

You’d think the most expensive part of cracking a single hash would be generating the candidate hash, but it’s not, it’s finding if the candidate hash you generated is in the list of input hashes you provided. If you only provide one input hash, then it isn’t a problem, but if you provide thousands or hundreds of thousands, you have to look up every candidate hash generated against this list. To take it a bit further, if you have 10 input hashes, the average linear search through that list will find a hash in 5 attempts. So if you have a wordlist with 100 hashes, now you’re doing 100*5 =500 lookups. But, given that the majority of hashes you generate *won’t* be in your input list, the performance is actually much worse.

My first attempt was to use some sort of balanced tree. Rust has some built in BTree functionality (https://doc.rust-lang.org/std/collections/struct.BTreeMap.html) and I used that (BTreeSet). This gave a bit of a speedup for larger input hash lists. However, it wasn’t what I hoped. I experimented with removing items from the set to speed up future lookups and allow an early exit if we cracked everything, but it still wasn’t what I hoped.

Then a friend pointed out I could just use a hash table (https://doc.rust-lang.org/std/collections/hash_map/struct.HashMap.html) because it gives a constant O(1) cost for each lookup, rather than a BTree’s O(log(n)). That worked well and gave a bit of a speed up.

But what really made the impact, was to switch the HashMap’s hasher function to the NoHashHasher (https://docs.rs/nohash-hasher/0.2.0/nohash_hasher/index.html), a hasher specifically designed for already hashed data, which a list of NT hashes is! With that in place, I got a great combined speed boost when looking up whether a hash generated from a word in the wordlist matched any of the input hashes provided.

Finally, I did one more thing. If only a single input hash is provided, it’s faster to check if a candidate hash starts with the same bytes as the hash we’re looking for, rather than comparing all 16 bytes of the two hashes. If they don’t match we can move on and save the slightly more expensive HashMap lookup. And if they do, it’s a very small extra price to pay. But this has the added advantage, that for small input hash lists, we can reduce the total HashMap lookups we do to the number of unique first bytes. Given there are a total of 256 possible byte values, for input hash lists much larger than that it doesn’t start to make sense. The small number of all possible single bytes means we can store the byte in an array of 256 items, and do a very fast lookup by using the byte as the index. e.g. if the hash starts with ‘AA’ then our boolean array[170] (170 is the decimal of the hex AA) can be set to true.

The Actual Hashing

Finally, we want to make sure the actual hash computation is efficient. A NT hash has two operations, encoding the text in UTF16LE then MD4 hashing the result. The latter part turned out to be pretty easy. The Rust Crypto team (https://github.com/RustCrypto) has done a great job in building performant algorithms in rust and MD4 is no exception (https://docs.rs/md4/latest/md4/). One small tweak was to do the digest in one call, rather than an update and then finalise call as per their docs.

What took longer to get right, and I didn’t see coming, was the UTF16 encoding. At its most simple, UTF16 will just widen an ASCII character to two bytes instead of one by adding a NULL byte. The “LE” stands for little endian, which will place the NULL before the ASCII byte. So an ASCII “A” is 0x41 and a UTF16LE encoded “A” is 0x0041. What hashcat does (did? https://github.com/hashcat/hashcat/commit/045701683430ce0c0a0c1545a637edf7b659a8f3) for speed and to avoid complexity in the GPU code, is to just stuff that NULL byte in, and assume it’s always an ASCII charset in use. I initially tried the same but ran into two problems. The first was that it required alloc()ing a whole new Vec for each candidate we encode, which becomes expensive. This was resolved by doing it per char in a map instead and reusing the resulting Vec. The more pernicious problem is that most wordlists don’t only have ASCII characters and doing proper encoding matters if you deal with non-English hashes.

Rust forces you to be explicit about Strings by enforcing a UTF8 requirement for a String. That’s fine if your input file is guaranteed to be UTF8 encoded, but wordlists are often a mixed bag, and might not be universally encoded. So it makes more sense to read bytes from the file, and not assume you’re reading UTF8. That means that to do “proper” UTF16 encoding you need to first convert the raw bytes to UTF8. After that Rust has native UTF16 encoding (https://doc.rust-lang.org/std/primitive.char.html#method.encode_utf16) which can be converted to little endian bytes natively too (https://doc.rust-lang.org/std/primitive.u16.html#method.to_le_bytes). This works, and is ok speed wise. But, in the end going unsafe() and using align_to() worked much faster. At least half the speed up I suspect is from using unsafe() and dropping some of the checks that brings.

Writing to stdout

This almost always catches me. By default the println! macro is *slow* for writing large amounts of data to stdout. It allocs a String, calls a formatter and flushes the stream (I think). Doing a raw write to the file handle with bytes is much faster. Add in something I learned a few years ago when discussing perf optimisations with Atom (https://twitter.com/hashcat/status/1137335572970790912), use an output buffer. The combination of those two makes a massive speed difference over a basic println!. Then I went down the rabbit hole and squeezed a few more milli seconds by using the write! macro instead of format! or similar to get a printable hex encoding of the resulting hash, doing it per byte instead of across multiple at once, and using extend_from_slice() to add to the buffer rather than push() or append().

The End

That’s it for now. I hope this is interesting to someone else who enjoys going deep on performance issues, or who just needs a fast basic NT hash cracker. Next up, I want to see if I can add the GPUs to the mix … left to my own devices … this is what you get.

After the End? Name clash

ntcrack is already the name of a password cracker circa 1997 (https://seclists.org/bugtraq/1997/Mar/103) that cracks LM hashes (which were used by Windows NT, hence the name). Just for kicks I got it compiled on my machine to test (you can get libdes from https://ftp.nluug.nl/security/coast/libs/libdes/libdes-4.01.tar.gz).

The release mail for v2 states:

“We ran a user list of length 1006 with a word list of around 860,000 in 5 minutes 30 seconds on a pentium 133 with 32MB RAM running Windows NT Server. Roughly 2,606,000 cracks per second..”

So let’s run it on a modern M1 Pro and see how it performs …

That’s a cracking speed of … 1 977 446 H/s which is *slower* than jwilkins’ speed from 25 years ago. But the elapsed time is much faster, 5.5 mins on the pentium for an 860k wordlist, compared with 7.2 seconds for a 14 million wordlist.

Anyway, I hope he doesn’t mind me using the same name :)

Understanding PEAP In-Depth

2019-04-18T06:58:00Z

tl;dr
We reported a long standing PEAP bug in all Apple devices that would allow an attacker to force any Apple device (iOS, macOS or tvOS) to associate with a malicious access point, even if the authentication server (RADIUS) couldn’t prove knowledge of the password. To understand it fully, we go on a deep dive into EAP and MSCHAPv2.

PEAP at a High Level
MSCHAPv2
The Apple Vulnerability
- Apple’s Fix
- Disclosure Timeline & Details
Original Vulnerability Report

While prepping for our Defcon talk last year, Michael kept pushing me to implement hostapd-wpe‘s EAP success attack. In this attack, the authentication server will accept any username, then skip the step where it proves knowledge of the password back to the station (because it doesn’t know the password), and instead sends an EAP-success message back to the station. I refused for a long time, because I thought it was a dumb attack that would never work. This is because in MSCHAPv2 the authentication server also proves knowledge of the password back to the station, and if it couldn’t, I assumed the station would just refuse to continue, after all, that’s the whole point.

Eventually, I caved and tested hostapd-wpe’s “always send EAP success” attack against a few devices, and bizarrely, my Apple devices (iPads, iPhones, Macbooks) all successfully connected to the malicious access point. Huh?

Since WPE is written by Brad Antoniewicz, I asked him if he was aware of the bug, to which he replied:

Interesting! It’s been awhile but it sounds like the client is failing to validate server’s proof and giving priority to EAP Success. Nice! I def didn’t report this!
— Brad Antoniewicz (@brad_anton) August 2, 2018

So I wrote up a bug report and sent it off to Apple. It was a weird one, because Brad did the technical work that lead to discovery of the vulnerability, plus it had been a feature in hostapd-wpe for a few years already. The disclosure timeline and original report are at the end of this post.

To understand the vulnerability, we need to know how MSCHAPv2 in PEAP works and that requires a deep dive into some concepts. I’m writing them up and releasing the toy code to validate what I’m saying, because there are no good recent writeups of how this works, and how to see it for yourself.

PEAP at a High Level

The first “P” in PEAP stands for “Protected” and practically that means the whole exchange is wrapped in TLS. This part is called the outer tunnel. Within that tunnel, a MSCHAPv2 challenge response happens where the station (or the client, or the peer) and the authentication server (or RADIUS or AAA) prove knowledge of an identified user’s password to each other. This is done via the AP (because most often it isn’t also the RADIUS authentication server). If you’re familiar with wifi hacking, this is the part where if you person-in-the-middle it, you get the challenge:response hash to send to JtR/hashcat/asleap.

After this, the normal WPA/2 4-way handshake occurs. But, instead of using a typical pre-shared key, it uses a key (the pairwise master key or PMK) negotiated during the outer TLS session. This means, while you can capture these handshakes, you won’t be able to crack them.

MSCHAPv2

MSCHAPv2 is a challenge response protocol. The station and authenticator first identify themselves (to make sure that user is authorised). Then both share a random challenge (peer and authenticator challenge) which is combined with things like the username and password hash to prove to each other that they both know the password, without ever sending the password across the wire.

There are several RFCs that cover EAP, PEAP, CHAP, MSCHAPv1, MSCHAPv2, MPEE and MPEE key derivation. These are pretty frustrating to read as they refer to eachother, and no one document puts it all together. I did, in the code at https://github.com/sensepost/understanding-eap.

Decrypting the Inner Tunnel

This is typically where packet captures would come in. However, the whole MSCHAPv2 exchange is encrypted by TLS. Years ago, Michal wrote a perl script to decrypt this inner session and display it in Wireshark as well as documenting what was happening in the inner tunnel. However, modern TLS isn’t so easily decrypted thanks to perfect forward secrecy, and I wanted to see how things changed when we made the mana authenticator act differently. So instead, I told wpa_supplicant and hostapd to use the openssl eNULL cipher. This provides no encryption, only authentication of the data. Which means we can see the data in the clear. This, combined with the hexdump’s provided by hostapd-mana run with debugging (-d), let me see what was happening in the inner tunnel.

You can enable eNULL in wpa_supplicant and hostapd by adding the following line to the respective config (use quotes for wpa_supplicant’s config, no quotes for hostapd’s):

openssl_ciphers="eNULL"

The Inner MSCHAPv2 Exchange

A packet capture of a successful association looks like this:

You can see the following happening:

Frames 5-6: The authenticator asking the stations for it’s outer identity.
Frame 7: Starting PEAP
Frames 8-13: TLS initiation
Frames 15-21: MSCHAPv2 in the inner tunnel
Frames 22-23: PEAP and EAP end
Frames 24-27: The WPA2 4-way handshake

As you can see the MSCHAPv2 exchange happens over seven frames. These are listed here, and the specific bytes described after:

Authenticator -> Station – Initiation: let’s do this
Station -> Authentication – Username: sure, this is my id
Authenticator -> Station – Authenticator Challenge: nice, here’s my challenge and hostname
Station -> Authenticator – Peer Challenge & NTResponse: cool, cool, here’s my challenge and I heard you like DES so I encrypted our challenges with the password hash as a key.
Authenticator -> Station – Authenticator Response: I see your response, and raise you a sha1 hash of all of that with some static magic bytes from the RFC, oh and I double hashed the password for reasons
Station -> Authenticator – Success
Authenticator -> Station – Success: Me too

If you look at the encrypted data within the first frame, wireshark helpfully “decrypts” it for you:

Byte-Level Description of the MSCHAPv2 Exchange

MSCHAPv2 Frame 1: Authenticator -> Station – Initiation.
This is a EAP/CHAP format, which is made up of the following. All bytes are in hex except where they conform to ASCII strings.

Code: 01 – This is a challenge
Identifier: 61 00 – This is the challenge’s ID
Value-Size: 05 – aka length
Value: 01

MSCHAPV2 Frame 2: Station -> Authentication – Username

Code: 02 – This is a response
Identifier: 61 00 – Response to Frame 1’s ID
Value-Size: 12
Value: 01
Name: Oliver.Parker

MSCHAPV2 Frame 3: Authenticator -> Station – Authenticator Challenge

Code: 01 – Challenge
Identifier: 62 00 – Response to previous new ID
Value-Size: 21
Value: 1a – Seems to indicate start of a new CHAP frame
- Code: 01
- Identifier: 62 00
- Value-Size: 1c
  - Length: 10 – Authenticator Challenge length
  - AuthenticatorChallenge: f5 b8 ad ee e9 ff 08 15 dd 83 e8 2d 89 6e eb 2a
  - Authenticator Name: hostapd

MSCHAPV2 Frame 4: Station -> Authenticator – Peer Challenge & NTResponse

Code: 02
Identifier: 62 00
Value-Size: 48
Value: 1a
- Code: 02
- Identifier: 62 00
- Value-Size: 43 – 67 in decimal
  - Length: 31
  - PeerChallenge: e3 32 bf 8e c5 37 e5 72 1d 0d 9a 0e e4 40 46 d6
  - Padding?: 00 00 00 00 00 00 00 00
  - NTResponse: 6c da db 80 dd 53 10 b8 05 f2 a0 da 9b b4 5e ad 51 ee 65 34 4c 95 e6 00
  - Padding?; 00
  - Name: Oliver.Parker

MSCHAPV2 Frame 5: Authenticator -> Station – Authenticator Response

Code: 01
Identifier: 63 00 – New Challenge ID
Value-Size: 38
Value: 1a
- Code: 03 – Success?
- Identifier: 62 00
- Value-Size: 33 – 51 in decimal
- AuthenticatorResponse: S=3EC7654786779579D27FCB870C93670D66E5AFB7 M=OK

MSCHAPV2 Frame 6: Station -> Authenticator – Success

Code: 02
Identifier: 63 00
Value-Size: 06
Value: 1a
- Code: 03 – Success

MSCHAPV2 Frame 7: Authenticator -> Station – Success

Code: 03 – Success
Identifier: 64 00
Value-Size: 04

MSCHAPv2 Calculations

We can check the above by implementing the code described in the RFC 2759 Section 8 which you can grab from our repo at https://github.com/sensepost/understanding-eap.

The Station/Client Side
Both the authenticator and the station send each other some random data (the challenges’s). The authenticator sends its challenge first (the Authenticator Challenge), so the client gets to kick off the computations. Using the values from above and the code I just posted, it looks like this from the python3 interpreter:

from eap import MSCHAPV2
UserName = b'Oliver.Parker'
Password='123456Seven'
AuthenticatorChallenge = b''.fromhex('f5 b8 ad ee e9 ff 08 15 dd 83 e8 2d 89 6e eb 2a')
PeerChallenge = b''.fromhex('e3 32 bf 8e c5 37 e5 72 1d 0d 9a 0e e4 40 46 d6')
chap = MSCHAPV2(UserName, Password, AuthenticatorChallenge, PeerChallenge)
PasswordHash = chap.NtPasswordHash(Password)
Challenge = chap.ChallengeHash(PeerChallenge, AuthenticatorChallenge, UserName)
NTResponse = chap.ChallengeResponse(Challenge, PasswordHash)
print ('Challenge : '+Challenge.hex())
print ('NTResponse: '+NTResponse.hex())

Challenge : ada74b1fca661d15
NTResponse: 6cdadb80dd5310b805f2a0da9bb45ead51ee65344c95e600

The station then sends the NTResponse and its peer challenge to the authenticator. You can see the calculated NTResponse matches that from frame 4 above.

A WPE interlude

That challenge and response should look familiar. It’s basically the same as a NetNTLMv1 hash. However, in NetNTLMv1 the challenge is just sent over the network, in MSCHAPv2 the challenge is computed from the two challenges and the username. This is also what freeradius-wpe, hostapd-wpe and hostapd-mana give you when they PitM (Person in the Middle) a PEAP session and capture a challenge response.

We can test this is correct using asleap/hashcat/JtR, I’ll use asleap:

> asleap -C ad:a7:4b:1f:ca:66:1d:15 -R 6c:da:db:80:dd:53:10:b8:05:f2:a0:da:9b:b4:5e:ad:51:ee:65:34:4c:95:e6:00 -W passwords
 asleap 2.2 - actively recover LEAP/PPTP passwords. jwright@hasborg.com
 Using wordlist mode with "passwords".
         hash bytes:        2b6f
         NT hash:           79337ad5724e777b41e8fc81ad232b6f
         password:          123456Seven

And indeed, if we check the value of PasswordHash in our python, it will match asleap’s “NT hash”.

The Authenticator/RADIUS Side
At this point, the authenticator now has the stations challenge (the peer challenge) and can do similar calculations. They look like this:

from eap import MSCHAPV2
UserName = b'Oliver.Parker'
Password='123456Seven'
AuthenticatorChallenge = b''.fromhex('f5 b8 ad ee e9 ff 08 15 dd 83 e8 2d 89 6e eb 2a')
PeerChallenge = b''.fromhex('e3 32 bf 8e c5 37 e5 72 1d 0d 9a 0e e4 40 46 d6')
chap = MSCHAPV2(UserName, Password, AuthenticatorChallenge, PeerChallenge)
NTResponse = b''.fromhex('6c da db 80 dd 53 10 b8 05 f2 a0 da 9b b4 5e ad 51 ee 65 34 4c 95 e6 00')
PasswordHash = chap.NtPasswordHash(Password)
AuthenticatorResponse = chap.GenerateAuthenticatorResponse(Password, NTResponse, PeerChallenge, AuthenticatorChallenge, UserName)
print('Authenticator Response: ' + AuthenticatorResponse)

Authenticator Response: S=3EC7654786779579D27FCB870C93670D66E5AFB7

The authenticator then sends the authenticator response to the stations, along with a success or failure code. You can see that the calculated response matches that from frame 5 above.

MSCHAPv2 Failure Behaviour

In the case of a normal access point and authenticator, the station would send its username, and if the authenticator has a record for that user, authentication will continue. That failure condition isn’t particularly interesting.

However, if you set up a malicious authenticator, that will accept any username, you can capture the two challenges as well as the NTResponse from the station, which you can crack as detailed above. This was what Joshua Wright and Brad Antoniewicz published in 2008 with their initial freeradius-wpe work.

Interestingly however, the exchange ends, because the authenticator ended it, not the station. It can’t validate the NTResponse from the station (because it doesn’t have the right password). So the authenticator can’t compute an Authenticator Response, and instead sends a failure response in frame 5 along the lines of:

E=691 R=0 C=00000000000000000000000000000000 V=3 M=FAILED

WPE’s EAP-Success
In the case of WPE’s -s switch, to implement the “always return EAP-Success” attack, the authenticator skips sending the authenticator response, and jumps ahead to a success frame, much like frame 7 above.

If a normal station/client/supplicant sees this, it will end the exchange, because it was expecting the authenticator response. In wpa_supplicant’s case, it will hard stop and send a deauthentication frame at the AP.

The Apple Vulnerability

In the case of unpatched Apple devices, the authenticator would skip sending the authenticator response and just send a MSCHAPv2 success frame as per frame 7 above. A vulnerable Apple device happily jumps ahead in its state machine, accepts that, and exits out of the inner MSCHAPv2 tunnel. It then sends a PEAP response, to which hostapd-wpe sends the EAP-Success.

Earlier, when introducing PEAP, we said that by default (i.e,. if there’s no cryptobinding), the pairwise master key used for starting the WPA2 4-way handshake is taken from the outer TLS session. The authenticator sends this to the AP at this point, and the AP and Apple device happily complete the 4-way handshake and the device connects. Here’s an example:

If you’d like to read the original vulnerability report, it’s at the bottom of this post.

The Risk
This means that if an Apple device connects to a rogue AP that doesn’t know the user’s password, not only will it get the NetNTLMv1 challenge response, the device will also connect to the network. Because EAP’ed networks are typically corporate networks, the Apple device will think it’s connected to that (sans user interaction), at which point Responder style attacks are also possible.

That said, this isn’t exactly CVSS 10 territory, and we rated the initial vulnerability as a CVSS3 5.5

However, the vulnerability seemed to affect multiple iOS and macOS versions, as well as multiple Apple devices such as Macbooks, iPhones and iPads. Apple’s advisory confirm it also affected Apple TVs.

Apple’s Fix

Apple released three updates for macOS, iOS and tvOS to fix this, and assigned it CVE-2019-6203. It took them approximately 8 months from the time of reporting to the fix. We don’t always appreciate the engineering effort that goes into fixing the vulns we fling at these teams, especially one that affects so many devices. A big thanks to anyone involved in getting it fixed.

That said, the way Apple fixed this confuses me to no end. Devices that have been patched exhibit the exact same behaviour at a PEAP, MSCHAPv2 and WPA2 level i.e. the device still connects to the network, and in some cases will even request DHCP. Here’s an example:

Instead, Apple made the devices disconnect from the network after connecting. The device displays a “cannot connect” error, and a log entry shows up on the device saying:

This is a little bit like a security guard letting someone in the building, then chasing them out once they’re inside. While it has the same end effect, I’d be a little worried about what could be exposed during that time. That said, different chips may be doing different things, and maybe this is a temporary fix until it can get fixed in firmware. I can only imagine it’s an engineering nightmare and wish the people dealing with it luck.

However, while testing the new fix, I did notice one outlier, when the device connected but derived a different PMK, evidenced by the MIC in the second message of the handshake. (That’s what the WPA code in the repo is for.) I haven’t been able to get it to repeat, but it should be impossible since the PMK is taken from the outer TLS session and cryptobinding wasn’t enabled. I also haven’t tested extensively across different devices. So there may be updates to my understanding of this fix later.

Disclosure Timeline & Details

2 Aug 2018 – Vulnerability report submitted
6 Aug 2018 – Akila @ Apple confirm receipt of 696428427
23 Oct 2018 – I asked for an update on progress
24 Oct 2018 – Akila said still investigating
4 Feb 2019 – I asked for progress update
6 Feb 2019 – Akila confirmed they will be addressing this in a future update.
6 Feb 2019 – I asked for an ETA
13 Feb 2019 – Akila said it won’t be getting a CVE
14 Feb 2019 – I asked why no CVE
20 Feb 2019 – Akila confirmed it will get a CVE after internal discussion
29 Mar 2019 – I saw the issue had been patched, but not disclosed in security notes.
2 April 2019 – Akila said will update security notes.
15 April 2019 – March 25 macOS, iOS and tvOS security notes updated. CVE-2019-6203 assigned.

I’d also like to thank the anonymous Apple employee who spoke to me off the record about progress.

While it’s lovely to see my name credited to this, Brad Antoniewicz deserves most of the credit as he wrote the initial exploit, I just spotted the specifics and reported it.

Original Vulnerability Report

Overview

iOS and macOS will connect to a malicious wifi access point using PEAP/MSCHAPv2 if an EAP-Success message is sent with an invalid authenticator MSCHAPv2 response.

Vulnerable Version

Only a few versions were tested, these were:

iOS 11.4.1 (iPhone)
iOS 9.3.5 (iPad)
macOS 10.13.6 (MBP Pro 2017)

Detail

PEAP establishes an outer TLS tunnel, and typically MSCHAPv2 is used within the tunnel to authenticate a supplicant (client iOS device) to an authenticator (backend RADIUS server). With MSCHAPv2 a challenge is sent to the supplicant, the supplicant combines this challenge and their password to send a nt-response. The authenticator generates the same expected nt-response based on its knowledge of the password, and compares them. If they match, an EAP-Success frame is sent to allow the supplicant to authenticate. However, this EAP-Success frame is sent with a 42-byte message authenticator based on the authenticator’s knowledge of the password (aka authenticator response). The supplicant should validate this message authenticator.

iOS and macOS do not. This makes it possible to stand up a fake access point, that will accept any username and password, and merely send an EAP-Success back. iOS/macOS devices will then connect.

wpa_supplicant on Linux and Android, and Windows 8/10 have been tested and are not vulnerable. As they will validate the message authenticator sent from the authenticator and refuse to connect.

Impact

CVSS3 5.5
https://www.first.org/cvss/calculator/3.0#CVSS:3.0/AV:A/AC:L/PR:N/UI:R/S:U/C:L/I:L/A:L

Devices could end up connected to networks the user believes are trusted. This could allow additional MitM attacks against the device or applications running on it.

Devices connecting to PEAP networks should validate the certificate sent by the authenticator, but user’s aren’t good at validating certificates. However, iOS devices won’t automatically connect to the network if it has a different certificate, meaning users will need to manually select the network and choose to trust the new certificate. Although, cloning all aspects of the certificate with tools such as https://github.com/sensepost/apostille will make it hard for a user to differentiate a fake one from the original.

How to Reproduce

Install hostapd-wpe https://github.com/OpenSecurityResearch/hostapd-wpe/blob/master/hostapd-wpe.patch
This is most simply done in Kali with “apt-get install hostapd-wpe” and the following assumes that approach.

Run it with the -e switch to enable “EAP Success”
https://github.com/OpenSecurityResearch/hostapd-wpe/blob/master/README#L135

On an iOS device, under Wifi, connect to the “hostapd-wpe” network. Choose to trust the certificate. Any credentials can be used.

The device will connect. Running dnsmasq to hand out DHCP will show the device gets an IP.

Attempting the same client connection with wpa_supplicant using the following sample configuration will not work:

network={
ssid=”hostapd-wpe”
key_mgmt=WPA-EAP
eap=PEAP
phase2=”auth=MSCHAPV2″
identity=”test”
password=”password”
ca_cert=”/etc/hostapd-wpe/certs/ca.pem”
}

You will see the supplicant will reject the final message authenticator and disconnect.

Recommendation

Validate the message authenticator sent in the final EAP-Success message, and do not allow iOS/macOS device to connect to rogue access points that cannot prove knowledge of the user’s password.

An example of wpa_supplicant performing this validation can be found at:
https://w1.fi/cgit/hostap/tree/src/eap_peer/mschapv2.c#n112

Credit

Credit for the functionality I used goes to Brad Antoniewicz (@brad_anton) the author of hostapd-wpe. Although, he was not aware of the iOS/macOS specifics.

Originally published at SensePost's Blog.

Introduction to WebAssembly

2018-11-14T08:39:00Z

I’ve started seeing WebAssemly (WASM) stuff popping up in a few places, most notably CloudFlare’s recent anti-container isolated v8 workload stuff and I wanted to understand it a little better, but from a hacker's perspective.

Essentially, WebAssembly is a way to compile stuff to a browser-native binary format .wasm, which you can then load with JavaScript and interact with.

Simplest C

Since this is binary, I wanted to start with a C program. Since it’s C, to avoid includes or C<->JS string handling, I’m just going to return 42 like other tutorials start with :)

int main() {
  return 42;
}

github

If we compile and run it as usual:

> gcc -o 42 -O1 42.c
> ./42
> echo $?
42

If we disassemble 42, we get:

push rbp
mov  rbp, rsp
mov  eax, 0x2a
pop  rbp
ret

Now as WASM

Right, now let’s see what it looks like as WASM. The easiest way to get started is to use an online fiddle tool such as:
https://mbebenita.github.io/WasmExplorer/
or
https://wasdk.github.io/WasmFiddle/?q1rr6

There is a human readable intermedia form wasm can be represented as (a .wat). For our 42 program this looks like:

(module
 (table 0 anyfunc)
 (memory $0 1)
 (export "memory" (memory $0))
 (export "main" (func $main))
 (func $main (; 0 ;) (result i32)
  (i32.const 42)
 )
)

If we look at WasmExplorer, it also shows the asm of the resulting binary .wasm:

 sub rsp, 8      ; 0x000000 48 83 ec 08
 mov eax, 0x2a  ; 0x000004 b8 2a 00 00 00
 nop            ; 0x000009 66 90
 add rsp, 8     ; 0x00000b 48 83 c4 08
 ret

I’ve no idea why that nop is in there.

Compiling and Running it Yourself

Online tools are nice, but what if we wanted to compile and host it oursleves?

First you need emscripten. Hopefully your OS has a nice package. On macOS the homebrew version broke badly, so I followed the manual installation instructions which were super easy.

Once you’ve got it installed, you can compile a “hello world” to wasm with:

emcc hello.c -o hello.html -s WASM=1

This will generate three files, the .wasm binary, a .js loader, and a .html emscripten front-end. Put them up on a webserver of your choice and access the .html, and you’ll see ‘hello world’ in the console. Alternativley, you can have emscripten host a webserver and run it for you with:

emrun --browser firefox --port 8080 .

Or try it at https://sensepost.github.io/wasm-demos/emscripten/hello.html

Going manual

It’s nice that emscripten automates a bunch of stuff for us, like the JS, but I wanted to see what the simplest calls are. So let’s make our own. Mozilla documents this here.

The .wasm emcc compiles is rather large, so I used the .wasm from the fiddler above (click the download icon next to “Wasm”).

The simplest loader for our 42 program that I can come up with is this:

github, demo

The buffer is simply a decimal representation of the .wasm file’s bytes. WebFiddle can do it for you if you change from “Text Format” to “Code Buffer” in the dropdown. You can also generate it with this horrible one liner:

out="";for x in $(xxd -ps -c1 42.wasm); do out="$out,$(( 16#$x ))"; done; echo $out|sed "s/^,\(.*\)$/var wasmCode = new Uint8Array([\1]);/"

or expanded to a script:

#!/bin/sh
# Usage: ./wasm2cb.sh .wasm
out=""
for x in $(xxd -ps -c1 $1)
  do out="$out,$(( 16#$x ))"
done
echo $out|sed "s/^,\(.*\)$/var wasmCode = new Uint8Array([\1]);/"

github

Simple IO & Function Calling

Just running binaries and logging to the console isn’t very interesting. The good news is that passing parameters in and out is very simple.

Here’s the binary code I’m going to use, note that it doesn’t *need* a main():

int foo(int x) {
  return x+1;
}

github

Throwing the resulting code buffer into some HTML looks like:

github, demo

And voila, we can now pass input to our binary and get a response.

Including Remote .wasm’s

You don’t need to use a code buffer each time, browsers provide WebAssembly.instantiateStreaming() to do it on the fly for you. Here’s the calc() function from above rewritten to call an external .wasm file with fancy Promise style code I don’t really grok:

function calc(num) {
  WebAssembly.instantiateStreaming(fetch('io-simple.wasm')).then(obj => 
    obj.instance.exports.foo(num)
  ).then(res => 
    document.getElementById('out').innerHTML = res
  );
}

github

Although this doesn’t work on Safari.

Calling JS Functions

You can also call JavaScript functions from inside your binary! You do that with imports. For example, given the following C:

int foo(int x) {
  bar(x);
  return x+1;
}

github

You can define a function bar() in the JavaScript and import it to the WebAssembly like this (building on from the calc() example earlier):

function calc(num) {
  var importObj = {
    env: { bar: arg => console.log('Got it: '+arg) }
  };
  WebAssembly.instantiateStreaming(fetch('io-adv.wasm'),importObj).then(obj => 
    obj.instance.exports.foo(num)
  ).then(res => 
    document.getElementById('out').innerHTML = res
  );
}

github, demo

The importObj dictionary’s “env” and “bar” entries were from the resulting .wat, which included the line:

(import "env" "bar" (func (;0;) (type 1)))

So I knew how to build the import.

Disassembling

We’re hackers, and we’re probably going to need to reverse this at some point. This article from the Flare-On challenge pointed me to the WebAssembly Binary Toolkit (wabt pronounced wabbit). It includes the wasm-objdump and wasm2wat tools. wasm2wat will convert the binary to the human readable .wat stack language and is probably the most useful dissasembly. wasm-objdump will give you much the same info, but in more of a typical disasm format. To get actual asm, IDA does some magic with SpiderMonkey that I haven’t looked into yet.

Conclusion

I hope this was useful to you and helped give you a hacker rather than dev intro to wasm.

Making Your Own LinuxKit With Docker For Mac

2018-06-04T13:50:00Z

Docker For Mac (and Windows) has done some interesting tricks to bring Docker to non-Linux platforms. It took me a while to figure it all out, and even longer to work out how to make change to the kernel and rebuild the VM. This is a write-up of what I did and why. If you just want the results, you can grab them from https://github.com/singe/linuxkit-for-mac.

Overview

On a traditional Linux-based host, docker runs on the native OS and provisions and isolates containers with things like containerd and runc. Windows and MacOS don’t run Linux kernels and so you can’t run dockerd directly on the host OS. Instead, a Linux VM is used as an interstitial to run dockerd on. There’s some dark integration magic to allow things like bind volume mounts to the host OS from a container. In short, it looks like this:

Highly Technical Overview of normal Docker vs Docker for Mac

You can get a proper explanation of it all from this docker blog post.

If you don’t believe me and want to connect directly to your docker host VM on MacOS, you can run this:

screen ~/Library/Containers/com.docker.docker/Data/com.docker.driver.amd64-linux/tty

You can also see the .iso it boots from at:

/Applications/Docker.app/Contents/Resources/linuxkit/docker-for-mac.iso

In line with Docker’s principle of “swappable batteries” the Moby framework includes LinuxKit for building small/secure Linux host OS’ that can run on all sorts of things like MacOS, Windows, Mainframes and more. This article has a good overview. Several months ago, Docker-CE for Mac moved away from using the boot2docker VirtualBox VM to a LinuxKit based HyperKit VM.

Making Changes — Trying to just build a kernel module

For our upcoming Defcon talk, I wanted to port the work we’d done on AWS to build WiFi CTF environments in the cloud, to Docker, so people can learn/practise WiFi hacking without needing hardware. This necessitates loading a kernel module mac80211_hwsim to create fake wifi devices. The LinuxKit VM doesn’t have these modules compiled, so my first failed attempt was to try and build these. You can build kernel modules for the existing LinuxKit VM by reading the documentation, and looking at these examples. One critical piece of information not included, is that you can grab the config for the running kernel from /proc/config.gz like so:

> docker run -it --rm -v /:/host -v $(pwd):/macos alpine:latest
/ # uname -a
Linux 23b3e591c4eb 4.9.93-linuxkit-aufs #1 SMP Wed Jun 6 16:55:56 UTC 2018 x86_64 Linux
/ # cp /host/proc/config.gz /macos/
/ # exit

You’ll also need the kernel source for that version, e.g.

wget https://www.kernel.org/pub/linux/kernel/v4.x/linux-4.9.93.tar.xz

But, while this will work for simple modules. This won’t work for wifi modules, as the existing LinuxKit kernel doesn’t have base support for wifi capabilities. So building the modules will get you errors like this:

can't insert 'mac80211.ko': unknown symbol in module, or unknown parameter

Ok, it looks like we’re going to need to build a whole new kernel.

Rebuilding A New LinuxKit Kernel

Building a new kernel is relatively easy, especially if you’re familiar with building Linux kernels in general. The documentation is now clear and accurate. If you’re here for wifi, you don’t need to do this, as I’ve done it for you already.

First you need a copy of LinuxKit, then you need to work out what new kernel options you need to add, then you need to build your custom kernel.

Build your custom config.

git clone https://github.com/linuxkit/linuxkit
cd linuxkit/kerneldocker run --rm -ti -v $(pwd):/src linuxkit/kconfig
# In container
cd linux-4.9.96/
make menuconfig
####Configure the kernel the way you want, I enabled base wifi
cp .config /src/config-4.9.x-x86_64-custom
exit #Exit Container

For my purposes, I only added kernel options, so I stick those in a config-wifi file, so I can reuse them for other kernels, as these mostly haven’t changed across versions. You can now build your kernel, I’ve got the steps for that documented in the readme here. It should be a simple case of:

make EXTRA=-custom build_4.9.x-custom

This will build an image and store it in your docker image store with the tag linuxkit/kernel. You can now use this in your own LinuxKit builds. (Amusingly, this build is kernel panic’ing my vanilla docker-ce LinuxKit host right now, so there are still some bugs).

Build a New LinuxKit Image

Great, we’ve got a shiny new kernel image, time to build a new LinuxKit iso. There are a bunch of examples of these in the linuxkit/examples directory. It makes sense to start with the docker-for-mac.yml. Open it up in your favourite editor, and replace the line

image: linuxkit/kernel:4.14.52

with your kernel e.g.

image: linuxkit/kernel:4.14.52-wifi-ba03a8d668eb6be981e1ff71883b5e9e26274971-amd64

Or just use my prebuilt kernel:

image: singelet/kernel:4.14.52-wifi-ba03a8d668eb6be981e1ff71883b5e9e26274971-amd64

If you want to have the kernel modules loaded automatically with modprobe, you’ll also need to add this to the .yml file:

- name: modprobe
    image: linuxkit/modprobe:v0.4
    command: ["modprobe", "-a", "mac80211_hwsim"]

Next up, build yourself an iso with:

linuxkit build --format iso-efi docker-for-mac.yml

If you haven’t build linuxkit, just run make in its top-level directory, and you’ll get the binary in linuxkit/bin/linuxkit.

This will create a file named docker-for-mac-wifi-efi.iso. You can use this, by stopping Docker, backing up your existing docker-ce iso and replacing it with this one, then restarting it.

mv /Applications/Docker.app/Contents/Resources/linuxkit/docker-for-mac.iso /Applications/Docker.app/Contents/Resources/linuxkit/docker-for-mac.iso.orig
cp docker-for-mac-wifi-efi.iso /Applications/Docker.app/Contents/Resources/linuxkit/docker-for-mac.iso

You can check that it boots alright by watching the console with the screen command a the top of this post.

If all went well, you should now have a docker running with your shiny new LinuxKit host, and immediately notice several problems. This is where it gets messy.

Why Some Batteries Are Nicer Than Others

Despite Docker’s principle of swappable batteries, the LinuxKit image they build for Docker-CE for MacOS, has a image docker-ce with proprietary and non-redistributable code. This means that the docker-for-mac.yml that comes with LinuxKit, creates a LinuxKit image that:

Can’t do any bind mounts, whether to the LinuxKit host or MacOS host
Will never signal the taskbar icon that it’s started, so the GUI thinks it’s booting constantly.
Probably several other things I haven’t noticed yet.

This is primarily because of a tool called transfused that communicates with the osxfs process running on the mac. It used to be open source, but dissapeared and after some significant commit archeology, I eventually just asked:

To which the reply was:

And his suggestion was to just copy it out of the existing image, but:

Shucks. Then there’s also sendtohost, which send simple state info to the Docker taskbar agent. It appears to come from a private docker repo called pinata (according to strings inside it). There’s an older version available here. There also possibly a lot more I haven’t figured out yet.

This is why I built get-dockerce, to extract the things you need from the existing LinuxKit image while it’s running. Because I can’t redistribute them. Plus, with the regular release cycle of docker-ce this stuff is likely to change over time. All it really does is copy two files, transfused and sendtohost. Those two files are then used in the docker-fakece image you’ll need to build yourself. After which, they can be used in the modified LinuxKit docker-for-mac.yml file. I had to painstakingly figure out how to make that work with limited documentation (I think LinuxKit .yml files are Docker Cloud Stack files) and by recreating the .iso and restarting Docker each time. There was all sorts of weirdness (like scripts with background’ing, &, directives refusing to execute those), and figuring out how to pass the fuse device through. But it works.

Unfortunately, while transfused will now let you bind mount locations on the LinuxKit host, I haven’t got it working to allow bind mounts to those macOS host. If you have any ideas …

Was It Worth It & What Would Be Ideal

While this was an interesting dive into the innards of LinuxKit and Docker for Mac, the latter part feels like a lot of work and ugly hacks, that are mostly not redistributable, and fragile to change. Ideally, Docker will release the docker-ce image they use, containing transfused and similar publicly on Docker hub (it’s already on everyone’s machines just in a hard to access way) as well as an updated docker-for-mac.yml. Then we could just change the kernel and build a first-tier LinuxKit image. A request I made here.

Alternativley, they could add wireless options to the kernel they ship with LinuxKit, to allow wifi modules to be built. This isn’t great because it’s super specific to a wifi edge case, and doesn’t help people wanting to build custom kernels.

Cracking Efficiency Measurements & Common Substring Attack

2018-04-19T08:33:00Z

This was an epic week for password cracking, we had lots of new hashes and lots of competition to see who could crack the most the fastest.

BLUF: I put together a cracking technique, and tested it against other techniques, generating some insight into the best performing cracking techniques. Rockyou with hob064 rules won, but my technique came a close second, and had a faster crack speed. Get the script here.

You can use the technique with a list of common substrings from your own lists (sorry we can’t share ours). Or use the technique targetted specifically at a dump you’ve been going at to mine more cracks out of it.

Common Substrings

As my eyes blurred over some boring work, I had the thought; “what if we used the most common substrings found in already cracked passwords, to crack more”. For example, if users regularly use “companyname” or “!!” in their passwords, this would pull them out.

To this end, I wrote some dirty python. It took 38 minutes to run across one list. Before optimising I thought I should try awk, which is famously good at this sort of processing.

That lead me to a kernel of an idea taken from these forums. awk is magic, if hard to understand. I’ll leave doing that as an exercise to the reader. Needless to say, this is *much* faster than my pythonic attempts.

The way to use this, is to dump all the clears you’ve cracked so far to a file, then run this over that output. It’ll output some stats like percentage and number of times it was seen (and sort by percentage). Just cut on tabs to get the substrings only. Make sure you don’t unique anything, if a dump has lots of the same password repeated, you *want* that to show up as “more common”. If you unique either the hashes or the clears, you’ll lose that.

I then used this to generate a list of common substrings specific to various password dumps, and managed to crack a whole lot more that I hadn’t cracked before. I used hashcat’s -a1 combinator attack mode with the substrings as the right most list and other password lists as the left. I’d run it twice, once with -jc (i.e. capitalise first letter) and then again without.

I then took the most common substrings (everything >= 1%) by percentage from various dumps, and combined those to form a short super list of common substrings.

It looked like it was working well, but I wanted to see how it compared to other techniques.

Efficiency Measurements

It’s fine to some “something worked well” but what does that actually mean? Well, stand back, I’m going to try science!

I run 88 different tests on my laptop (kept constant) trying different techniques against different sets of hashes to see what worked best. I’d clear the potfile, run the text, then make a note of the time it took, the H/s, the number of hashes cracked and the percentage of the total that constituted.

The experiments combined several things:

4 different sets of hashes from projects. Two were part of the substring creation list (a & b) and two weren’t part of them (x & y).
Two base password lists; rockyou and facebook-firstnames (and in one case our private lists as a comparison).
Three rule sets best64, hob064 and InsidePro-PasswordsPro. I chose them as they were fast, and this experiment needed speed to scale the tests.
Where appropriate, I tried with -j c and without (i.e. uppercase first char), mostly this was for the substrings tests.

If you want raw results, my excel calcs are here: cracking-stats.xlsx

The overall results, were that a rules based approach with hob064 and rockyou featured in the top 4 for each password list as the most efficient, cracking on average 9,4% (ranging between 4,5%-18,2%) of the respective hash lists in 4-6s (your speed may vary). The second most effective, was using facebook firstnames with my substring list and uppercasing the first letter (i.e. -jc). This cracked on average 9,8% (ranging between 6,2-12,2%) of the passwords in the respective dumps between 7-8s. The next best technique (facebook-firstnames with best64) only averaged 3% and only did well against one password list, so it skewed its results. However, the substring attack had a significantly higher H/s on average than the rules based attack, which may give it an edge. To put this is a table:

Approach	Average % Cracked	Average time (s)	Average MH/s
rockyou rules hob064	9.4%	5s	190.25MH/s
fb-firstnames substrings -jc	9.8%	7.3s	914.35MH/s

I did a brief test of our private wordlists against one set of hashes. Those lists outperformed both rockyou and facebook-firstnames in effectiveness. So it makes sense to develop your own for your specific use cases. The first list with hob064 rules did 15% of the hashes in 2s, and the second list with my substrings and -jc did 13% in 2s.

I also did a quick check of a mask attack out of interest, I used facebook-firstname and -jc and it took 37s to get 6% of the passwords.

Finally, I checked what the overlap between the rules-based approach and the substring approaches was (i.e. are they finding the same passwords or different ones). This was less good, on average there was a 4.5% non-overlap between the rules and substring approaches. I suspect this has a lot to do with the wordlists.

Propagation of 7MHz signals & ionospheric refraction

2017-12-17T07:15:00Z

I’ve long been interested in the physics of RF, but never had a chance to play with it until recently. This post covers my experiments with the propagation of 7MHz signals; the equipment, the setup, the code, the results and the science.

The Setup

My setup is at home, where I’ve got an ancient HF radio (ICOM 738), plugged in to a 20m dipole antenna on my roof and a laptop doing WSPR with the wsjt-x program for this experiment.

If you’re interested in more detail of the setup … otherwise skip this part. The antenna is borrowed from Wicus (thanks dude!) and consists of two 10m wires coming into a balun which connects to the coax (RG58) feed line. The balun effectively filters out unwanted signals that are picked up by the coax sheath (I think, still trying to grok these fully). It’s jury-rigged to what I had in the house at the time, a piece of too-thin PVC cable tied to a telescopic painter’s pole. The radio is too old to do digital modes (which WSPR is), so I’ve got a homebrew SignalLink-like device which routes sound out of the radio to a USB soundcard for input (thanks ZS6SKY). For output, wsjt-x converts the digital signal to audio which is sent via the microphone input of the radio (a proprietary 8-pin ICOM plug). There’s also a serial cable that triggers the push-to-talk (PTT) pin of the microphone input (by pulling RTS down to GND).

This is focused on 7Mhz signals, because I had to start somewhere, it’s where my antenna has the lowest SWR ratio (1:1) (aka signals are efficiently radiated via the antenna, and nothing is reflected back down again) and I need another USB to RS232 serial converter to auto-control my rig to switch frequencies while also triggering the microphone to transmit.

I’m also using the WSPR digital mode for three reasons. Firstly, I’m based in a suburban area, which means the noise floor is *terrible* (using the S-measure of signal strength my noise is sitting at a 9 with the preamp on and a 6 with it off, which is terribad!). This means I can barely hear traditional analog voice contacts through the noise. Secondly, digital modes are much less susceptible to RFI, and can encode the info more efficiently on narrower bandwidths which a DSP can pick out. Lastly, WSPR is designed to work even in really poor signal conditions.

My setup does two things. First, it monitors for signals from others and uploads these “spots” to WSPRnet. These are done in two minute windows (about 110s). WSPR is a very narrow digital mode (about 6Hz), so the radio’s output can include several at once and the software can pick out even very weak signals (that’s the point of WSPR).

Then my setup broadcasts a signal every six minutes or so. I vary the power of the transmitted signals between 2W and 10W (as I wanted to see how power affects things) but the default is 5W. Other stations monitoring for WSPR signals will report when they see me.

The WSPR signal includes some very basic information:

Your callsign (I’m ZS6HAK)
Your location as a maidenhead grid location
Your transmit power level

These get encoded into 50bits; 28 bits for callsign, 15 for locator, 7 for power level (I’m using a 6 digit grid locator so it’s more complex). I’m still learning what the signal looks like exactly. You’ll notice that it’s completely spoofable, but at the moment they just rely on an honour system not to pollute the data.

The Results

Before running this, I had tried a couple of voice contacts and hadn’t gotten much farther than PMB (about 400km away). I could sometimes barely hear a friend of mine from Cape Town (about 2000km away) thanks to the noise issue I spoke about, so I didn’t think I’d get much further than a few hundred kilometers.

When I first started running this, one afternoon two weekends ago, I thought I had messed something up. I was spotting other local ZA transmission (even one in Cape Town!) but nobody was spotting me. Then, in the early evening I suddenly got spotted by Russia and large parts of Europe (furthest contact was over 9000km). When I woke up the next morning, I saw that I had made contacts as far away as Wisconsin America (over 14 000km away), and by the next evening I managed by furthest contact at 16 941km in California!

Looking at lines of text is pretty dry. However, WSPRnet gives you a pretty map where you can see who’s seen your signal and whose signals you’ve seen. What you’ll notice is that most of the contacts (in the 40m band we’re looking at) were made between two night-side stations. Here are some example pictures:

However, I thought it would be really cool to visualise these changing over time in a video, so you could see how the day/night change affect things. I figured it would be a couple lines of code and a few minutes of work. Instead it took me about 6 hours, 231 lines of code and python and JavaScript to get it done over a few nights. The code is here if you’d like. Here’s a video of the last two days of activity from my stations, neatly showing how day/night affects the propagation.

But Why?

This three pager from the American Amateur Radio League explains it better than most other references I’ve seen. Essentially, at night, the ionosphere thins, and starts reflecting 7MHz signals back to earth. This reflection can be anywhere between 2000–4000km (depending on the frequency of the signal and the state of the ionosphere at the time). The 17000km contacts are due to multiple bounces where it gets reflects back to earth, then bounces back to the ionosphere multiple times. This works particularly well over the Atlantic because the sea reflects better than say, the Sahara.

In a bit more detail, when radiation from space hits the ionosphere (a rough area covering 50–650km above sea level), it splits molecules into ions and electrons (ionisation). The biggest contributor of this is solar radiation (aka the sun). Higher up in the ionosphere has both thinner air and gets hit by more radiation, and so the molecules are split more aggressively and don’t recombine as easily as lower down. The lower down layer (D-layer) attenuates our 7MHz signal as it’s more dense and so a lower frequency radio signal will cause the free electrons to vibrate and collide with lots of nearby molecules. These collisions reduce the power of the signal. However, at night, the D-layer disappears as the sun’s radiation is no longer ionising that low down. This means our signals make it to the E and F layers of the ionosphere. There, the air is less dense, and so there are fewer collisions meaning our signal is less attenuated. The magic however, is that the free electrons in these layers, tend to re-radiate signals coming into them (no idea why, I’d love an explanation). As the electron density is higher the higher you go (more space radiation) the signal is “repelled” the higher it goes. This means the signal is effectively “bent” back to earth (refracted/reflected) at night.

The End

If you made it this far, well done and thanks for listening to my physics experiment. Radio is magic! If this excited you, let me know and I’ll help you through your Radio Amateur Exams so you can get licensed and come play too.

Thoughts on Bureaucracy

2017-09-25T07:30:00Z

After seeing PaulG's tweet on bureaucracy it kicked off some quick thoughts.

The dangerous thing about letting your company become bureaucratic is that when the smart people leave, they won't tell you that’s why.
https://twitter.com/paulg/status/910519167949971456

The two fastest way to implementing a bureaucracy in my opinion are centralizing decision making and implementing process.

Centralizing decisions moves the person implementing something as far away from the person with power to change it as possible. It's why your bank teller just looks at you and says "there’s nothing I can do". It means the people on the ground with the knowledge of how best to do something are being ignored and disempowered to make good changes. You can try all sorts of things to fight that, spend time talking to the do'ers, put in a suggestion box etc. But why not just give them the power to change things and rather use intelligent oversight? If it's because you can't trust them, then you have a bigger problem, and one that won't be fixed by continuing to not trust them.

As for processes, the Netflix Culture Deck puts it well (https://jobs.netflix.com/culture). Paraphrasing badly, if you want to do something the same way every time, processify it, if you want people to keep doing it better, let exceptional people be exceptional. Process is a way of telling people "how" to do something instead of just "what to do". It encourages less thinking. Worse, if it's enforced, it disempowers people from optimizing or inventing. My suggestion, write down the objectives, why they're needed, and who can help you with them, then let smart people figure out how to go about getting those done most appropriately for the situation.

Fight organizational atrophy!

If you're interested in more on this see this post; On Large Companies and Staff Retention

BSides Cape Town Secret Squirrel Challenge Write-Up

2016-12-06T12:54:59Z

Last weekend was the BSides Cape Town conference, currently ZA’s only hacker con. It’s a cool little con with big dreams that get a little closer each time. This year was a lot a fun and well put together, congrats to all of the speakers organisers and volunteers.

SP gave some talks; Charl spoke about where we’re headed in a talk entitled Love Triangles in CyberSpace; a tale about trust in 5 chapters. Chris discussed his DLL preloading work and released his toolset. Finally, Darryn & Thomas spoke about exploiting unauth’ed X sessions and released their tool XRDP, it was also their first con talk ever.

The other thing we did was run a CTF challenge off the back of the cool badge & CTF platform AndrewNoHawk and elasticninja built. This is a write up of that challenge.

The first hint that the challenge existed was on the challenge portal:

This pointed to the food tickets everyone was given to redeem for food and drink. They looked a little like this:

There were fives types of tickets. The left has what looks like a QR code. However, most QR code readers can’t read them. That’s because the colours have been inverted. This required collecting pics of all the lunch ticket codes, which in-turn required you to speak to some people, since not everyone had all the tickets. Given the low numbers, vegetarians would have been the most popular. These decode to:

left right, up up, skip BA, down down, , left right again

The oldies among us, or those using a simple Google will recognise that as parts of the Konami code. The “skip” part was due to the fact that I read badge code really badly and I thought it didn’t have a B or A button. Then later I hoped it could be used to prevent people just guessing the Konami code (i.e. you type the Konami code, you get one thing, you type the truncated one, you get another). Unfortunately, time was short.

Typing in the real, truncated Konami code; up-up-down-down-left-right-left-right, displayed a sort of riddle;

There's a wifi
net you can't
see. It's hidden
not easy.@JP_14c

This tells you three things, the first is the next part is to do with wifi (although anyone who knows me should have guessed), the second is that it’s hidden in someway, and the third is that there’s a twitter account. The name of the Twitter account was itself a hint, but just in case it had only one tweet that directed people to this link.

In case you didn’t get it, it’s pointing to the fact that there’s a wifi network running on channel 14, a frequency only available in Japan’s regulatory domain.

I had a ton of fun setting this network up. Not only was it running in very low power and on channel 14, so most devices couldn’t see it at all, I also had it doing 802.11n which is not something that should be possible (in Japan it’s only allowed to do 802.11b (i.e. no OFDM). If you’re interested the code to comment out is here. And finally, it was running mana’s proportionality ACLs, so it wouldn’t even respond to probes from other devices. Initially, I did a bunch of editing to wpa_supplicant’s code to get it to connect, but eventually, it turned out that with the right regulatory settings it connects just fine.

However, none of that detail is really necessary, because airodump-ng in it’s default configuration spots the network just fine. The idea with the next part was to teach participants some wifi hacking.

Initially, the network ESSID was http://bit.ly/1Gm8CGe which points to the aircrack newbie guide, with tutorials on how to capture wifi traffic. The intention was to have two data requests going over the network, the first was an HTTP GET request to aircrack’s writeup of packetforge-ng. The second was a UDP packet to the badge challenge server with the string ‘1234567890’ and a varying response containing the cryptographic challenge hash. My hope was that someone would be able to grab the UDP packet, modify it to use their badge number, and re-inject it into the network spoofing the existing connected client. This could be done using three steps:

Capture & extract the packet
Modify it to use your badge number
Re-inject it into the network

Capturing the packet can be done with airodump-ng and the -w switch to write the packets to a file. Just make sure you’re not channel hopping (-c14 would fix it) and ideally filter for just that network with the –bssid switch. The packet can be extracted using wireshark.

Modifying the packet is a three step process. The first is to use your favourite hexeditor to change the badge number from 1234567890 to your badge number. Then re-open it in wireshark, where it will inform you the checksum is wrong, and what it should be. Armed with this, re-edit the capture to change the checksum to the right value.

Re-injecting the packet can be done using aireplay-ng -3 -r -h -c -j . The response could be captured in the same way as the initial packet was captured.

Unfortunately, due to a bizarre string of technical failures, I was unable to replicate it on the day. Also, everybody I spoke to told me it was too advanced. So I changed the ESSID to another bit.ly link pointing to an Internet-connected, HTTP-version of the badge server. Now all that was needed was to modify the GET request with a new badge number.

Upon completing the challenge, the wifi scanner would be unlocked on the badge, allowing you to scan for wifi networks using just your badge (nice work Andrew).

At the end of the day, one person made it all the way through, Cobus Bernard. For his troubles, we gave him a R1k take-a-lot (ZA’s amazon) voucher. Well done Cobus!

Snoopy with Mana

2016-09-14T12:11:27Z

In 2011 Glenn and Daniel released Snoopy, a set of tools for tracking and visualising wireless client activity. However, the Snoopy project is no longer maintained. This blog entry is about how I got Snoopy-like functionality built into Mana.

Snoopy’s core functionality was to observe probe requests for remembered networks from wireless clients, although it ended up doing much more.

The problem tools like Snoopy face, is that they can’t monitor the whole 2.4Ghz wireless spectrum for probe requests, without the use of multiple wireless cards. So they channel hop to make sure they see probes on multiple channels. In the 2.4Ghz range this wasn’t terrible, because the channels overlap, which means you didn’t have to tune in to all 11 or 14 (depending on location) channels individually to see probes across the spectrum. So while you may have missed a few probe requests, you didn’ t miss many.

However, with the introduction of the 5Ghz spectrum, you now have an additional 24 non-overlapping channels to monitor. This means that in order to monitor for probe requests across both 2.4Ghz and 5Ghz ranges, there is a high chance that some probes will be made while your transceiver isn’t listening to that frequency, and won’t be recorded.

Wireless clients have a similar problem. They need to quickly find nearby APs and can’t monitor the whole spectrum. Through a combination of, usually proprietary, active and passive scanning techniques, they will be “attracted” to channels with APs on and send their probes there. So we can make use of an AP and have the clients come to us, rather than us looking for them. Additionally, this is already core mana functionality, as it needs to see probe requests to know what networks to impersonate.

Additionally, to make sure we’re getting as much from the PNL (preferred networks list) of the devices we’re observing as possible, mana can also pretend to be a hidden network in it’s beacons (with ignore_broadcast_ssid=1), while still responding to probe requests. This triggers iOS devices to probe for hidden networks on their PNL but still lets you impersonate non-hidden networks.

So, I added an option to hostapd-mana that will have it log station MACs, the network they’re probing for, and whether it is a locally administered (aka random) MAC. You can enable this functionality by adding the following line to your hostapd.conf:

mana_outfile=/some/file enable_mana=1 mana_loud=0

The last two lines are enabling mana, and disabling loud mode. This is required to track individual stations. With loud mode enabled, you’ll be limited to a single entry per SSID.

Practically, the output will look something like:
00:11:22:33:44:55, FunnyNetwork, 0
That’s a CSV of station MAC, ESSID and a 1/0 flag with 1 indicating a random MAC.

The real magic is when you import this into Maltego for visualisation. You can do this using the new “Import/Export -> Import Graph from Table” function in Maltego 4. Before doing so, make sure you have the SensePost Toolset installed from the Transform Hub on the front page, otherwise you won’t have the entities we’re about to map to.

There’s a nice tutorial when you click the Import Graph from Table button, but effectively you need to configure Column 1 as a MAC Address, Column 2 as an SSID and Column 3 as a dynamic property of the MAC Address. This looks like this picture:

Doing so, will get you a graph of which devices were probing for which networks.

Next, you can map a network name to a location using wigle.net and the “Geolocate SSID (Wigle)” transform from the SensePost Toolset. You’ll need to register for an account at wigle.net and if you’re planning on doing anything more than point lookups, you may need to contact wigle to ask for an account with less API rate limiting.

The other advantage of running mana to do this, is that you can “decloak” random MACs when the device tries to join the network. For example, here we can see three devices probing for a network, two of them are random and one is a non-random Apple device. In all likelihood, we’ve “decloaked” the random MACs by the device attempting to associate to our AP. This won’t work for Windows randomisation however.

You can grab the code now from https://github.com/sensepost/hostapd-mana I haven’t rolled it into mana-toolkit yet.

Universal Serial aBUSe

2016-08-15T17:02:29Z

Last Saturday, at Defcon 24, we gave a talk entitled “Universal Serial aBUSe: Remote Physical Access Attacks” about some research we had performed into USB attacks. The talk was part of a research theme we’ve been pursuing related to hardware bypasses of software security. We decided to look into these sorts of attacks after noting their use in real world attacks. For example, you have “Apex predators” such as the NSA’s extensive use of sophisticated hardware implants, most notably for this work, the COTTONMOUTH devices. On the other end of the scale, we noticed real world criminals in the UK and ZA making use of unsophisticated hardware devices, such as hardware keyloggers, drive imagers and physical VPN devices and successfully making off with millions. This led us to hypothesise that there’s probably a large series of possible attacks in between these two extremes. We also noted that there’s not many decent defences against these sorts of attacks, it’s 2016, and the only decent defence against decent hardware keyloggers is still to “manually inspect all USB ports” (assuming this stuff is even visible).

And so, if we manipulate a wise man’s quote to say something we want it to say: “pentesters need to emulate real world attacks”. We’re hoping that with enough hackers equipped with these things, there will be enough “audit findings” to move the needle.

If you’re just here for the tl;dr:

Code is at https://github.com/sensepost/USaBUSe
Video demo is at https://www.youtube.com/watch?v=5gMvtUq30fA

Enough justification, onto the meat!

We took some fairly common attacks (fake keyboards in small USB devices that type nasty things) and extended them to provide us with a bi-directional binary channel over our own wifi network to give us remote access independent of the host’s network. This gives us several improvements over traditional “Rubber Ducky” style attacks:

We can trigger the attack when we want. No missed executions.
We don’t use the host’s network. No hassle on exfil, or potential for NIDS catching us.
We can shrink our initial typed payload to just open the binary pipe. Much less fragile typing required.
Lots of heavy lifting can be moved to the hardware, which gives less for stuff like AV to trigger on or DFIR teams to find.
We don’t show up as a network adapter, our binary pipe is an innocuous USB device, making it harder to spot.

Lastly, we wanted this to be a working, end-to-end, attack. This means we also spent time adding some nifty features like:

A mouse jiggler to prevent the screen saver from activating (but with no visible movement of the mouse)
Optimised typed payloads that are partially hidden from a user within 4s, and completely hidden within 12s of their activation
An ability to integrate your favourite payload

Prior work

Before we get into that, we wanted to acknowledge the giants whose shoulders we stood on:

* Adrian Crenshaw Plug & Pray; Malicious USB Devices & his PHUKD – Adrian did the initial work on this, and was the inspiration for the Rubber Ducky.

* Michael Ossman & Dominic Spill’s NSA Playset, TURNIPSCHOOL
Mike and Dom showed that this can be miniaturised like the NSA’s devices with some awesome work, but didn’t get to the on-host stuff.

* There are numerous projects that make use of “typing attacks” such as; HAk5’s Rubber Ducky, Samy’s USBDriveBy, Nikhil’s Kautilya or Elie Bursztein’s work (presented at BHUSA2016).

* Lastly, Seunghun Han released his Iron-HID at HITB AMS after we had submitted our Defcon CFP. It’s cool work, and a very similar idea to ours, but our implementations are very different.

Too many words, show me the pictures

Yeah, yeah, but how does it work?

The Hardware

We initially prototyped the attacks on April Brother’s, Cactus Micro revision 2. Think of it like a Teensy 2 with an ESP8266 stuck on it. This is still the cheapest way to get the hardware for this attack ($11).

The device has two microcontrollers, an Atmega32u4 and an ESP8266. The Atmega32u4 (hereafter AVR) gives us USB device capability using the LUFA stack. The ESP8266 (herafter ESP) is much faster than the AVR, and provides a WiFi interface, however, it doesn’t have USB support. We based our code for the ESP on the esp-link TCP-UART firmware.

However, there are a couple of disadvantages with the Cactus Micro, and we had a new board designed by Ignatius Havemann at BlackBox . It’s open hardware, and the full specs are in the code repo. We’re currently working on plans to make fully assembled versions available. Our boards make a couple of improvements:

They have a USB A connector so you don’t need to solder your own.
They were designed to fit into commonly available thumb drive enclosures.
They have a micro SD slot added for more storage
We changed the reset switch to a hall effect sensor so it can be reset while in the case. This lets you upload new firmware.
The I2C bus between the chips is connected with appropriate pull up resistors
The LED is programmable
The code works on both the Cactus Micro and our boards.

Overview

Here’s an overview of how it all fits together.

The ESP runs a modified version of the esp-link firmware. This provides a VNC server to the attacker, which is how HID events are received. The telnet interface is used to send binary data. Originally, Rogan built a Java client and custom protocol to take HID input, but soon realised that this is what VNC was designed for, and built a VNC server into the esp-link firmware instead.

The ESP is connected via UART to the AVR. The AVR is running our own firmware built on the excellent LUFA stack. The AVR’s job is mostly to be the UART to USB interface. The AVR will present itself as three devices to the host OS. A keyboard and mouse, which are used to replay HID events from the ESP’s VNC server, and a “binary pipe” device. Currently, we’re using a Generic HID device, as it has standard drivers that don’t require privileges in Windows. Other innocuous devices, such as text-only printers or MIDI devices are planned for the future. This is also where the mouse jiggler code, to prevent screensavers from engaging, sits.

Execution

On the host, a two or three stage process is run, depending on the type of attack.

The initial “typed” payload is run. The “typing” is automated using a vnc automation tool, vncdotool. Rogan made a slight modification to allow it to read from files. Our favourite payload is the “direct” powershell payload, because it lets us do smart things to hide the window. Currently, it takes 4s for the powershell code to hide itself (set fg colour to bg), by 13s the window is moved off screen (so it still has focus for the keyboard input but is hidden from the user), and by 22s the payload has fully hidden itself and opened up the binary pipe to receive the second stage.
The second stage no longer has size restrictions as it’s sent as a binary blob over the generic HID device. The simplest payload here just spawns a command shell on the host and sends it back to the telnet port on the ESP. There’s also a screenshot payload, which combined with a keyboard and mouse via VNC, gives you painfully slow GUI access. These payloads are pretty stealthy, the only thing written to disk is the small C# segment in the powershell that lets us import “open file” and window manipulation functionality. Even if AV were to write a signature for this, it’s small enough and innocuous enough to easily modify.
There’s an alternate second stage, that is used to spawn a TCP to USB proxy bound to localhost, that can be used to stage other more common network-based payloads, such as meterpreter. Those payloads are sent as a third stage. We tried to demo this live, and it failed, thanks to the fact that msf’s multi/handler doesn’t like 127.0.0.1 as an LHOST. Right after the preso, @mubix pointed that out, and using a different alias for localhost makes it work, or just leaving it as 0.0.0.0. Note that this is currently limited to a single TCP connection. Once that connection is closed, the proxy terminates. A future extension would include support for multiple TCP connections, multiplexed over the single Generic HID interface.

Problems Experienced

Theoretically, this attack is nothing new. However, the gap between theory and implementation was pretty big. There were some particularly face-punching issues related to developing this sort of thing on that sort of hardware we thought we should share.

Debugging

Debugging embedded hardware is painful, because there is no mechanism for persistent debug logs. The ESP’s watchdog means that any lockups, or taking too long to receive/process data, ends in the ESP hard resetting, which means your debug logs disappear. We ended up snooping on the UART between the two microcontrollers with a pair of FTDI USB-UART adapters on the Cactus Micro, where we could simply clip test clips onto the .1″ headers. On our hardware, we made sure test pads are exposed to do the same. We also built a laser-cut test jig to hold the test pads firmly in place on an array of pogo pins, and utilised a Teensy LC (which has multiple hardware UART interfaces) in place of the FTDI adapters. The Teensy also has functionality to trigger the reset on the board, for fully hands off reprogramming!

Flow control

When you are dealing with processors of vastly differing capability, flow control becomes a critical part of the equation.

The first place this was noticed was between the ESP and the AVR. The ESP8266 has a 128 byte output FIFO, and the AVR has a 1 byte receive register. The AVR is also much more limited in terms of RAM and CPU cycles, running at 8MHz to the ESP’s 80MHz. Even if the AVR sent a message to the ESP when it realised its own 256 byte receive ring buffer was half full, the ESP already had 128 bytes in flight in the UART FIFO. Simply making the AVR’s ring buffer larger didn’t solve the problem reliably, and we had to revert to making the ESP wait after every message it sent for the AVR to acknowledge it, and give the go-ahead for additional messages to be sent. There is definitely scope here for performance improvements!

This then triggered the second place. By default, the esp-link expects to be able to transmit all the data received via TCP to the UART in a single method invocation. However, by introducing flow control from the AVR, this could end up taking significant time, enough to trigger the ESP8266 watchdog! As a result, it was necessary to save the data received from the TCP connection to a local buffer on the ESP, so that it could be transmitted as and when the AVR was ready to receive it. This then required implementation of a periodic task that checked to see if the AVR was “receptive”, and then transmitted the next message from the local buffer.

Unfortunately, the act of returning from the “receive TCP data” method allows the TCP sender to transmit more data! If left uncontrolled, the sender would overwhelm the ESP as well. This made it necessary to add TCP flow control, using espconn_recv_hold and espconn_recv_unhold calls. It also necessitated allocation of a 5 packet buffer per connection on the ESP, as TCP can have several packets “in flight” simultaneously, that the ESP is obliged to accept.

Finally, the victim may want to transmit data faster than the AVR can send it to the ESP. For the Generic HID interface, this was achieved by setting a flag in a “control byte”, indicating that the AVR could not receive any more data, and that the sender should pause.

All this makes you appreciate the problems long solved by the folks that gave us the IP protocol, and those that have implemented it since then, that we no longer have to solve when working with high-level applications! Unfortunately, when working low-level, we relearn the problems of old!

Windows device naming

Following various code examples found scattered around the internet resulted in use of a device file name looking like “\??\HID\VID_03EB&PID_2066&MI_00&Col01\9&32bfc41&0&0000\{4d1e55b2-f16f-11cf-88cb-001111000030}”. This worked fine on Windows 7, but failed on Windows XP. Windows devs will likely be slapping their heads, but it took searching through the XP registry to realise that the correct device prefix should be “\\?\”, not “\??\”. Funny that the latter worked on Windows 7, though!

Foreign keyboard layouts

If you check the powershell, you’ll notice a couple of .Replace() and char[] calls, which may look weird in a payload that needs to be as lean as possible to be typed fast. This is due to differences between international keyboard layouts that we ran into, resulting in different output for the same keycode. The obvious solutions would be to use powershell.exe’s -Encrypted flag and just base64 it, but given powershell expects double width base64, it doubles the size of the typed payload. An alternative might be to use a keyboard mapping in either the VNC client or the VNC server, but that makes the payload less generic, and assumes that the attacker is aware of the keyboard layout of the victim.

Future Development

Using Generic HID interfaces for the binay pipe works fine on Windows, but will fail on Linux or OS X, as unprivileged users are not granted access to the device by default, as they are on Windows. An alternative implementation would be required for these operating systems.

Possible devices yet to be properly investigated include:

A printer. This would depend on how significantly CUPS mediate’s access to printers
Audio devices of various sorts. “Regular” sounds cards are likely to interfere with the user’s existing desktop setup, but a MIDI device would not – however it would be severely limited in bandwidth
Mass storage. Read and write sectors of a very large file, although OS-level caching and device read-ahead may prove problematic, and anti-virus may interfere.

The End

Wow, thanks for making it this far, we were certain you’d drop off by flow control. The code is up at https://github.com/sensepost/USaBUSe if you’re looking to grab a copy. The release has pre-compiled binary firmware you can just flash by following the instruction in the README. If you’d like to build your own firmware, we warn you that setting up the tool chain requires some patience. We’ve added some documentation to the project, and the sub-projects have a fair bit too.

As always, everything has ben released and is under an open source license. This includes the hardware designs.

Too Easy – Adding Root CA’s to iOS Devices

2016-03-23T08:51:00Z

With the recent buzz around the iMessage crypto bug from the John’s Hopkins team, several people pointed out that you would need a root CA to make it work. While getting access to the private key for a global root CA is probably hard, getting a device to trust a malicious root CA is sometimes phrased as difficult to do, but really isn’t. (There’s a brief technical note about this in the caveats section at the end.)

In our 2014 Defcon talk where we released the mana toolkit, we pointed out how stupidly easy it was to get a root CA installed on both iOS and Android devices with no hacking required. Two years later, not much has changed in the iOS world, except for a single extra unclear prompt.

The Simple Approach

To prompt a user to install a malicious root CA on an iOS device, all you need do is serve a self-signed certificate via HTTP (it has to be self-signed, otherwise it won’t install as a root CA). You just need to serve the file, you don’t even need the right mime-type. In my world, this is most easily done during the captive portal check (made up of two requests to http://captive.apple.com/hotspot-detect.html) when a device first connects to a wifi network, with the bonus being it’s done via the WebSheet and pops up over the user interface. To make things a little more tempting, you can name it something like “Free Wifi Autoconfiguration”. If the user has no pass{code|word} setup (our most likely target group) then the flow looks like this:

1 The prompt to install the self-signed malicious certificate. The red “Not Verified” is the closest sign of danger a non-technical user will see.

1.1 What you see on clicking “More Details”

2 The new warning about how it will be added to your trusted certificate store. You’ll notice, that for the average user, this doesn’t say “why” this is bad. Ideally, something like “This will allow someone to intercept and modify much of your encrypted communication.” should be added.

Even to a technical user, it doesn’t make it clear this is being added as a new trusted root, it just says something about it being trusted.

We also see a warning about the profile being unverified, which we’ll make go away later in this post.

3 The second “Install” prompt. If the user has a passcode or password enabled, they will be prompted for it before this prompt.

4 The certificate is now installed.

That’s a simple 3 step process to the user, and other than some text saying “not verified” it doesn’t really give the user any idea that something bad just happened. At this point, encrypted MitM attacks are feasible, all for the cost of serving a single cert file.

One Step Better

But, let’s take it one step further and see if we can get rid of that red “Not Verified” warning and maybe do a bit more than just add a root CA. In steps Apple’s Configuration Profiles.

First, we put together a simple configuration profile using Apple Configurator 2 or older iPhone Configuration Utility. Both of these generate a simple plist file. In this configuration, I add the same self-signed certificate as a credential and export the configuration to a .mobileprofile, making sure to leave it unsigned and unencrypted. Next, I sign the file using a valid code signing certificate with openssl (as detailed here). Make sure you include the full certificate chain, as the device won’t download and follow the chain itself. Just to show the profile doing something else, I add a hidden network that devices will probe for (and get responded to by mana). Finally, we update our captive portal to serve the .mobileprofile file instead of the certificate. Again, you just serve it, no fancy headers or mime-types required. This is what a user sees.

1 What a user is prompted with on connection. Gone is the red “Not Verified” now replaced by a green “Verified” resplendent with a tick due to the profile having been signed , even though it contains a malicious, unverified, root CA. We also get to add some explanatory text to make the user feel more comfortable.

I’ve redacted the signing certificate’s details because I don’t want someone getting it revoked :)

1.1 What you’d see after clicking “More Details”. You’ll notice the wifi-network in included here.

2 The same warning as before about how a certificate will be added to your trusted root store, but gone is the “This profile is unverified” warning. Again, to a non-technical user, this really doesn’t sound scary. To technical users they may even still be fooled since it doesn’t mention that it will be added as a trusted root CA.

3 The second install prompt. Again, if you have a passcode you’d be prompted to enter that before this.

4 Voila, the profile is installed, and you can MitM away.

Now, this is a malicious profile that’s been installed. You can configure nearly every aspect of an iOS device with a configuration profile, even going so far as to set up a remote MDM server for pushing new profiles down later, as well as doing things like preventing a user from removing it. Of course, additional configs come with additional warnings.

Conclusion

I hope this has demonstrated how easy it can be to push a malicious root CA to an iOS device, making it as a pre-requirement for the iMessage and other such attacks not an implausibly difficult one. For the level of compromise it provides, particularly in the case of a configuration profile, the “overhead” on the attacker is ridiculously low.

Additionally, I really wish Apple would make it clearer to the user what’s going on by explaining what the implication of their choice is, in big red obvious writing as well as encouraging a secure default choice. For example, when doing the same to an Android device, Google introduced a persistent warning in the notifications telling the user that their communications may being intercepted. Of course, none of this will prevent all users from clicking through warnings, but the mobile OS’es need to be following the lead of the browsers, and encouraging users to make good security choices “by default” (think of the big red “this site is insecure” warnings you get for an invalid cert these days) rather than relying on them to make good choices despite the OS’ encouragement.

Caveats specific to the iMessage flaw

I’ve not really been charitable to the interpretation of “requires a root CA”, but it was really just the hook that got me to write this entry. Apple’s fix to the iMessage flaw was to implement certificate pinning on some iMessage requests. This was done around December 2015 according to the Johns Hopkins’ paper. Certificate pinning effectively blocks a MitM of iMessage traffic, by forcing either a specific trust chain or specific certificate to be used. Our malicious certificate will not be part of that chain, nor will the certs we sign with it match the specific cert. That’s why apps such as Twitter and Facebook are also not vulnerable to MitM from this. However, older iOS’ (pre-9) are still vulnerable to this according to the JHU paper. Thus, people claiming this attack requires a root CA’s private key are correct only insofar as they mean on iOS9, which doesn’t make them wrong. The attack against iMessage is much harder than “any root CA” because you’d need to access to a specific, built-in root CA’s private key, or one of Apple’s. In short, the technique described above, will not let you perform the JHU attack against iMessages on updated phones.

Admission of illegally obtained evidence in ZA courts; hacked FB messages

2016-02-08T11:41:06Z

There's a story that's been doing the rounds in the ZA press entitled; "Your private Facebook messages can be used in court against you even if you were hacked" It details a case "Harvey v Niland and Others" in the South African High Court, Eastern Cape Division where Facebook messages were deemed admissible in a case, despite having been obtained illegally.

This is pretty attention grabbing, and could have far reaching implications. Could someone hack my WhatsApp/Gmail/Facebook and use my private conversations to implicate me in a crime? What does this mean for my right to privacy over my communications, and does it change the status quo? Could out intelligence services hack citizens for mass dragnet surveillance ala the NSA? Does this have implications for the Cybercrimes bill? Since it was a pretty attention grabbing headline, I sought out the full text of the judgement to understand the, usually carefully considered and well reasoned results.

The heart of the case is about whether Niland had acted in opposition to his fiduciary duties to the CC he remained attached to, and the Facebook messages were used to show he had violated this. I'm going to ignore this and focus on the part relevant to infosec, privsec and citizens; can someone hack you and use that as evidence in a case?

Sections [13]-[16] of the judgement provide the details of what happened; an employee at Harvey's organisation claimed to know Niland's Facebook password. Harvey asked this employee to log in as Niland and check the private messages for evidence, which they did. The messages show Niland is guilty. Niland disputes that he had given his password, and claims he was hacked. No more technical details of how the messages were obtained was provided. If I had to guess, I suspect Niland left himself logged in to Facebook on one of the company machines, or had an easy to guess password (written on a post-it left under his keyboard?) or had in fact told the other employee a password to something else, and just re-used it.

Sections [38-53] of the judgement deal with the admissibility of the Facebook messages. Niland's attorney argued that hacking the Facebook messages is a crime according to the Electronic Communications and Transactions Act 25 of 2002 (ECT Act) and should render the evidence inadmissible. Additionally, it was a violation of the constitutionally enshrined right to privacy [Section 14(d) of the Constitution of South Africa], which includes the right not to have the privacy of your communications infringed.

The judge cited much case law and established the following four high-level counters to the above:

One The ECT act is silent on whether acts marked as criminal negate their output being used as evidence. A prior case relating to phone tapping was cited where the Interception and Monitoring Prohibition Act 127 of 1992 too was silent on whether evidence obtained by phone tap was inadmissible, despite declaring such tapping a criminal offence.

Interestingly, Niland's attorney sought to separate the two cases by arguing that the ECT act is a "game changer", the specifics of the argument are not disclosed, but the judge rejected the idea by citing the similarity in the two cases and acts as discussed above.

Two Next, the judge cited a case describing the difference between a criminal and civil case. In a criminal case, the defendant has a right against self-incrimination, and doesn't need to help the state's case, however, in a civil case, the defendant is subject to discovery that includes information that may be harmful to their case. While the messages were not found in discovery (more on that later) their disclosure does fit the requirements of the civil case.

Three The judge then tries to consider what rules he should operate under in further considering whether the evidence should be admissible. Here he cites a case where it was determined that the right to privacy "attenuates" the further away from "the home" (aka private life) one moves. In this case, the Facebook messages submitted were directly relevant to the business of Harvey and Niland, and messages related to Niland's private life were excluded. Thus, the right to privacy may be lessened in this case.

Four Lastly, the judge tries to determine whether there were other, legal, means of obtaining this data, and concludes that there weren't. Harvey could have sued Niland for damages resulting from the suspected breach, and in doing so, would be entitled to discovery of the messages. I mentioned this above, in a civil case relevant information must be provided by the defendant, even if it hurts their case. This is why you often see internal e-mails from companies being cited in public court proceedings. If Harvey was worried that Niland would delete the messaged, he could ask for an "Anton Piller order. An "Anton Piller order" is a sort of surprise search warrant where the defendant is not informed. However, the judge believed that such a request would be dismissed as a fishing expedition since the messages are the basis of the case, without them the justification could not be made for such an order.

And so, the evidence was admitted, and Niland was found guilty of breaching his fiduciary duties.

Analysis

I think at this point, it should be clear, that information obtained about you via illegal hacks cannot arbitrarily be used against you and admissible in court. It should also be clear, that in a civil case, you may be subject to discovery that would require such information to be disclosed anyway. And even in criminal cases, where you aren't required to disclose such information, it can still be obtained and used against you. For example, in the Oscar Pistorius case, detailed information about his WhatsApp messages between him and others, including deeply personal messages to his deceased girlfriend Reeva Steenkamp, as well as call records and even movement based on cell towers his phone was communicating with were provided.

If you want to defend yourself, the easiest take-away is for you to regularly delete your private communications (WhatsApp, Facebook Messages, SMS, SnapChats, Twitter DMs, etc.) before they become relevant to a potential case.

Meta-Analysis

There are two things that I'd like to discuss in addition to this case. The first is the observation that we have a fantastic judiciary, with smart judges that apply their minds, and public records of such judgements are available. If you ever hear bombastic claims off the back of a case, it usually pays to read the actual judgement.

The second, relates to the proposed Cybercrime bill. The bill attempts to pick up from the ECT act, as well as implement parts of the National Cybersecurity Policy Framework. It's first public draft received much negative attention about the privacy procedural implications of the bill. There's a call to get more technical experts involved. The case law considerations from this case are exactly why tech experts should be very wary about doing so. While any act would, of course, require accurate technical descriptions and assumptions, the bulk of the act will be interpreted outside of such technicalities, in much the way the judge did in this. As such, the implications of wording in the act can have significant unforeseen consequences. In my mind, it is significantly more important for legal minds, familiar with how acts are read during proceedings, to provide input, supported by technical experts.

A quick view on IBM's approach to mainframe security disclosures

2014-09-30T12:31:26Z

At DerbyCon I made a point about IBM's security response procedures. It's a complex and subtle issue that won't carry well over Twitter. Here's my quick attempt at clarifying my personal view, hurriedly typed on a phone before I catch a flight. IBM responded quickly and proactively to seeing the original HITB abstract (they contacted me). They asked smart questions and got a patch out for NVAS quickly. It was a somewhat uncomfortable conversation, over email only, as I believe their legal team "contributed" to each mail. However, the patch and knowledge of the issue is only available to active NVAS customers (not wider System Z customers) and no fixes exist for the wider issue. IBM take a stance of not publicizing new vulnerabilities beyond a need to know group. There's some merit to this and it clearly limits the sorts of public scrutiny and tool development other applications see (there are nearly no Metasploit modules for System Z despite its long life and critical use within its install base for eg). However, within 45 mins of hypothesizing it, I was able to find and implement tools to exploit a family of vulnerabilities within the 30yr old TN3270 protocol, that likely affects nearly every app exposed via that protocol. This includes apps run in CICS, IMS, REXX etc. That's pretty significant, and there's no patch for it (that I can see or IBM can point customers to at the time of publication) beyond fixing each vulnerable app individually. I'm pretty confident that nearly no one, as a tourist to Windows or Facebook systems (for eg), would achieve anything so damaging in the same timeframe against those systems. Especially not critical protocols that had been in use for over 30 years. So what's the difference? My belief is that it's the approach; by discouraging a wider understanding of security issues and inhibiting offensive innovation within mainframes, IBM has landed themselves in the situation where enterprise mainframe apps are more vulnerable than their web app counterparts. Other apps have been exposed to full disclosure and used those issues to direct security teams. Note, the subtly here is not that IBM doesn't have smart security people, but rather that they may not have had their focus sharpened by external research in the way someone like Microsoft did. This approach seems to be the best we have at the moment, characterized by the movement towards bug bounties. As a final point, IBM may have listened to their customers on security up till now. But it's been at the expense of ignoring security researchers. It may come to a point soon where they need to take a smart stance, even if it's in opposition to customers, to enhance the security of their platform.

hostapd v2.0 KARMA edition

2013-07-21T22:24:46Z

DigiNinja wrote a set of patches for hostapd that allow it to operate in KARMA mode (i.e. respond to any probe in an attempt to fool wifi devices into joining it). His last set of patches were for v1.0. I spent some time porting them to v2.0 of hostapd.

The functionality is exactly the same (although the probe response is a little more aggressive), and you can grab either the patch or the full tarball here:

hostapd-2.0-karma.patch (38K)
hostapd-2.0-karma.tar.gz (1.4M)

If you want to clean them up, some context: the majority of the changes are based in src/ap/beacon.c and are due to the introduction of the new ssid_match() function. I initially had something fairly intelligent, but it got turfed as I debugged back to Robin's initial method. The big problem was that the probe response's SSID was being grabbed from the configured SSIDs instead of what was probed for. So I created a new _karma probe response method that took the SSID as a parameter.

Dominic White

Reading Large Files and Perf

The strategies

1 - Vanilla

2 - IO Read

3 - Block Read

4 - Buffered Read

5 - Thread Reader

Measuring Methodology

Results

Conclusion

Fast NTCracking in Rust

The Full Benchmarks

But Wait You Didn’t …

What is it

The Actual Post – What Did I Learn

Multi-threading

Reading a file … fast

Hash lookups

The Actual Hashing

Writing to stdout

The End

After the End? Name clash

Understanding PEAP In-Depth

Table of Contents

PEAP at a High Level

MSCHAPv2

Decrypting the Inner Tunnel

The Inner MSCHAPv2 Exchange

Byte-Level Description of the MSCHAPv2 Exchange

MSCHAPv2 Calculations

MSCHAPv2 Failure Behaviour

The Apple Vulnerability

Apple’s Fix

Disclosure Timeline & Details

Original Vulnerability Report

Overview

Vulnerable Version

Detail

Impact

How to Reproduce

Recommendation

Credit

Introduction to WebAssembly

Simplest C

Now as WASM

Compiling and Running it Yourself

Going manual

Simple IO & Function Calling

Including Remote .wasm’s

Calling JS Functions

Disassembling

Conclusion

Making Your Own LinuxKit With Docker For Mac

Overview

Making Changes — Trying to just build a kernel module

Rebuilding A New LinuxKit Kernel

Build a New LinuxKit Image

Why Some Batteries Are Nicer Than Others

Was It Worth It & What Would Be Ideal

Cracking Efficiency Measurements & Common Substring Attack

Common Substrings

Efficiency Measurements

Propagation of 7MHz signals & ionospheric refraction

The Setup

The Results

But Why?

The End

Thoughts on Bureaucracy

BSides Cape Town Secret Squirrel Challenge Write-Up

Snoopy with Mana

Universal Serial aBUSe

Enough justification, onto the meat!

Prior work

Too many words, show me the pictures

Yeah, yeah, but *how* does it work?

The Hardware

Overview

Execution

Problems Experienced

Yeah, yeah, but how does it work?