mitmproxy and pathod 0.11

I'm happy to announce that we've just released v0.11 of both mitmproxy and pathod. This release features a huge revamp of mitmproxy's internals and a long list of important features. Pathod has much improved SSL support and fuzzing.

Our thanks to the many testers and contributors that helped get this out the door. Please lodge bug reports and feature requests here.

MITMPROXY CHANGELOG

  • Performance improvements for mitmproxy console
  • SOCKS5 proxy mode allows mitmproxy to act as a SOCKS5 proxy server
  • Data streaming for response bodies exceeding a threshold (bradpeabody@gmail.com)
  • Ignore hosts or IP addresses, forwarding both HTTP and HTTPS traffic untouched
  • Finer-grained control of traffic replay, including options to ignore contents or parameters when matching flows (marcelo.glezer@gmail.com)
  • Pass arguments to inline scripts
  • Configurable size limit on HTTP request and response bodies
  • Per-domain specification of interception certificates and keys (see --cert option)
  • Certificate forwarding, relaying upstream SSL certificates verbatim (see --cert-forward)
  • Search and highlighting for HTTP request and response bodies in mitmproxy console (pedro@worcel.com)
  • Transparent proxy support on Windows
  • Improved error messages and logging
  • Support for FreeBSD in transparent mode, using pf (zbrdge@gmail.com)
  • Content view mode for WBXML (davidshaw835@air-watch.com)
  • Better documentation, with a new section on proxy modes
  • Generic TCP proxy mode
  • Countless bugfixes and other small improvements

PATHOD CHANGELOG

  • Hugely improved SSL support, including dynamic generation of certificates using the mitproxy cacert
  • pathoc -S dumps information on the remote SSL certificate chain
  • Big improvements to fuzzing, including random spec selection and memoization to avoid repeating randomly generated patterns
  • Reflected patterns, allowing you to embed a pathod server response specification in a pathoc request, resolving both on client side. This makes fuzzing proxies and other intermediate systems much better.

A few weeks ago, I posted that I had hacked up a version of mitmproxy that exploited CVE-2014-1266, giving unrestricted access to nearly all HTTPS traffic on affected IOS and OSX devices. I chose not to release working code at the time, but a number of POCs have been floating about publicly almost since the issue was first discovered. So, the time has come to publish - as of yesterday, mitmproxy's master branch supports #gotofail.

To see the exploit in action, invoke mitmproxy as follows:

mitmproxy --ciphers="DHE-RSA-AES256-SHA" --cert-forward 

After configuring your device proxy, you should see something like this screenshot, which shows off interception of miscellaneous iTunes traffic:

Note that the client device here has no mitmproxy CA certificate installed, and we get circumvention of certificate pinning "for free".

Two new options make the magic work. The --ciphers option specifies which SSL ciphers we should expose to connecting clients. In this case, we force the client to use a DHE cipher, which is required to trigger the issue. The --cert-forward option tells mitmproxy to pass upstream SSL certificates down to the client unmodified. Usually we'd expect this to fail, since the upstream certs won't match mitmproxy's private key. In this case #gotofail means the client fails to properly execute the check, letting us pass certificates through to the client verbatim as if we owned them.

There's one additional wrinkle that mitmproxy smooths over - before we can get the mismatching certificate and key to the client, OpenSSL itself has to be coaxed into accepting them. The first version of my exploit involved a patch to OpenSSL to remove the library's own consistency check, but this is inconvenient. Luckily it turns out that we can munge an obscure flag in the RSA data-structures to circumvent this, which allows us to exploit #gotofail in pure Python.

The moment I got this exploit working, I marched upstairs and confiscated my wife's un-updated iPhone 5 to add it to my pool of test devices (never fear - it's been replaced with a nice new 5S). Devices running IOS of the right vintage have suddenly become the gold standard for analysis and pen testing. This beautiful vulnerability lets us circumvent SSL effortlessly, completely sidestepping certificate pinning for all the applications I've tried, without any cumbersome and invasive interference with the device. Combine this with the fact that these same devices also have an un-tethered jailbreak, and I think it's unlikely that we'll ever have an analysis platform this nice again. So, stockpile your IOS 7.0.6 devices now, and intercept all the things.

This post is a quick recap of work I've been discussing on Twitter in the last few hours. I've just finished putting together a version of mitmproxy that takes advantage of CVE-2014-1266, Apple's critical SSL/TLS bug. We knew in theory that the issue should give access to all SSL traffic using Apple's broken implementation - I can now report that this is also true in practice.

I've confirmed full transparent interception of HTTPS traffic on both IOS (prior to 7.0.6) and OSX Mavericks. Nearly all encrypted traffic, including usernames, passwords, and even Apple app updates can be captured. This includes:

  • App store and software update traffic
  • iCloud data, including KeyChain enrollment and updates
  • Data from the Calendar and Reminders
  • Find My Mac updates
  • Traffic for applications that use certificate pinning, like Twitter

It's difficult to over-state the seriousness of this issue. With a tool like mitmproxy in the right position, an attacker can intercept, view and modify nearly all sensitive traffic. This extends to the software update mechanism itself, which uses HTTPS for deployment.

At the time of writing, Apple still doesn't have a fix deployed for OSX. It took less than a day to get the patched version of mitmproxy and its supporting libraries up and running. I won't be releasing my patches until well after Apple's pending update, but it's safe to assume that this is now being exploited in the wild. Of course, intelligence agencies have no doubt been on top of this for some time - perhaps some of the inflammatory Sochi security horror stories were plausible after all.

mitmproxy and pathod 0.10

I've just released v0.10 of both mitmproxy and pathod. This is chiefly a bugfix release, with a few nice additional features to sweeten the pot.

Perhaps the most visible change has been a huge improvement in the recommended method for installing the mitmproxy certificates. Certs are now served straight from the web application hosted in mitmproxy, which means that in most cases cert installation is as simple as typing the mitmproxy URL into the devce driver. See the docs for more.

In other, minor news - I see that the mitmproxy project has just passed 2000 stars on GitHub. Between PyPi and the files we serve from mitmproxy.org, the project has also seen nearly 100k downloads in the last year (after removing obvious bots). I know, I know - figures like these don't mean much, but it's still nice to see that people are using and enjoying mitmproxy.

CHANGELOG

  • Support for multiple scripts and multiple script arguments
  • Easy certificate install through the in-proxy web app, which is now enabled by default
  • Forward proxy mode, that forwards proxy requests to an upstream HTTP server
  • Reverse proxy now works with SSL
  • Search within a request/response using the "/" and "n" shortcut keys
  • A view that beatifies CSS files if cssutils is available
  • Many bug fix, documentation improvements, and more.

Here's a riff on Malcolm Gladwell's rule of thumb about mastery: you don't really know a programming language until you've written 10,000 lines of production-quality code in it. Like the original this is a generalization that is undoubtedly false in many cases - still, it broadly matches my intuition for most languages and most programmers1. At the beginning of this year, I wrote a sniffy post about Go when I was about 20% of the way to knowing the language by this measure. Today's post is an update from further along the curve - about 80% - following a recent set of adventures that included entirely rewriting choir.io's core dispatcher in Go. My opinion of Go has changed significantly in the meantime. Despite my initial exasperation, I found that the experience of actually writing Go was not unpleasant. The shallow issues became less annoying over time (perhaps just due to habituation), and the deep issues turned out to be less problematic in practice than in theory. Most of all, though, I found Go was just a fun and productive language to work in. Go has colonized more and more use cases for me, to the point where it is now seriously eroding my use of both Python and C.

After my rather slow Road to Damascus experience, I noticed something odd: I found it difficult to explain why Go worked so well in practice. Sure, Go has a triad of really smashing ideas (interfaces, channels and goroutines), but my list of warts and annoyances is long enough that it's not clear on paper that the upsides outweigh the downsides. So, my experience of actually cutting code in Go was at odds with my rational analysis of the language, which bugged me. I've thought about this a lot over the last few months, and eventually came up with an explanation that sounds like nonsense at first sight: Go's weaknesses are also its strengths. In particular, many design choices that seem to reduce coherence and maintainability at first sight actually combine to give the language a practical character that's very usable and compelling. Lets see if I can convince you that this isn't as crazy as it sounds.

Maps and magic

Lets pretend that we're the designers of Go, and see if we can follow the thinking that went into a seemingly simple part of the language - the value retrieval syntax for maps. We begin with the simplest possible case - direct, obvious, and familiar from a number of other languages:

v := mymap["foo"]

It would be nice if we could keep it this simple, but there's a complication - what if "foo" doesn't exist in the map? The fact that Go doesn't have exceptions limits the possibilities. We can discard some gross options out of hand - for instance, making this a runtime error or returning a magic value flagging non-existence are both pretty horrible. A more plausible route is to pass an existence flag back as a second return value:

v, ok := mymap["foo"]

So far, so logical, and if consistency was the primary goal, we would stop here. However, having two return arguments would make many common patterns of use inconvenient. You would constantly be discarding the ok flag in situations where it wasn't needed. Another repercussion is that you couldn't directly use the results in an if clause. Instead of a clean phrasing like this (relying on the zero value returned by default):

if map["foo"] {
    // Do something
}

... you would have to do this:

if _, ok := map["foo"]; ok {
    // Do something
}

Ugh. What we really want, is to get the best of both worlds. The ease of the first signature, plus the flexibility of the second. In fact, Go does exactly that, in a surprising way: it discards some basic conceptual constraints, and makes the data returned by the map accessor depend on how many variables it's assigned to. When it's assigned to one variable, it just returns the value. When it's assigned to two variables, it also returns an existence flag.

Compare this with Python. The dictionary access syntax is identical:

v = mymap["foo"]

Python does have exceptions, so non-existence is signaled through a KeyError, and the dictionary interface includes a get method that allows the user to specify a default return when this is too cumbersome. This is certainly consistent on the surface, but there's also a deeper structure that helps the user understand what's going on. The square bracket accessor syntax is just syntactic sugar, because the call above is equivalent to this:

v = mymap.__getitem__("foo")

In a sense, then, the value access is just a method call. The coder can write a dictionary of their own that acts just like a built-in dictionary2, and can also build a clear mental model of what's going on underneath. Python dictionaries are conceptually built up from more primitive language elements, where Go maps are designed down from concrete use cases.

Range: a compendium of use cases

An even stranger beast is the range clause of Go's for loops. Like map accessors, range will return either one value or two, depending on the number of variables assigned to. What's particularly revealing about range is the way these results differ depending on the data type being ranged over. Consider this piece of code, for example:

for x, y := range v {
}

To figure out what this does, we need to know the type of v, and then consult a table like this:3

Range expression 1st Value 2nd Value
array or slice index i a[i]
map key k m[k]
string index i of rune rune int
channel element error

What range does for arrays and maps seems consistent and not particularly surprising. Things get a tad slightly odd with channels. A second variable arguably doesn't make much sense when ranging over a channel, so trying to do this results in a compile time error. Not terribly consistent, but logical.

Weirder still is range over strings. When operating on a string, range returns runes (Unicode code points) not bytes. So, this code:

s := "a\u00fcb"
for a, b := range s {
    fmt.Println(a, b)
}

Prints this:

0 97
1 252
3 98

Notice the jump from 1 to 3 in the array index, because the rune at offset 1 is two bites wide in UTF-8. And look what happens when we now retrieve the value at that offset from the array. This:

fmt.Println(s[1])

Prints this:

195

What gives? At first glance, it's reasonable to expect this to print 252, as returned by range. That's wrong, though, because string access by index operates on bytes, so what we're given is the first byte of the UTF-8 encoding of the rune. This is bound to cause subtle bugs. Code that works perfectly on ASCII text simply due to the fact that UTF-8 encodes these in a single byte will fail mysteriously as soon as non-ASCII characters appear.

My argument here is that range is a very clear example of design directly from concrete use cases down, with little concern for consistency. In fact, the table of range return values above is really just a compendium of use cases: at each point the result is simply the one that is most directly useful. So, it makes total sense that ranging over strings returns runes. In fact, doing anything else would arguably be incorrect. What's characteristic here is that no attempt was made to reconcile this interface with the core of the language. It serves the use case well, but feels jarring.

Arrays are values, maps are references

One final example along these lines. A core irregularity at the heart of Go is that arrays are values, while maps are references. So, this code will modify the s variable:

func mod(x map[int] int){
    x[0] = 2
}

func main() {
    s := map[int]int{}
    mod(s)
    fmt.Println(s)
}

And print:

map[0:2]

While this code won't:

func mod(x [1]int){
    x[0] = 2
}

func main() {
    s := [1]int{}
    mod(s)
    fmt.Println(s)
}

And will print:

[0]

This is undoubtedly inconsistent, but it turns out not to be an issue in practice, mostly because slices are references, and are passed around much more frequently than arrays. This issue has surprised enough people to make it into the Go FAQ, where the justification is as follows:

There's a lot of history on that topic. Early on, maps and channels were syntactically pointers and it was impossible to declare or use a non-pointer instance. Also, we struggled with how arrays should work. Eventually we decided that the strict separation of pointers and values made the language harder to use. This change added some regrettable complexity to the language but had a large effect on usability: Go became a more productive, comfortable language when it was introduced.

This is not exactly the clearest explanation for a technical decision I've ever read, so allow me to paraphrase: "Things evolved this way for pragmatic reasons, and consistency was never important enough to force a reconciliation".

The G Word

Now we get to that perpetual bugbear of Go critiques: the lack of generics. This, I think, is the deepest example of the Go designers' willingness to sacrifice coherence for pragmatism. One gets the feeling that the Go devs are a tad weary of this argument by now, but the issue is substantive and worth facing squarely. The crux of the matter is this: Go's built-in container types are super special. They can be parameterized with the type of their contained values in a way that user-written data structures can't be.

The supported way to do generic data structures is to use blank interfaces. Lets look at an example of how this works in practice. First, here is a simple use of the built-in array type.

l := make([]string, 1)
l[0] = "foo"
str := l[0]

In the first line we initialize the array with the type string. We then insert a value, and in the final line, we retrieve it. At this point, str has type string and is ready to use. The user-written analogue of this might be a modest data structure with put and get methods. We can define this using interfaces like so:

type gtype struct {
    data interface{}
}
func (t *gtype) put(v interface{}) {
    t.data = v
}
func (t *gtype) get() interface{} {
    return t.data
}

To use this structure, we would say:

v := gtype{}
v.put("foo")
str := v.get().(string)

We can assign a string to a variable with the empty interface type without doing anything special, so put is simple. However, we need to use a type assertion on the way out, otherwise the str variable will have type interface{}, which is probably not what we want.

There are a number of issues here. It's cosmetically bothersome that we have to place the burden of type assertion on the caller of our data structure, making the interface just a little bit less nice to use. But the problems extend beyond syntactic inconvenience - there's a substantive difference between these two ways of doing things. Trying to insert a value of the wrong type into the built-in array causes a compile-time error, but the type assertion acts at run-time and causes a panic on failure. The blank-interface paradigm sidesteps Go's compile time type checking, negating any benefit we may have received from it.

The biggest issue for me, though, is the conceptual inconsistency. This is something that's difficult to put into words, so here's a picture:

The fact that the built-in containers magically do useful things that user-written code can't irks me. It hasn't become less jarring over time, and still feels like a bit of grit in my eye that I can't get rid of. I might be an extreme case, but this is an aesthetic instinct that I think is shared by many programmers, and would have convinced many language designers to approach the problem differently.

The extent to which Go's lack of generics is a critical problem, however, is not the point here. The meat of the matter is why this design decision was taken, and what it reveals about the character of Go. Here's how the lack of generics is justified by the Go developers:

Many proposals for generics-like features have been mooted both publicly and internally, but as yet we haven't found a proposal that is consistent with the rest of the language. We think that one of Go's key strengths is its simplicity, so we are wary of introducing new features that might make the language more difficult to understand.

Instead of creating the atomic elements needed to support generic data structures then adding a suite of them to the standard library, the Go team went the other way. There was a concrete use case for good data structures, and so they were added. Attempting a deep reconciliation with the rest of the language was a secondary requirement that was so unimportant that it fell by the wayside for Go 1.x.

A Pragmatic Beauty

Lets over-simplify for a moment and divide languages into two extreme camps. On the one hand, you have languages that are highly consistent, with most higher order functionality deriving from the atomic elements of the language. In this camp, we can find languages like Lisp. On the other hand are languages that are shamelessly eager to please. They tend to grow organically, sprouting syntax as needed to solve specific pragmatic problems. As a consequence, they tend to be large, syntactically diverse, not terribly coherent, and, occasionally, sometimes even unparseable. In this camp, we find languages like Perl. It's tempting to think that there exists a language somewhere in the infinite multiverse of possibilities that unites perfect consistency and perfect usability, but if there is, we haven't found it. The reality is that all languages are a compromise, and that balancing these two forces against each other is really what makes language design so hard. Placing too much value on consistency constrains the human concessions we can make for mundane use cases. Making too many concessions results in a language that lacks coherence.

Like many programmers, I instinctively prefer purity and consistency and distrust "magic". In fact, I've never found a language with a strongly pragmatic bent that I really liked. Until now, that is. Because there's one thing I'm pretty clear on: Go is on the Perl end of this language design spectrum. It's designed firmly from concrete use cases down, and shows its willingness to sacrifice consistency for practicality again and again. The effects of this design philosophy permeate the language. This, then, is the source of my initial dissatisfaction with Go: I'm pre-disposed to dislike many of its core design decisions.

Why, then, has the language grown on me over time? Well, I've gradually become convinced that practically-motivated flaws like the ones I list in this post add up to create Go's unexpected nimbleness. There's a weird sort of alchemy going on here, because I think any one of these decisions in isolation makes Go a worse language (even if only slightly). Together, however, they jolt Go out of a local maximum many procedural languages are stuck in, and take it somewhere better. Look again at each of the cases above, and imagine what the cumulative effect on Go would have been if the consistent choice had been made each time. The language would have more syntax, more core concepts to deal with, and be more verbose to write. Once you reason through the repercussions, you find that the result would have been a worse language overall. It's clear that Go is not the way it is because its designers didn't know better, or didn't care. Go is the result of a conscious pragmatism that is deep and audacious. Starting with this philosophy, but still managing to keep the language small and taut, with almost nothing dispensable or extraneous took great discipline and insight, and is a remarkable achievement.

So, despite its flaws, Go remains graceful. It just took me a while to appreciate it, because I expected the grace of a ballet dancer, but found the grace of an battered but experienced bar-room brawler.

--

Edited to remove some inaccuracies about channels.


  1. I don't mean mundane details like the syntax and core concepts of a language. In the case of Go, you can get a handle on these in an hour by reading the language specification. 

  2. Pedant hedge: yes, the illusion isn't perfect, and there are in fact subtle ways in which Python dictionaries are not just objects like any other. 

  3. Simplified from here 

I've just released v0.9.2 of both mitmproxy and pathod. This is a bugfix release, chiefly to address two crashing issues affecting mitmproxy when relaying SSL traffic. A range of other fixes and improvements are also included - if you use mitmproxy, you should upgrade.

CHANGELOG

  • Improvements to the mitmproxywrapper.py helper script for OSX.
  • Don't take minor version into account when checking for serialized file compatibility.
  • Fix a bug causing resource exhaustion under some circumstances for SSL connections.
  • Revamp the way we store interception certificates. We used to store these on disk, they're now in-memory. This fixes a race condition related to cert handling, and improves compatibility with Windows, where the rules governing permitted file names are weird, resulting in errors for some valid IDNA-encoded names.
  • Display transfer rates for responses in the flow list.
  • Many other small bugfixes and improvements.

Introducing choir.io

Today, I'm raising the veil (slightly) on a new project - choir.io. The most succinct description of choir.io is that it is a service that turns events into sound. Why would you want to do that? Well, I believe that there are compelling reasons to make sound part of your monitoring stack. Let's see if I can convince you.

The soundscape

When I walk into my study every morning, I'm surrounded a rich, subtle soundscape that exists just beneath conscious perception. My air-conditioner, computers and monitors all emit hums and purrs. I can "tune in" to these if I focus, but they usually only draw my attention when something changes. When the power goes out there is a deathly silence, when a CPU fan noise changes pitch or texture, it bothers me immediately.

Layered over this background are more obtrusive sounds, closer to the threshold of awareness - the clacking of keyboards, faint noises of my family getting ready for their day upstairs, the front door opening and closing. Whether or not I pay attention to these is somewhat context dependent. Am I waiting, or instance, for my wife and kids to start trooping down the stairs so I can join them for my son's swimming lesson? If I am, I listen out for those sounds specifically. I get an enormous amount of information about my world from these more discrete, event-related noises.

Finally, there are the really obtrusive sounds, things that immediately get my attention. This might be someone saying my name, my phone ringing, a knock at the door, or a smoke alarm. I'm very aware of these, and they usually signal something I have to deal with immediately.

These layers of more and less obtrusive sounds form a soundscape that is ever-present, and utterly necessary in our day-to-day lives. Notice how effortless this process of extracting meaning from our ambient sounds is. Our minds process this information stream without any mental exertion, filters out what we don't need to notice, and draws our attention to what we do. There's a lot of cognitive research (that I might delve into in future posts) that show that our brains and auditory systems are specifically designed to make sense of the world in this way.

We have nothing like this rich texture of ambient awareness for the technology that surrounds us. Our monitoring mechanisms seem to be stuck at the ends of the intrusiveness spectrum. At one end, we have email notifications that demand our attention until we start to ignore them or silence them with a filter. At the other end we have passive status dashboards that require us to remember to switch context and visually consult a different interface. Choir.io doesn't aim to supplant either of these, but tries to fill in the blank portion of the awareness spectrum between them.

When I sit at my desk, I can hear our server architecture humming away. There's the subtle pitter-patter of hits to various webservers, the occasional clack of an SSH login. Occasionally there is a chime when @alexdong pushes to Github, followed shortly by the celebratory cheer of a server deploy. When I hear the jarring note of a 500 server error, I switch context to view logs or a dashboard, but otherwise my focus stays with my editor window. Choir is young, but it's already become an indispensable part of my life.

The demo

To give you an idea of what we're trying to achieve, we've put together a demo feed that consumes all public GitHub events. The demo is tuned to be more intrusive than it would be in production, because we want someone listening casually for a few minutes to be able to hear sounds that cover the whole intrusiveness spectrum. Have a listen here:

Github Realtime Activity

Challenges and next steps

There are a number of key questions that we'd like to answer with the help of our intrepid early adopters. First among these is the question of soundscape design. What makes a good sound pack? What is the right mix of intrusive and non-intrusive sounds? How do we construct soundscapes that blend into the background like natural sounds do? Another set of questions surrounds the API and integration. What is the right blend of simplicity and power is in the API? Which services should we integrate with next?

There are some obvious next steps in the works. We recognize that sound pack design is a deep problem with subjective solutions. So, letting users assemble, edit and eventually share their own sound packs is high on our list of priorities. Free-standing Choir.io player apps for Windows and OSX will also be on the way soon, so you won't need to remember to keep a browser tab open. Technical improvements to the API that are on the way include UDP and SSL support.

Choir is trying to do something new, and we want as much feedback as early in the process as possible. So, we've decided to start sending out invites today, even though Choir is far from the polished system that it will be in a few months. If you're brave, willing to give frank feedback, and want to help us explore this exciting idea, please request an invite.

mitmproxy 0.9.1

I'm happy to announce the release of mitmproxy 0.9.1. This is a bugfix release, with no significant changes in behaviour.

As hinted in my previous release note, the project itself is also evolving. As of this release, mitmproxy and its sister projects (pathod and netlib) are housed under a separate organization on Github, rather than my own personal space:

github.com/mitmproxy

I'm also very happy to welcome the first external core developer to the mitmproxy projext: Maximilian Hils. Max is the author of HoneyProxy, a web analysis front-end for mitmproxy. In the next few months, he'll be working on integrating and expanding his work to become mitmproxy's official web interface. Max's efforts will be sponsored by Google under their Summer of Code program, and will be mentored by the HoneyNet Project.

Changelog

  • Use "correct" case for Content-Type headers added by mitmproxy.
  • Make UTF environment detection more robust.
  • Improved MIME-type detection for viewers.
  • Always read files in binary mode (Windows compatibility fix).
  • Correct PyOpenSSL dependency declaration.
  • Some developer documentation.