is behavioural unit testing brittle?

I've recently been in discussions around whether doing lots of interaction/behaviour and mock based test first development leads to brittle tests. The examples that have promoted this usually result from code needing some rework and as such, specifications have to be ripped out and new ones formed as replacements. Thus the inference is that:

changing tests = brittle tests = more work required

I've been thinking a lot about this and went back to reading works on the web such as Martin Fowler's Mocks Aren't Stubs and my old TDD books. If you have already seen my previous posts on this blog about context specification then you may already know that I am a huge interaction tester - a "mockist" rather than a "classicist", if you like.

My conclusion is that in the world of single responsibility principle and "tell don't ask" we often want to ensure that our classes are small, do one thing if possible and do not query into collaborators for state and subsequently make decisions based on this inquiry. We want our code loosely coupled, well defined and segregated and abiding by core principles whether it be SOLID, DDD, BDD.

So when performing test first development how can we make sure that these isolated and responsible collaborators all interact exactly how we want them to?
With behavioural unit tests.

How can I ensure that the core objects responsible for my complicated business rules that have all been thoroughly test driven and are beautifully laid out are actually used by all the confederates that should be using it?
With behavioural unit tests.

Let's take the blank slate approach too - the first few test first classes require interaction with a few other future objects that we have in mind in the architecture of this current piece of work... let's define the use of that (currently non-existent) API through a seam, often an interface, and thus requiring a stub. I use Rhino Mocks or a similar tool to generate a stub, assert that certain calls were made against the API with the correct inputs... I have behavioural unit tests.

Now the argument may be that this white box testing knows far too much about what is going on in the internals of these classes... why should we know what it does with its colleagues as long as it brings back the desired result? This is of course state based testing's favourite advocation - just verify the outputs given the inputs.

But again, if we do this, how can we be one hundred percent certain that the heart of the code exercised in this particular test hasn't been ripped out at some point and our all singing, all dancing business rules are not even being used at runtime?
Full test coverage isn't just about each unit being tested - its also often about the collaborations too... this code MUST use the business rules. It is an absolute requirement.

Behaviour driven test first development can be achieved at the unit test level through context specification frameworks and this is a fantastic way to describe the behaviour of our code, including all the collaborations, the interactions, the state that moves from one seam to another - these are the important pieces of information, the core of what we should be testing.

With specifications so tight to the implementation, reworking code means that a specification has to be deleted and a new one created - this is absolutely right and purposeful as our behaviour has completely changed, the state moving through our objects is different, the cogs move in a different way... we need to harness these changes with specifications that ensure they do the new thing correctly.

We don't need the old specification so let's throw it away. Did we put all that effort into crafting those tests just to erase it now? Yes! It served its purpose, it helped shape our original code. It performed its role in our CI process and ensured our first few releases worked they way we wanted them too. It was not wasted effort by any means.

Behavioural unit testing is not brittle, it is an essential part of our testing process to make sure the heart of our machine does exactly what we've specified it should.