A rule is only as strong as how hard it is to break

not how good the rule is. how hard it is to break.

you have seen the proof of this. a comment that says do not call this directly. a line in a style guide everyone nodded at. a convention the whole team agreed to in a meeting and meant. and a week later someone calls it directly, at the end of a long day, because the thing was right there and nothing stopped them. the rule was not wrong. it was unenforced. those are different problems and we keep treating them as one.

most of what we call discipline is a rule resting on the weakest enforcement there is: a thing you have to remember not to do. and remembering does not scale, because the person who has to remember is tired, or in a hurry, or new, or all three, and the rule was written once by someone who was none of those things and has long since moved teams. a rule that runs on willpower has scheduled its own violation.

so the question is never whether the rule is good. it is what happens when someone tries to break it. and there turns out to be a ladder.

01 · the ladderfive answers to what happens when someone breaks it, weakest to strongest

here are the rungs, from the rule that asks nicely to the rule that cannot be broken.

convention. a comment, a doc, a line in CONTRIBUTING.md, a thing said in standup. enforced by memory and goodwill. it is free to write and it decays the instant someone is busy. this is where most rules live and it is the weakest rung there is.

review. a human who is supposed to catch it. better, because now two people have to fail instead of one. but the reviewer is also tired, also busy, and a diff that looks unalarming gets an LGTM. review lowers the rate. it does not close the door.

the gate. a check that fails the build. a test, a lint, a CI job. this is a real wall, because the bad change does not merge. it is also the rung most teams reach and mistake for the top, because a gate is a file in the repository, and whoever can edit the file can delete the gate. the same tired person who would have broken the rule can add --no-verify or loosen the config, and now the wall has a door in it that only they know about.

unrepresentable. the violation does not compile. the type system refuses to express the bad program. you have not made the mistake harder to make. you have made it impossible to say. the constraint stops living in anyone's memory and starts living in the language.

nonexistent. the dangerous thing is not reachable at all. the function is not in scope, the lock does not exist, the capability was never handed to this code. you cannot misuse what you cannot name. this is the strongest rung, and it is usually a restructure rather than a check. you did not guard the hazard. you removed it.

if this shape feels familiar it is because the safety world has had it for fifty years. it is the hierarchy of controls: elimination, substitution, engineering controls, administrative controls, and last of all PPE. it is the framework occupational safety people use to rank ways of keeping a worker from getting hurt, ordered most effective first. eliminate the hazard, or failing that engineer it out, and only at the very bottom, when you have run out of better options, do you put up a sign and ask people to be careful. a comment that says do not call this directly is a sign on a wall. software keeps half rediscovering this and keeps stopping at the sign.

the idea i want to poke at is simple. good engineering moves a rule down this ladder, from the thing you must remember toward the thing that cannot happen. discipline is the name we give a rule still stuck on the top rung, and it is a tax, paid not once but on every future change, by every future person, for as long as the code lives.

which is an easy thing to say and costs nothing to say. so let us check the bill.

02 · the billevery rung up is a claim that costs nothing. it isn't.

to measure the cost of climbing you need one rule you can build at every rung and compare. i picked the oldest one in concurrency: always acquire your locks in the same order. break it and two threads can each grab one lock and then wait forever for the other's, which is a deadlock. the rule is real, it is famous, and in almost every codebase it lives on the top rung, written in a wiki, enforced by nobody, debugged at 2am when the thing hangs in production and not in the test.

so i built it at four rungs and measured each. the gate rung is missing on purpose, and i will get to why.

at the bottom, convention: a doc comment naming the order, and a test that proves the comment is worthless. two threads, a barrier so each is holding its first lock before it reaches for its second, opposite order. it hangs, every time. the firing is the point.

at the top of what a type system can do, unrepresentable, built on the lock_ordering crate. the idea is small once you see it. holding a lock is modeled as a mutable borrow of a zero sized token, and acquiring the next lock requires a trait bound proving this lock comes before that one in a declared order. acquire them in order and the bounds are satisfied and it compiles. acquire them out of order and the bound is unprovable and it does not compile. the deadlock is not detected. it is unspeakable.

i did not invent that. the crate did, with the same lineage as Fuchsia's network stack, and a 2002 paper out of MIT laid down the core idea two decades before either. i measured a thing that already exists. the mechanism is not mine and is not the interesting part for me here. the bill is, mostly because i had not seen anyone add it up.

what the climb costs at the compiler

start with the surprising part. the type level version does not get slow as your lock hierarchy grows. past a certain size it stops compiling.

rustcerror

error[E0275]: overflow evaluating the requirement `L1: LockAfter<L127>`
help: consider increasing the recursion limit by adding a
      `#![recursion_limit = "256"]` attribute to your crate

at 128 levels deep it hits the compiler's default recursion limit and gives up. not a slowdown. a wall. the same wall on two different toolchains, nearly two years and about sixteen releases apart. the safety has a hard ceiling, and you do not see it until you hit it.

under the wall there is a curve. the compiler's type checking work grows with roughly the square of the number of locks in the order, measured exponent near 1.9, and there is a clean reason for it that took me an embarrassing detour to get right.

“the cost tracks the number of ordered pairs the type system has to know about. not the number of locks. not how deep the chain is. the pairs.”

— what the quadratic is actually counting

i first thought the cost was about depth, long chains slow and shallow trees fast, because every example i had built happened to make those the same thing. then i built the one that pulls them apart. take 160 locks. arrange them as four independent chains of 40 and it costs five milliseconds. arrange the same 160 as four dense tiers of 40, every lock in one tier ordered before every lock in the next, and it costs 138 milliseconds. same number of locks, same depth, 28 times the cost. the difference is that the dense version has thousands of ordered pairs and the sparse one has a few hundred, and for the nontrivial hierarchies the compiler pays on the order of 14 microseconds for each pair it has to prove on my machine, roughly the same whether those pairs come from a deep chain or a shallow dense set of tiers. (the very sparse cases run higher per pair, nearer 20, because fixed overhead dominates when there are only a couple hundred pairs to amortise it over.) depth was never the thing. it was a proxy that happened to track the real thing until i forced it not to.

so the honest version is not that shallow hierarchies are safe. it is that a realistic lock hierarchy is cheap because it is sparse, not because it is shallow. declare a few orderings between a few groups and you pay almost nothing. declare that everything is ordered against everything and you pay the full square, even if it is only two layers deep. the model is simple enough to estimate your own case on a napkin: count the ordered pairs and multiply by a per pair cost you measure once on your own hardware, which is the kind of result i trust precisely because it is boring. and it is worth saying plainly that the drama in this section, the wall and the steep climb, lives in the corner almost nobody builds in, very deep or very densely ordered hierarchies. the case you will actually hit is a few hundred pairs and a few milliseconds. the wall is the better story and the sparse case is the true headline.

what it costs at runtime

here the surprise runs the other way. on the hot path the type level version is not just the same speed, it is the same instructions. the lock, increment, and release code the compiler emits for it is identical to the plain version for the first 157 instructions, the entire acquire and release path. the two functions diverge only in the cold panic unwind cleanup that runs when a mutex is poisoned, and even there it is the same instructions in a different order, a layout difference that depends on build settings. the token is zero sized and the bounds are gone before code generation, so there is nothing of them left to run. the proof apparatus is not made cheap. it is made absent. that is the half everyone expects, and it is the weakest thing in this section only because it is the one nobody argues with.

the interesting runtime cost is one rung down, at review, the version that catches the deadlock at runtime with a detector. i used parking_lot's deadlock detection and benchmarked the same three lock path four ways: the standard library lock, and parking_lot with detection off and on, so the library and the feature could be told apart.

lock acquisition cost · 3-lock path · ns / acquire

backend	std mutex	parking_lot, detection off	parking_lot, detection on
ns per acquisition	29.4	25.8	36.2

read the last two columns. the bookkeeping that lets the detector find a cycle costs about 10 nanoseconds on every lock you take, 40 percent, whether or not a deadlock ever happens. you pay it on every acquisition for the entire life of the program to be told, occasionally, in testing, that you wrote a cycle. the rung that catches it at runtime is the only rung whose cost you keep paying after the bug is gone.

what it costs to write

the type level version is more to type, and the cost scales: roughly three lines of ordering declaration per lock you put in the hierarchy. for a small program it is a couple dozen lines of ceremony you would not otherwise write. the honest framing is not that it doubles your code, because the actual locking stays about the same size. it is that you pay a per lock tax to declare the hierarchy, whether or not you ever lock those two together. the boilerplate scales with how many locks you name, not with how much you use them.

the missing rung

i did not build the gate, and not out of laziness. there is no honest version of it for this rule. a CI check for lock order either runs the runtime detector, which is just the review rung again with the same hole, that it only fires if a test happens to hit the cycle, or it performs the type level analysis, which is just the compile rung done outside the compiler for no reason. a hand rolled lint that pattern matches lock() calls would be a green checkmark that does not mean what the dashboard says it means, which is worse than no checkmark at all. so the rung is empty, and the emptiness is a finding. not every rule has a clean answer at every rung. sometimes the ladder skips a step, and pretending otherwise is how you ship a wall with a hidden door.

03 · strength is not the only axisa good binding is hard to break by accident and easy to undo on purpose

the ladder ranks rungs by how hard the rule is to break. that is necessary and it is not sufficient, because there is a second axis people forget, and it is the one Odysseus actually got right.

he tied himself to the mast so he could not steer toward the sirens. but he also told the crew to untie him once the ship was past. the binding had an exit. it was hard to break in the moment and easy to undo when the danger was gone. that is the whole art. a good constraint is expensive to violate on impulse and cheap to reverse on purpose.

most gates fail this. they are equally annoying to bypass whether you are cutting a corner at midnight or making a deliberate, reviewed exception in daylight, so people learn one reflex, --no-verify, and apply it to both. a gate that punishes the legitimate override as hard as the illegitimate one trains everyone to route around it, and a rung you have trained everyone to skip has quietly fallen back to convention. the strength was real and the reversibility was wrong and the wrongness ate the strength.

there is a deeper version of this, the question under every rung: who binds the binder. a gate is a file you can edit. a convention is a habit you can drop. the reason the compile rung is strong is not that the wall is taller, it is that the wall is much harder to move. you can drop the crate or raise a recursion limit, sure, but you cannot talk the type checker into accepting an unprovable bound, and rewriting a dependency's type discipline is a different kind of effort from deleting a line in a CI config. the enforcement does not get tired at 2am and does not accept just this once. the strongest pacts are the ones whose enforcement you have handed to a party that cannot be talked out of it. that is the same reason a constitution is harder to change than a law, and it is on purpose for the same reason.

04 · the cost that isn't in any benchmarka binding holds the wrong thing exactly as tightly as the right thing

everything in section 02 was a cost you can put a number on. here is the one you cannot, and i think it is the one that matters most, because it is the cost of being wrong.

a constraint is rigid. that is the point of it. and rigidity is a catastrophe when you have bound the wrong thing, because the type system will defend your mistake with exactly the conviction it would defend a truth. the compile rung does not check that your rule is correct. it checks that your code is consistent with the rule you wrote. those are very different promises and it is easy to hear the first when only the second was made.

the deadlock version makes this concrete in three ways, and i built each one so it could not be waved away.

it rejects correct programs. the textbook fix for transferring between two accounts is to lock the lower id first, a real and sound ordering, decided at runtime from the data. the type system cannot say that. it knows about orderings between named, fixed levels, and whichever of these two has the smaller id is not a name. so the safe, standard, correct program does not compile, and you are pushed toward a worse design to satisfy a rule that was supposed to help. that is the price of a binding that only speaks in types when the truth is in the data.

it accepts broken ones. declare that lock A comes before B and also that B comes before A, a cycle, the exact thing that deadlocks, and the type system believes you. it never checks your declared order for consistency, only that your code matches it, and a contradictory order makes everything match. you can write the deadlock back in at the level of the rule and the rule will wave it through, because the rule trusts its own premises and you wrote the premises.

and it guards the wrong half. it constrains the order you acquire locks and says nothing about the order you release them. release out of order, which can matter, and it compiles without complaint. the binding is real and it is pointed at acquisition, and anything outside that frame is unprotected in a way that looks, from the green build, exactly like being protected.

“the type system did not check that the rule was right. it checked that the code was consistent with the rule. nobody told the green checkmark which of those it meant.”

— consistency is not correctness

none of this makes the rung bad. it makes it a rung: a specific guarantee with specific edges, not a magic word. the danger is not using it. the danger is climbing to it and then believing you have bought more than you have, because the higher you climb the more total the binding feels, and a total binding around the wrong thing is worse than the comment you replaced. premature binding is the twin of premature abstraction, and it has the same cause, committing the structure before you are sure of the truth.

05 · the shape movessame rung, different rule, different bill

i built one more rule at the compile rung, to make a point that one rule cannot make alone. the cost of climbing is not a property of the ladder. it is a property of the rule.

the second rule is a trading system one: an order may not be submitted until it has passed a risk check. at the compile rung it is a small state machine. an unchecked order has no submit method. the only way to get a checked order is through the risk check, which consumes the unchecked one. only a checked order can be submitted, and its fields are private so you cannot forge one. submit without check does not fail a test. it does not exist as a thing you can write.

and its bill is nothing like deadlock's. there is no compile time curve, because there is no hierarchy to grow. it is two states, fixed, forever, no matter how big the system around it gets. there is no wall at 128. the entire cost of climbing to the compile rung for this rule is about twice the boilerplate of the comment version, and then nothing. deadlock's compile rung has a screaming quadratic and a hard ceiling. this one is nearly free. same rung, different rule, different shape.

so there is no single answer to what it costs to make a rule impossible. it depends on the rule, specifically on whether the rule has structure that grows. a fixed relationship is cheap to enforce in types forever. a relationship that scales with your data can get expensive enough to hit a wall. two rules is not a taxonomy and i am not going to pretend it is one. it is enough to make the one point i wanted. the cost moves, so you have to weigh it per rule, which means you have to be able to measure it, which is the whole reason this post has numbers in it.

06 · closethe whole thing in one move

the argument fits in one move, repeated.

a rule is only as strong as how hard it is to break, so move it down the ladder until breaking it is hard, or impossible, or cannot be expressed at all. but every step down costs something, build time, runtime, lines, lost flexibility, and the standing risk that you have bound the wrong thing tightly, and the cost is different for every rule and almost never measured before it is paid. so measure it. climb deliberately, knowing the bill, to the rung the rule actually warrants and no higher. the comment that asks nicely and the type that cannot be violated are different prices for different amounts of certainty, and the skill is not always reaching for the strongest. the skill is knowing what you are buying and what it costs.

a lot of engineering is unenforced rules we keep meaning to enforce. the better version is not more discipline. it is fewer rules, each one pushed down to a rung where it stops needing anyone to remember it, each climb paid for on purpose. willpower does not scale and structure does, and at least now there is a price tag on it.

Credits & sources

the seed for this was Tris's No Boilerplate video on standardizing teams on plain text. one frame of it, the Ulysses pact, is what i got curious about generalizing. the ladder itself is the occupational safety hierarchy of controls, and a 2019 dev.to post by dealeron already mapped that framework to software in prose, which is worth reading and which this post tries to put numbers under.

the type level lock ordering is the lock_ordering crate by akonradi. the underlying idea, giving locks a typed partial order the compiler checks, is older than any of it: Boyapati, Lee, and Rinard, Ownership Types for Safe Programming: Preventing Data Races and Deadlocks, OOPSLA 2002. all the measurements, the harness, the topology sweep, and the decision records are in the repo: github.com/yussypu/binding-ladder.