Think of it this way. I'm an apply-templates instruction, and I select a set of nodes. Think of the nodes like tennis balls. I throw the selected balls across a high wall. On the other side of the wall are a set of template rules, waiting for balls to be thrown across. Each template rule is designed to handle different shapes or colours of tennis ball. When a ball comes across, every template rule looks at it and decides "is this one for me?". If it is, then it tries to catch it. If several rules try to catch the same ball, then the highest-priority one wins.
So when you say there is no direct correspondence between apply-templates and the template rule, you are spot on. That's the whole point, it's what makes the idea so powerful. It creates a very loose coupling between the code for processing one element and the code for processing its children, which means that the code is very resilient to changes in the document structure.
In your example there's no explicit rule that catches tennis balls labelled "publisher". But there's an implicit rule, which is to apply-templates to each of its children. If one of the children is a book element, that element will in turn be thrown over the wall, to be caught this time by the publisher/book template rule.
Michael Kay
http://www.saxonica.com/
Author, XSLT 2.0 and XPath 2.0 Programmer's Reference