Printing a Book with CSS: Boom!

By Håkon Wium Lie and Bert Bos

book cover Lie & Bos: “Cascading Style Sheets — designing for the web,” Addison-Wesley, 3rd edition, 2005

HTML and CSS, two of our favorite acronyms, are normally associated with web pages. And deservedly so: HTML is the dominant document format on the web and CSS is used to style most HTML pages. But, are they suitable for off-screen use? Can CSS be used for serious print jobs? To find out, we decided to take the ultimate challenge: to produce the next edition of our book directly from HTML and CSS files. In this article we sketch our solution and quote from the style sheet used. Towards the end we describe the book microformat (boom!) we developed in the process.

The studious reader may want to fetch a sample HTML file, sample style sheet, as well as the PDF file generated by Prince. The PDF file is similar to the one we sent to the printer. We encourage you to base your own book on the sample file and tell us how it goes.

Print vs. paper

A printed book has many features not seen on screens. There are page numbers, headers and footers, a table of contents, and an index. The content must be split into pages of fixed size, and cross-references within the book (for example, “see definition on page 35”) must be resolved. Finally, the content must be converted to PDF, which is sent to the printer.

Web browsers are good at dealing with pixels on a screen, but not very good at printing. To print a full book we turned to Prince, a dedicated batch processor which converts XML to PDF by way of CSS. Prince supports the print-specific features of CSS2, as well as functionality proposed for CSS3.

CSS2

CSS2 has a notion of paged media (think sheets of paper), as opposed to continuous media (think scrollbars). Style sheets can set the size of pages and their margins. Page templates can be given names and elements can state which named page they want to be printed on. Also, elements in the source document can force page breaks. Here is a snippet from the style sheet we used:

@page {
  size: 7in 9.25in;
  margin: 27mm 16mm 27mm 16mm;
}

Having a US-based publisher, we were given the page size in inches. We, being Europeans, continued with metric measurements. CSS accepts both.

After setting the up the page size and margin, we needed to make sure there are page breaks in the right places. The following excerpt shows how page breaks are generated after chapters and appendices:

div.chapter, div.appendix {
  page-break-after: always;
}

Also, we used CSS2 to declare named pages:

div.titlepage {
  page: blank;
}

That is, the title page is to be printed on pages with the name “blank.” CSS2 described the concept of named pages, but their value only becomes apparent when headers and footers are available. For this we have to turn to CSS3.

CSS3

The CSS Working Group has published a CSS3 Module for Paged Media. It describes additional functionality required for printing. We will start by looking at running headers and footers.

Headers and footers

Here is an example:

@page :left {
  @top-left {
    content: "Cascading Style Sheets";
  }
}

The example above puts a string (“Cascading Style Sheets”) in the top left corner of all left pages of the book. All pages? Not quite. A subsequent rule removes the header from pages named “blank”:

@page blank :left {
  @top-left {
    content: normal;
  }
}

Recall from earlier that all <div class="titlepage"> elements are to be printed on “blank” pages. Given the style sheet above, “blank” left pages will be printed without a header.

Stealing strings

Our book consists of many chapters and the title of each chapter is displayed in a header on right pagers. To achieve this, the title string must be copied from an element with the string-set property:

h1 {
  string-set: header content();
}

Just like there were named pages in the previous section, CSS3 also has named strings. In the example above, the string named “header” is assigned the chapter headings. Each time a chapter heading is encountered, the chapter title is copied into this string. The string can be referred to in other parts of the style sheet:

@page :right {
  @top-right {
    content: string(header, first); 
  }
}

In the example above, the right header is set to be the value of the “header” string. The keyword “first” indcates that we want the first value of “header” in case there are several assignments on that page.

Page numbers

Like headers, page numbers are a navigational aid in books. Setting the page numbers is easy:

@page :left {
  @bottom-left {
    content: counter(page);
  }
}

One requirement from our publisher was to use roman numerals in the first part of the book. This part is referred to as “front-matter”. Here is the style sheet for roman page numbers in the front-matter:

@page front-matter :left {
  @bottom-left {
    content: counter(page, lower-roman);
  }

The numbering systems are the same as for the list-style-type property and lower-roman is one of them. The counter called “page” is predefined in CSS.

Cross-references

The web is a huge collection of cross-references: all hyperlinks are cross-references. Cross-references in books are similar in nature, but presented differently. Instead of the blue underlined text we know from our screens, books contain text such as “see the figure on page 35.” The number “35” is unknown to the authors of the book — one can only find the page number by formatting the content. Therefore, the number “35” cannot be typed into the manuscript but must be inserted by the formatter. To do so, the formatter needs a pointer to the figure. In HTML, this is done with an A element:

<a class="pageref" href="#figure">see the figure</a>

The corresponding style sheet looks like this:

a.pageref::after { 
  content: " on page " target-counter(attr(href), page) 
}

The example above needs some explanation. The selector refers to a generated pseudo-element (::after) which comes after the content of the A element. The first part of that pseudo-element is the string “ on page ”. After that comes the most interesting part, the target-counter function which fetches the value of the “page” counter at the location pointed to by the “href” attribute. The result is a that the string “ on page ” is concatenated with the number “35”.

Table of contents

Similar magic is invoked to generate a table of contents (TOC). Given a bunch of hyperlinks pointing to chapters, sections and other TOC entries, the style sheet describes how to present the hyperlinks as TOC. Here is a sample TOC entry:

<ul class="toc">
  <li><a href="#intro">Introduction</a>
  <li><a href="#html">HTML</a>
</ul>

The style sheet for the TOC uses the same target-counter to fetch a page number:

ul.toc a::after {
  content: leader('.') target-counter(attr(href), page);
}

Also, a new function, 'leader', is used to generate “leaders.” In typography, a “leader” is a line that guides the eye from the textual entry to the page number. In our example, a set of dots is added between the text and the page number:

The Web and HTML................1
CSS.............................3

Note that the this functionality is experimental; no Working Draft for leaders has been published yet.

The book microformat — boom!

As you probably have guessed by now, we succeeded in producing our book using HTML and CSS. In doing so, we also developed a set of conventions for marking up a book in HTML. HTML has the wonderful “class” attribute which lets anyone extend the semantics of HTML documents while building on HTML's universally know semantics. So, in our book, we used a rich set of HTML elements and added a bunch of class names.

Since then, the concept of “microformats” has entered the web and we are happy to discover that we actually developed (at least the beginnings of) a microformat for books. We think other authors will be able use the boom! microformat and improve upon it in the process.

Sections of a book

The chapters in the first part of the book, such as preface, foreword, and table of contents, are enclosed in a DIV with a corresponding class name. The chapters in the main body are DIVs with a class of “chapter” and the appendices are DIVs with class “appendix”. In the style sheet, the class names are primarily used to select the right named page with the correct headers and footers.

Although HTML has six levels of headings (H1, H2, etc.) to distinguish chapter headings, section headings, and subsection heading, it is convenient to enclose sections in an element, if only to be able to style the end of a section. We used a DIV with class “section”.

Tables and figures

HTML doesn't have a dedicated element for figures with captions, but it is easy to create one by specializing a DIV:

<div class="figure">
  <p class="caption">...</p>
  <p class="art"><img src="..." alt="..."></p>
</div>

The TABLE element has a CAPTION element, but support is spotty. We therefore used a similar strategy for marking up tables:

<div class="table">
  <p class="caption">...</p>
  <table class="lined">
    ...
  </table>
</div>

We used a variety of figure styles (normal, wide, on the side, etc.) and table styles (normal, wide, lined, top-floating, etc.) in our book. An element can be given several class names, so that (say) a table can be both “lined” and “wide”. We have cut down on the number of alternatives in the sample document for the sake of simplicity.

Side notes and side bars

A DIV with class “sidenote” is used for side remarks, related to the (following) text in the main body but not necessarily shown in-line. A typical way to show them is to put them in the margin.

A “sidebar” is longer than a “sidenote”. The latter is typically only one paragraph, maybe two; the former is several paragraphs or includes lists or other material. In the sample document there is one sidebar that floats to the top, uses the full width of the page, and is given a gray background.

Summing up

The Prince formatter has opened up the processing pipeline from HTML and CSS to PDF. It is now possible, even feasible, to use HTML as the document format for books. This makes it easier to cross-publish content on the web and in print. Authors who attempt to use the techniques described in this article will face some technical issues along the way. For example, we have not discussed how to generate the TOC structures and how to display wide tables. We have also left some room for improvement in the boom! microformat. However, compared to the headaches of actually writing a book, formatting is now a joy!

About the authors

Håkon Wium Lie proposed the concept of CSS in 1994 while working at CERN, the birthplace of the web. He is now the CTO of Opera Software, making sure Opera is faster, better, and more standards-compliant than the one you know. He is also a director of YesLogic, tbe company behind the Prince formatter which was used to produce the book.

Bert Bos proposed and implemented his own style sheet language before joining forces with Håkon at W3C in 1995. He was the co-author of the original CSS specification and launched W3C's internationalization activities. He is currently the Style Sheets activity lead at W3C.