Problems with Styles in Word Processing:
A Weak Foundation for Electronic Publishing with SGML

Pål Sørgaard
Department of Informatics, University of Oslo, Norway, and
Department of Informatics, Göteborg University, Sweden
pal.sorgaard@ifi.uio.no and
Tone Irene Sandahl
Department of Informatics and
University Center of Information Technology Services
University of Oslo, Norway
tone.sandahl@ifi.uio.no

Copyright 1996 IEEE.
Published in the Proceedings of the 30th HICSS, January 7-10, 1997, Wailea, Hawai`i

Abstract

Most word processors have facilities for styles and document templates. These mechanisms may help maintain typographic consistency while editing, and they may support document exchange and conversion. Document conversion is of special importance to electronic publishing. Our own experience suggests that there are problems with the use of styles and templates. A sample of documents drawn from three case organisations confirms this suggestion, and indicates that the use of these mechanisms is infrequent and riddled with difficulties. An initial classification suggests that most of the problems with the use of paragraph styles can be described as ignoring or overlooking these mechanisms. These problems have consequences, and one should be careful in assuming that material written with current word processors easily can be converted to formats for electronic publishing. Careful organisational implementation of word processing may be critical. Our interpretation of the problems encountered also indicates that they may be related to the paper metaphor communicated by the principle of WYSIWYG. We claim that the paper metaphor does not communicate any understanding of the structure beneath the surface of a digital document.

Introduction

The recent massive diffusion of the World Wide Web [3] has turned electronic publishing into a practical reality for many providers and consumers of documents. As a result, the aim of electronic word processing has expanded. Where traditionally the emphasis has very much been on the production of printed documents (see, for example, Preece et al. [20, p. 18,] and Barker et al. [2]), today the purpose of word processing for many users is extended to the production of digital documents.

There are several important opportunities with digital documents in areas such as search, retrieval and reuse of text. Several organisations plan to use these possibilities. For an example, see ACM's Electronic Publishing Plan [8].

Several of the opportunities of digital documents can only be fulfilled if the documents are available in some standardised format, as specified by, for example, SGML (Standard Generalized Markup Language, see section 2) [10]. In fact, the World Wide Web relies on this standard, as the Web's language for document exchange, HTML (Hyper Text Markup Language), is defined as a so-called SGML DTD (Document Type Definition).

Hence, to provide a Web-service, and to benefit from digital documents, it is highly relevant to look at how to make text available in SGML (or some other standard). Eric van Herwijnen identifies four approaches for capturing text in SGML [14, ch. 5,]:

  1. Typing SGML as plain text,
  2. OCR (optical character recognition) reading of existing documents combined with an intelligent markup system,
  3. converting text written in conventional word-processors, and
  4. use of specialised SGML editors.

Today, the third approach could be extended to include word processors with built-in converters to HTML/SGML. van Herwijnen argues that the success of the third approach will depend on the use of so-called paragraph styles (see section 2). People working in Web-services often spend a lot of time cleaning up errors in automatically converted HTML-files. Moreover, conversion to HTML can be much simpler than conversion to ``richer'' Document Type Definitions. Such rich DTDs are needed to get the full benefit of digital documents, i.e. search and retrieval based on the logical structure of the documents (see figure 1).

Paragraph styles provide a mechanism which helps such conversions, and paragraph styles can conceivably be used for conversions to very rich DTDs. Little is known, however, about the actual use of paragraph styles in word processing. In this paper we report from the first of a series of studies of use of word processing, electronic publishing, and organisation of Web-services. Here we study the use of paragraph styles, in order to discuss whether current word processors are suitable applications for the capture of text in SGML. Our findings are not restricted to SGML, but apply to several contexts where text is to be converted and made available in more than one medium.

In the next section paragraph styles, SGML and related issues will be discussed. In section 3 our research method is described. In section 4 we present a classification of the problems we have observed with the use of paragraph styles. Our findings are discussed in section 5, and finally some conclusions are drawn in section 6.

Styles, templates, markup and SGML

Styles  or style rules are widespread mechanisms in word processing. Styles have been available in commercial WYSIWYG (what you see is what you get) word processors since the end of the 80s, and were available in several research prototypes a decade earlier [16]. Styles are named formats, enabling consistent formatting of all parts (typically paragraphs) of a document referring to the same style.

Styles can be applied to several property domains of a document, i.e.\ characters, paragraphs, tables, headings, etc. [16]. Typical paragraph styles are `Heading1,' `Body,' `Author,' etc. A word processor will typically have a dialogue box for the definition of properties of a style.

It is common to define style sheets or document templates which contain a coherent set of styles to be used for the composition of specific document types like letter, article, memo and book. Often efforts are made to make a set of company specific templates. Making good document templates requires good knowledge of the word processor as well as a fair sense of typographic quality.

Johnson and Beach distinguish between static functionality and dynamic functionality of a word processor [16]: Styles do not add to the power of a word processor in terms of what kind of paper documents it can be used to produce (the static functionality is the same). The dynamic functionality, however, is increased through the use of styles, typically in terms of making consistent updates, document-wide format changes, typographic consistency, and reuse of text more easy.

In classical, markup-based, word processors (like LaTeX [18]) there is a distinction between visual markup and logical markup [9]. Visual markup indicates how the text should be set (size, width, etc.), while logical markup classifies or ``tags'' the text so that it can be appropriately set using the definitions of the word processor (or of the user, if the definitions have been modified). LaTeX is in fact a macro package implementing logical markup on top of TeX's [17] commands for visual markup.

SGML  (ISO-8879) is a standard for how to describe and provide logical structure in documents. Originally SGML was introduced for technical documentation, but has later been used in several other areas. Logical structure is defined in Document Type Definitions (DTDs), for an example see figure 1. DTDs essentially define context-free grammars [1] for the markup of one type of documents. The markup itself (see an example in figure 2) follows specific conventions for syntax: <EMNE>...</EMNE> is a typical example of an element. Moreover there is notation for attributes (i.e. DEL="in260"), and for entities (e.g., &aring; used to denote `å'). A DTD may or may not be a part of the document itself. There are also several standard DTDs, some of which defined by ISO 12083. A wide range of SGML-tools like parsers, editors, converters and browsers is available.

 

  figure58


Figure 1: Part of a rich DTD for describing courses

SGML is built on the idea of a strong separation between markup and lay-out (formatting can be specified, but this is not encouraged). SGML-files are assumed to be processed by suitable output programs, which will process the SGML-tags and make lay-out according to instructions from some other source. Hendry argues that it may be very hard to write a text independently of how it will be processed [13], nevertheless the potential is attractive, and there are working examples. At the University of Oslo, the complete catalogue of all teaching is written in SGML, with a DTD defined for the purpose [15]. The printed catalogue is produced through a conversion from SGML to TeX, while an online version is available in WWWgif. Figures 1 and 2 originate from this example. Course information is typed in and modified using a syntax directed editor: the editor only allows input as defined by the DTD, tags need not be typed, but can nevertheless be displayed. The system takes input and displays output in ISO-Latin-1, reducing the need for SGML-entities like &aring;.

Use of paragraph styles have several potential benefits, [2, 5, 16, 19]. For the purposes of this paper, we focus on benefits related to electronic publishing and SGML, three of which are identified below, named and labelled with letters to ease reference to them later in the paper.

M: meaning

Paragraph styles provide a mechanism for classifying text. Different kinds of paragraphs can be explicitly named, using paragraph names like `Example,' `Definition,' etc. This is useful in large, technical documents like software manuals. This may be used to attach meaning to the text, supporting the authors in discussions about the document, its content and its lay-out.

C: conversion

In several settings there is a need to convert the files for a document. Conversions may be needed to transfer a document between different word processors or to formats like SGML. Today, authors often need a lot of assistance to get documents on the World Wide Web. Sometimes separate organisational units are set up for this purpose. Electronic publishing efforts are simplified if the authors use a format which is easily converted to HTML.

S: search

Indexing, search, and retrieval are improved when text is equipped with logical markup. Although free-text search also has its advantages, it is certainly desirable to be able to distinguish between occurrences of a name as an author name and as any other textual occurrence of the same name. ACM's Electronic Publishing Plan [8] explicitly calls for digital documents which are logically structured for search and retrieval.

Research method

We  used a pragmatic approach for analysing use of styles in word processing. We used our established contacts with three different organisations to get access to documents and to conduct interviews with users.

The idea behind our approach was to analyse use of paragraph styles in a set of ``normal'' documents. Since we found no other studies or attempts to systematise the use of paragraph styles, we developed our own classification of the problems we found.

Initial Classification of Problems

Before we started our empirical work we identified four categories of problems associated with the use of paragraph styles: overlooking styles, weak functionality, semantic problems and structural problems. This classification was based on personal experiences with word processing and on experiences from development projects and consulting. Our understanding and naming of these problems changed during the study. Still, the initial categories survived, and have been used to structure the findings in this paper (subsection 4.1-4.4).

Case Organisations

We collected documents from three different organisations. Here we call them AA, BB, and CC, respectively.

AA

has 100 employees, most of whom are engineers, mathematicians or computer scientists doing contract research. AA is highly computerised, each employee has at least one computer. Several different kinds of computers as well as different word processing packages are being used. There is little systematic education in word processing. Document templates are available, their use being mildly encouraged, but not followed up. Only limited resources have been put into developing templates, although efforts have increased during the last year. Facilities for converting documents are not always available.

BB

has 4.500 employees, most of whom use Microsoft Word or Word Perfect. Within the administration, Word is the dominating word processor. Every year BB offers courses in word processing for both Microsoft Word and WordPerfect. In 1995 about 500 persons participated in internal word processing courses. Availability and enforcement of document templates vary heavily. Within BB we have had contacts with the administrative staff at different levels. Some of these have attended courses in word processing, others have not. For the administration, a document template for letters is available from servers in the network.

CC

is a major public bureaucracy, consisting of several, large, and very independent units. Most employees use PCs, using AmiPro, Microsoft Word and WordPerfect for word processing. From 1993 CC introduced email for most employees, and in 1995 CC launched a common, externally oriented WWW-service. This service is considered politically important. The WWW-service is operated by a newly established unit (CCW) within the department for common services. CCW receives documents for publishing via electronic mail, on floppy disks, and sometimes on paper. Documents are normally received from the authors, but for large publications the text is often obtained from a print shop.

   figure84
Figure 2: SGML code describing a course in the University catalogue

Use and availability of document templates vary heavily among the units in CC. CCW has no authority over authors, and makes documents available on an ``as is'' basis, taking whatever steps which may be needed (in some cases retyping the text). Thus the burden of authors not using appropriate templates falls on CCW. As often as possible, the personnel in CCW perform automatic conversions. Within the half-year period we have followed CCW there has been an increasing emphasis on encouraging the use of paragraph styles among authors. New tools for document analysis and conversion are brought into use as they become available.

Selection Procedure

We had different strategies for the selection of documents in the three organisations, but the general emphasis was on selecting documents where use of paragraph styles would make sense.

In AA we searched for a project where several authors were involved. We obtained access to a more than 300 page software manual (referred to as AA-01) written together with people from an external customer. The documented software was of a very technical nature and was written by domain specialists, not by computer scientists. To avoid bias we considered avoidance of computer scientists important, and this made it hard to find many more documents.

In BB we randomly selected 18 persons in the administration, and asked them to send us the last three documents they had written. Not all responded. We received 18 documents from nine different authors.

In CC we asked CCW to file all documents they received for publishing on the Web during a week. The result was a floppy disk containing 27 documents.

Document collection was supplemented by interviews in AA and BB. Some interviews were conducted using the telephone, and others were face-to-face meetings. We did not draw up a structured questionnaire, but asked questions related to the concrete use of styles in the documents we received.

Given the difficulties in obtaining a representative sample of documents, we restrict our interpretations to state the existence of problems. We cannot report reliable figures about ``problem frequencies,'' but we believe the classification reported here can be used to compare the use of word processing in two or more different organisations.

Document Analysis

Document analysis was primarily performed manually. We addressed only the use of paragraph styles. This delimitation means that there are other problems in the way word processors are used which will not be covered. It is only when paragraph styles have a role that we identify the problem.

Examples of problems falling outside our categorisation are the practices of adding empty paragraphs to align a new heading on top of a page and of ``simulating'' an indented list by manually breaking the text in single lines and indenting these lines with tabs or blanks (this is also reported by Trigg and Bødker [21]). Of course, such practices create problems when the document is being changed or converted. To us, however, this is not related to paragraph styles. It is more an issue of other mechanisms in the word processors not being used as intended.

Manual document analysis was performed by opening documents with the word processor. When going through the documents, we looked for use and non-use of styles, redefinitions of styles, manual formatting, and all aspects of paragraph styles we could think of. For some of the documents we conducted follow-up interviews with authors or other people involved. The findings were classified in discussions in our research group. The problems presented below were developed bottom up from our findings, but we choose to present them within the four predefined categories, as there were not really any new categories of problems encountered.

In an attempt to do automatic document analysis we developed a system based on DynaTag (from Electronic Book Technologies) and Balise (from Advanced Information Systems/Berger-Levrault). With this system we could automatically compare the use of paragraph styles (and indeed use of character formats) with the structure defined by a ``normal'' Document Type Definition provided by us. Automatic document analysis turned out to be of little use. It could only be applied to Microsoft Rich Text Format (RTF), and thus hard to apply to all documents. Moreover, the automatic analysis was geared to look for structural problems, a category of problems which was not very common.

A classification of problems

In this  section we present a classification of the problems we found. In most of the categories we have identified several subcategories. There were large frequency differences among the problems. For each problem we also try to identify which of the benefits of styles (identified in section 2) which may not be achieved when the problem is present. Other consequences, like loss of dynamics in editing, are not discussed. The findings are summarised in table 1.

``Overlooking Styles:'' Not Using Style Capabilities

We define ``overlooking styles''  as the manual formatting of text which could have been formatted using paragraph styles. Since the use of paragraph styles was not very extensive, this category is by far the most common category. Analysis of the documents reveals a series of different problems in this category.

I-a: Style exists but is not used.

It is very common to find documents where the document template contains styles which could have done the required formatting, but where these styles are not being used. Instead the format of a paragraph is modified to achieve the same effect. We have seen this in practically all the documents we analysed. In many documents styles are not used at all. It is also quite common with documents that only use the most elementary styles, typically the styles for headings at the higher levels. Some interviewees explained that these styles were useful, as they helped constructing the table of contents. In a large technical document (AA-01), where styles were used heavily, we found spurious examples of ``manually'' created section titles at low levels.

Not using existing styles may lead to problems when the file is to be converted to some other format. Moreover, depending on which styles are not being used, search (for, e.g., `Author') and meaning (of, e.g., `Example') may also suffer.

I-b: New type of paragraph, but no style.

In developing a text, new ``types'' of paragraphs often come into being. In a strategy document we analysed, there were some important paragraphs in each section, and these paragraphs were printed in bold. This consistent formatting was done manually, rather than defining a new paragraph style. Clearly, consistent formatting of a new kind of paragraph is more easily achieved with a specific style.

This problem may have consequences when the file is converted. In addition, the possible benefits of attaching meaning to paragraphs are lost.

I-c: Examples of a style consistently reformatted, while style is unchanged.

In early versions of one of the documents we studied, all examples of one specific heading style were reformatted. This was not done, however, through a modification of the style for that heading, but by manually changing the format for each individual example of the kind of heading. In the final version of the document this had been changed to a corresponding modification of the style.

Although inconvenient, this problem does not affect any of the mentioned benefits of meaning, conversion and search.

I-d: Incidental use of style.

In some of the documents from BB, only one paragraph style was applied, for example the style `Date.' This came as a surprise to us, since we had expected that if a document should have been written in only one paragraph style, it would be `Normal.' It turned out that many documents were written using a letter template, where `Date' was the first style encountered. As a side effect, we saw an example where a quote which clearly was pasted in from another document, was carrying the style `Date'. We have personally experienced this problem in other contexts, where the choice of template, however, was more appropriate.

This problem causes difficulties for any conceivable benefit from the use of paragraph styles. Meaning, conversion and search are all affected. As an example, one needs to be prepared for documents where there are many examples of the `Date' style, and where the contents of such paragraphs do not have any resemblance with a date.

``Weak Functionality:'' Style Use Made Difficult

Style mechanisms often have unsatisfactory designs. This can be observed by analysing documents, and is also brought forwards in interviews with users. It is our impression that these problems make style use difficult or even directly discouraging. What we report here are problems with the use of styles which we have observed and which we believe are related to the design of style mechanisms. To try to catch the full range of problems with styles, a more detailed analysis of different implementations of style mechanisms is needed.

II-a: Context-free mechanisms cause many styles.

In a large technical document (AA-01) we were struck by the number of paragraph styles associated with various kinds and levels of indented lists. The lists occurred at three levels, and in three kinds: bulleted, numbered and equipped with descriptions. To achieve good lay-out, there were separate styles for beginnings and ends of lists. As a result, 24 out of a total of 45 paragraph styles were involved with lists. Even so, not all alternatives were available at level three, and there were no definitions beyond that level.

In the word processors we looked at, it is not possible to define that a style should be indented 1 cm relative to the preceding or surrounding paragraph. There is no notion of context-sensitive style definitions (see Johnson and Beach [16] for an analysis of various design alternatives). Hence the style for an itemised list must have different names for each level of indentation. This problem does not occur in markup based systems like LaTeX or with DTDs like HTML: both feature general definitions of three kinds of lists: unnumbered, numbered and labelled.

This problem clearly makes it difficult to work with styles. There is an unnecessary conflict between simplicity (few styles) and generality (many styles) for those who make document templates. Document conversion is made difficult, although not impossible.

II-b: Copy and paste resulting in enormous style catalogues.

In some word processors, it often happens that when text from one document is pasted into another, the style catalogue of the target document is updated with the styles of the pasted text. We have found documents with style catalogues containing styles which obviously do not belong there (see also problem I-d above). We and our colleagues have experienced shared authoring or editing processes where the number of styles became simply overwhelming. We have experienced as much as 200 styles in a catalogue, many of which were ``synonyms.'' Mixing different language versions of the same word processor tends to generate this problem.

It is clearly hard to work with large style catalogues. Selecting the appropriate style requires scrolling in the style catalogue, and it often happens that one document mixes different styles for the same purpose. It is not surprising that those who have been confronted with such examples find paragraph styles hard to use and understand. Large style catalogues also make conversion more difficult, since large tables of style names need to be maintained.

II-c: Association with line-oriented formatting.

We analysed several press releases from CC. They were all written according to an agreed lay-out, although they were not written using any document template. In the agreed lay-out, several different ``fields'' of information, for example date and organisational unit, are placed on the same line. Since paragraph styles are associated with complete paragraphs, they always result in a line break, making it impossible to combine the agreed lay-outs with paragraph styles carrying meaning.gif

This problem has several consequences. Use of paragraph styles to attach meaning to elements of the document is discouraged. The example with heading lines containing several important pieces of information is typical. Some of the fields found in such headings could also be useful when searching for documents. Depending on how users circumvent the problem, problems may also arise with conversion.

Semantic Problems

We define ``semantic problems'' to be situations where styles with specific meaning are used without regard to this meaning.

III-a: Wrong document template.

In BB we saw several examples of documents written with a template for letters although the documents themselves were not letters. Clearly this resulted in strange use of styles as well (see problems I-d and III-b). An obvious reason for the problem to turn up was the lack of other document templates.

This problem does, like problem I-d, cause difficulties for all mentioned benefits of styles.

III-b: Wrong paragraph style.

When a paragraph style is used only for its lay-out properties, but contrary to the meaning associated with the name of the paragraph style, we would classify it as use of the ``wrong paragraph style''. The `Author' style, for example, should not be used as a convenient way to centre text in a heavy type face. We have encountered this problem personally, but we did not really observe it in the set of documents we analysed. Those examples we found were hard to interpret as using the wrong paragraph style. Instead we classified them as incidental use of styles (I-d), related to use of the wrong document template (III-a).

Again, all mentioned benefits of using paragraph styles are affected.

 

  table125


Table 1: Overview of problems and consequences, M = meaning, C = conversion, S = search

III-c: Logical style not applied.

Some documents have a well developed set of logical paragraph styles, styles which are of specific relevance to the document or the kind of document in question. In document AA-01 there were special styles for `Code,' `Note,' and `Example,' etc. Sometimes, such logical styles are defined, but not used when appropriate. There is clearly a heavy overlap between this problem and problem I-a. When we propose this as a separate problem, we want to highlight the potential for expressing meaning with the paragraph style mechanism.

The main consequence of this problem is the lack of representation of meaning. In addition, search for documents with specific content is made more difficult. As some word processors (e.g., Word and FrameMaker) allow search for examples of specific styles, even search within documents may suffer.

Structural Problems

SGML Document Type Definitions  normally impose restrictions on where different elements may appear in a document. Examples of such restrictions may be that `Author' should come after `Title,' that a heading should not be more than one level below the previous heading, or that all letters should have a `Title.' None of these restrictions apply with the use of styles. Simple-minded translation of style examples to SGML elements may therefore result in text which cannot be parsed according to the DTD. Clever conversion programs may fix several such ``errors,'' but practical experience with automatic conversion shows that some ``errors'' always tend to slip through.

Initially we defined this as structural problems. We thought this was an important category, since it addresses a problem which arises when text is to be converted to some DTD, say HTML. In the set of documents we analysed, we found no clear examples of structural problems. Still, when documents are to serve as input to information systems, the structure of the documents must conform to what we expect.

Summary of Findings

Our main findings are (1) that there are great variations in the use of paragraph styles, (2) that generally paragraph styles are seldom used, and (3) that there are several problems associated with the use of paragraph styles. The problems with the use of styles, and their consequences (affected benefits), are summarised in table 1.

Discussion

In discussing  our findings we will first go back to our question regarding approaches for electronic publishing and capturing input to SGML-based systems. Thereafter we will analyse our cases and findings with respect to organisational implementation of word processing. Finally we will try to establish a connection between the problems we have observed and the dominating paradigm of modern WYSIWYG word processors, which we here will call the paper metaphor.

Word processors as input to SGML

Compared to other ways of obtaining input to SGML-based systems there are obvious advantages with the approach of converting documents written in a word processor to SGML. Other approaches involve disadvantages as [4]:

Our study shows that the apparently attractive alternative of taking SGML input from word processors may not be that attractive either. Text from word processors, as written today, is a very weak starting point for automatic conversion to SGML and thus capture of important data. For some DTDs, like HTML, it will always be possible to make a conversion which looks reasonable (converting paragraphs with big letters to <h5>...</h5>, etc.), but this is not a way to capture the structure of the documents.

The press releases of CC may serve as a good example, since they contain a series of ``fields'' of information which would be relevant to capture for further processing, indexing, search, etc. Examples of such fields are date, organisational unit, topic, and contact person. Today, press releases are made such that they look reasonably similar, but they are not standardised with respect to use of styles. Technically it is of course possible to attack this problem with AI-inspired techniques for heuristic classification (see, i.e., Clancey [7]). The documents could be analysed with respect to words, formatting, etc. In such an approach it would most likely be possible to convert a large proportion of the press releases to a suitable DTD. Still, there would be uncertainty with respect to the correctness of the classification. It is not our purpose in this paper, however, to enter into a debate on the relevance of heuristic classification and other AI-based techniques. To us, it appears as attractive to let those who write a text declare that the title indeed is a title, and we are looking for ways of making this convenient.

Our findings imply that other approaches to capture text in SGML, after all may turn out to be more attractive than the conversion of word processor files. Such an inference cannot, however, be made in general. We have not tried to change practice with respect to use of paragraph styles in our case organisations. The efforts needed in order to change word processing practice in a more ``SGML-friendly'' fashion may vary. We discuss these efforts below. Our conclusion is therefore more modest: one cannot simply assume that comprehensive use of word processing represents a simple way to capture text in SGML.

Organisational implementation

Common to the three case organisations we have worked with is the lack of what we would call organisational implementation of the use of word processing and use of styles.

In AA there have been some efforts, but in the project where the software manual (AA-01) was written, there were no discussions about the specific template used for this document (it was not a template from AA). The person we interviewed had done her best in using styles as she saw they were used in the document, but some features of the template were unknown to her.

In BB only one template, the one for letters, had been developed and made available.

In CC there is no common policy on word processing. It is left to the different units, and the people in the Web-service do what they can to cope with what they get. The Web-service is an add-on to the organisation, and has no authority to instruct the units regarding their word processing practices.

Although some organisations claim to succeed with the implementation of word processing, it is clear from our cases and other examples we know of, that often too little effort is spent on developing templates and on training in how to access and use document templates. As a result, the interpersonal, long-term benefits of using word processing are not realised. Of course, it may always be difficult to realise such benefits, since they are inherently associated with Grudin's dilemma of ``who does the job and who gets the benefit'' [11, 12], but in our cases one has hardly attempted to pave the way for these benefits, much less made an effort to make potential benefits visible. Benefits visible to the users are effective, as demonstrated by the use of styles which support the construction of a table of contents.

To be effective, we believe the implementation of word processing should include the following tasks in order to prepare for electronic publishing and conversion of documents to SGML:

The paper metaphor

The use of paragraph styles does not change the looks of a printed document. It is only a property of the digital, unprinted document residing in the computer's storage.

Currently WYSIWYG -- what you see is what you get -- and direct manipulation are the dominant principles for user interface design [6], especially for off the shelf software like word processing. Unfortunately, it appears that WYSIWYG communicates the idea that word processing is just another way of writing on paper. As an example Preece et al. [20, p. 18,] write:

Alongside developments in interactive graphics, interactive text processing systems were also evolving at a rapid rate. Following in the footsteps of line and display editors was the development of systems that allowed users to create and edit documents that were represented fully on the screen. The underlying philosophy of these systems is captured by the term WYSIWYG, .... In other words, the documents were displayed on the screen exactly as they would look in printed form. This was in stark contrast to earlier document editors, where commands were embedded in the text and it was impossible to see what a document would look like without printing it.

Implicitly or not, Preece et al. place WYSIWYG and ``how a document looks like in printed form'' on an equal footing. A related opinion is expressed by Barker et al. [2, p. 306,]:

Although non-WYSIWYG word processors can produce a handsome final product, you won't see what you'll get until you get it. That's fine for people who like surprises. But if you work in a busy environment, where the deadline is always yesterday, you need an editor that gives you a clear idea of how your document will look before it comes off the printer. You need a product that does word styling as well as word processing.

The main problem with the idea that word processing is for processing printed paper -- the paper metaphor -- is that invisible differences have no importance. This may be appropriate for documents that will only be viewed in one format and one medium. Documents which look exactly the same on paper may, however, have vastly different properties with respect to conversion, reformatting, search and exchange.

Warnock claims that until a few years ago the only use of word processors was to produce printed documents [22]. Nowadays, however, many documents are published on paper as well as electronically (e.g., in WWW), and exchanging documents through email is also quite common. With these trends, the use of word processors has shifted from producing printed documents to producing documents which can be reproduced or communicated in several different ways.gif The paper metaphor does not keep up with these changes in use of word processing.

Users clearly focus on paper. The person working with the large manual (AA-01) used styles a lot, the motivation being a consistent layout, not a digital document. In BB, after a course in the use of styles, a user stated ``I get the same results without [styles],'' even asking back ``Do you see the point in using them?'' Users also refer to the use of simple icons in the toolbar to change the alignment of text. One user referred to the use of shortcuts: she knew all the standard formatting commands by heart. There were also cases where users were unaware of planned reuse of the text in WWW. When perceived as beneficial, i.e.\ for the construction of table of contents, some styles were indeed used.

Based on our findings and interviews we claim that the paper metaphor communicated by WYSIWYG is a part of the problem. Within the paper metaphor the use of styles makes little sense. Many of the problems reported from our empirical material can be explained by the users' focus on printed results. The paper metaphor provides no room for invisible structures in the document, while inviting non-style based editing of paragraph formats. Thus, we feel that the paper metaphor is an overly simple notion of what is going on. In the era of electronic documents people should not be misled to believe that they write on paper.

Conclusions

We have developed a classification of problems with paragraph styles. The classifications appears to cover the problems we have identified, but one category, ``Overlooking styles,'' dominates quantitatively.

In a sample  of documents, paragraph styles appear to be little used. Even in documents where use of paragraph styles would make sense (for example, because of planned publishing in WWW), there is little use. These practices will make it difficult to benefit from the opportunities of digital documents, standard exchange formats, etc. One cannot realistically assume that current practices in use of word processing provide a good basis for electronic publishing.

In our cases, little effort has so far been made to encourage or support the use of paragraph styles. We have provided some recommendations about how this could be done, but further studies are needed to interpret and understand differences in style use in different organisations.

On a general level the ``paper-metaphor'' of current WYSIWYG word processors can be used to explain the findings, but there are also problems connected to the way styles have been implemented in some word processors. Comparative evaluations and studies of use of different word processors under comparable conditions could shed light on these issues.

Acknowledgements

Jonathan Grudin and Jeffrey Johnson provided comments early in the work with this paper. They, Unni Astad, Susanne Bødker, and anonymous referees helped us with constructive comments on previous versions. Kristin Braa and Fredrik Ljungberg have participated in some of the work behind this paper, and they have also commented earlier versions. This work was supported by the Research Council of Norway through its grant to the BEST-programme and the Swedish Transport & Communications Research Board (Kommunikationsforskningsberedningen) through its grant to the ``Internet project.''

References

1
Alfred V. Aho and Jeffrey D. Ullman. The Theory of Parsing, Translation, and Compiling, volume 1. Prentice-Hall, Englewood Cliffs, 1972.

2
D. Barker, David L. Edvards, and Stan Wszola. Writing in style. BYTE, 17(6):306-315, June 1992.

3
Tim Berners-Lee, Robert Cailliau, Ari Luotonen, Henrik Frystyk Nielsen, and Arthur Secret. The World Wide Web. Communications of the ACM, 37(8):76-82, August 1994.

4
Kristin Braa and Tone Sandahl. Standardization and flexibility in the distibution and exchange of documents. In International Working Conference on Integration of Enterprise Information and Processes ``Rethinking Documents'' (IPIC'96), Boston, November 1996.

5
Martin Bryan. Document markup for open information exchange. In IEE Colloquium on ``Adding Value to Documents with Markup Languages'', London, 6 June 1994. The Institution of Electrical Engineers. IEE colloquium digest no. 1994/142.

6
Bill Buxton. HCI and the inadequacies of direct manipulation systems. SIGCHI Bulletin, 25(1):21-22, January 1993.

7
William J. Clancey. Heuristic classification. Artificial Intelligence, 27:289-350, 1985.

8
Peter J. Denning and Bernard Rous. The ACM electronic publishing plan. Communications of the ACM, 38(4):97-103, April 1995.

9
Peter Flynn. The World Wide Web handbook. Thomson Computer Press, London, 1995.

10
Charles F. Goldfarb and Yuri Rubinsky. The SGML handbook. Clarendon Press, 1990.

11
Jonathan Grudin. Why groupware applications fail: Problems in design and evaluation. Office: Technology and People, 4(3):245-264, June 1989.

12
Jonathan Grudin. Groupware and social dynamics: Eight challenges for developers. Communications of the ACM, 37(1):92-105, January 1994.

13
D. G. Hendry. Breakdowns in writing intentions when simultaneously deploying SGML-marked texts in hard copy and electronic copy. Behaviour and Information Technology, 14(2):80-92, March-April 1995.

14
Eric van Herwijnen. Practical SGML. Kluwer, Dordrecht, 1990.

15
Astrid E. Jenssen and Tone Sandahl. Conflicts between the possibilities and the reality in the field of structured electronic documents: Experiences from a large-scale SGML-project. In Bo Dahlbom et al., editors, Proceedings of the 19th Information systems Research seminar In Scandinavia, pages 935-955, Lökeberg (Göteborg), 10-13 August 1996.

16
Jeff Johnson and Richard J. Beach. Styles in document editing systems. Computer, 21(1), January 1988.

17
Donald E. Knuth. The TeXbook. Addison-Wesley, Reading, Mass., 1984.

18
Leslie Lamport. LaTeX: A Document Preparation System. Addison-Wesley, Reading, Mass., 1986.

19
Stan Miastkowsky. Beyond word processing. BYTE, 18(5):85-86, 88, 90, Spring (Special issue) 1993.

20
Jenny Preece et al. Human-Computer Interaction. Addison-Wesley, Wokingham, England, 1994.

21
Randall Trigg and Susanne Bødker. From implementation to design: Tailoring and the emergence of systematization in CSCW. In Richard Furuta and Christine Neuwirth, editors, ACM 1994 Conference on Computer Supported Cooperative Work -- CSCW'94, pages 45-54, Chapel Hill, October 22-26, 1994. ACM Order Number 612940.

22
John E. Warnock. The new age of documents. BYTE, 17(6):257-260, June 1992.

About this document ...

Problems with Styles in Word Processing:
A Weak Foundation for Electronic Publishing with SGML

This document was generated using the LaTeX2HTML translator Version 96.1 (Feb 5, 1996) Copyright © 1993, 1994, 1995, 1996, Nikos Drakos, Computer Based Learning Unit, University of Leeds.

The command line arguments were:
latex2html -split 0 pap.tex.

The translation was initiated by Pål Sørgaard on Fri Sep 27 12:42:24 MET DST 1996

...WWW
See http://tag.uio.no/fkat/katalog.html
...meaning.
Clever use (abuse?) of table mechanisms or columns may be used to circumvent this problem. Alternatively character styles could be used, but in most cases it is natural to use paragraph styles to classify text, like with `Author,' `Abstract,' etc.
...ways.
For a discussion of the difficulties in writing text for multiple kinds of presentation, see Hendry [13].
 


Pål Sørgaard
Fri Sep 27 12:42:24 MET DST 1996