Word doc to HTML Conversion: May 06 2011


Like HTML/CSS, a Word document has two parts: The basic text content and the various formatting and "styles" applied to it! The most basic "style" is simply the font type and font size. Beyond that are margins, indents and other features that are similar to the on-screen HTML/CSS world.

Easy option: Save As webpage, filtered.htm

In Word 2010, the extra MS Office specific info can bloat the file tremendously whereas the "filtered" version in this example is a "mere" 16K!

Word Style doc conversion example

Note: each document then creates its own doc related folder for images!

Other Option: Copy and Paste raw text into an HTML editor then insert your own tags for headings

Not as difficult or as time consuming as it may seem, at least without graphics involved.

MS Word has Styles as well:

MS Word lets you define a style

Style in MS Word: Changing the look of Characters and Paragraphs

In MS Word, a style is a collection of character formatting and paragraph formatting that can be saved. These style parameters are then saved in templates.

MS Word's "default" template is called the "Normal" template.

Unless you specify otherwise, all MS documents you create will be based on the Normal template. Note that every Word document has an associated "Style", even if it just "Normal". Styles and what is "Normal" changes:

Fonts and Fashion

It's important to note that "Style" is like "fashion", it changes, and what is "normal" one time is not "normal" a few years later, or in a different application! Example: What is "normal" in Hollywood" screenplays is not exactly "normal" in a newspaper like The Times. Even Microsft has varying definitions of what is Normal! Example: Excel 2003: Default font: Arial 10 Example: Word 2007 ( Vista ) default font is calibri size 11! Example: Word 2002 & Word 2003: 12 point regular Times New Roman " Left-aligned " Single spaced " No extra space above and none below. " This text is in Times New Roman 12 pt regular " This text is in Courier New 10pt regular (BIG spacing between words!) " This text is in Arial 10pt regular

Courier and Courier New: Dates back to Happy Days

Courier is a monospaced "slab serif" typeface, developed in 1955. Monospaced means that the distance between all characters is a standard size. That is important for "lining up" columns of numbers and in "tables". Courier New was developed for the IBM Selectric typewriter series. 12pt Courier has become an "industry standard" for all screenplays and was the standard for the US State Department up to Jan 2004, when font fashions changed and then 14 pt Times New Roman was chosen.

Arial and Helvetica Font:

Helvetica was the font that "ruled" in the "professional" world for the last half of the 20th century. Arial, from Microsoft, looks very similar to the "untrained non-professional" eye and because of its MS pervasiveness, has largely replaced Helvetica as the new "standard". Thus, in terms of setting an MS Word "style", Arial 10 seems to be another "common" or "classically fashionable/normal" font.

Times New Roman Font: Think Newspaper

Times New Roman is a serif typeface that was commissioned by the British newspaper "The Times" in 1931! Even now in the Digital Font era, Microsoft has distributed Times New Roman with every version of MS Windows since version 3.1. It's the default font in many applications such as web browsers and word processors. As of Word 2007, MS changed their "font fashion" mind and made Calibri, a sans-serif font as the default font in MS Office 2007. This only the tip of the iceberg! The fact is there are so many different font styles to choose from! Some even look like hand script ( Amazone BT ). Like for wedding Invitations.