My First DEXembed Word Embedded Index
I have been ground between the gears of Word embedded indexing—forced sorts, special characters, and a Word manuscript already formatted.
My client planned to self-publish directly from this already formatted Word document, so I began the embedded indexing job using the WordEmbed utility by James Lamb, knowing that it wouldn’t mess up the formatting. But then, more than a hundred pages into the job, my client changed his mind from no desire for an e-book to wanting a linked index for an e-book.
The DEXembed utility by The Editorium works well with its sister utility, IndexLinker, to link index page numbers to the pages. WordEmbed creates XE index entries that differ from the format expected by IndexLinker and so IndexLinker won’t work with WordEmbed at this point.
In WordEmbed, clicking in the Word document then hitting a keyboard shortcut places a comment with a unique ID number where an index mark should be placed. The unique ID number on the clipboard is then pasted into the locator field for the index entry in the indexing software.
The DEXembed utility puts bright red paragraph numbers on the copy of the Word manuscript used for indexing. The paragraph numbers are typed into the locator fields.
So I converted my WordEmbed references, which are a combination of the page number and line number, into DEXembed paragraph numbers. It only took a couple of hours.
(Later I discovered that the DEXembed embed process put its tags into the paragraph BEFORE the number I’d entered. I don’t know what that was about but fortunately this was a manuscript with empty paragraph marks. I removed one so that the tags embedded into the paragraphs I’d specified.)
DEXembed is nicely quick to use, with the efficiency of not having to hit that keyboard shortcut in Word for each entry. Also, I LOVE the fact that you work in a copy of the Word manuscript during indexing with DEXembed. I use Cindex indexing software, in which you move to an entry by typing the opening letters of that entry, and I occasionally find myself typing those letters into the Word document by mistake.
Then came time to embed the index and the gears started grinding. Instead of detailing my difficulties, I’m going to simply outline the fixes.
First let’s discuss page references with attributes such as bold or italic. My client wanted page numbers of pages with photos to appear in bold. DEXembed works from a tab-delimited export from the indexing software, which is a text format. This means that bold and italic are lost. At the recommendation of the DEXembed developer, I used the following procedure to get bold page numbers into Word:
End the lowest level heading with a special character. I used the +% character combination to mark bold page numbers. So if the bold page number appeared after a main heading, then the last characters of the main heading text were +%. If the bold page number was in a subentry, the subentry heading text ended with +%.
Once the embedding process into the Word document was done, I showed index tags and did the following search and replace:
Replace With: ^034 \b
This searches for my unique character code plus a double quote mark and replaces it with a double quote mark, a space, then the flag for bolding the page number.
My index had many book titles, whose italic needed to be surrounded by special characters that I could search for and replace with italic formatting of the text:
Replace With: \1
Nearly every name in this book had an associated nickname that I included in the index within double quote marks, which my Macintosh system automatically turns into curly quote marks on entry.
Double quote marks are a special character in Word indexing, surrounding the heading text. When using WordEmbed, the curly quotes in my headings were differentiated by Word from the straight quote marks Word uses around the XE index heading text. But DEXembed works from a tab-delimited text file exported from the indexing software, and the curly quotes were all converted to straight quotes.
If I simply escaped the double quote marks, I had a heck of a time determining opening versus closing for my search and replace. So I ended up replacing all my open quotes with #OQ# and my close quotes with #CQ#, which I replaced with the proper curly quotes after the embed.
The colon is a special character in Word indexes, separating head levels, but my index had no colons, so I didn’t have to deal with them.
Okay, the semicolon is a special character in Word, used to force sorting. So here we’re into special characters and forced sorting.
Let’s sidetrack into forced sorting.
DEXembed was written with the Sky indexing software in mind. Sky’s tab-delimited export format apparently contains information beyond that found in Cindex’s tab-delimited export file that DEXembed uses to force sorting within Word. The DEXembed developer made it work with Cindex for me, but I decided to use the semicolon method to force sorting.
Seth Maislin has a great page on issues with Word indexes that includes a section on forcing sorting with the semicolon:
Troubleshooting Those Horrible Microsoft Word Index Problems
An example of using the semicolon to force sorting is easily seen with an entry that begins with a quote mark. Word does its sorts based on ASCII value, so an entry beginning with a quote mark jumps to top of the index. The following is the syntax for forcing the proper sorting:
This sorts the entry
under Nick in the index.
My index heading text used semicolons in several entries. Plus I wanted to use semicolons to sort. The DEXembed utility forces sorting in a Sky manner that doesn’t use the semicolon, and recognizes that semicolons may appear in heading text, so it escapes all semicolons.
I found it easiest to simply use a special character code for in-text semicolons and another for sorting semicolons that I could then search for and replace after embedding. The sorting semicolon character codes were replaced with semicolons but the in-text semicolon character codes were replaced with backslash semicolon.
The process then followed the path of embedding then doing a series of searches and replaces for the bold page numbers, italic headings, quote marks, and semicolons.
This Word manuscript was formatted and had many photos. If a page consisted of only a photo, I had trouble getting the tags to be embedded on the correct page so that the correct page number was bolded. Fortunately in each of those cases, there was an empty paragraph mark on the page with the photo. I hand moved the index tags from paragraphs on nearby pages to the empty paragraph mark on the photo page. This got the right page number to bold to indicate the photo.
This is undoubtedly bad form—putting index tags into an empty paragraph. But DEXembed’s overall embedding is probably good practice: ganging all the index marks near the start of a paragraph.
Once the final index is generated and converted to text, corrections need to be made: Word puts the commas outside of quotation marks. A See also hanging off a subentry will probably have a comma before that needs to be changed to a period. Multiple cross-references in one entry need to be ganged together. These are all Word errors in generating the index.
So how did IndexLinker do, after all this? It seems that IndexLinker has trouble with entries containing quote marks or bold page numbers. *sigh* A lot of the page numbers were properly linked, and a lot weren’t.
AND: You have to present IndexLinker with a generated index that’s still a field. When it asks if it should convert the index to text, say yes:
- If you leave the index as a field then convert to text, you lose the hyperlinks.
- If you leave the index as a field then re-generate the index, you lose the hyperlinks.
So I don’t know why you’d say no to leave the index as a field. But life is for learning.
The job was delivered, the client was happy, but there are still rough spots in the process.