Search This Blog

Thursday, April 14, 2011

Lexigraph Universal "Tag" Markup Process

Lexigraph's pdfExpress Pro series of products allows you to use PDF files a templates for highly efficient data merge functions.

This post describes the use of Acrobat "Tags" to facilitate that process.

Background

Full Acrobat versions starting with 5.0 support a feature called "Tags".  This feature is defined by version 1.4 and later of the PDF standard.

Starting with version 4.0.3 and later of pdfExpress Pro Acrobat tags can be used as an alternate to the plug-in markup process for Lexigraph's pdfExpress line of products.  While Acrobat Tags offer many general capabilities for PDF files, Lexigraph uses only a few specific elements of native Acrobat Tagged PDF as an alternate to the Lexigraph markup plug-in.

In general tags allow an Acrobat user to add specific types of meta data to PDF content in a way that is surprisingly similar to Lexigraph's markup process (which has been in use since 1998 and Acrobat version 3.0)

Acrobat Tags operate by defining a region of marking operators within a PDF page using the BDC command.  Each BDC command is numbered and brackets a range of PDF operators of any type.  Since there is no requirement in PDF for a string, e.g., "This is a test.", to be marked with a single PDF command most text appears as multiple BDC sections.

Both InDesign and Acrobat are able to add tags to a document.

Tags are accessible via Acrobat and InDesign via specific product features: Navigation/Tabs in Acrobat and Window/Tags in InDesign.

The Markup Process

All strictures requiring distillation by an older Acrobat Distiller to maintain proper font embedding, etc. are now relaxed.  PDF files created directly by Adobe applications can be marked up within those applications so long as the file is a legal PDF file relative to the particular version of Acrobat being used.

Creating markup using InDesign CS2:

1. Launch InDesign

2. Add your text and image elements to the page as desired.

3. First you must make each markup its own separate "Style".  Do this by selecting the story block and clicking the small create style icon at the bottom of the Object Styles palette.

Note that the story block must be "selected" with setting the Object Style as indicated by the small squares at the corner.  Here the Object Style has been renamed to match the markup - but this is not necessary - the default Style names will work just as well.

4. Open "Tags" palette under file menu Window>Tags.

5. For each Story highlight the text you wish to mark up.
Then click the small arrow at the upper right of the Tags palette and create a name for the markup using the "New Tag…" option.  (Note that the blue highlight around "markup1" indicates that the text is actually tagged.

Name the tag a legal pdfExpress markup name.

6. Select each line, block or image separately on the page and click on the corresponding tag name you assigned previously in the Tag palette to set the tag.

7. Cycle through each selection on template and verify Tags palette shows the tag set correctly.

8. Go to File>Export to save PDF file. Use standard pdfExpress output settings and make sure you click on "Create Tagged PDF" under the Options section as shown below in order to retain Tags set in InDesign.
9. To verify the tags have been correctly created open up Acrobat and go to View>Navigation Tabs>Tags.  A "Tags" palette will appear.

Next to the small red box by "Tags" click the triangle to open up the Tag tree.  Open the successive triangles as shown in the example above.

A correctly marked up PDF from InDesign will have the "Tags//
/markup names" structure as shown above.  Note that each markup name must appear as a separate entry directly under the
tag.  Any other format from InDesign will not work inside pdfExpress.

Using Acrobat for Markup

The steps below describe how to use Acrobat to add tags to an existing PDF file.  The following steps assume that no tags currently exist in the PDF file.  However these steps will work to change existing markup or replace existing markup in any PDF.

Basic One-Element Markup Steps


1. Open the file with a full version of Acrobat (Acrobat Reader will not work) and select the desired page for markup.

2. Locate and select the "Tags" menu.  Depending on your version of Acrobat this may be located under the "Windows" or "View>Navigation Tabs>" menu.  A small floating dialog will appear with a "Tags" tab and "Tags" drop-down menu.  Assuming no prior tags have been added to the document (and not just the particular page) there will be a small text message saying "No Tags available".

3. Select the drop-down choice "Create Tags Root".  The "No Tags available" text should change to "Tags Root" or "Tags".

4. On the Acrobat toolbar select the "TouchUp Object Tool", "Text Select Tool" or "TouchUp Object Tool". Depending on the version of Acrobat what the these tools do or do not do varies and some "experimenting" may be required to find the correct tool for selecting a specific set of characters or an image. For this example, using CS2 and Acrobat 7.0 the "TouchUp Text Tool" was used:


5. Using the selected tool select the text or image that requires markup.  Note that  Certain types of tools tend to select larger groups of text than others - depending on the context of the selection.

A full selection using "TouchUp Text Tool" should look like this (note the gaps where kerning was done by using different PDF placement operators):


This same tool will select multiple lines (as indicated by the blue outline) along with text you are specifically interested in:

The black markup in this case is what will drive pdfExpress.

6. Once the item to markup has been selected, move the cursor into the Tag dialog window.  Select the "Tags" root item.  Then control click (Ctrl-Click) this item (there may also be other ways to do this within the palette).  A pop-up menu will appear with the choice "Create Tag From Selection".  Choose this item and a small "New Tag" dialog will appear:

Make sure that "Article" is selected and name the tag.  Press OK when finished.  A new entry should appear under the main "Tags" root:



The entry must begin with "" and be followed by the name of the tag.

To reference the element in a .LIS file the "Title:" value provided in step #6 above is used as follows:

/Art_XXX_Line (some replacement text)

where XXX is the "Title:" value.

7. Feel free to select the "Tags" root and press Delete to remove all markup at any time.  You may also select individual tags and delete them in the same way.

Multiple Line Markup


Multi-line text blocks may be marked up with a tag.  pdfExpress Pro automatically determines the block structure from the marked up text block.

pdfExpress determines the size of the block being created by using the smallest rectangle able to enclose all of the marked up elements.  Note that sometimes selections may include text outside the desired area - usually due to the nature of the underlying PDF commands themselves.  Other than altering the PDF file there is no way to separate text elements in such a markup region.

The following pdfExpress .LIS replacements are created for each block:

/block.Art_XXX_Top [ (block step 1) (block step 11) (block step 111) ]
/block.Art_XXX_Bottom [ (block step 1) (block step 11) (block step 111) ]

The block.Art_XXX_Top substitution dictionary name starts with the "top" line of the multi-line markup and steps downward.  The block.Art_XXX_Bottom  name starts with the "bottom" line of the block and works upward.

Command Line


By default pdfExpress Pro does not interpret tags within a PDF as pdfExpress markup.  This condition can be overridden on a file-by-file basis using the carousel creation option  "/InterpretTagsAsMarkup true".  For example

[ /createCarousel (test3) (smith_base_2.pdf) << /InterpretTagsAsMarkup true >> ] merge
will cause the tags in "smith_base-2.pdf" to be interpreted as markup.

If you intend to use only tag markup or to have all files with tags processed as markup, you can use the command line option

pdfc-Pro ... -tagsasmarkup true ...

This ensures that all tags used in every PDF file encountered by pdfc-Pro will be converted to markup.

No comments:

Post a Comment