Tuesday, January 23, 2018

Adding Terms to Javadoc Search with Java 9

There is a relatively old web page called "Proposed Javadoc Tags" that appears to have originally been written in conjunction with Javadoc 1.2 that lists "tags that Sun may implement in Javadoc someday." The tags in this list are @category, @example, @tutorial, @index, @exclude, @todo, @internal, @obsolete, and @threadsafety. One of these tags, @index, has moved from "Proposed Tags" to "Standard Tags" with its inclusion in Java 9. The Java 9 Javadoc tool documentation states that the @index tag is used to specify an indexed "search term or a phrase" that can be searched for in Java 9's new Javadoc Search feature.

The ability to add terms for searching in Javadoc generated documentation has been desired for some time as demonstrated by the existence of JDK-4034228 ("stddoclet: Add @index doc-comment tag for generating an index from common words"), JDK-4279638 ("Javadoc comments: Need ability to tag words for inclusion in the API index"), and JDK-4100717 ("Allow user-specified index entries"). JEP 225 ("Javadoc Search") was used to "add a search box to API documentation generated by the standard doclet that can be used to search for program elements and tagged words and phrases within the documentation."

Javadoc in Java 9 and later will automatically include several constructs in the "Search" that can be performed from the generated HTML output. These searchable by default strings are those based on methods' names, members' names, types' names, packages' names, and modules' names. The advantage offered by @index is that phrases or search terms not built into the names of these just-listed constructs can be explicitly to the searched index.

There are several examples of where the ability to add customized text for searching Javadoc generated documentation can be useful. The Javadoc tool documentation references the "domain-specific term ulps" ("units in the last place") and explains that although "ulps is used throughout the java.lang.Math class," it "doesn't appear in any class or method declaration names." Using @index would allow the API designers of the Math class to add "ulps" to the searchable index to help people find the Math class when searching for "ulps." In Effective Java's Third Edition, Josh Bloch references another example of where Javadoc {@index} might be useful. In Item 56, Bloch cites an example using {@index IEEE 754} ("IEEE Standard for Floating-Point Arithmetic").

I recently ran into a case in the JDK where I thought use of {@index} would be appropriate. I posted recently on the Dual-Pivot Quicksort, but realized that one does not find any matches for that term when searching the Javadoc-generated output. It seems like it would be useful to add terms such as "Dual Pivot Quicksort" and "Mergesort" to the Javadoc search index via {@index}.

Unfortunately, having spaces in the text embedded in the {@index } tag seems to result in only the terms before the first space showing up in the rendered HTML (and being the only portions that can be searched). To demonstrate this, the following ridiculously contrived Java code contains three {@index} Javadoc tags representative of the three examples just discussed.

Java Code Using {@index} in Its Documentation

package dustin.examples.javadoc;

/**
 * Used to demonstrate use of JDK 9's Javadoc tool
 * "@index" tag.
 */
public class JavadocIndexDemonstrator
{
   /**
    * This method complies with the {@index IEEE 754} standard.
    */
   public void doEffectiveJava3Example()
   {
   }

   /**
    * Accuracy of the floating-point Math methods is measured in
    * terms of {@index ulps}, "units in the last place."
    */
   public void doMathUlpsExample()
   {
   }

   /**
    * This method uses a version of the {@index Dual-Pivot Quicksort}.
    */
   public void doDualPivotQuicksort()
   {
   }
}

When the Javadoc tool is executed against the above code on my Windows 10 machine in Java 9.0.4, the generated HTML page looks like this:

The "754" is missing in the generated HTML after "IEEE" and the "Quicksort" is missing after "Dual-Pivot" in the methods' documentation. The next code listing shows the generated HTML source code for these pieces with missing text.

HTML Source

<div class="block">This method uses a version of the <a id="Dual-Pivot" class="searchTagResult">Dual-Pivot</a>.</div>
 . . .
<div class="block">This method complies with the <a id="IEEE" class="searchTagResult">IEEE</a> standard.</div>

From the HTML output just shown, it becomes apparent why only the text before the first space appears in the page and is searchable. The "id" attribute associated with the "searchTagResult" class for each searchable entry consists of the searchable string. Because HTML "id" attributes cannot have spaces, only the characters up to the first space can be used for the "id" value.

Because spaces are not allowed in the "id" attributes, one of the following work-arounds would need to be used when dealing with multiple words in a single phrase for which search is desired.

  1. Remove spaces
    • "{@index IEEE 754}" becomes "{@index IEEE754}"
    • "{@index Dual-Pivot Quicksort}" becomes "{@index Dual-PivotQuicksort}"
  2. Replace spaces with allowable character (for example, hyphen)
    • "{@index IEEE 754}" becomes "{@index IEEE-754}"
    • "{@index Dual-Pivot Quicksort}" becomes "{@index Dual-Pivot-Quicksort}"
  3. Use separate {@index} for each word in phrase
    • "{@index IEEE 754}" becomes "{@index IEEE} {@index 754}"
    • "{@index Dual-Pivot Quicksort}" becomes "{@index Dual-Pivot} {@index Quicksort}"
  4. Use {@index} only on most important terms in phrase
    • "{@index Dual-Pivot Quicksort}" becomes "{@index Dual-Pivot} Quicksort"
  5. Represent multiple word phrase with common single word representation
    • This is why "ulps" in the Javadoc tool documentation works well rather than "units in the last place."

The "Motivation" section of JEP 225 ("Javadoc Search") nicely summarizes the benefits of this ability to search for terms in Javadoc:

The API documentation pages generated by the standard doclet can be hard to navigate if you're not already familiar with their layout. An external search engine can be used, but that may lead to an out-dated or irrelevant page. The browser's built-in search function can be used, but that is limited to searching within the current page rather than an entire body of documentation.

Although adding search capability to Javadoc-generated documentation is a minor addition in Java 9, it can be used to make documentation of one's Java code more useful to other developers and users of that code.

1 comment:

Jonathan Gibbons said...

If you want to enter a phrase into the search index, enclose it in quotes, as in {@index "IEEE 754"}.

Any words appearing after the word or quoted phrase will be used as additional info that is presented in the search menu. For example, {@index "IEEE 754" Floating Point Standard}. These extra words are not searchable.