Skip to content

Authoring API Documentation

Briton Barker edited this page Sep 30, 2015 · 7 revisions

Python API Documentation

Type Style Output
HTML+PDF for publication custom .rst *.rst files
Static Program Analysis (SPA) for IDEs (IJ, iPy) standard, archaic .rst docstubs*.py files
“Live” doc strings - Interactive docs for REPLs Numpydoc installed API entity.__doc__

REST API Documentation

Type Style Output
HTML+PDF for publication or download custom .rst *.rst files

Command/Method Documentation Elements

  1. One-line summary – concise one-liner.
  2. Extended summary – rich description, can be several paragraphs
  3. Args – describes parameters, specifically, name: (type, description)
  4. Return – describes return value, (type, description)
  5. Maturity – api maturity tag (alpha, beta, deprecated)
  6. Examples – (client specific) shows how to use method; for Python examples, should pass w/ doctests
  7. Notes – (client specific?) Extra notes, particulars

The command metadata provides elements 1-5, using json-schema standard, with some additions. See http://json-schema.org/documentation.html

The REST documentation requires more information like: parameters, headers, and body, and response status, headers, body.

ATK Engine (Scala)

The CommandDoc class (the “doc” property) provides One Line Summary and Extended Summary for each plugin. This is being phased out in favor of annotations. There is an @PluginDoc annotation which decorates the plugin class. It provides fields for text descriptions of:

  1. oneLine – the one-liner
  2. extended – the rich description
  3. returns – optional text describing the returns object

The apiMaturityTags property persists on its own, as metadata that is possibly useful outside of documentation. The case class used for the Arguments of the plugin should use field annotations (@ArgDoc) to provide description text for each argument. The argument name and type are extracted through reflection.

The case class used for the Returns object may also use field annotations, particularly for more complex and/or custom return types.

A special case class for Returns is often not needed, even discouraged. The return type should be a standard type, like a FrameReference or a DoubleValue. In this situation, where the author would still like to provide documentation regarding the returned object (always recommended), she may add text to the “returns” argument of a general PluginDoc annotation, mentioned above. Both PluginDoc.returns annotation and the ArgDoc field annotations can be used together for returns documentation. The PluginDoc.returns text will go first.

Example of annotations on the plugin:

case class EcdfArgs(frame: FrameReference,
  @ArgDoc("The name of the input column containing sample.") column: String,
  @ArgDoc("A name for the resulting frame which is created by this operation.") resultFrameName: Option[String] = None) {
  require(frame != null, "frame is required")
  require(column != null, "column is required")
}

/** Json conversion for arguments and return value case classes */
object EcdfJsonFormat {
  import DomainJsonProtocol._
  implicit val EcdfArgsFormat = jsonFormat3(EcdfArgs)
}

import EcdfJsonFormat._
/**
 * Empirical Cumulative Distribution for a column
 */
@PluginDoc(oneLine = "Build new frame with columns for data and distribution.",
  extended = """Generates the :term:`empirical cumulative distribution` for the input column.""",
  returns = Some("A new Frame containing each distinct value in the sample and its corresponding ECDF value."))
class EcdfPlugin extends SparkCommandPlugin[EcdfArgs, FrameEntity] {

Default values

For *Arguments case classes, each optional parameter must be defined with a default value. All optional parameters must be at the end of the signature (i.e. such that it follows the args/kwargs rule in Python). The Scala api metadata generation will validate this.

We use reflection to get the default values when generating the command definitions (we tell the compiler to keep the information around). http://stackoverflow.com/questions/14034142/how-do-i-access-default-parameter-values-via-scala-reflection

We collect the defaults for the *Arguments case class only. It’s hard to imagine documenting default values for the *Return object.

Scala Option

In the past we have used the Scala Option type to indicate an optional parameter. This is inaccurate, and confusing if we rely on Scala’s naming it Option. An Option is used to specify well-defined presence of value, not the quality of being optional. For example, consider a parameter frame_name which indicates a name to give to a frame produced by a method. We would use an Option[String] type to state if Some(“new_name”) is provided (i.e. presence of value), then we will name the frame with that name. If None is provided (i.e. absence of value), then we will not name the new frame. The value of the Option determines the behavior. However, as is, the user must choose one way or another. It isn’t until we provide a default value (probably =None) that the parameter becomes optional.

Furthermore, having a default value makes it available to reflection and the command definition metadata. This means we can accurately represent the default value in the documentation. If the default value is squirreled away down in a case class accessor method, then we can’t use it.

Therefore, optional parameters should have default values provided directly in the signature. Usage of the type Option[_] should be for situations where value presence is Significant, not the quality of being optional. We should never see a parameter defined as x: Option[Int] = Some(10). It should be simply: x: Int = 10 Though at the same time, we should also never see strings defaulted to an empty string. Empty strings are rarely useful. It is better to use Option[String]=None.

In the code – source_code/shared/src/main/scala/com/intel/intelanalytics/shared/JsonSchema.scala we’ve been determining optional based on the type Option. We should instead be basing it off the fact that a default value is provided, and only use Option when something can be None. The code was implemented for this change, but many plugins violate it, so it can’t be turned on until a cleanup effort transpires. Don’t forget to revisit this.

Python Client

The API info from the server comes nicely dressed up with metadata. Client meta-programming auto-generates classes and functions, with documentation, when the client “connects” to the server. However, some of the Python API is manually coded. This makes it difficult to operate on the API as a whole, like generating complete docs in various formats, collecting API coverage, and querying the API. We need the manually coded (“clientside”) API methods to have the same metadata.

To manually create a clientside API method:

  1. Make the method or property name start with a double-underscore. Example: “__add_columns”. If our API method is adapting a command coming from the server, make the name following the double-underscore match the name coming from the server, to prevent the server one from getting installed as well. [It may be suitable for the server’s meta data to be used directly. This would be a TODO to implement that option]

  2. Decorate it with @api which marks it for installation as a “clientside” API.

  3. Decorate it with @arg and @returns to provide excellent metadata: name, type and description. Still write a __doc__ string for the function which has the one-line summary at the top, which should end in a period, and then the extended summary following with multi-line text. Do not include details about “Parameters” or “Returns”.

a. @arg(name, type, description) # type can be an actual type (preferred) or a string that identifies the type

b. @returns(type, description)

  1. For API maturity tags, use a different set of decorators:
   @alpha
   @beta
   @deprecated

The decorators (@args, @alpha, …) may be applied in any order, but @api must be the first one. It’s a rule. Properties needed special rules. The @property decorator should come before all the @alpha or @returns, etc., however, @api should still come very first.

@api
@property
@beta
@returns(str, "phone number in string format")
def __phone(self):
    '''The customer support hotline'''
    return self._phone

However, for a setter, the @*.setter must come first --the one exception to the @api comes first rule.

@phone.setter
@api
def __phone(self, value):
    self._phone = value

The python metaprogramming builds the “live” docstrings naturally as part of building out the API. The other two types (rst and spa) are commissioned explicitly. The process for all 3 types however is very similar.

Examples are pulled from intelanalytics/doc/examples/{install_path}.rst where they’re also executed as integration tests.

Process for autogenerating the .rst

This section covers some details of the actual .rst generation process, which is not required knowledge for authoring.

intelanalytics/doc/api_template contains a skeleton dir tree with folders and some canned files. During .rst doc creation this tree is copied to a temporary location as “python_api/” where it is filled out by the metaprogramming. The resulting tree is moved from the temp location to source_code/doc/source/python_api. The python_api folder is the root folder. Under this folder are entity collections subfolders, like “frames”, “models”, “graphs” etc. and any other type of “named” “entity” types. Under the entity collection sits subfolders for the specific class type, like frame-, frame-vertex, frame-edge. In the class folder, the index.rst describes the class (the __doc__ string of the Python class) and has the attribute and method summary tables. These tables have hyperlinks to the other *.rst files in the class directory, each of which describes a single method or attribute. The __init__ method for the class is documented in the index.rst. Note the distinction and usage of the doc string for the class vs. the doc string for its __init__ method. (In the past, only the class’s doc str was used, for the __init__ method, weird) Back to the collection folder, its index.rst file provides hyperlinks to the class pages (their index.rst) and hyperlinks to *.rst files that describe the collection global methods, say, get_frame, get_frame_names, etc. –one .rst file each.

Example structure when done…

python_api/
  index.rst
  frames/
    index.rst   # has summary table with global methods and classes
    get_frame_names.rst
    get_frames.rst
    drop_frames.rst
    frame-/
        index.rst # has summary table with methods
        ecdf.rst
        assign_sample.rst
    frame-vertex/
        index.rst
        etc.
    frame-edge/
        index.rst
        etc.

Python Metaprogramming Important terms

CommandDefinition – all the metadata for a single command, but does not have a reference to its parent class or module

CommandInstallation – an object added to Python classes which hold CommandDefinitions and other metadata associated with methods and attributes which are installed into the class.

InstallPath – a string which defines the location of a CommandInstallation. Ex. “frame:vertex” means the command installs to the class VertexFrame

Class store – global collection of all the classes which have an instance of a CommandInstallation

api_globals – set of all the objects (Types) which belong to the API, whether clientside or serverside

)

Building the HTML Docs Manually

To generate the documentation manually, to see changes you've made, follow these instructions. Each step assumes starting in the root atk/ source folder.

  1. For any scala changes or API .rst changes in doc-api-examples/, rebuild ATK. For Python edits or .rst changes in the doc/, this step can be skipped.

    mvn install -DskipTests -T 8
    
  2. Start the REST server

    ./bin/rest-server.sh
    
  3. Generate the .rst files for the Python API and REST API (as well as the docstubs*.py files)

    cd python-client/trustedanalytics/doc
    python2.7 build_docs.py
    
  4. Build the html*

    cd doc
    make html
    

    *- If you have trouble with step #4, you may not have sphinx (the rendering tool) or its dependencies installed:

    pip2.7 install sphinx
    pip2.7 install numpydoc
    
  5. View the html in a browser on that machine: file:///path/to/your/atk/doc/build/html/html.index

    To view in a browser on a different machine, start a simple web server with these commands:

    cd doc/build/html
    python2.7 -m SimpleHTTPServer
    

    Then open any browser, go to that machine IP with port 8000 (ie. 10.7.152.51:8000)