-
Notifications
You must be signed in to change notification settings - Fork 40
Authoring API Documentation
Python API Documentation
Type | Style | Output |
---|---|---|
HTML+PDF for publication | custom .rst | *.rst files |
Static Program Analysis (SPA) for IDEs (IJ, iPy) | standard, archaic .rst | docstubs*.py files |
“Live” doc strings - Interactive docs for REPLs Numpydoc | installed API | entity.__doc__
|
REST API Documentation
Type | Style | Output |
---|---|---|
HTML+PDF for publication or download | custom .rst | *.rst files |
- One-line summary – concise one-liner.
- Extended summary – rich description, can be several paragraphs
- Args – describes parameters, specifically, name: (type, description)
- Return – describes return value, (type, description)
- Maturity – api maturity tag (alpha, beta, deprecated)
- Examples – (client specific) shows how to use method; for Python examples, should pass w/ doctests
- Notes – (client specific?) Extra notes, particulars
The command metadata provides elements 1-5, using json-schema standard, with some additions. See http://json-schema.org/documentation.html
The REST documentation requires more information like: parameters, headers, and body, and response status, headers, body.
The CommandDoc class (the “doc” property) provides One Line Summary and Extended Summary for each plugin. This is being phased out in favor of annotations. There is an @PluginDoc annotation which decorates the plugin class. It provides fields for text descriptions of:
- oneLine – the one-liner
- extended – the rich description
- returns – optional text describing the returns object
The apiMaturityTags property persists on its own, as metadata that is possibly useful outside of documentation. The case class used for the Arguments of the plugin should use field annotations (@ArgDoc) to provide description text for each argument. The argument name and type are extracted through reflection.
The case class used for the Returns object may also use field annotations, particularly for more complex and/or custom return types.
A special case class for Returns is often not needed, even discouraged. The return type should be a standard type, like a FrameReference or a DoubleValue. In this situation, where the author would still like to provide documentation regarding the returned object (always recommended), she may add text to the “returns” argument of a general PluginDoc annotation, mentioned above. Both PluginDoc.returns annotation and the ArgDoc field annotations can be used together for returns documentation. The PluginDoc.returns text will go first.
Example of annotations on the plugin:
case class EcdfArgs(frame: FrameReference,
@ArgDoc("The name of the input column containing sample.") column: String,
@ArgDoc("A name for the resulting frame which is created by this operation.") resultFrameName: Option[String] = None) {
require(frame != null, "frame is required")
require(column != null, "column is required")
}
/** Json conversion for arguments and return value case classes */
object EcdfJsonFormat {
import DomainJsonProtocol._
implicit val EcdfArgsFormat = jsonFormat3(EcdfArgs)
}
import EcdfJsonFormat._
/**
* Empirical Cumulative Distribution for a column
*/
@PluginDoc(oneLine = "Build new frame with columns for data and distribution.",
extended = """Generates the :term:`empirical cumulative distribution` for the input column.""",
returns = Some("A new Frame containing each distinct value in the sample and its corresponding ECDF value."))
class EcdfPlugin extends SparkCommandPlugin[EcdfArgs, FrameEntity] {
Default values
For *Arguments case classes, each optional parameter must be defined with a default value. All optional parameters must be at the end of the signature (i.e. such that it follows the args/kwargs rule in Python). The Scala api metadata generation will validate this.
We use reflection to get the default values when generating the command definitions (we tell the compiler to keep the information around). http://stackoverflow.com/questions/14034142/how-do-i-access-default-parameter-values-via-scala-reflection
We collect the defaults for the *Arguments case class only. It’s hard to imagine documenting default values for the *Return object.
Scala Option
In the past we have used the Scala Option type to indicate an optional parameter. This is inaccurate, and confusing if we rely on Scala’s naming it Option. An Option is used to specify well-defined presence of value, not the quality of being optional. For example, consider a parameter frame_name which indicates a name to give to a frame produced by a method. We would use an Option[String] type to state if Some(“new_name”) is provided (i.e. presence of value), then we will name the frame with that name. If None is provided (i.e. absence of value), then we will not name the new frame. The value of the Option determines the behavior. However, as is, the user must choose one way or another. It isn’t until we provide a default value (probably =None) that the parameter becomes optional.
Furthermore, having a default value makes it available to reflection and the command definition metadata. This means we can accurately represent the default value in the documentation. If the default value is squirreled away down in a case class accessor method, then we can’t use it.
Therefore, optional parameters should have default values provided directly in the signature. Usage of the type Option[_] should be for situations where value presence is Significant, not the quality of being optional. We should never see a parameter defined as x: Option[Int] = Some(10). It should be simply: x: Int = 10 Though at the same time, we should also never see strings defaulted to an empty string. Empty strings are rarely useful. It is better to use Option[String]=None.
In the code – source_code/shared/src/main/scala/com/intel/intelanalytics/shared/JsonSchema.scala
we’ve been determining optional based on the type Option. We should instead be basing it off the fact that a default value is provided, and only use Option when something can be None. The code was implemented for this change, but many plugins violate it, so it can’t be turned on until a cleanup effort transpires. Don’t forget to revisit this.
The API info from the server comes nicely dressed up with metadata. Client meta-programming auto-generates classes and functions, with documentation, when the client “connects” to the server. However, some of the Python API is manually coded. This makes it difficult to operate on the API as a whole, like generating complete docs in various formats, collecting API coverage, and querying the API. We need the manually coded (“clientside”) API methods to have the same metadata.
To manually create a clientside API method:
-
Make the method or property name start with a double-underscore. Example: “__add_columns”. If our API method is adapting a command coming from the server, make the name following the double-underscore match the name coming from the server, to prevent the server one from getting installed as well. [It may be suitable for the server’s meta data to be used directly. This would be a TODO to implement that option]
-
Decorate it with @api which marks it for installation as a “clientside” API.
-
Decorate it with @arg and @returns to provide excellent metadata: name, type and description. Still write a
__doc__
string for the function which has the one-line summary at the top, which should end in a period, and then the extended summary following with multi-line text. Do not include details about “Parameters” or “Returns”.
a. @arg(name, type, description) # type can be an actual type (preferred) or a string that identifies the type
b. @returns(type, description)
- For API maturity tags, use a different set of decorators:
@alpha
@beta
@deprecated
The decorators (@args, @alpha, …) may be applied in any order, but @api must be the first one. It’s a rule. Properties needed special rules. The @property decorator should come before all the @alpha or @returns, etc., however, @api should still come very first.
@api
@property
@beta
@returns(str, "phone number in string format")
def __phone(self):
'''The customer support hotline'''
return self._phone
However, for a setter, the @*.setter must come first --the one exception to the @api comes first rule.
@phone.setter
@api
def __phone(self, value):
self._phone = value
The python metaprogramming builds the “live” docstrings naturally as part of building out the API. The other two types (rst and spa) are commissioned explicitly. The process for all 3 types however is very similar.
Examples are pulled from intelanalytics/doc/examples/{install_path}.rst where they’re also executed as integration tests.
This section covers some details of the actual .rst generation process, which is not required knowledge for authoring.
intelanalytics/doc/api_template
contains a skeleton dir tree with folders and some canned files. During .rst doc creation this tree is copied to a temporary location as “python_api/” where it is filled out by the metaprogramming. The resulting tree is moved from the temp location to source_code/doc/source/python_api.
The python_api folder is the root folder. Under this folder are entity collections subfolders, like “frames”, “models”, “graphs” etc. and any other type of “named” “entity” types.
Under the entity collection sits subfolders for the specific class type, like frame-, frame-vertex, frame-edge.
In the class folder, the index.rst describes the class (the __doc__
string of the Python class) and has the attribute and method summary tables. These tables have hyperlinks to the other *.rst files in the class directory, each of which describes a single method or attribute. The __init__
method for the class is documented in the index.rst. Note the distinction and usage of the doc string for the class vs. the doc string for its __init__
method. (In the past, only the class’s doc str was used, for the __init__
method, weird)
Back to the collection folder, its index.rst file provides hyperlinks to the class pages (their index.rst) and hyperlinks to *.rst files that describe the collection global methods, say, get_frame, get_frame_names, etc. –one .rst file each.
Example structure when done…
python_api/
index.rst
frames/
index.rst # has summary table with global methods and classes
get_frame_names.rst
get_frames.rst
drop_frames.rst
frame-/
index.rst # has summary table with methods
ecdf.rst
assign_sample.rst
frame-vertex/
index.rst
etc.
frame-edge/
index.rst
etc.
CommandDefinition – all the metadata for a single command, but does not have a reference to its parent class or module
CommandInstallation – an object added to Python classes which hold CommandDefinitions and other metadata associated with methods and attributes which are installed into the class.
InstallPath – a string which defines the location of a CommandInstallation. Ex. “frame:vertex” means the command installs to the class VertexFrame
Class store – global collection of all the classes which have an instance of a CommandInstallation
api_globals – set of all the objects (Types) which belong to the API, whether clientside or serverside
)
To generate the documentation manually, to see changes you've made, follow these instructions. Each step assumes starting in the root atk/ source folder.
-
For any scala changes or API .rst changes in
doc-api-examples/
, rebuild ATK. For Python edits or .rst changes in thedoc/
, this step can be skipped.mvn install -DskipTests -T 8
-
Start the REST server
./bin/rest-server.sh
-
Generate the .rst files for the Python API and REST API (as well as the docstubs*.py files)
cd python-client/trustedanalytics/doc python2.7 build_docs.py
-
Build the html*
cd doc make html
*- If you have trouble with step #4, you may not have sphinx (the rendering tool) or its dependencies installed:
pip2.7 install sphinx pip2.7 install numpydoc
-
View the html in a browser on that machine:
file:///path/to/your/atk/doc/build/html/html.index
To view in a browser on a different machine, start a simple web server with these commands:
cd doc/build/html python2.7 -m SimpleHTTPServer
Then open any browser, go to that machine IP with port 8000 (ie. 10.7.152.51:8000)