Skip to content
Avinash Ranganath edited this page Apr 16, 2018 · 3 revisions

LIP 14 - Products node with heterogeneous input sizes

LIP 14
Title Products node with heterogeneous input sizes
Author A. Ranganath
Status Draft
Type Standard
Discussion Issue #46
PR #48
Created March 9, 2018

Introduction

This is the product equivalent of multi-ops node SumsLayer. The interface and the reason behind both ops are common, however, they differ in implementation.

Technical Background

The current implementation of multi-ops Products node, which can model multiple product operations, accepts either a single input or list of inputs, concatenates them into a single wide 2D tensor, reshapes it to (batch x num-ops x input-size), and then performances reduction operation on the last dimension for value calculation.

The underlying assumption with Products node is that all the ops modeled within, have a common inputs-size. This assumption falls through when: (1) you want to model an entire layer of an SPN graph's tree structure, or (2) while pruning an SPN graph with multi-op nodes. This deficit in the current design facilitates a need for a multi-op node that can model multiple product operations of varying input-sizes.

Proposal

We want to implement a new type of multi-ops node - ProductsLayer - for modeling multiple product operations with heterogeneous input-sizes. The parameter num_prods in the current Products class accepts the number of product ops to be modelled. This parameter will be replaces with num_or_size_prods, which, as a list would encode the input-size of each product operation modeled, or as an int would be treated similar to the parameter num_prods (indicating that each product op modeled would have a common input-size).

Custom tf-op GatherColumns3D will be used to gather subsets of columns, per product operation modeled, shaped to (batch x num-ops x max(input-sizes)), and then reduction operation performed for value calculation.

For products that have input-size less than max(input-size), residual columns will be padded with either 0's or 1's, for value calculation in log or non-log space respectively, which would be done implicitly by the GatherColumns3D operation.

Similarly for path calculation, subsets or columns from the 2D counts tensor are gathered, one per unique (input, index) pair in the node's input-list, into a 3D tensor and reduce-sum operation is performed on the 3rd axis to add-up counts emanating from all the parents of each unique (input, index) pair, before split and scattered to unique inputs in the inputs-list.

The proposed new ProductsLayer multi-op node would contain the following changes compared to the Products node class:

  • Create a ProductsLayer class that can model products with heterogeneous input-sizes
  • Introduce parameter num_or_size_prods, which as a list would indicate input-sizes of each product modeled, or number of products modeled as an int.
  • Modify the following methods:
    • _compute_value_common
    • _compute_value
    • _compute_log_value
    • _compute_mpe_path

Examples

After this, we should be able to do replace the following

X = spn.ContVars(num_vars=6, name="X")
prods_1 = spn.Products((X, [0, 2, 4, 1, 3, 5]), num_prods=1, name="prods_1") # input-sizes = [6]
prods_2 = spn.Products((X, [1, 3, 5, 0, 2, 4]), num_prods=2, name="prods_2") # input-sizes = [3, 3]
prods_3 = spn.Products((X, [3, 4, 5, 0, 1, 2]), num_prods=3, name="prods_3") # input-sizes = [2, 2, 2]

with:

X = spn.ContVars(num_vars=6, name="X")
prods_1 = spn.Products((X, [0, 2, 4, 1, 3, 5]), (X, [1, 3, 5, 0, 2, 4]),
                       (X, [3, 4, 5, 0, 1, 2]), num_or_size_prods=[6, 3, 3, 2, 2, 2],
                       name="prods_1")

The parameter num_or_size_prods, as a list would encode the input-sizes of each product operation modeled. The number of products modeled would then be len(num_or_size_prods).

Performance comparison

Following are performance test results comparing value-computation between different product node types, performed on a machine with E5-1650v3 | GTX1080TI. Test cases include value calculation in log and non-log space, and for non-padded and padded (homogeneous and heterogeneous input-sizes respectively) cases. Batch size for the test is 1000, and evaluated for 50 runs.

Here, for the non-padded case, total number of products modeled in the layer is 5⁵ x 10 = 3.125 x 10⁴ product operations. All 3.125 x 10⁴ products are modeled in a single ProductsLayer node. While there are 10 PermProds or Products nodes respectively, each modeling 3.125 x 10³ products ops, and 3.125 x 10⁴ individual Product nodes, modeling one product operation each.

Similarly, for the padded case, total number of products modeled in the layer are 8.55 x 10³.

------------------------------------------------------------------------
NON-PADDED
InferenceType: MARGINAL
num_inputs: [5, 5, 5, 5, 5, 5, 5, 5, 5, 5]
num_prods: [3125, 3125, 3125, 3125, 3125, 3125, 3125, 3125, 3125, 3125]
------------------------------------------------------------------------
CPU          op  size indices  single_input  setup_time  first_run_time  rest_run_time    correct
        product 437619   Yes         No      169068.66        38823.31         378.97       True
       products 312679   Yes         No      121881.48       304540.42         250.39       True
  perm_products   199    No         No          78.25          229.58         138.40       True
  perm_products   299   Yes         No         111.60          287.52         146.37       True
  perm_products   201   Yes        Yes          94.05          248.98         142.94       True
 products_layer   123   Yes         No        3567.80          605.04         564.50       True
 products_layer    23   Yes        Yes        6166.31          606.10         482.46       True
GPU          op  size indices  single_input  setup_time  first_run_time  rest_run_time    correct
        product 437619   Yes         No      178614.85        40375.54        1940.26       True
       products 312679   Yes         No      120312.67       302684.60          36.64       True
  perm_products   199    No         No          78.12           55.87          23.66       True
  perm_products   299   Yes         No         111.09           69.16          23.76       True
  perm_products   201   Yes        Yes          93.94           56.13          20.30       True
 products_layer   123   Yes         No        3655.28           40.88          24.74       True
 products_layer    23   Yes        Yes         692.47           24.62          18.42       True

------------------------------------------------------------------------
NON-PADDED
InferenceType: MARGINAL-LOG
num_inputs: [5, 5, 5, 5, 5, 5, 5, 5, 5, 5]
num_prods: [3125, 3125, 3125, 3125, 3125, 3125, 3125, 3125, 3125, 3125]
------------------------------------------------------------------------
CPU          op  size indices  single_input  setup_time  first_run_time  rest_run_time    correct
        product 437682   Yes         No      173310.67        37997.24         434.48      False
       products 312742   Yes         No      121119.08       310505.22         314.10      False
  perm_products   262    No         No          92.83          295.61         229.09      False
  perm_products   362   Yes         No         125.34          340.07         231.62      False
  perm_products   215   Yes        Yes          92.94          307.48         222.74      False
 products_layer   186   Yes         No        8882.09          626.43         605.27      False
 products_layer    37   Yes        Yes         763.38          531.15         578.62      False
GPU          op  size indices  single_input  setup_time  first_run_time  rest_run_time    correct
        product 437682   Yes         No      174900.02        40615.28        1894.39       True
       products 312742   Yes         No      116497.61       308969.12          41.45       True
  perm_products   262    No         No          93.30           60.14          26.24       True
  perm_products   362   Yes         No         124.69           70.04          25.70       True
  perm_products   215   Yes        Yes          93.15           52.91          22.53       True
 products_layer   186   Yes         No        3747.06           44.71          27.36       True
 products_layer    37   Yes        Yes         603.29           27.68          20.74       True

--------------------------------------------------------------
PADDED
InferenceType: MARGINAL
num_inputs: [2, 3, 4, 5, 4, 3, 2, 3, 4, 5]
num_prods: [25, 125, 625, 3125, 625, 125, 25, 125, 625, 3125]
--------------------------------------------------------------
CPU          op  size indices  single_input  setup_time  first_run_time  rest_run_time    correct
        product 114239   Yes         No       43468.50         9164.96          76.60       True
       products 80099   Yes         No       30489.87        63484.72          64.49       True
  perm_products   169    No         No          57.25          146.15         131.45       True
  perm_products   239   Yes         No          92.27          167.72         124.45       True
  perm_products   171   Yes        Yes          78.69          149.60         130.51       True
 products_layer    93   Yes         No         962.92          225.97         224.66       True
 products_layer    23   Yes        Yes         240.93          222.76         150.36       True
GPU          op  size indices  single_input  setup_time  first_run_time  rest_run_time    correct
        product 114239   Yes         No       44641.18         9775.85         517.66       True
       products 80099   Yes         No       29748.52        63156.28          10.81       True
  perm_products   169    No         No          66.20           30.59           9.34       True
  perm_products   239   Yes         No          91.71           36.27           9.44       True
  perm_products   171   Yes        Yes          67.21           22.73           6.73       True
 products_layer    93   Yes         No         913.59           16.33           9.67       True
 products_layer    23   Yes        Yes         244.87           10.19           6.26       True

--------------------------------------------------------------
PADDED
InferenceType: MARGINAL-LOG
num_inputs: [2, 3, 4, 5, 4, 3, 2, 3, 4, 5]
num_prods: [25, 125, 625, 3125, 625, 125, 25, 125, 625, 3125]
--------------------------------------------------------------
CPU          op  size indices  single_input  setup_time  first_run_time  rest_run_time    correct
        product 114287   Yes         No       43500.43         8936.22          88.80       True
       products 80147   Yes         No       30635.42        64758.48          76.79       True
  perm_products   217    No         No          72.60          182.51         138.91       True
  perm_products   287   Yes         No         109.52          184.04         142.52       True
  perm_products   185   Yes        Yes          83.99          136.13         137.69       True
 products_layer   141   Yes         No         888.07          176.53         228.94       True
 products_layer    37   Yes        Yes         204.79          172.45         179.84       True
GPU          op  size indices  single_input  setup_time  first_run_time  rest_run_time    correct
        product 114287   Yes         No       44684.13         9587.35         405.56       True
       products 80147   Yes         No       29858.64        64512.84          12.18       True
  perm_products   217    No         No          83.12           35.08          10.42       True
  perm_products   287   Yes         No         109.91           40.67          10.06       True
  perm_products   185   Yes        Yes          85.03           29.53           6.68       True
 products_layer   141   Yes         No         906.27           22.07          10.80       True
 products_layer    37   Yes        Yes         214.22           12.43           6.65       True

Performance results show that there is between 5x and 7x reduction in TF graph-size, across the board, while a slight (~10%) improvement in performance for the non-padded (homogeneous input-sizes) case, and similar performance for the padded (heterogeneous input-sizes) case.

Similarly, performance test results for path-calculation among product node types, with the same structure and parameters as above, are as follows:

------------------------------------------------------------------------
NON-PADDED
InferenceType: MARGINAL
num_inputs: [5, 5, 5, 5, 5, 5, 5, 5, 5, 5]
num_prods: [3125, 3125, 3125, 3125, 3125, 3125, 3125, 3125, 3125, 3125]
------------------------------------------------------------------------
CPU          op  size indices  single_input  setup_time  first_run_time  rest_run_time    correct
        product 750182   Yes         No      326970.99        94056.85        1760.23       True
       products 625292   Yes         No      289457.04       343305.91        2019.92       True
  perm_products   282    No         No         171.33          445.59         316.98       True
  perm_products   282   Yes         No         170.98          432.38         328.86       True
  perm_products   435   Yes        Yes         235.15          495.91         379.11       True
 products_layer   138   Yes         No        5571.58         1060.60        1063.77       True
 products_layer    38   Yes        Yes         947.10         1116.14        1011.80       True
GPU          op  size indices  single_input  setup_time  first_run_time  rest_run_time    correct
        product 750182   Yes         No      340425.74        97985.53        2965.77       True
       products 625292   Yes         No      283081.47       347562.34        5649.17       True
  perm_products   282    No         No         171.30          123.80          51.56       True
  perm_products   282   Yes         No         171.67          116.61          50.64       True
  perm_products   435   Yes        Yes         234.74          140.44          47.26       True
 products_layer   138   Yes         No        5581.78           91.02          54.34       True
 products_layer    38   Yes        Yes         940.51           64.08          51.36       True

------------------------------------------------------------------------
NON-PADDED
InferenceType: MARGINAL-LOG
num_inputs: [5, 5, 5, 5, 5, 5, 5, 5, 5, 5]
num_prods: [3125, 3125, 3125, 3125, 3125, 3125, 3125, 3125, 3125, 3125]
------------------------------------------------------------------------
CPU          op  size indices  single_input  setup_time  first_run_time  rest_run_time    correct
        product 750245   Yes         No      337322.48        92822.89        1767.17       True
       products 625355   Yes         No      282556.16       347116.62        2035.92       True
  perm_products   345    No         No         187.37          492.23         347.76       True
  perm_products   345   Yes         No         186.70          466.18         337.36       True
  perm_products   449   Yes        Yes         234.43          473.36         399.30       True
 products_layer   201   Yes         No        5359.41         1164.09        1091.25       True
 products_layer    52   Yes        Yes         927.65          933.24        1053.17       True
GPU          op  size indices  single_input  setup_time  first_run_time  rest_run_time    correct
        product 750245   Yes         No      337619.45        95252.60        2884.61       True
       products 625355   Yes         No      276265.29       353030.89        5568.78       True
  perm_products   345    No         No         186.76          122.87          53.48       True
  perm_products   345   Yes         No         186.87          164.51          53.38       True
  perm_products   449   Yes        Yes         233.48          130.09          48.67       True
 products_layer   201   Yes         No        5341.68           96.11          54.27       True
 products_layer    52   Yes        Yes         920.32           67.12          53.15       True

--------------------------------------------------------------
PADDED
InferenceType: MARGINAL
num_inputs: [2, 3, 4, 5, 4, 3, 2, 3, 4, 5]
num_prods: [25, 125, 625, 3125, 625, 125, 25, 125, 625, 3125]
--------------------------------------------------------------
CPU          op  size indices  single_input  setup_time  first_run_time  rest_run_time    correct
        product 194237   Yes         No       94674.83        22815.26         473.13       True
       products 160147   Yes         No       71653.87        72491.76         643.23       True
  perm_products   252    No         No         107.76          302.28         240.91       True
  perm_products   252   Yes         No         122.78          276.25         222.67       True
  perm_products   360   Yes        Yes         171.33          301.02         233.77       True
 products_layer   108   Yes         No        4658.76          436.08         435.16       True
 products_layer    38   Yes        Yes         355.13          415.69         392.69       True
GPU          op  size indices  single_input  setup_time  first_run_time  rest_run_time    correct
        product 194237   Yes         No       87654.73        22038.33         700.53       True
       products 160147   Yes         No       71287.60        72703.72         650.39       True
  perm_products   252    No         No         119.90           54.57          18.50       True
  perm_products   252   Yes         No         120.88           59.52          18.33       True
  perm_products   360   Yes        Yes         169.61           67.42          15.32       True
 products_layer   108   Yes         No        1158.40           38.18          21.26       True
 products_layer    38   Yes        Yes         275.66           25.27          16.29       True

--------------------------------------------------------------
PADDED
InferenceType: MARGINAL-LOG
num_inputs: [2, 3, 4, 5, 4, 3, 2, 3, 4, 5]
num_prods: [25, 125, 625, 3125, 625, 125, 25, 125, 625, 3125]
--------------------------------------------------------------
CPU          op  size indices  single_input  setup_time  first_run_time  rest_run_time    correct
        product 194285   Yes         No       81847.55        20454.00         476.21       True
       products 160195   Yes         No       70388.23        74397.83         659.00       True
  perm_products   300    No         No         121.36          274.26         227.51       True
  perm_products   300   Yes         No         138.66          307.67         236.26       True
  perm_products   374   Yes        Yes         152.15          304.46         245.76       True
 products_layer   156   Yes         No        1087.28          500.73         449.69       True
 products_layer    52   Yes        Yes         243.54          452.36         421.52       True
GPU          op  size indices  single_input  setup_time  first_run_time  rest_run_time    correct
        product 194285   Yes         No       85890.92        21046.80         581.75       True
       products 160195   Yes         No       70913.65        73223.11         695.57       True
  perm_products   300    No         No         120.41           54.30          17.95       True
  perm_products   300   Yes         No         136.22           57.48          18.03       True
  perm_products   374   Yes        Yes         172.37           54.07          14.37       True
 products_layer   156   Yes         No        1083.38           48.67          21.47       True
 products_layer    52   Yes        Yes         257.40           27.48          16.77       True

Here, PermProds nodes with single-input (which can only be the case for the input layer of a raw-inputs distribution SPN) performs slightly (~10%) better than ProductsLayer node. Although, there is between 5x and 7x reduction in TF graph-size.

Extensions

  • Given a list of any product type nodes, should be able to automatically convert it into a single ProductsLayer node
  • Current way of handling heterogeneous input-sizes case, where products modeled do not all have the same input-size, involves padding residual columns with 0's or 1's, and then reduce-summing or reduce-multiplying. As disparity between product input-sizes grow, it negatively impacts performance, because of all the extra summing or multiplying of padded-columns. A better approach would be to use the TF's Segment class of ops (Eg: tf.segment_sum or tf.segment_prod). But currently these TF ops do not have an axis parameter, and hence only supports segmented operation of the first dimension of the tensor. So, extending the OpKernal to support segmented operations on the second dimension would be an interesting option to pursue.

Decision

Clone this wiki locally