-
Notifications
You must be signed in to change notification settings - Fork 3
LIP0014
LIP | 14 |
---|---|
Title | Products node with heterogeneous input sizes |
Author | A. Ranganath |
Status | Draft |
Type | Standard |
Discussion | Issue #46 |
PR | #48 |
Created | March 9, 2018 |
This is the product equivalent of multi-ops node SumsLayer
. The interface and the reason behind both ops are common, however, they differ in implementation.
The current implementation of multi-ops Products
node, which can model multiple product operations, accepts either a single input or list of inputs, concatenates them into a single wide 2D tensor, reshapes it to (batch x num-ops x input-size)
, and then performances reduction operation on the last dimension for value calculation.
The underlying assumption with Products
node is that all the ops modeled within, have a common inputs-size. This assumption falls through when: (1) you want to model an entire layer of an SPN graph's tree structure, or (2) while pruning an SPN graph with multi-op nodes. This deficit in the current design facilitates a need for a multi-op node that can model multiple product operations of varying input-sizes.
We want to implement a new type of multi-ops node - ProductsLayer
- for modeling multiple product operations with heterogeneous input-sizes. The parameter num_prods
in the current Products
class accepts the number of product ops to be modelled. This parameter will be replaces with num_or_size_prods
, which, as a list would encode the input-size of each product operation modeled, or as an int would be treated similar to the parameter num_prods
(indicating that each product op modeled would have a common input-size).
Custom tf-op GatherColumns3D
will be used to gather subsets of columns, per product operation modeled, shaped to (batch x num-ops x max(input-sizes))
, and then reduction operation performed for value calculation.
For products that have input-size less than max(input-size)
, residual columns will be padded with either 0's or 1's, for value calculation in log or non-log space respectively, which would be done implicitly by the GatherColumns3D
operation.
Similarly for path calculation, subsets or columns from the 2D counts tensor are gathered, one per unique (input, index) pair in the node's input-list, into a 3D tensor and reduce-sum operation is performed on the 3rd axis to add-up counts emanating from all the parents of each unique (input, index) pair, before split and scattered to unique inputs in the inputs-list.
The proposed new ProductsLayer
multi-op node would contain the following changes compared to the Products
node class:
- Create a
ProductsLayer
class that can model products with heterogeneous input-sizes - Introduce parameter
num_or_size_prods
, which as a list would indicate input-sizes of each product modeled, or number of products modeled as an int. - Modify the following methods:
_compute_value_common
_compute_value
_compute_log_value
_compute_mpe_path
After this, we should be able to do replace the following
X = spn.ContVars(num_vars=6, name="X")
prods_1 = spn.Products((X, [0, 2, 4, 1, 3, 5]), num_prods=1, name="prods_1") # input-sizes = [6]
prods_2 = spn.Products((X, [1, 3, 5, 0, 2, 4]), num_prods=2, name="prods_2") # input-sizes = [3, 3]
prods_3 = spn.Products((X, [3, 4, 5, 0, 1, 2]), num_prods=3, name="prods_3") # input-sizes = [2, 2, 2]
with:
X = spn.ContVars(num_vars=6, name="X")
prods_1 = spn.Products((X, [0, 2, 4, 1, 3, 5]), (X, [1, 3, 5, 0, 2, 4]),
(X, [3, 4, 5, 0, 1, 2]), num_or_size_prods=[6, 3, 3, 2, 2, 2],
name="prods_1")
The parameter num_or_size_prods
, as a list would encode the input-sizes of each product operation modeled. The number of products modeled would then be len(num_or_size_prods)
.
Following are performance test results comparing value-computation between different product node types, performed on a machine with E5-1650v3 | GTX1080TI
. Test cases include value calculation in log and non-log space, and for non-padded and padded (homogeneous and heterogeneous input-sizes respectively) cases. Batch size for the test is 1000
, and evaluated for 50
runs.
Here, for the non-padded case, total number of products modeled in the layer is 5⁵ x 10 = 3.125 x 10⁴
product operations. All 3.125 x 10⁴
products are modeled in a single ProductsLayer
node. While there are 10 PermProds
or Products
nodes respectively, each modeling 3.125 x 10³
products ops, and 3.125 x 10⁴
individual Product
nodes, modeling one product operation each.
Similarly, for the padded case, total number of products modeled in the layer are 8.55 x 10³
.
------------------------------------------------------------------------
NON-PADDED
InferenceType: MARGINAL
num_inputs: [5, 5, 5, 5, 5, 5, 5, 5, 5, 5]
num_prods: [3125, 3125, 3125, 3125, 3125, 3125, 3125, 3125, 3125, 3125]
------------------------------------------------------------------------
CPU op size indices single_input setup_time first_run_time rest_run_time correct
product 437619 Yes No 169068.66 38823.31 378.97 True
products 312679 Yes No 121881.48 304540.42 250.39 True
perm_products 199 No No 78.25 229.58 138.40 True
perm_products 299 Yes No 111.60 287.52 146.37 True
perm_products 201 Yes Yes 94.05 248.98 142.94 True
products_layer 123 Yes No 3567.80 605.04 564.50 True
products_layer 23 Yes Yes 6166.31 606.10 482.46 True
GPU op size indices single_input setup_time first_run_time rest_run_time correct
product 437619 Yes No 178614.85 40375.54 1940.26 True
products 312679 Yes No 120312.67 302684.60 36.64 True
perm_products 199 No No 78.12 55.87 23.66 True
perm_products 299 Yes No 111.09 69.16 23.76 True
perm_products 201 Yes Yes 93.94 56.13 20.30 True
products_layer 123 Yes No 3655.28 40.88 24.74 True
products_layer 23 Yes Yes 692.47 24.62 18.42 True
------------------------------------------------------------------------
NON-PADDED
InferenceType: MARGINAL-LOG
num_inputs: [5, 5, 5, 5, 5, 5, 5, 5, 5, 5]
num_prods: [3125, 3125, 3125, 3125, 3125, 3125, 3125, 3125, 3125, 3125]
------------------------------------------------------------------------
CPU op size indices single_input setup_time first_run_time rest_run_time correct
product 437682 Yes No 173310.67 37997.24 434.48 False
products 312742 Yes No 121119.08 310505.22 314.10 False
perm_products 262 No No 92.83 295.61 229.09 False
perm_products 362 Yes No 125.34 340.07 231.62 False
perm_products 215 Yes Yes 92.94 307.48 222.74 False
products_layer 186 Yes No 8882.09 626.43 605.27 False
products_layer 37 Yes Yes 763.38 531.15 578.62 False
GPU op size indices single_input setup_time first_run_time rest_run_time correct
product 437682 Yes No 174900.02 40615.28 1894.39 True
products 312742 Yes No 116497.61 308969.12 41.45 True
perm_products 262 No No 93.30 60.14 26.24 True
perm_products 362 Yes No 124.69 70.04 25.70 True
perm_products 215 Yes Yes 93.15 52.91 22.53 True
products_layer 186 Yes No 3747.06 44.71 27.36 True
products_layer 37 Yes Yes 603.29 27.68 20.74 True
--------------------------------------------------------------
PADDED
InferenceType: MARGINAL
num_inputs: [2, 3, 4, 5, 4, 3, 2, 3, 4, 5]
num_prods: [25, 125, 625, 3125, 625, 125, 25, 125, 625, 3125]
--------------------------------------------------------------
CPU op size indices single_input setup_time first_run_time rest_run_time correct
product 114239 Yes No 43468.50 9164.96 76.60 True
products 80099 Yes No 30489.87 63484.72 64.49 True
perm_products 169 No No 57.25 146.15 131.45 True
perm_products 239 Yes No 92.27 167.72 124.45 True
perm_products 171 Yes Yes 78.69 149.60 130.51 True
products_layer 93 Yes No 962.92 225.97 224.66 True
products_layer 23 Yes Yes 240.93 222.76 150.36 True
GPU op size indices single_input setup_time first_run_time rest_run_time correct
product 114239 Yes No 44641.18 9775.85 517.66 True
products 80099 Yes No 29748.52 63156.28 10.81 True
perm_products 169 No No 66.20 30.59 9.34 True
perm_products 239 Yes No 91.71 36.27 9.44 True
perm_products 171 Yes Yes 67.21 22.73 6.73 True
products_layer 93 Yes No 913.59 16.33 9.67 True
products_layer 23 Yes Yes 244.87 10.19 6.26 True
--------------------------------------------------------------
PADDED
InferenceType: MARGINAL-LOG
num_inputs: [2, 3, 4, 5, 4, 3, 2, 3, 4, 5]
num_prods: [25, 125, 625, 3125, 625, 125, 25, 125, 625, 3125]
--------------------------------------------------------------
CPU op size indices single_input setup_time first_run_time rest_run_time correct
product 114287 Yes No 43500.43 8936.22 88.80 True
products 80147 Yes No 30635.42 64758.48 76.79 True
perm_products 217 No No 72.60 182.51 138.91 True
perm_products 287 Yes No 109.52 184.04 142.52 True
perm_products 185 Yes Yes 83.99 136.13 137.69 True
products_layer 141 Yes No 888.07 176.53 228.94 True
products_layer 37 Yes Yes 204.79 172.45 179.84 True
GPU op size indices single_input setup_time first_run_time rest_run_time correct
product 114287 Yes No 44684.13 9587.35 405.56 True
products 80147 Yes No 29858.64 64512.84 12.18 True
perm_products 217 No No 83.12 35.08 10.42 True
perm_products 287 Yes No 109.91 40.67 10.06 True
perm_products 185 Yes Yes 85.03 29.53 6.68 True
products_layer 141 Yes No 906.27 22.07 10.80 True
products_layer 37 Yes Yes 214.22 12.43 6.65 True
Performance results show that there is between 5x and 7x reduction in TF graph-size, across the board, while a slight (~10%) improvement in performance for the non-padded (homogeneous input-sizes) case, and similar performance for the padded (heterogeneous input-sizes) case.
Similarly, performance test results for path-calculation among product node types, with the same structure and parameters as above, are as follows:
------------------------------------------------------------------------
NON-PADDED
InferenceType: MARGINAL
num_inputs: [5, 5, 5, 5, 5, 5, 5, 5, 5, 5]
num_prods: [3125, 3125, 3125, 3125, 3125, 3125, 3125, 3125, 3125, 3125]
------------------------------------------------------------------------
CPU op size indices single_input setup_time first_run_time rest_run_time correct
product 750182 Yes No 326970.99 94056.85 1760.23 True
products 625292 Yes No 289457.04 343305.91 2019.92 True
perm_products 282 No No 171.33 445.59 316.98 True
perm_products 282 Yes No 170.98 432.38 328.86 True
perm_products 435 Yes Yes 235.15 495.91 379.11 True
products_layer 138 Yes No 5571.58 1060.60 1063.77 True
products_layer 38 Yes Yes 947.10 1116.14 1011.80 True
GPU op size indices single_input setup_time first_run_time rest_run_time correct
product 750182 Yes No 340425.74 97985.53 2965.77 True
products 625292 Yes No 283081.47 347562.34 5649.17 True
perm_products 282 No No 171.30 123.80 51.56 True
perm_products 282 Yes No 171.67 116.61 50.64 True
perm_products 435 Yes Yes 234.74 140.44 47.26 True
products_layer 138 Yes No 5581.78 91.02 54.34 True
products_layer 38 Yes Yes 940.51 64.08 51.36 True
------------------------------------------------------------------------
NON-PADDED
InferenceType: MARGINAL-LOG
num_inputs: [5, 5, 5, 5, 5, 5, 5, 5, 5, 5]
num_prods: [3125, 3125, 3125, 3125, 3125, 3125, 3125, 3125, 3125, 3125]
------------------------------------------------------------------------
CPU op size indices single_input setup_time first_run_time rest_run_time correct
product 750245 Yes No 337322.48 92822.89 1767.17 True
products 625355 Yes No 282556.16 347116.62 2035.92 True
perm_products 345 No No 187.37 492.23 347.76 True
perm_products 345 Yes No 186.70 466.18 337.36 True
perm_products 449 Yes Yes 234.43 473.36 399.30 True
products_layer 201 Yes No 5359.41 1164.09 1091.25 True
products_layer 52 Yes Yes 927.65 933.24 1053.17 True
GPU op size indices single_input setup_time first_run_time rest_run_time correct
product 750245 Yes No 337619.45 95252.60 2884.61 True
products 625355 Yes No 276265.29 353030.89 5568.78 True
perm_products 345 No No 186.76 122.87 53.48 True
perm_products 345 Yes No 186.87 164.51 53.38 True
perm_products 449 Yes Yes 233.48 130.09 48.67 True
products_layer 201 Yes No 5341.68 96.11 54.27 True
products_layer 52 Yes Yes 920.32 67.12 53.15 True
--------------------------------------------------------------
PADDED
InferenceType: MARGINAL
num_inputs: [2, 3, 4, 5, 4, 3, 2, 3, 4, 5]
num_prods: [25, 125, 625, 3125, 625, 125, 25, 125, 625, 3125]
--------------------------------------------------------------
CPU op size indices single_input setup_time first_run_time rest_run_time correct
product 194237 Yes No 94674.83 22815.26 473.13 True
products 160147 Yes No 71653.87 72491.76 643.23 True
perm_products 252 No No 107.76 302.28 240.91 True
perm_products 252 Yes No 122.78 276.25 222.67 True
perm_products 360 Yes Yes 171.33 301.02 233.77 True
products_layer 108 Yes No 4658.76 436.08 435.16 True
products_layer 38 Yes Yes 355.13 415.69 392.69 True
GPU op size indices single_input setup_time first_run_time rest_run_time correct
product 194237 Yes No 87654.73 22038.33 700.53 True
products 160147 Yes No 71287.60 72703.72 650.39 True
perm_products 252 No No 119.90 54.57 18.50 True
perm_products 252 Yes No 120.88 59.52 18.33 True
perm_products 360 Yes Yes 169.61 67.42 15.32 True
products_layer 108 Yes No 1158.40 38.18 21.26 True
products_layer 38 Yes Yes 275.66 25.27 16.29 True
--------------------------------------------------------------
PADDED
InferenceType: MARGINAL-LOG
num_inputs: [2, 3, 4, 5, 4, 3, 2, 3, 4, 5]
num_prods: [25, 125, 625, 3125, 625, 125, 25, 125, 625, 3125]
--------------------------------------------------------------
CPU op size indices single_input setup_time first_run_time rest_run_time correct
product 194285 Yes No 81847.55 20454.00 476.21 True
products 160195 Yes No 70388.23 74397.83 659.00 True
perm_products 300 No No 121.36 274.26 227.51 True
perm_products 300 Yes No 138.66 307.67 236.26 True
perm_products 374 Yes Yes 152.15 304.46 245.76 True
products_layer 156 Yes No 1087.28 500.73 449.69 True
products_layer 52 Yes Yes 243.54 452.36 421.52 True
GPU op size indices single_input setup_time first_run_time rest_run_time correct
product 194285 Yes No 85890.92 21046.80 581.75 True
products 160195 Yes No 70913.65 73223.11 695.57 True
perm_products 300 No No 120.41 54.30 17.95 True
perm_products 300 Yes No 136.22 57.48 18.03 True
perm_products 374 Yes Yes 172.37 54.07 14.37 True
products_layer 156 Yes No 1083.38 48.67 21.47 True
products_layer 52 Yes Yes 257.40 27.48 16.77 True
Here, PermProds
nodes with single-input (which can only be the case for the input layer of a raw-inputs distribution SPN) performs slightly (~10%) better than ProductsLayer
node. Although, there is between 5x and 7x reduction in TF graph-size.
- Given a list of any product type nodes, should be able to automatically convert it into a single
ProductsLayer
node - Current way of handling heterogeneous input-sizes case, where products modeled do not all have the same input-size, involves padding residual columns with 0's or 1's, and then reduce-summing or reduce-multiplying. As disparity between product input-sizes grow, it negatively impacts performance, because of all the extra summing or multiplying of padded-columns. A better approach would be to use the TF's
Segment
class of ops (Eg:tf.segment_sum
ortf.segment_prod
). But currently these TF ops do not have anaxis
parameter, and hence only supports segmented operation of the first dimension of the tensor. So, extending the OpKernal to support segmented operations on the second dimension would be an interesting option to pursue.