Using FINN to translate large, sparse MLPs #957
Unanswered
sdittmeier
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi all,
I am looking for some input, as to what I'm doing is meaningful, or if there is a better way to do it.
I have trained a large MLP, quantized with Brevitas, applying Quantization Aware Training.
During training, I pruned it down iteratively, from originally 800 k Parameters, to sth about 15 k non-zero Parameters, while retaining performance.
The layers have a dimensionality of 3 - 512 - 512 - 512 - 512 - 12
with bit widths of 4 - 6 bits for weights and activations.
I have created a build dataflow, targeting an Alveo U280 card.
And the build dataflow works quite nicely, when targeting rather low values of target_fps and mvau_wwidth --> but like this, it cannot really make use of the sparsity.
My goal was now to create a fully unrolled version of this MLP, to understand the resource usage of this very sparse network.
The build dataflow then gets to the step of generating the HLS IP --> and this is now an extremely long process;
I understand that the underlying HLS synthesis tool, when trying to unroll the individual layers, has to go through these 512x512 dimensional loops, which of course is not "ideal" --> in the end, a lot of the multiplications are synthesized away, since they are with 0's. But it does not know this from the start; I assume this is not intended to be used in such a way?
Anyways, most of the layers have actually finished HLS synthesis (within a month!), one layer is still pending, with the last message in the vitis_hls.log file like, from weeks ago
INFO: [XFORM 203-602] Inlining function 'comp::less_equal<ap_int<13>, ap_int<13> >::operator()' into 'Matrix_Vector_Activate_Batch<512u, 512u, 512u, 512u, 1u, Slice<ap_uint<4>, 4u>, Slice<ap_uint<6>, 6u>, Identity, ap_uint<2048>, ap_uint<3072>, FixedPointWeights<512u, ap_int<4>, 512u, 1u>, ThresholdsActivation<1u, 512u, 63u, ap_int<13>, ap_uint<6>, 0, comp::less_equal<ap_int<13>, ap_int<13> > >, ap_resource_lut>' (/home/sebastian/git/finn/deps/finn-hlslib/activations.hpp:218->/home/sebastian/git/finn/deps/finn-hlslib/mvau.hpp:168) automatically.
So what I'm asking is, if anyone has tried sth similar, or a better suggestion for translating this network.
I'm also not sure if I should simply trust the process to continue the HLS synthesis.
Is it possible to reuse the already finished layers, and only synthesize this one missing again, so intervening manually in the build dataflow?
Any input is very welcome! I'm also happy to share more information if helpful.
Beta Was this translation helpful? Give feedback.
All reactions