# oplib-optimize **Repository Path**: zeasa/oplib-optimize ## Basic Information - **Project Name**: oplib-optimize - **Description**: dl 几个算子的学习目的实现 - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 1 - **Forks**: 0 - **Created**: 2022-07-02 - **Last Updated**: 2024-10-12 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # oplib-optimize this library include 3 deep learning operator inplementations, including conv2D,relu and pooling, single layer as while as fused version. 1.env needed ``` a.ubuntu18.04.5 with gcc 7.5.0 b.cmake 3.10.2 or above c.intel oneMKL 2022.1.0 ``` 2.how to build ``` cd oplib-optimize mkdir -p build cd build cmake .. make -j ``` 3.run test ``` cd oplib-optimize cd build ctest -V ``` 4.how to run ``` cd oplib-optimize cd build/example ./benchmark_net_conv2d_relu_pooling_nofuse -d ./benchmark_net_conv2d_relu_pooling_fuse -d ``` then can get the benchmark report with in the console like this: ``` [DEBUG]: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [DEBUG]: dump mode = [false] [DEBUG]: peak avx2 vmul+vadd computation performance(1 thread) is [36.693602] GFLOPS/s [DEBUG]: peak fpu computation performance(1 thread) is [4.584952] GFLOPS/s [DEBUG]: allocate buffers for input/output/intermediate tensors! [DEBUG]: generate test data for [pbuf_ifm_conv] [DEBUG]: generate test data for [pbuf_wt_conv] [DEBUG]: generate test data for [pbuf_bs_conv] [DEBUG]: oplib_layer_conv2d_s1 param : IN=[4],IH=[512],IW=[512],IC=[32],KW=[3],KH=[3],OC=[16],gflops=[9.663676] [DEBUG]: oplib_layer_relu param : IN=[4],IH=[512],IW=[512],IC=[16],OH=[512],OW=[512],OC=[16],gflops=[0.016777] [DEBUG]: oplib_layer_avgpool param : IN=[4],IH=[512],IW=[512],IC=[16],OH=[256],OW=[256],OC=[16],gflops=[0.016777] [DEBUG]: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [DEBUG]: report : oplib_layer_conv2d_s1 cost [7.991605] seconds in avg within [1] iters [DEBUG]: report : oplib_layer_relu cost [0.009072] seconds in avg within [200] iters [DEBUG]: report : oplib_layer_avgpool cost [0.033691] seconds in avg within [200] iters [DEBUG]: report : oplib_layer_conv2d_s1_omp cost [1.044546] seconds in avg within [1] iters [DEBUG]: report : oplib_layer_relu_omp cost [0.006228] seconds in avg within [200] iters [DEBUG]: report : oplib_layer_avgpool_omp cost [0.004644] seconds in avg within [200] iters [DEBUG]: report : oplib_layer_conv2d_s1 calculation profermance is [1.209229] GFLOPS/s [DEBUG]: report : oplib_layer_relu calculation profermance is [1.849413] GFLOPS/s [DEBUG]: report : oplib_layer_avgpool calculation profermance is [0.497979] GFLOPS/s [DEBUG]: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ``` notice: the '-d' option will print the input and output tensor contents with in the console 5.more about this code library uses openmp feature of gcc to optimize the execute speed of the program.