JeVois  1.20
JeVois Smart Embedded Machine Vision Toolkit
Share this page:
JeVois-Pro Deep Neural Network Benchmarks

JeVois-Pro neural network backends

The measurements below were made on a JeVois-Pro smart camera running JeVois v1.18.0 (September 2022).

  • OpenCV: network loaded by OpenCV DNN framework and running on CPU.
  • ORT: network loaded by ONNX Runtime framework and running on CPU.
  • NPU: network running native on the JeVois-Pro integrated 5-TOPS NPU (neural processing unit).
  • TPU: network running on the optional 4-TOPS Google Coral TPU accelerator (tensor processing unit).
  • SPU: network running on the optional 26-TOPS Hailo8 SPU accelerator (stream processing unit).
  • VPU: network running on the optional 1-TOPS MyriadX VPU accelerator (vector processing unit).
  • NPUX: network loaded by OpenCV and running on NPU via the TIM-VX OpenCV extension. To run efficiently, network should have been quantized to int8, otherwise some slow CPU-based emulation will occur.
  • VPUX: network optimized for VPU but running on CPU if VPU is not available. Note that VPUX entries are automatically created by scanning all VPU entries and changing their target from Myriad to CPU, if a VPU accelerator is not detected. If a VPU is detected, then VPU models are listed and VPUX ones are not. VPUX emulation runs on the JeVois-Pro CPU using the Arm Compute Library to provide efficient implementation of various network layers and operations.

Benchmarking conditions

  • Display was on and 1920x1080/60Hz. Operation is a bit slower if you enable 4K display, likely because of higher contension on the memory bus.
  • The DNN module was used, with 1920x1080 YUYV video capture for display purposes, and 1024x576 RGB24 capture for vision processing.
  • Batch size is always 1, i.e., we measure the round-trip time to pre-process, infer, and post-process one frame at a time. Higher performance is usually achieved with larger batch size, but this is not a real-time scenario (would lead to larger delays between when a video frame is captured and when the inference results are available and displayed).
  • These benchmarks are for JeVois-Pro only and not meant to be representative of a particular accelerator's peak performance. In particular:
    • The Myriad-X VPU used was a USB dongle connected to JeVois-Pro over a 480 Mbit/s USB 2.0 link. The dongle supports 5 GBit/s USB 3.0 but the JeVois-Pro CPU has no available USB 3.0 port.
    • The NPU is integrated into the Amlogic A311D processor of JeVois-Pro and hence has the highest memory bandwidth (direct memory access to the main RAM of the processor), and highest available memory (up to 4 GBytes of main RAM).
    • Coral Edge TPU and Hailo-8 SPU were M.2 2230 A+E cards optionally installed inside JeVois-Pro. Data transfer is over PCIe at 5 GBits/s. Note that Hailo-8 can support up to PCIe x4 but the A311D processor of JeVois-Pro only has one PCIe x1 lane. Note also that Hailo-8 can support larger PCIe transaction packets (up to 4 Kbytes) than the A311D can provide (only up to 256 bytes).
    • Coral Edge TPU has only about 6.5 MBytes of usable RAM on chip. Thus, for larger networks, performance is slower as some of the weights may need to constantly be loaded/unloaded over PCIe on every video frame. See, e.g., 45 fps for Inception-V3 on 5-TOPS NPU vs. only 21 fps on 4-TOPS TPU, as model size is about 25 MBytes.
    • You can only install one M.2 2230 A+E card inside JeVois-Pro, so you have to choose between a Hailo-8 card, or a single-TPU card, or a dual-TPU card (only dual-TPU cards made by JeVois will work; the dual-TPU card made by Google requires a PCIe x2 link while JeVois-Pro only has PCIe x1).
    • The PreProc time includes resizing the input video (1024x576 RGB24) to the network's input size, and possibly swapping RGB/BGR order, NCHW/NHWC order, mean subtraction, normalization by scale factor and/or stdev, and quantization to the network's desired data type.
    • The Network inference time includes data transfer from main memory to device, on-chip inference, data transfer of outputs back to main memory, and possibly dequantization to float32.
    • The PostProc time includes decoding of network outputs (e.g., decoding YOLO boxes from raw YOLO layer outputs), and drawing results using OpenGL.

Benchmark results

PipelineInputOutputPreProcNetworkPostProcTotal

FPS

NPU:Classify:Inception-V3 4D 1x299x299x3 8U2D 1x1001 32F1.0 +/- 0.0 ms21.0 +/- 0.0 ms132.5 +/- 16.1 us22.2 +/- 0.1 ms45.1 fps
NPU:Classify:MobileNet-V1 4D 1x224x224x3 8S2D 1x1001 32F778.2 +/- 41.9 us8.0 +/- 0.1 ms135.4 +/- 19.1 us8.9 +/- 0.1 ms112.7 fps
NPU:Detect:Yolo-Face-DFP 4D 1x3x416x416 8S4D 1x30x13x13 32F3.3 +/- 0.8 ms7.1 +/- 0.1 ms249.3 +/- 97.0 us10.7 +/- 0.8 ms93.8 fps
NPU:Detect:YoloV3-Tiny-DFP 4D 1x3x416x416 8S4D 1x255x13x13 32F, 4D 1x255x26x26 32F3.5 +/- 0.7 ms7.3 +/- 0.2 ms340.8 +/- 246.5 us11.1 +/- 0.8 ms90.1 fps
NPU:Detect:YoloV4-DFP 4D 1x3x416x416 8S4D 1x255x52x52 32F, 4D 1x255x26x26 32F, 4D 1x255x13x13 32F3.8 +/- 0.6 ms164.7 +/- 0.3 ms671.9 +/- 137.9 us169.2 +/- 0.6 ms5.9 fps
NPU:Detect:YoloV3-DFP 4D 1x3x416x416 8S4D 1x255x13x13 32F, 4D 1x255x26x26 32F, 4D 1x255x52x52 32F4.0 +/- 0.3 ms86.6 +/- 0.2 ms1.0 +/- 0.9 ms91.6 +/- 1.0 ms10.9 fps
NPU:Detect:YoloV2-DFP 4D 1x3x416x416 8S4D 1x425x13x13 32F3.4 +/- 0.8 ms22.6 +/- 0.1 ms369.1 +/- 114.4 us26.4 +/- 0.9 ms37.8 fps
NPU:Detect:YoloV7-Tiny-AA 4D 1x3x416x416 8U4D 1x255x52x52 32F, 4D 1x255x26x26 32F, 4D 1x255x13x13 32F2.7 +/- 0.8 ms18.4 +/- 0.2 ms538.1 +/- 97.4 us21.6 +/- 0.9 ms46.3 fps
NPU:Detect:YoloV7-Tiny-DFP 4D 1x3x416x416 8S4D 1x255x52x52 32F, 4D 1x255x26x26 32F, 4D 1x255x13x13 32F4.0 +/- 0.2 ms44.1 +/- 0.2 ms691.1 +/- 179.9 us48.9 +/- 0.3 ms20.5 fps
NPU:Detect:yolov7-tiny-512x288 4D 1x3x288x512 8U4D 1x255x36x64 32F, 4D 1x255x18x32 32F, 4D 1x255x9x16 32F3.2 +/- 0.5 ms14.7 +/- 0.2 ms591.0 +/- 95.8 us18.6 +/- 0.6 ms53.9 fps
NPU:Detect:yolov7-tiny-1024x576 4D 1x3x576x1024 8U4D 1x255x72x128 32F, 4D 1x255x36x64 32F, 4D 1x255x18x32 32F1.2 +/- 0.1 ms62.0 +/- 0.5 ms1.0 +/- 0.7 ms64.2 +/- 0.9 ms15.6 fps
NPU:Detect:yolov2-coco 4D 1x3x416x416 8U4D 1x425x13x13 32F3.1 +/- 0.7 ms22.7 +/- 0.0 ms382.9 +/- 112.1 us26.2 +/- 0.7 ms38.2 fps
NPU:Detect:yolov2-voc 4D 1x3x416x416 8U4D 1x125x13x13 32F3.2 +/- 0.7 ms23.9 +/- 0.1 ms283.6 +/- 85.8 us27.4 +/- 0.7 ms36.5 fps
NPU:Detect:yolov3-tiny 4D 1x3x416x416 8U4D 1x255x13x13 32F, 4D 1x255x26x26 32F3.0 +/- 0.8 ms7.7 +/- 0.2 ms405.9 +/- 423.2 us11.1 +/- 0.9 ms90.4 fps
NPU:Detect:yolov4-tiny 4D 1x3x416x416 8U4D 1x255x13x13 32F, 4D 1x255x26x26 32F2.9 +/- 0.8 ms11.0 +/- 0.1 ms424.9 +/- 134.8 us14.3 +/- 0.8 ms69.7 fps
NPU:Detect:yolov3-spp 4D 1x3x608x608 8U4D 1x255x19x19 32F, 4D 1x255x38x38 32F, 4D 1x255x76x76 32F4.0 +/- 1.1 ms197.0 +/- 1.1 ms931.1 +/- 247.5 us201.9 +/- 1.7 ms5.0 fps
NPU:Detect:yolov4-csp-x-swish 4D 1x3x640x640 8U4D 1x255x80x80 32F, 4D 1x255x40x40 32F, 4D 1x255x20x20 32F4.1 +/- 1.6 ms307.0 +/- 1.6 ms1.1 +/- 1.8 ms312.2 +/- 2.6 ms3.2 fps
NPU:Python:yolov7-tiny-512x288-PyPost 4D 1x3x288x512 8U4D 1x255x36x64 32F, 4D 1x255x18x32 32F, 4D 1x255x9x16 32F2.8 +/- 0.9 ms14.0 +/- 0.1 ms3.0 +/- 0.5 ms19.8 +/- 1.2 ms50.4 fps
NPUX:Classify:ResNet-50-int8 4D 1x3x224x224 32F2D 1x1000 32F3.8 +/- 0.1 ms556.0 +/- 1.8 ms170.3 +/- 5.0 us560.0 +/- 1.8 ms1.8 fps
NPUX:Segment:PP-HumanSeg 4D 1x3x192x192 8U4D 1x2x192x192 32F589.5 +/- 34.5 us26.8 +/- 0.2 ms944.0 +/- 66.2 us28.3 +/- 0.3 ms35.3 fps
NPUX:YuNet:YuNet-Face-512x288 4D 1x3x288x512 8U2D 8448x2 32F, 2D 8448x1 32F, 2D 8448x14 32F1.6 +/- 0.4 ms16.1 +/- 0.5 ms222.3 +/- 75.4 us17.9 +/- 0.6 ms55.9 fps
NPUX:YuNet:YuNet-Face-768x432 4D 1x3x432x768 8U2D 18984x2 32F, 2D 18984x1 32F, 2D 18984x14 32F2.7 +/- 0.9 ms35.0 +/- 1.7 ms515.9 +/- 189.9 us38.2 +/- 2.1 ms26.2 fps
SPU:Classify:ResNext-50-32x4d 4D 1x224x224x3 8U4D 1x1000x1x1 32F721.5 +/- 56.7 us9.2 +/- 1.0 ms163.2 +/- 17.5 us10.1 +/- 1.0 ms99.4 fps
SPU:Classify:EfficientNet-Large 4D 1x300x300x3 8U4D 1x1001x1x1 32F2.1 +/- 0.1 ms23.6 +/- 0.3 ms132.5 +/- 17.3 us25.8 +/- 0.3 ms38.7 fps
SPU:Classify:EfficientNet-Medium 4D 1x240x240x3 8U4D 1x1001x1x1 32F1.4 +/- 0.0 ms6.2 +/- 1.4 ms130.1 +/- 15.3 us7.7 +/- 1.4 ms129.0 fps
SPU:Classify:EfficientNet-Small 4D 1x224x224x3 8U4D 1x1001x1x1 32F1.3 +/- 0.1 ms5.0 +/- 1.7 ms128.4 +/- 15.0 us6.4 +/- 1.7 ms155.7 fps
SPU:Classify:EfficientNet-Lite4 4D 1x300x300x3 8U4D 1x1000x1x1 32F2.1 +/- 0.1 ms27.6 +/- 0.2 ms133.8 +/- 38.4 us29.8 +/- 0.3 ms33.6 fps
SPU:Classify:Hardnet68 4D 1x224x224x3 8U4D 1x1000x1x1 32F657.7 +/- 40.5 us38.3 +/- 0.1 ms167.3 +/- 17.3 us39.1 +/- 0.1 ms25.6 fps
SPU:Classify:Inception-v1 4D 1x224x224x3 8U4D 1x1001x1x1 32F663.2 +/- 40.4 us3.1 +/- 1.9 ms129.4 +/- 15.4 us3.9 +/- 1.9 ms255.5 fps
SPU:Classify:MobileNetV3 4D 1x224x224x3 8U4D 1x1001x1x1 32F664.5 +/- 35.0 us3.7 +/- 2.0 ms129.3 +/- 14.3 us4.5 +/- 2.0 ms223.4 fps
SPU:Classify:ResNet-V1-50 4D 1x224x224x3 8U4D 1x1000x1x1 32F664.8 +/- 35.1 us3.8 +/- 1.9 ms129.5 +/- 17.5 us4.6 +/- 1.9 ms216.5 fps
SPU:Classify:ResNet-V2-34 4D 1x224x224x3 8U4D 1x1000x1x1 32F663.2 +/- 41.1 us4.7 +/- 1.9 ms171.2 +/- 21.2 us5.5 +/- 1.9 ms180.8 fps
SPU:Classify:SqueezeNet 4D 1x224x224x3 8U4D 1x1000x1x1 32F666.0 +/- 36.8 us1.8 +/- 2.3 ms131.5 +/- 21.1 us2.6 +/- 2.3 ms383.4 fps
SPU:Classify:ViT-Base 4D 1x224x224x3 8U4D 1x1000x1x1 32F669.6 +/- 32.1 us171.1 +/- 6.2 ms165.9 +/- 16.8 us172.0 +/- 6.2 ms5.8 fps
SPU:Classify:ViT-Tiny 4D 1x224x224x3 8U4D 1x1000x1x1 32F647.1 +/- 44.8 us11.0 +/- 0.5 ms167.7 +/- 19.3 us11.8 +/- 0.5 ms84.6 fps
SPU:Detect:YOLOv5m 4D 1x640x640x3 8U4D 1x80x80x255 32F, 4D 1x40x40x255 32F, 4D 1x20x20x255 32F3.1 +/- 1.2 ms19.9 +/- 0.8 ms1.9 +/- 0.4 ms25.0 +/- 1.5 ms40.0 fps
SPU:Detect:YOLOv3-tiny 4D 1x416x416x3 8U4D 1x13x13x255 32F, 4D 1x26x26x255 32F2.7 +/- 0.7 ms4.6 +/- 2.0 ms565.0 +/- 729.8 us7.8 +/- 2.2 ms127.6 fps
SPU:Detect:YOLOv5s 4D 1x640x640x3 8U4D 1x80x80x255 32F, 4D 1x40x40x255 32F, 4D 1x20x20x255 32F3.4 +/- 1.3 ms18.2 +/- 1.1 ms2.1 +/- 1.8 ms23.7 +/- 2.5 ms42.2 fps
SPU:Detect:YOLOv5xs 4D 1x512x512x3 8U4D 1x64x64x255 32F, 4D 1x32x32x255 32F, 4D 1x16x16x255 32F2.9 +/- 1.4 ms12.5 +/- 1.1 ms1.6 +/- 0.6 ms17.0 +/- 1.8 ms58.7 fps
SPU:Detect:YOLOv7 4D 1x640x640x3 8U4D 1x80x80x255 32F, 4D 1x40x40x255 32F, 4D 1x20x20x255 32F3.8 +/- 1.4 ms72.2 +/- 1.0 ms1.7 +/- 0.7 ms77.7 +/- 2.0 ms12.9 fps
SPU:Detect:YOLOv7-tiny 4D 1x640x640x3 8U4D 1x80x80x255 32F, 4D 1x40x40x255 32F, 4D 1x20x20x255 32F3.2 +/- 1.2 ms17.5 +/- 0.8 ms1.6 +/- 1.0 ms22.3 +/- 1.7 ms44.9 fps
SPU:Segment:DeepLabV3-MobileNetV2 4D 1x513x513x3 8U4D 1x513x513x21 8U2.2 +/- 1.2 ms45.2 +/- 1.3 ms17.3 +/- 4.5 ms64.6 +/- 5.1 ms15.5 fps
SPU:Segment:DeepLabV3-MobileNetV2-NoDilation 4D 1x513x513x3 8U4D 1x513x513x1 8U2.9 +/- 1.5 ms13.9 +/- 0.2 ms1.1 +/- 0.2 ms17.9 +/- 1.6 ms55.9 fps
SPU:Python:FastDepth 4D 1x224x224x3 8U4D 1x224x224x1 32F650.7 +/- 52.9 us5.4 +/- 1.6 ms2.0 +/- 0.1 ms8.0 +/- 1.6 ms124.7 fps
OpenCV:Classify:SqueezeNet 4D 1x3x227x227 32F4D 1x1000x1x1 32F991.0 +/- 224.7 us35.2 +/- 1.3 ms116.1 +/- 39.2 us36.3 +/- 1.4 ms27.6 fps
Python:Python:SqueezeNet 4D 1x3x227x227 32F2D 1000x1 32F3.7 +/- 0.4 ms36.9 +/- 1.7 ms336.1 +/- 9.7 us41.0 +/- 1.7 ms24.4 fps
OpenCV:Classify:Inception-V3 4D 1x3x299x299 32F2D 1x1001 32F3.1 +/- 0.3 ms514.0 +/- 6.1 ms145.8 +/- 30.3 us517.3 +/- 6.2 ms1.9 fps
OpenCV:Classify:GoogleNet 4D 1x3x224x224 32F2D 1x1000 32F1.6 +/- 0.0 ms139.8 +/- 2.7 ms144.0 +/- 24.3 us141.5 +/- 2.7 ms7.1 fps
OpenCV:Classify:ResNet-50-int8 4D 1x3x224x224 32F2D 1x1000 32F3.6 +/- 0.3 ms265.3 +/- 4.5 ms171.7 +/- 31.6 us269.1 +/- 4.5 ms3.7 fps
OpenCV:Detect:YoloV3-Tiny 4D 1x3x416x416 32F2D 507x85 32F, 2D 2028x85 32F3.7 +/- 0.4 ms207.2 +/- 6.2 ms821.2 +/- 275.3 us211.8 +/- 6.4 ms4.7 fps
OpenCV:Detect:YoloV2-Tiny-VOC 4D 1x3x416x416 32F2D 845x25 32F3.6 +/- 0.6 ms288.9 +/- 11.9 ms177.2 +/- 81.7 us292.7 +/- 11.9 ms3.4 fps
OpenCV:Detect:OpenCV-Face 4D 1x3x300x300 32F4D 1x1x200x7 32F2.3 +/- 0.4 ms102.2 +/- 2.7 ms20.3 +/- 3.6 us104.6 +/- 3.0 ms9.6 fps
OpenCV:Detect:YOLOv3 4D 1x3x416x416 32F2D 507x85 32F, 2D 2028x85 32F, 2D 8112x85 32F3.9 +/- 0.5 ms2.0 +/- 0.0 s3.8 +/- 1.1 ms2.0 +/- 0.0 s0.5 fps
OpenCV:Detect:MobileNet-SSD-VOC 4D 1x3x300x300 32F4D 1x1x100x7 32F3.2 +/- 0.2 ms123.0 +/- 2.5 ms57.4 +/- 12.4 us126.2 +/- 2.5 ms7.9 fps
OpenCV:Detect:YOLOv7-Tiny 4D 1x3x256x480 32F3D 1x7560x85 32F3.1 +/- 0.2 ms210.3 +/- 4.6 ms2.0 +/- 0.2 ms215.4 +/- 4.6 ms4.6 fps
OpenCV:Segment:ENet-CityScapes 4D 1x3x256x512 8U4D 1x20x256x512 32F1.0 +/- 0.1 ms243.3 +/- 3.7 ms53.0 +/- 16.1 ms297.3 +/- 16.0 ms3.4 fps
OpenCV:Segment:DeepLabV3-CPU 4D 1x3x513x513 32F4D 1x21x513x513 32F7.2 +/- 0.8 ms890.5 +/- 5.9 ms99.8 +/- 5.4 ms997.5 +/- 8.9 ms1.0 fps
OpenCV:Segment:Skin-Clothes-Hair-DeepLab 4D 1x3x512x512 32F4D 1x3x512x512 32F15.7 +/- 2.3 ms952.2 +/- 4.9 ms6.0 +/- 3.1 ms973.9 +/- 6.4 ms1.0 fps
OpenCV:Segment:Skin-Clothes-Hair-PAN 4D 1x3x512x512 32F4D 1x3x512x512 32F16.8 +/- 1.3 ms285.7 +/- 4.9 ms7.0 +/- 2.0 ms309.6 +/- 6.1 ms3.2 fps
OpenCV:Segment:Skin-Clothes-Hair-UNet 4D 1x3x512x512 32F4D 1x3x512x512 32F17.3 +/- 0.2 ms1.1 +/- 0.0 s6.3 +/- 2.1 ms1.1 +/- 0.0 s0.9 fps
OpenCV:YuNet:YuNet-Face-512x288 4D 1x3x288x512 8U2D 8448x2 32F, 2D 8448x1 32F, 2D 8448x14 32F1.1 +/- 0.1 ms40.4 +/- 1.5 ms230.6 +/- 33.7 us41.7 +/- 1.5 ms24.0 fps
OpenCV:Python:FastDepth 4D 1x3x224x224 32F4D 1x1x224x224 32F1.9 +/- 0.2 ms61.5 +/- 1.6 ms2.3 +/- 0.2 ms65.6 +/- 1.5 ms15.2 fps
TPU:Classify:MobileNetV3-1.0-224 4D 1x224x224x3 8U2D 1x1001 32F817.3 +/- 36.8 us2.9 +/- 0.0 ms135.3 +/- 15.9 us3.9 +/- 0.0 ms257.4 fps
TPU:Classify:MobileNetV2-1.0-224 4D 1x224x224x3 8U2D 1x1001 32F814.0 +/- 42.7 us2.4 +/- 0.0 ms135.0 +/- 15.3 us3.3 +/- 0.1 ms299.0 fps
TPU:Classify:MobileNetV1-1.0-224 4D 1x224x224x3 8U2D 1x1001 32F815.1 +/- 39.5 us2.2 +/- 0.0 ms135.8 +/- 17.1 us3.1 +/- 0.0 ms321.9 fps
TPU:Classify:EfficientNet-L 4D 1x300x300x3 8U2D 1x1001 32F1.3 +/- 0.1 ms27.5 +/- 0.0 ms128.8 +/- 10.9 us28.9 +/- 0.1 ms34.6 fps
TPU:Classify:EfficientNet-M 4D 1x240x240x3 8U2D 1x1001 32F887.3 +/- 48.6 us9.6 +/- 0.0 ms134.7 +/- 13.6 us10.6 +/- 0.1 ms94.4 fps
TPU:Classify:EfficientNet-S 4D 1x224x224x3 8U2D 1x1001 32F817.2 +/- 38.5 us4.8 +/- 0.0 ms133.4 +/- 19.8 us5.7 +/- 0.1 ms174.1 fps
TPU:Classify:MobileNetV1-1.0-224-TF2 4D 1x224x224x3 8U2D 1x1001 32F819.4 +/- 42.4 us2.2 +/- 0.0 ms137.2 +/- 17.3 us3.1 +/- 0.0 ms318.1 fps
TPU:Classify:MobileNetV2-1.0-224-TF2 4D 1x224x224x3 8U2D 1x1001 32F817.9 +/- 40.3 us2.4 +/- 0.0 ms136.2 +/- 15.7 us3.4 +/- 0.0 ms295.1 fps
TPU:Classify:MobileNetV3-1.0-224-TF2 4D 1x224x224x3 8U2D 1x1001 32F819.9 +/- 42.9 us2.9 +/- 0.0 ms136.0 +/- 17.3 us3.9 +/- 0.0 ms257.2 fps
TPU:Classify:MobileNetV2-iNat-Insects 4D 1x224x224x3 8U2D 1x1022 32F668.1 +/- 44.4 us2.4 +/- 0.0 ms144.4 +/- 18.4 us3.2 +/- 0.1 ms311.4 fps
TPU:Classify:MobileNetV2-iNat-Plants 4D 1x224x224x3 8U2D 1x2102 32F668.9 +/- 39.4 us2.5 +/- 0.0 ms264.2 +/- 22.6 us3.5 +/- 0.1 ms289.1 fps
TPU:Classify:MobileNetV2-iNat-Birds 4D 1x224x224x3 8U2D 1x965 32F668.9 +/- 37.6 us2.4 +/- 0.0 ms137.6 +/- 15.9 us3.2 +/- 0.0 ms315.4 fps
TPU:Classify:Inception-V1 4D 1x224x224x3 8U2D 1x1001 32F817.4 +/- 39.7 us3.2 +/- 0.0 ms135.8 +/- 16.5 us4.2 +/- 0.1 ms238.2 fps
TPU:Classify:Inception-V2 4D 1x224x224x3 8U2D 1x1001 32F815.0 +/- 37.7 us15.1 +/- 0.0 ms142.4 +/- 17.3 us16.0 +/- 0.0 ms62.4 fps
TPU:Classify:Inception-V3 4D 1x299x299x3 8U2D 1x1001 32F1.3 +/- 0.1 ms45.5 +/- 0.0 ms140.0 +/- 16.0 us46.9 +/- 0.1 ms21.3 fps
TPU:Classify:Inception-V4 4D 1x299x299x3 8U2D 1x1001 32F1.3 +/- 0.0 ms90.5 +/- 0.0 ms141.7 +/- 17.0 us91.9 +/- 0.1 ms10.9 fps
TPU:Classify:Resnet-50 4D 1x224x224x3 8U2D 1x1001 32F815.5 +/- 47.8 us45.7 +/- 0.0 ms136.3 +/- 14.5 us46.6 +/- 0.1 ms21.4 fps
TPU:Classify:Popular-US-Products 4D 1x224x224x3 8U2D 1x100000 32F655.7 +/- 35.4 us7.6 +/- 0.0 ms11.2 +/- 0.4 ms19.4 +/- 0.4 ms51.4 fps
TPU:Detect:MobileDetSSD-Coco 4D 1x320x320x3 8U3D 1x10x4 32F, 2D 1x10 32F, 2D 1x10 32F, 2D 1x1 32F2.8 +/- 0.2 ms8.3 +/- 0.2 ms108.0 +/- 17.5 us11.2 +/- 0.3 ms89.3 fps
TPU:Detect:MobileNetSSDv2-face 4D 1x320x320x3 8U3D 1x50x4 32F, 2D 1x50 32F, 2D 1x50 32F, 2D 1x1 32F2.8 +/- 0.2 ms12.2 +/- 0.1 ms43.5 +/- 10.9 us15.1 +/- 0.2 ms66.3 fps
TPU:Detect:MobileNetSSDv2-Coco 4D 1x300x300x3 8U3D 1x20x4 32F, 2D 1x20 32F, 2D 1x20 32F, 2D 1x1 32F993.5 +/- 44.6 us8.3 +/- 0.2 ms124.1 +/- 12.8 us9.4 +/- 0.2 ms105.9 fps
TPU:Detect:MobileNetSSDv1-Coco 4D 1x300x300x3 8U3D 1x20x4 32F, 2D 1x20 32F, 2D 1x20 32F, 2D 1x1 32F1.0 +/- 0.1 ms49.7 +/- 0.8 ms127.3 +/- 13.8 us50.9 +/- 0.8 ms19.7 fps
TPU:Detect:EfficientDetLite0-Coco 4D 1x320x320x3 8U3D 1x25x4 32F, 2D 1x25 32F, 2D 1x25 32F, 2D 1x1 32F2.7 +/- 0.5 ms52.0 +/- 0.3 ms127.7 +/- 13.5 us54.9 +/- 0.5 ms18.2 fps
TPU:Detect:EfficientDetLite1-Coco 4D 1x384x384x3 8U3D 1x25x4 32F, 2D 1x25 32F, 2D 1x25 32F, 2D 1x1 32F2.9 +/- 0.5 ms76.0 +/- 0.4 ms130.9 +/- 33.0 us79.0 +/- 0.5 ms12.7 fps
TPU:Detect:EfficientDetLite2-Coco 4D 1x448x448x3 8U3D 1x25x4 32F, 2D 1x25 32F, 2D 1x25 32F, 2D 1x1 32F3.4 +/- 0.6 ms118.6 +/- 0.6 ms154.9 +/- 41.4 us122.1 +/- 0.6 ms8.2 fps
TPU:Detect:EfficientDetLite3-Coco 4D 1x512x512x3 8U3D 1x25x4 32F, 2D 1x25 32F, 2D 1x25 32F, 2D 1x1 32F2.3 +/- 1.7 ms133.2 +/- 2.1 ms101.8 +/- 45.9 us135.6 +/- 3.0 ms7.4 fps
TPU:Detect:EfficientDetLite3x-Coco 4D 1x640x640x3 8U3D 1x25x4 32F, 2D 1x25 32F, 2D 1x25 32F, 2D 1x1 32F3.7 +/- 1.8 ms339.0 +/- 11.6 ms110.0 +/- 44.4 us342.9 +/- 11.8 ms2.9 fps
TPU:Segment:UNet-MobileNetV2-Pets-128 4D 1x128x128x3 8U4D 1x128x128x3 8U339.6 +/- 24.2 us3.1 +/- 0.0 ms313.1 +/- 19.7 us3.8 +/- 0.0 ms264.8 fps
TPU:Segment:UNet-MobileNetV2-Pets-256 4D 1x256x256x3 8U4D 1x256x256x3 8U800.1 +/- 52.1 us15.2 +/- 0.0 ms1.2 +/- 0.1 ms17.2 +/- 0.1 ms58.2 fps
TPU:Segment:DeepLabV3-dm0.5 4D 1x513x513x3 8U3D 1x513x513 32S3.0 +/- 1.5 ms87.8 +/- 0.1 ms1.1 +/- 0.2 ms92.0 +/- 1.5 ms10.9 fps
TPU:Segment:DeepLabV3-dm1.0 4D 1x513x513x3 8U3D 1x513x513 32S2.5 +/- 1.2 ms93.8 +/- 0.2 ms1.1 +/- 0.1 ms97.4 +/- 1.2 ms10.3 fps
TPU:Segment:DeepLab-slim 4D 1x513x513x3 8U3D 1x513x513 32S2.9 +/- 1.3 ms92.6 +/- 0.2 ms1.5 +/- 0.2 ms97.0 +/- 1.3 ms10.3 fps
VPU:Classify:Inception-V3 4D 1x3x299x299 32F2D 1x1001 32F3.2 +/- 0.1 ms122.6 +/- 0.2 ms143.4 +/- 24.3 us125.9 +/- 0.2 ms7.9 fps
VPU:Detect:face-detection-retail-0004 4D 1x3x300x300 8U4D 1x1x200x7 32F1.1 +/- 0.1 ms27.6 +/- 0.1 ms40.6 +/- 16.0 us28.8 +/- 0.1 ms34.7 fps
VPU:Detect:face-detection-adas-0001 4D 1x3x384x672 8U4D 1x1x200x7 32F4.1 +/- 1.4 ms108.8 +/- 0.2 ms23.5 +/- 6.0 us112.9 +/- 1.4 ms8.9 fps
VPU:Detect:person-detection-retail-0013 4D 1x3x320x544 8U4D 1x1x200x7 32F3.6 +/- 0.3 ms156.7 +/- 0.1 ms55.3 +/- 29.9 us160.4 +/- 0.3 ms6.2 fps
VPU:Detect:pedestrian-detection-adas-0002 4D 1x3x384x672 8U4D 1x1x200x7 32F4.0 +/- 1.3 ms113.3 +/- 0.1 ms23.6 +/- 6.4 us117.3 +/- 1.3 ms8.5 fps
VPU:Detect:vehicle-detection-adas-0002 4D 1x3x384x672 8U4D 1x1x200x7 32F4.1 +/- 1.3 ms108.5 +/- 0.1 ms78.2 +/- 32.4 us112.6 +/- 1.3 ms8.9 fps
VPU:Detect:pedestrian-and-vehicle-detector-adas-0001 4D 1x3x384x672 8U4D 1x1x200x7 32F4.1 +/- 1.3 ms127.7 +/- 0.1 ms28.0 +/- 39.6 us131.8 +/- 1.3 ms7.6 fps
VPU:Detect:product-detection-0001 4D 1x3x512x512 8U4D 1x1x200x7 32F4.2 +/- 1.2 ms152.7 +/- 0.2 ms75.3 +/- 62.9 us157.0 +/- 1.3 ms6.4 fps
VPU:Detect:YoloV5s 4D 1x3x640x640 8U4D 1x255x80x80 32F, 4D 1x255x40x40 32F, 4D 1x255x20x20 32F4.6 +/- 1.3 ms523.9 +/- 2.9 ms1.7 +/- 1.8 ms530.2 +/- 3.9 ms1.9 fps
VPU:Segment:road-segmentation-adas-0001 4D 1x3x512x896 8U4D 1x4x512x896 32F4.3 +/- 1.1 ms655.0 +/- 6.4 ms18.0 +/- 4.1 ms677.3 +/- 7.7 ms1.5 fps
ORT:Detect:YOLOv7-Tiny 4D 1x3x256x480 32F3D 1x7560x85 32F4.0 +/- 0.5 ms175.7 +/- 1.7 ms2.7 +/- 0.3 ms182.4 +/- 1.9 ms5.5 fps
ORT:Python:DamoYOLO-tinynasL20_T-320x192 4D 1x3x192x320 32F3D 1x1260x80 32F, 3D 1x1260x4 32F1.3 +/- 0.3 ms117.3 +/- 0.8 ms7.6 +/- 2.0 ms126.2 +/- 2.1 ms7.9 fps
Python:Python:DamoYOLO-tinynasL20_T-320x192-Python 4D 1x3x192x320 32F2D 1x1260 32FC80, 2D 1x1260 32FC41.5 +/- 0.2 ms135.4 +/- 2.5 ms3.6 +/- 0.4 ms140.5 +/- 2.5 ms7.1 fps
ORT:Python:DamoYOLO-tinynasL20_T-480x288 4D 1x3x288x480 32F3D 1x2835x80 32F, 3D 1x2835x4 32F2.5 +/- 0.2 ms253.9 +/- 1.4 ms10.1 +/- 2.9 ms266.5 +/- 4.0 ms3.8 fps
ORT:Python:DamoYOLO-tinynasL25_S-320x192 4D 1x3x192x320 32F3D 1x1260x80 32F, 3D 1x1260x4 32F962.0 +/- 217.8 us222.9 +/- 0.6 ms8.9 +/- 0.7 ms232.8 +/- 1.0 ms4.3 fps
ORT:Python:DamoYOLO-tinynasL25_S-480x288 4D 1x3x288x480 32F3D 1x2835x80 32F, 3D 1x2835x4 32F2.5 +/- 0.1 ms488.9 +/- 2.1 ms11.0 +/- 3.0 ms502.4 +/- 4.6 ms2.0 fps
ORT:Detect:YOLOv7-Tiny 4D 1x3x256x480 32F3D 1x7560x85 32F3.8 +/- 0.5 ms178.7 +/- 1.1 ms2.6 +/- 0.4 ms185.2 +/- 1.2 ms5.4 fps
ORT:Python:DamoYOLO-tinynasL20_T-320x192 4D 1x3x192x320 32F3D 1x1260x80 32F, 3D 1x1260x4 32F1.4 +/- 0.3 ms117.2 +/- 0.5 ms6.6 +/- 1.4 ms125.2 +/- 1.8 ms8.0 fps
ORT:Python:DamoYOLO-tinynasL20_T-480x288 4D 1x3x288x480 32F3D 1x2835x80 32F, 3D 1x2835x4 32F2.5 +/- 0.2 ms256.7 +/- 2.7 ms10.7 +/- 2.3 ms269.9 +/- 4.5 ms3.7 fps
ORT:Python:DamoYOLO-tinynasL25_S-320x192 4D 1x3x192x320 32F3D 1x1260x80 32F, 3D 1x1260x4 32F1.0 +/- 0.3 ms222.5 +/- 0.4 ms6.3 +/- 0.6 ms229.8 +/- 0.9 ms4.4 fps
ORT:Python:DamoYOLO-tinynasL25_S-480x288 4D 1x3x288x480 32F3D 1x2835x80 32F, 3D 1x2835x4 32F3.1 +/- 0.9 ms484.5 +/- 2.0 ms9.3 +/- 2.8 ms496.9 +/- 3.0 ms2.0 fps
ORT:Python:DamoYOLO-tinynasL35_M-320x192 4D 1x3x192x320 32F3D 1x1260x80 32F, 3D 1x1260x4 32F1.5 +/- 0.1 ms367.7 +/- 0.8 ms6.6 +/- 0.2 ms375.9 +/- 0.9 ms2.7 fps
ORT:Python:DamoYOLO-tinynasL35_M-480x288 4D 1x3x288x480 32F3D 1x2835x80 32F, 3D 1x2835x4 32F3.4 +/- 0.6 ms782.2 +/- 1.5 ms12.0 +/- 1.3 ms797.6 +/- 1.4 ms1.3 fps
ORT:Segment:Skin-Clothes-Hair-DeepLab 4D 1x3x512x512 32F4D 1x3x512x512 32F16.5 +/- 2.8 ms401.9 +/- 1.7 ms10.0 +/- 4.5 ms428.4 +/- 5.8 ms2.3 fps
ORT:Segment:Skin-Clothes-Hair-PAN 4D 1x3x512x512 32F4D 1x3x512x512 32F15.3 +/- 1.8 ms206.4 +/- 1.5 ms7.3 +/- 4.7 ms229.0 +/- 6.0 ms4.4 fps
ORT:Segment:Skin-Clothes-Hair-UNet 4D 1x3x512x512 32F4D 1x3x512x512 32F17.6 +/- 2.0 ms1.2 +/- 0.0 s10.4 +/- 4.5 ms1.2 +/- 0.0 s0.8 fps
ORT:Segment:LaneSOD 4D 1x3x192x320 32F4D 1x1x192x320 32F4.7 +/- 0.1 ms2.3 +/- 0.0 s907.9 +/- 42.4 us2.3 +/- 0.0 s0.4 fps
ORT:Python:URetinex-Net 4D 1x3x180x320 32F4D 1x3x180x320 32F1.3 +/- 0.3 ms3.0 +/- 0.0 s6.4 +/- 0.4 ms3.0 +/- 0.0 s0.3 fps
ORT:Python:FastDepth 4D 1x3x224x224 32F4D 1x1x224x224 32F1.9 +/- 0.1 ms74.2 +/- 0.3 ms2.6 +/- 0.3 ms78.8 +/- 0.4 ms12.7 fps
VPUX:Classify:Inception-V3 4D 1x3x299x299 32F2D 1x1001 32F3.3 +/- 0.1 ms618.1 +/- 1.7 ms142.3 +/- 19.2 us621.6 +/- 1.7 ms1.6 fps
VPUX:Detect:face-detection-retail-0004 4D 1x3x300x300 8U4D 1x1x200x7 32F1.1 +/- 0.0 ms110.6 +/- 0.3 ms22.1 +/- 1.4 us111.8 +/- 0.3 ms8.9 fps
VPUX:Detect:face-detection-adas-0001 4D 1x3x384x672 8U4D 1x1x200x7 32F2.7 +/- 0.5 ms547.9 +/- 1.7 ms34.9 +/- 22.5 us550.7 +/- 1.8 ms1.8 fps
VPUX:Detect:person-detection-retail-0013 4D 1x3x320x544 8U4D 1x1x200x7 32F1.4 +/- 0.8 ms633.8 +/- 1.2 ms23.9 +/- 10.5 us635.3 +/- 1.4 ms1.6 fps
VPUX:Detect:pedestrian-detection-adas-0002 4D 1x3x384x672 8U4D 1x1x200x7 32F2.6 +/- 0.4 ms556.4 +/- 1.0 ms83.4 +/- 81.1 us559.1 +/- 1.1 ms1.8 fps
VPUX:Detect:vehicle-detection-adas-0002 4D 1x3x384x672 8U4D 1x1x200x7 32F2.9 +/- 0.6 ms550.9 +/- 0.5 ms35.8 +/- 17.3 us553.8 +/- 0.7 ms1.8 fps
VPUX:Detect:pedestrian-and-vehicle-detector-adas-0001 4D 1x3x384x672 8U4D 1x1x200x7 32F2.6 +/- 0.5 ms654.7 +/- 1.4 ms81.9 +/- 123.0 us657.4 +/- 1.5 ms1.5 fps
VPUX:Detect:product-detection-0001 4D 1x3x512x512 8U4D 1x1x200x7 32F3.0 +/- 0.0 ms4.0 +/- 0.0 s150.2 +/- 170.1 us4.0 +/- 0.0 s0.2 fps
VPUX:Detect:YoloV5s 4D 1x3x640x640 8U4D 1x255x80x80 32F, 4D 1x255x40x40 32F, 4D 1x255x20x20 32F3.9 +/- 1.3 ms1.3 +/- 0.0 s625.9 +/- 24.0 us1.3 +/- 0.0 s0.8 fps
VPUX:Segment:road-segmentation-adas-0001 4D 1x3x512x896 8U4D 1x4x512x896 32F3.2 +/- 1.1 ms2.4 +/- 0.0 s9.5 +/- 4.2 ms2.4 +/- 0.0 s0.4 fps

Older benchmarks

Older benchmarks are provided for comparison as the software evolves over time. Typically, networks running on CPU with OpenCV backend should get faster over time as more optimized kernels are added to OpenCV. Networks running on hardware accelerators tend to remain the same. Pre and Post processing are under our control and we strive to make those faster over time as well, though sometimes adding more features may decrease speed slightly.