JeVoisBase  1.21
JeVois Smart Embedded Machine Vision Toolkit Base Modules
Share this page:
Loading...
Searching...
No Matches
DarknetYOLO.C
Go to the documentation of this file.
1// ///////////////////////////////////////////////////////////////////////////////////////////////////////////////////
2//
3// JeVois Smart Embedded Machine Vision Toolkit - Copyright (C) 2016 by Laurent Itti, the University of Southern
4// California (USC), and iLab at USC. See http://iLab.usc.edu and http://jevois.org for information about this project.
5//
6// This file is part of the JeVois Smart Embedded Machine Vision Toolkit. This program is free software; you can
7// redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software
8// Foundation, version 2. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY;
9// without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public
10// License for more details. You should have received a copy of the GNU General Public License along with this program;
11// if not, write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
12//
13// Contact information: Laurent Itti - 3641 Watt Way, HNB-07A - Los Angeles, CA 90089-2520 - USA.
14// Tel: +1 213 740 3527 - itti@pollux.usc.edu - http://iLab.usc.edu - http://jevois.org
15// ///////////////////////////////////////////////////////////////////////////////////////////////////////////////////
16/*! \file */
17
18#include <jevois/Core/Module.H>
19#include <jevois/Debug/Timer.H>
21#include <opencv2/core/core.hpp>
22#include <opencv2/imgproc/imgproc.hpp>
24
25// icon from https://pjreddie.com/darknet/yolo/
26
27static jevois::ParameterCategory const ParamCateg("Darknet YOLO Options");
28
29//! Parameter \relates DarknetYOLO
30JEVOIS_DECLARE_PARAMETER(netin, cv::Size, "Width and height (in pixels) of the neural network input layer, or [0 0] "
31 "to make it match camera frame size. NOTE: for YOLO v3 sizes must be multiples of 32.",
32 cv::Size(320, 224), ParamCateg);
33
34
35//! Detect multiple objects in scenes using the Darknet YOLO deep neural network
36/*! Darknet is a popular neural network framework, and YOLO is a very interesting network that detects all objects in a
37 scene in one pass. This module detects all instances of any of the objects it knows about (determined by the
38 network structure, labels, dataset used for training, and weights obtained) in the image that is given to it.
39
40 See https://pjreddie.com/darknet/yolo/
41
42 This module runs a YOLO network and shows all detections obtained. The YOLO network is currently quite slow, hence
43 it is only run once in a while. Point your camera towards some interesting scene, keep it stable, and wait for YOLO
44 to tell you what it found. The framerate figures shown at the bottom left of the display reflect the speed at which
45 each new video frame from the camera is processed, but in this module this just amounts to converting the image to
46 RGB, sending it to the neural network for processing in a separate thread, and creating the demo display. Actual
47 network inference speed (time taken to compute the predictions on one image) is shown at the bottom right. See
48 below for how to trade-off speed and accuracy.
49
50 Note that by default this module runs tiny-YOLO V3 which can detect and recognize 80 different kinds of objects from
51 the Microsoft COCO dataset. This module can also run tiny-YOLO V2 for COCO, or tiny-YOLO V2 for the Pascal-VOC
52 dataset with 20 object categories. See the module's \b params.cfg file to switch network.
53
54 - The 80 COCO object categories are: person, bicycle, car, motorbike, aeroplane, bus, train, truck, boat, traffic,
55 fire, stop, parking, bench, bird, cat, dog, horse, sheep, cow, elephant, bear, zebra, giraffe, backpack, umbrella,
56 handbag, tie, suitcase, frisbee, skis, snowboard, sports, kite, baseball, baseball, skateboard, surfboard, tennis,
57 bottle, wine, cup, fork, knife, spoon, bowl, banana, apple, sandwich, orange, broccoli, carrot, hot, pizza, donut,
58 cake, chair, sofa, pottedplant, bed, diningtable, toilet, tvmonitor, laptop, mouse, remote, keyboard, cell,
59 microwave, oven, toaster, sink, refrigerator, book, clock, vase, scissors, teddy, hair, toothbrush.
60
61 - The 20 Pascal-VOC object categories are: aeroplane, bicycle, bird, boat, bottle, bus, car, cat, chair, cow,
62 diningtable, dog, horse, motorbike, person, pottedplant, sheep, sofa, train, tvmonitor.
63
64 Sometimes it will make mistakes! The performance of yolov3-tiny is about 33.1% correct (mean average precision) on
65 the COCO test set.
66
67 \youtube{d5CfljT5kec}
68
69 Speed and network size
70 ----------------------
71
72 The parameter \p netin allows you to rescale the neural network to the specified size. Beware that this will only
73 work if the network used is fully convolutional (as is the case of the default tiny-yolo network). This not only
74 allows you to adjust processing speed (and, conversely, accuracy), but also to better match the network to the input
75 images (e.g., the default size for tiny-yolo is 416x416, and, thus, passing it a input image of size 640x480 will
76 result in first scaling that input to 416x312, then letterboxing it by adding gray borders on top and bottom so that
77 the final input to the network is 416x416). This letterboxing can be completely avoided by just resizing the network
78 to 320x240.
79
80 Here are expected processing speeds for yolov2-tiny-voc:
81 - when netin = [0 0], processes letterboxed 416x416 inputs, about 2450ms/image
82 - when netin = [320 240], processes 320x240 inputs, about 1350ms/image
83 - when netin = [160 120], processes 160x120 inputs, about 695ms/image
84
85 YOLO V3 is faster, more accurate, uses less memory, and can detect 80 COCO categories:
86 - when netin = [320 240], processes 320x240 inputs, about 870ms/image
87
88 \youtube{77VRwFtIe8I}
89
90 Serial messages
91 ---------------
92
93 When detections are found which are above threshold, one message will be sent for each detected
94 object (i.e., for each box that gets drawn when USB outputs are used), using a standardized 2D message:
95 + Serial message type: \b 2D
96 + `id`: the category of the recognized object, followed by ':' and the confidence score in percent
97 + `x`, `y`, or vertices: standardized 2D coordinates of object center or corners
98 + `w`, `h`: standardized object size
99 + `extra`: any number of additional category:score pairs which had an above-threshold score for that box
100
101 See \ref UserSerialStyle for more on standardized serial messages, and \ref coordhelpers for more info on
102 standardized coordinates.
103
104 @author Laurent Itti
105
106 @displayname Darknet YOLO
107 @videomapping NONE 0 0 0.0 YUYV 640 480 0.4 JeVois DarknetYOLO
108 @videomapping YUYV 1280 480 15.0 YUYV 640 480 15.0 JeVois DarknetYOLO
109 @email itti\@usc.edu
110 @address University of Southern California, HNB-07A, 3641 Watt Way, Los Angeles, CA 90089-2520, USA
111 @copyright Copyright (C) 2017 by Laurent Itti, iLab and the University of Southern California
112 @mainurl http://jevois.org
113 @supporturl http://jevois.org/doc
114 @otherurl http://iLab.usc.edu
115 @license GPL v3
116 @distribution Unrestricted
117 @restrictions None
118 \ingroup modules */
120 public jevois::Parameter<netin>
121{
122 public:
123 // ####################################################################################################
124 //! Constructor
125 // ####################################################################################################
126 DarknetYOLO(std::string const & instance) : jevois::StdModule(instance)
127 {
128 itsYolo = addSubComponent<Yolo>("yolo");
129 }
130
131 // ####################################################################################################
132 //! Virtual destructor for safe inheritance
133 // ####################################################################################################
134 virtual ~DarknetYOLO()
135 { }
136
137 // ####################################################################################################
138 //! Un-initialization
139 // ####################################################################################################
140 virtual void postUninit() override
141 {
142 if (itsPredictFut.valid()) try { itsPredictFut.get(); } catch (...) { }
143 }
144
145 // ####################################################################################################
146 //! Processing function, no video output
147 // ####################################################################################################
148 virtual void process(jevois::InputFrame && inframe) override
149 {
150 int ready = true; float ptime = 0.0F;
151
152 // Wait for next available camera image:
153 jevois::RawImage const inimg = inframe.get();
154 unsigned int const w = inimg.width, h = inimg.height;
155
156 // Convert input image to RGB for predictions:
157 cv::Mat cvimg = jevois::rawimage::convertToCvRGB(inimg);
158
159 // Resize the network and/or the input if desired:
160 cv::Size nsz = netin::get();
161 if (nsz.width != 0 && nsz.height != 0)
162 {
163 itsYolo->resizeInDims(nsz.width, nsz.height);
164 itsNetInput = jevois::rescaleCv(cvimg, nsz);
165 }
166 else
167 {
168 itsYolo->resizeInDims(cvimg.cols, cvimg.rows);
169 itsNetInput = cvimg;
170 }
171
172 cvimg.release();
173
174 // Let camera know we are done processing the input image:
175 inframe.done();
176
177 // Launch the predictions, will throw logic_error if we are still loading the network:
178 try { ptime = itsYolo->predict(itsNetInput); } catch (std::logic_error const & e) { ready = false; }
179
180 if (ready)
181 {
182 LINFO("Predicted in " << ptime << "ms");
183
184 // Compute the boxes:
185 itsYolo->computeBoxes(w, h);
186
187 // Send serial results:
188 itsYolo->sendSerial(this, w, h);
189 }
190 }
191
192 // ####################################################################################################
193 //! Processing function with video output to USB
194 // ####################################################################################################
195 virtual void process(jevois::InputFrame && inframe, jevois::OutputFrame && outframe) override
196 {
197 static jevois::Timer timer("processing", 50, LOG_DEBUG);
198
199 // Wait for next available camera image:
200 jevois::RawImage const inimg = inframe.get();
201
202 timer.start();
203
204 // We only handle one specific pixel format, and any image size in this module:
205 unsigned int const w = inimg.width, h = inimg.height;
206 inimg.require("input", w, h, V4L2_PIX_FMT_YUYV);
207
208 // While we process it, start a thread to wait for out frame and paste the input into it:
209 jevois::RawImage outimg;
210 auto paste_fut = jevois::async([&]() {
211 outimg = outframe.get();
212 outimg.require("output", w * 2, h, inimg.fmt);
213
214 // Paste the current input image:
215 jevois::rawimage::paste(inimg, outimg, 0, 0);
216 jevois::rawimage::writeText(outimg, "JeVois Darknet YOLO - input", 3, 3, jevois::yuyv::White);
217
218 // Paste the latest prediction results, if any, otherwise a wait message:
219 cv::Mat outimgcv = jevois::rawimage::cvImage(outimg);
220 if (itsRawPrevOutputCv.empty() == false)
221 itsRawPrevOutputCv.copyTo(outimgcv(cv::Rect(w, 0, w, h)));
222 else
223 {
225 jevois::rawimage::writeText(outimg, "JeVois Darknet YOLO - loading network - please wait...",
226 w + 3, 3, jevois::yuyv::White);
227 }
228 });
229
230 // Decide on what to do based on itsPredictFut: if it is valid, we are still predicting, so check whether we are
231 // done and if so draw the results. Otherwise, start predicting using the current input frame:
232 if (itsPredictFut.valid())
233 {
234 // Are we finished predicting?
235 if (itsPredictFut.wait_for(std::chrono::milliseconds(5)) == std::future_status::ready)
236 {
237 // Do a get() on our future to free up the async thread and get any exception it might have thrown. In
238 // particular, it will throw a logic_error if we are still loading the network:
239 bool success = true; float ptime = 0.0F;
240 try { ptime = itsPredictFut.get(); } catch (std::logic_error const & e) { success = false; }
241
242 // Wait for paste to finish up:
243 paste_fut.get();
244
245 // Let camera know we are done processing the input image:
246 inframe.done();
247
248 if (success)
249 {
250 cv::Mat outimgcv = jevois::rawimage::cvImage(outimg);
251
252 // Update our output image: First paste the image we have been making predictions on:
253 if (itsRawPrevOutputCv.empty()) itsRawPrevOutputCv = cv::Mat(h, w, CV_8UC2);
254 itsRawInputCv.copyTo(outimgcv(cv::Rect(w, 0, w, h)));
255
256 // Then draw the detections:
257 itsYolo->drawDetections(outimg, w, h, w, 0);
258
259 // Send serial messages:
260 itsYolo->sendSerial(this, w, h);
261
262 // Draw some text messages:
263 jevois::rawimage::writeText(outimg, "JeVois Darknet YOLO - predictions", w + 3, 3, jevois::yuyv::White);
264 jevois::rawimage::writeText(outimg, "YOLO predict time: " + std::to_string(int(ptime)) + "ms",
265 w + 3, h - 13, jevois::yuyv::White);
266
267 // Finally make a copy of these new results so we can display them again while we wait for the next round:
268 outimgcv(cv::Rect(w, 0, w, h)).copyTo(itsRawPrevOutputCv);
269 }
270 }
271 else
272 {
273 // Future is not ready, do nothing except drawings on this frame (done in paste_fut thread) and we will try
274 // again on the next one...
275 paste_fut.get();
276 inframe.done();
277 }
278 }
279 else
280 {
281 // Note: resizeInDims() could throw if the network is not ready yet.
282 try
283 {
284 // Convert input image to RGB for predictions:
285 cv::Mat cvimg = jevois::rawimage::convertToCvRGB(inimg);
286
287 // Also make a raw YUYV copy of the input image for later displays:
288 cv::Mat inimgcv = jevois::rawimage::cvImage(inimg);
289 inimgcv.copyTo(itsRawInputCv);
290
291 // Resize the network if desired:
292 cv::Size nsz = netin::get();
293 if (nsz.width != 0 && nsz.height != 0)
294 {
295 itsYolo->resizeInDims(nsz.width, nsz.height);
296 itsNetInput = jevois::rescaleCv(cvimg, nsz);
297 }
298 else
299 {
300 itsYolo->resizeInDims(cvimg.cols, cvimg.rows);
301 itsNetInput = cvimg;
302 }
303
304 cvimg.release();
305
306 // Launch the predictions:
307 itsPredictFut = jevois::async([&](int ww, int hh)
308 {
309 float pt = itsYolo->predict(itsNetInput);
310 itsYolo->computeBoxes(ww, hh);
311 return pt;
312 }, w, h);
313 }
314 catch (std::logic_error const & e) { }
315
316 // Wait for paste to finish up:
317 paste_fut.get();
318
319 // Let camera know we are done processing the input image:
320 inframe.done();
321 }
322
323 // Show processing fps:
324 std::string const & fpscpu = timer.stop();
325 jevois::rawimage::writeText(outimg, fpscpu, 3, h - 13, jevois::yuyv::White);
326
327 // Send the output image with our processing results to the host over USB:
328 outframe.send();
329 }
330
331 // ####################################################################################################
332 protected:
333 std::shared_ptr<Yolo> itsYolo;
334 std::future<float> itsPredictFut;
337 cv::Mat itsNetInput;
338};
339
340// Allow the module to be loaded as a shared object (.so) file:
JEVOIS_REGISTER_MODULE(ArUcoBlob)
int h
#define success()
Detect multiple objects in scenes using the Darknet YOLO deep neural network.
DarknetYOLO(std::string const &instance)
Constructor.
virtual void postUninit() override
Un-initialization.
std::future< float > itsPredictFut
std::shared_ptr< Yolo > itsYolo
cv::Mat itsNetInput
virtual ~DarknetYOLO()
Virtual destructor for safe inheritance.
virtual void process(jevois::InputFrame &&inframe, jevois::OutputFrame &&outframe) override
Processing function with video output to USB.
JEVOIS_DECLARE_PARAMETER(netin, cv::Size, "Width and height (in pixels) of the neural network input layer, or [0 0] " "to make it match camera frame size. NOTE: for YOLO v3 sizes must be multiples of 32.", cv::Size(320, 224), ParamCateg)
Parameter.
cv::Mat itsRawPrevOutputCv
virtual void process(jevois::InputFrame &&inframe) override
Processing function, no video output.
cv::Mat itsRawInputCv
unsigned int fmt
unsigned int width
unsigned int height
void require(char const *info, unsigned int w, unsigned int h, unsigned int f) const
StdModule(std::string const &instance)
std::string const & stop(double *seconds)
#define LINFO(msg)
void paste(RawImage const &src, RawImage &dest, int dx, int dy)
cv::Mat cvImage(RawImage const &src)
void writeText(RawImage &img, std::string const &txt, int x, int y, unsigned int col, Font font=Font6x10)
cv::Mat convertToCvRGB(RawImage const &src)
void drawFilledRect(RawImage &img, int x, int y, unsigned int w, unsigned int h, unsigned int col)
cv::Mat rescaleCv(cv::Mat const &img, cv::Size const &newdims)
std::future< std::invoke_result_t< std::decay_t< Function >, std::decay_t< Args >... > > async(Function &&f, Args &&... args)
unsigned short constexpr Black
unsigned short constexpr White