JeVoisBase  1.21
JeVois Smart Embedded Machine Vision Toolkit Base Modules
Share this page:
Loading...
Searching...
No Matches
TensorFlowSaliency.C
Go to the documentation of this file.
1// ///////////////////////////////////////////////////////////////////////////////////////////////////////////////////
2//
3// JeVois Smart Embedded Machine Vision Toolkit - Copyright (C) 2016 by Laurent Itti, the University of Southern
4// California (USC), and iLab at USC. See http://iLab.usc.edu and http://jevois.org for information about this project.
5//
6// This file is part of the JeVois Smart Embedded Machine Vision Toolkit. This program is free software; you can
7// redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software
8// Foundation, version 2. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY;
9// without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public
10// License for more details. You should have received a copy of the GNU General Public License along with this program;
11// if not, write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
12//
13// Contact information: Laurent Itti - 3641 Watt Way, HNB-07A - Los Angeles, CA 90089-2520 - USA.
14// Tel: +1 213 740 3527 - itti@pollux.usc.edu - http://iLab.usc.edu - http://jevois.org
15// ///////////////////////////////////////////////////////////////////////////////////////////////////////////////////
16/*! \file */
17
18#include <jevois/Core/Module.H>
19#include <jevois/Debug/Timer.H>
21#include <opencv2/core/core.hpp>
22#include <opencv2/imgproc/imgproc.hpp>
25
26// icon from tensorflow youtube
27
28static jevois::ParameterCategory const ParamCateg("TensorFlow Saliency Options");
29
30//! Parameter \relates TensorFlowSaliency
31JEVOIS_DECLARE_PARAMETER(foa, cv::Size, "Width and height (in pixels) of the focus of attention. "
32 "This is the size of the image crop that is taken around the most salient "
33 "location in each frame. The foa size must fit within the camera input frame size. To avoid "
34 "rescaling, it is best to use here the size that the deep network expects as input.",
35 cv::Size(128, 128), ParamCateg);
36
37//! Detect salient objects and identify them using TensorFlow deep neural network
38/*! TensorFlow is a popular neural network framework. This module first finds the most conspicuous (salient) object in
39 the scene, then identifies it using a deep neural network. It returns the top scoring candidates.
40
41 See http://ilab.usc.edu/bu/ for more information about saliency detection, and https://www.tensorflow.org for more
42 information about the TensorFlow deep neural network framework.
43
44 \youtube{TRk8rCuUVEE}
45
46 This module runs a TensorFlow network on an image window around the most salient point and shows the top-scoring
47 results. We alternate, on every other frame, between updating the salient window crop location, and predicting what
48 is in it. Actual network inference speed (time taken to compute the predictions on one image crop) is shown at the
49 bottom right. See below for how to trade-off speed and accuracy.
50
51 Note that by default this module runs fast variant of MobileNets trained on the ImageNet dataset. There are 1000
52 different kinds of objects (object classes) that this network can recognize (too long to list here). It is possible
53 to use bigger and more complex networks, but it will likely slow down the framerate.
54
55 For more information about MobileNets, see
56 https://github.com/tensorflow/models/blob/master/research/slim/nets/mobilenet_v1.md
57
58 For more information about the ImageNet dataset used for training, see
59 http://www.image-net.org/challenges/LSVRC/2012/
60
61 Sometimes this module will make mistakes! The performance of mobilenets is about 40% to 70% correct (mean average
62 precision) on the test set, depending on network size (bigger networks are more accurate but slower).
63
64 Neural network size and speed
65 -----------------------------
66
67 This module provides a parameter, \p foa, which determines the size of a region of interest that is cropped around
68 the most salient location. This region will then be rescaled, if needed, to the neural network's expected input
69 size. To avoid wasting time rescaling, it is hence best to select an \p foa size that is equal to the network's
70 input size.
71
72 The network actual input size varies depending on which network is used; for example, mobilenet_v1_0.25_128_quant
73 expects 128x128 input images, while mobilenet_v1_1.0_224 expects 224x224. We automatically rescale the cropped
74 window to the network's desired input size. Note that there is a cost to rescaling, so, for best performance, you
75 should match \p foa size to the network input size.
76
77 For example:
78
79 - mobilenet_v1_0.25_128_quant (network size 128x128), runs at about 8ms/prediction (125 frames/s).
80 - mobilenet_v1_0.5_128_quant (network size 128x128), runs at about 18ms/prediction (55 frames/s).
81 - mobilenet_v1_0.25_224_quant (network size 224x224), runs at about 24ms/prediction (41 frames/s).
82 - mobilenet_v1_1.0_224_quant (network size 224x224), runs at about 139ms/prediction (7 frames/s).
83
84 When using video mappings with USB output, irrespective of \p foa, the crop around the most salient image region
85 (with size given by \p foa) will always also be rescaled so that, when placed to the right of the input image, it
86 fills the desired USB output dims. For example, if camera mode is 320x240 and USB output size is 544x240, then the
87 attended and recognized object will be rescaled to 224x224 (since 224 = 544-320) for display purposes only. This is
88 so that one does not need to change USB video resolution while playing with different values of \p foa live.
89
90 Serial messages
91 ---------------
92
93 On every frame where detection results were obtained that are above \p thresh, this module sends a standardized 2D
94 message as specified in \ref UserSerialStyle
95 + Serial message type: \b 2D
96 + `id`: top-scoring category name of the recognized object, followed by ':' and the confidence score in percent
97 + `x`, `y`, or vertices: standardized 2D coordinates of object center or corners
98 + `w`, `h`: standardized object size
99 + `extra`: any number of additional category:score pairs which had an above-threshold score, in order of
100 decreasing score
101 where \a category is the category name (from \p namefile) and \a score is the confidence score from 0.0 to 100.0
102
103 See \ref UserSerialStyle for more on standardized serial messages, and \ref coordhelpers for more info on
104 standardized coordinates.
105
106 Using your own network
107 ----------------------
108
109 For a step-by-step tutorial, see [Training custom TensorFlow networks for
110 JeVois](http://jevois.org/tutorials/UserTensorFlowTraining.html).
111
112 This module supports RGB or grayscale inputs, byte or float32. You should create and train your network using fast
113 GPUs, and then follow the instruction here to convert your trained network to TFLite format:
114
115 https://www.tensorflow.org/lite/
116
117 Then you just need to create a directory under <b>JEVOIS:/share/tensorflow/</B> with the name of your network, and,
118 in there, two files, \b labels.txt with the category labels, and \b model.tflite with your model converted to
119 TensorFlow Lite (flatbuffer format). Finally, edit <B>JEVOIS:/modules/JeVois/TensorFlowEasy/params.cfg</B> to
120 select your new network when the module is launched.
121
122 @author Laurent Itti
123
124 @displayname TensorFlow Saliency
125 @videomapping NONE 0 0 0.0 YUYV 320 240 15.0 JeVois TensorFlowSaliency
126 @videomapping YUYV 448 240 30.0 YUYV 320 240 30.0 JeVois TensorFlowSaliency # recommended network size 128x128
127 @videomapping YUYV 512 240 30.0 YUYV 320 240 30.0 JeVois TensorFlowSaliency # recommended network size 192x192
128 @videomapping YUYV 544 240 30.0 YUYV 320 240 30.0 JeVois TensorFlowSaliency # recommended network size 224x224
129 @email itti\@usc.edu
130 @address University of Southern California, HNB-07A, 3641 Watt Way, Los Angeles, CA 90089-2520, USA
131 @copyright Copyright (C) 2018 by Laurent Itti, iLab and the University of Southern California
132 @mainurl http://jevois.org
133 @supporturl http://jevois.org/doc
134 @otherurl http://iLab.usc.edu
135 @license GPL v3
136 @distribution Unrestricted
137 @restrictions None
138 \ingroup modules */
140 public jevois::Parameter<foa>
141{
142 public:
143 // ####################################################################################################
144 //! Constructor
145 // ####################################################################################################
146 TensorFlowSaliency(std::string const & instance) : jevois::StdModule(instance), itsRx(0), itsRy(0),
147 itsRw(0), itsRh(0)
148 {
149 itsSaliency = addSubComponent<Saliency>("saliency");
150 itsTensorFlow = addSubComponent<TensorFlow>("tensorflow");
151 }
152
153 // ####################################################################################################
154 //! Virtual destructor for safe inheritance
155 // ####################################################################################################
157 { }
158
159 // ####################################################################################################
160 //! Un-initialization
161 // ####################################################################################################
162 virtual void postUninit() override
163 {
164 try { itsPredictFut.get(); } catch (...) { }
165 }
166
167 // ####################################################################################################
168 //! Helper function: compute saliency ROI in a thread, return top-left corner and size
169 // ####################################################################################################
170 virtual void getSalROI(jevois::RawImage const & inimg)
171 {
172 int const w = inimg.width, h = inimg.height;
173
174 // Check whether the input image size is small, in which case we will scale the maps up one notch:
175 if (w < 170) { itsSaliency->centermin::set(1); itsSaliency->smscale::set(3); }
176 else { itsSaliency->centermin::set(2); itsSaliency->smscale::set(4); }
177
178 // Find the most salient location, no gist for now:
179 itsSaliency->process(inimg, false);
180
181 // Get some info from the saliency computation:
182 int const smlev = itsSaliency->smscale::get();
183 int const smfac = (1 << smlev);
184
185 // Find most salient point:
186 int mx, my; intg32 msal; itsSaliency->getSaliencyMax(mx, my, msal);
187
188 // Compute attended ROI (note: coords must be even to avoid flipping U/V when we later paste):
189 cv::Size roisiz = foa::get(); itsRw = roisiz.width; itsRh = roisiz.height;
190 itsRw = std::min(itsRw, w); itsRh = std::min(itsRh, h); itsRw &= ~1; itsRh &= ~1;
191 unsigned int const dmx = (mx << smlev) + (smfac >> 2);
192 unsigned int const dmy = (my << smlev) + (smfac >> 2);
193 itsRx = int(dmx + 1 + smfac / 4) - itsRw / 2;
194 itsRy = int(dmy + 1 + smfac / 4) - itsRh / 2;
195 itsRx = std::max(0, std::min(itsRx, w - itsRw));
196 itsRy = std::max(0, std::min(itsRy, h - itsRh));
197 itsRx &= ~1; itsRy &= ~1;
198 if (itsRw <= 0 || itsRh <= 0) LFATAL("Ooops, foa size cannot be zero or negative");
199 }
200
201 // ####################################################################################################
202 //! Processing function, no video output
203 // ####################################################################################################
204 virtual void process(jevois::InputFrame && inframe) override
205 {
206 // Wait for next available camera image:
207 jevois::RawImage const inimg = inframe.get();
208 unsigned int const w = inimg.width, h = inimg.height;
209
210 // Find the most salient location, no gist for now:
211 getSalROI(inimg);
212
213 // Extract a raw YUYV ROI around attended point:
214 cv::Mat rawimgcv = jevois::rawimage::cvImage(inimg);
215 cv::Mat rawroi = rawimgcv(cv::Rect(itsRx, itsRy, itsRw, itsRh));
216
217 // Convert the ROI to RGB:
218 cv::Mat rgbroi;
219 cv::cvtColor(rawroi, rgbroi, cv::COLOR_YUV2RGB_YUYV);
220
221 // Let camera know we are done processing the input image:
222 inframe.done();
223
224 // Launch the predictions, will throw if network is not ready:
225 itsResults.clear();
226 try
227 {
228 int netinw, netinh, netinc; itsTensorFlow->getInDims(netinw, netinh, netinc);
229
230 // Scale the ROI if needed:
231 cv::Mat scaledroi = jevois::rescaleCv(rgbroi, cv::Size(netinw, netinh));
232
233 // Predict:
234 float const ptime = itsTensorFlow->predict(scaledroi, itsResults);
235 LINFO("Predicted in " << ptime << "ms");
236
237 // Send serial results and switch to next frame:
239 }
240 catch (std::logic_error const & e) { } // network still loading
241 }
242
243 // ####################################################################################################
244 //! Processing function with video output to USB
245 // ####################################################################################################
246 virtual void process(jevois::InputFrame && inframe, jevois::OutputFrame && outframe) override
247 {
248 static jevois::Timer timer("processing", 30, LOG_DEBUG);
249
250 // Wait for next available camera image:
251 jevois::RawImage const inimg = inframe.get();
252
253 timer.start();
254
255 // We only handle one specific pixel format, but any image size in this module:
256 unsigned int const w = inimg.width, h = inimg.height;
257 inimg.require("input", w, h, V4L2_PIX_FMT_YUYV);
258
259 // While we process it, start a thread to wait for out frame and paste the input into it:
260 jevois::RawImage outimg;
261 auto paste_fut = jevois::async([&]() {
262 outimg = outframe.get();
263 outimg.require("output", outimg.width, outimg.height, V4L2_PIX_FMT_YUYV);
264
265 // Paste the current input image:
266 jevois::rawimage::paste(inimg, outimg, 0, 0);
267 jevois::rawimage::writeText(outimg, "JeVois TensorFlow Saliency", 3, 3, jevois::yuyv::White);
268
269 // Paste the latest prediction results, if any, otherwise a wait message:
270 cv::Mat outimgcv = jevois::rawimage::cvImage(outimg);
271 if (itsRawPrevOutputCv.empty() == false)
272 itsRawPrevOutputCv.copyTo(outimgcv(cv::Rect(w, 0, itsRawPrevOutputCv.cols, itsRawPrevOutputCv.rows)));
273 else
274 {
276 jevois::rawimage::writeText(outimg, "Loading network -", w + 3, 3, jevois::yuyv::White);
277 jevois::rawimage::writeText(outimg, "please wait...", w + 3, 15, jevois::yuyv::White);
278 }
279 });
280
281 // On even frames, update the salient ROI, on odd frames, run the deep network on the latest ROI:
282 if ((jevois::frameNum() & 1) == 0 || itsRw == 0)
283 {
284 // Run the saliency model, will update itsRx, itsRy, itsRw, and itsRh:
285 getSalROI(inimg);
286 }
287 else
288 {
289 // Extract a raw YUYV ROI around attended point:
290 cv::Mat rawimgcv = jevois::rawimage::cvImage(inimg);
291 cv::Mat rawroi = rawimgcv(cv::Rect(itsRx, itsRy, itsRw, itsRh));
292
293 // Convert the ROI to RGB:
294 cv::Mat rgbroi;
295 cv::cvtColor(rawroi, rgbroi, cv::COLOR_YUV2RGB_YUYV);
296
297 // Let camera know we are done processing the input image:
298 inframe.done();
299
300 // Launch the predictions, will throw if network is not ready:
301 itsResults.clear();
302 try
303 {
304 // Get the network input dims:
305 int netinw, netinh, netinc; itsTensorFlow->getInDims(netinw, netinh, netinc);
306
307 // Scale the ROI if needed:
308 cv::Mat scaledroi = jevois::rescaleCv(rgbroi, cv::Size(netinw, netinh));
309
310 // In a thread, also scale the ROI to the desired output size, i.e., USB width - camera width:
311 auto scale_fut = jevois::async([&]() {
312 float fac = float(outimg.width - w) / float(rgbroi.cols);
313 cv::Size displaysize(outimg.width - w, int(rgbroi.rows * fac + 0.4999F));
314 cv::Mat displayroi = jevois::rescaleCv(rgbroi, displaysize);
315
316 // Convert back the display ROI to YUYV:
318 });
319
320 // Predict:
321 float const ptime = itsTensorFlow->predict(scaledroi, itsResults);
322
323 // Wait for paste and scale to finish up:
324 paste_fut.get(); scale_fut.get();
325
326 int const dispw = itsRawInputCv.cols, disph = itsRawInputCv.rows;
327 cv::Mat outimgcv = jevois::rawimage::cvImage(outimg);
328
329 // Update our output image: First paste the image we have been making predictions on:
330 itsRawInputCv.copyTo(outimgcv(cv::Rect(w, 0, dispw, disph)));
331 jevois::rawimage::drawFilledRect(outimg, w, disph, dispw, h - disph, jevois::yuyv::Black);
332
333 // Then draw the detections: either below the detection crop if there is room, or on top of it if not enough
334 // room below:
335 int y = disph + 3; if (y + itsTensorFlow->top::get() * 12 > h - 21) y = 3;
336
337 for (auto const & p : itsResults)
338 {
339 jevois::rawimage::writeText(outimg, jevois::sformat("%s: %.2F", p.category.c_str(), p.score),
340 w + 3, y, jevois::yuyv::White);
341 y += 12;
342 }
343
344 // Send serial results:
346
347 // Draw some text messages:
348 jevois::rawimage::writeText(outimg, "Predict time: " + std::to_string(int(ptime)) + "ms",
349 w + 3, h - 11, jevois::yuyv::White);
350
351 // Finally make a copy of these new results so we can display them again on the next frame while we compute
352 // saliency:
353 itsRawPrevOutputCv = cv::Mat(h, dispw, CV_8UC2);
354 outimgcv(cv::Rect(w, 0, dispw, h)).copyTo(itsRawPrevOutputCv);
355
356 }
357 catch (std::logic_error const & e) { itsRawPrevOutputCv.release(); } // network still loading
358 }
359
360 // Show processing fps:
361 std::string const & fpscpu = timer.stop();
362 jevois::rawimage::writeText(outimg, fpscpu, 3, h - 13, jevois::yuyv::White);
363
364 // Show attended location:
367
368 // Send the output image with our processing results to the host over USB:
369 outframe.send();
370 }
371
372 // ####################################################################################################
373 protected:
374 std::shared_ptr<Saliency> itsSaliency;
375 std::shared_ptr<TensorFlow> itsTensorFlow;
376 std::vector<jevois::ObjReco> itsResults;
377 std::future<float> itsPredictFut;
379 cv::Mat itsCvImg;
381 int itsRx, itsRy, itsRw, itsRh; // last computed saliency ROI
382 };
383
384// Allow the module to be loaded as a shared object (.so) file:
JEVOIS_REGISTER_MODULE(ArUcoBlob)
int h
Detect salient objects and identify them using TensorFlow deep neural network.
virtual ~TensorFlowSaliency()
Virtual destructor for safe inheritance.
std::vector< jevois::ObjReco > itsResults
std::shared_ptr< Saliency > itsSaliency
virtual void getSalROI(jevois::RawImage const &inimg)
Helper function: compute saliency ROI in a thread, return top-left corner and size.
std::shared_ptr< TensorFlow > itsTensorFlow
TensorFlowSaliency(std::string const &instance)
Constructor.
JEVOIS_DECLARE_PARAMETER(foa, cv::Size, "Width and height (in pixels) of the focus of attention. " "This is the size of the image crop that is taken around the most salient " "location in each frame. The foa size must fit within the camera input frame size. To avoid " "rescaling, it is best to use here the size that the deep network expects as input.", cv::Size(128, 128), ParamCateg)
Parameter.
virtual void process(jevois::InputFrame &&inframe) override
Processing function, no video output.
virtual void process(jevois::InputFrame &&inframe, jevois::OutputFrame &&outframe) override
Processing function with video output to USB.
virtual void postUninit() override
Un-initialization.
std::future< float > itsPredictFut
unsigned int width
unsigned int height
void require(char const *info, unsigned int w, unsigned int h, unsigned int f) const
void sendSerialObjDetImg2D(unsigned int camw, unsigned int camh, float x, float y, float w, float h, std::vector< ObjReco > const &res)
StdModule(std::string const &instance)
std::string const & stop(double *seconds)
ENV_INTG32_TYPE intg32
32-bit signed integer
Definition env_types.h:52
#define LFATAL(msg)
#define LINFO(msg)
void paste(RawImage const &src, RawImage &dest, int dx, int dy)
cv::Mat cvImage(RawImage const &src)
void writeText(RawImage &img, std::string const &txt, int x, int y, unsigned int col, Font font=Font6x10)
void drawFilledRect(RawImage &img, int x, int y, unsigned int w, unsigned int h, unsigned int col)
cv::Mat rescaleCv(cv::Mat const &img, cv::Size const &newdims)
void convertCvRGBtoCvYUYV(cv::Mat const &src, cv::Mat &dst)
void drawRect(RawImage &img, int x, int y, unsigned int w, unsigned int h, unsigned int thick, unsigned int col)
std::future< std::invoke_result_t< std::decay_t< Function >, std::decay_t< Args >... > > async(Function &&f, Args &&... args)
std::string sformat(char const *fmt,...) __attribute__((format(__printf__
unsigned short constexpr Black
unsigned short constexpr LightPink
unsigned short constexpr White