2020 has taught us to be ready for any coming pandemics. The life in the lockdown has been at hold, making business and services hit at their lowest point ever. This urges the need of an alternative that could replace humans and still complete the essential work. This has led computer vision to be the trending topic. As the need of vision without human presence has become crucial due to the virus spread. Traditional methods to detect objects like infrared multi beam, laser based and video sensor-based detectors are expensive and complex to handle. Thus, deep learning techniques such as RCNN, SSD and YOLO are being incorporated to achieve high accuracy and low cost. Furthermore, these methods are not suitable for real time detection. Therefore, using YOLOv3 algorithm build on modified Darknet-53 framework can make real time detection in one step faster and accurate. This network will train on 3D images unlike current networks which train on 2D images and then extrapolate to retrieve object 3D coordinates. This can allow detection to operate on low power devices such as phones and tablets as the size is small. Leading to a smart city with high security and safety.
During the pandemic, coming in close contact with humans can put a life in danger. This pandemic has taught us the value of maintaining distance even with close ones. There has been a whooping decline of human services and businesses due to current situation. While the vaccines are being tested, creative alternatives are coming in trend to replace physical services and offices. One of the hot topics is using computer vision to detect objects. This can save human lives at risk meanwhile getting the work done accurately and efficiently. Object detection has been incorporated in most of the services to replace humans and make the city smarter. For instance, traffic monitoring, surveillance system, self-driving and monitoring system . Computer vision is also used in detecting wildlife in digital photos as recognizing it by human eyes can be very challenging in complex background due to weather and seasonal changes . Recognizing wildlife in digital images with complex background can be of social, economical and technical significance .
The traditional methods that are used for object detection includes infrared multi beam detector, laser-based detector and video sensor-based detector . However, these methods have high false detection rate because of large blind spot which is not being covered during detection process. These methods require more equipment to be stable which increase the cost of the detector. Infrared multi beam and laser-based detector do not work well for long distance detection. Also, traditional methods need to be manually designed which can be time consuming and increases labor work [5–8]. Thus, deep learning has become popular in replacing any object detection approaches as it being efficient, accurate and faster. RCNN, SSD and YOLO are well known for being used in deep learning. There are Fast, Faster and Mask RCNN which have been evolved from RCNN. But these networks are generally large and costs more . SSD is faster and robust but it requires many parameters. YOLOv1, YOLOv2 and YOLOv3 are versions of YOLO which is faster than SSD and higher in accuracy. Moreover, YOLOv3 has better speed and accuracy than SSD300 and Faster R-CNN . These are not advisable for real time object detection. Complex YOLO or YOLO 3D are mostly used for real time detection as their architecture is not as big as other YOLO versions, SSD and RCNNs [11–12]. Also, small objects and overlapping can be accurately detected using YOLOv3 . Thus, YOLOv3 is an efficient method to be used for detection.
Currently, YOLOv3 is being trained on 2D images and then using triangulation methods the 3D coordinates of the detected object in the image is obtained. This method has two step process which makes it slower but accurate. Furthermore, the small objects are difficult to be detected and the very similar scenes are difficult to differentiate using this two-step process . To address this problem, a network that trains on 3D images and outputs the detected location of the objects in one step can be completed. This network will be based on Darknet-53 feature extractor which has 53 layers. It aids the network in learning and creating shortcuts between the layers . This extractor will be modified in such a way that even the depth values will be labelled along with the objects in the 3D images. To train this network 670 3D vehicle images from KITTI dataset will be used. 70 percent of the data will be used for training while 15 percent of the data will be used for verification and remaining 15 percent will be used for testing. Before training, the data must be labelled manually along with the depth values. These labelled data will be then trained on the YOLOv3 network for 3D detection. Moreover, this trained network will be verified on verification data and then tested using testing data.
The incorporation of this network can enhance the world of computer vision by making it efficient and accurate. This can help security and surveillance by a huge margin as small and very similar objects in complex environment can be detected. Reducing the weight and downsizing the network can make this technology work on low computational power devices like mobiles and tablets. This can make it easier for usage and portable. Thus, this can lead to a safer environment to live without normal lifestyle being interrupted due to pandemic and restrictions. This can make the city smarter and risk free.
 C. Kumar B., R. Punitha, and Mohana, “YOLOv3 and YOLOv4: Multiple Object Detection for Surveillance Applications,” in 2020 Third International Conference on Smart Systems and Inventive Technology (ICSSIT), Aug. 2020, pp. 1316–1321.
 M. Gabriel, S. Cha, N. Y. B. Al-Nakash, and D. Yun, “Wildlife Detection and Recognition in Digital Images Using YOLOv3: Extended Abstract,” in 2020 IEEE Cloud Summit, Oct. 2020, pp. 170–171.
 Z. Zhaoet al, “Object detection with deep learning: A review,”IEEETrans. Neural Netw. Learn. Syst., vol. 30, no. 11, pp. 3212–3232, 2019.
 Y. Dai, W. Liu, H. Li, and L. Liu, “Efficient Foreign Object Detection Between PSDs and Metro Doors via Deep Neural Networks,” IEEE Access, vol. 8, pp. 46723–46734, 2020.
 N. Dalal and B. Triggs, ‘‘Histograms of oriented gradients for humandetection,’’ inProc. IEEE Comput. Soc. Conf. Comput. Vis. PatternRecognit. (CVPR), San Diego, CA, USA, Jun. 2005, pp. 886–893, doi:10.1109/CVPR.2005.177.
 P. F. Felzenszwalb, D. A. McAllester, and D. Ramanan, ‘‘A discrimina-tively trained, multiscale, deformable part model,’’ inProc. IEEE Comput.Soc. Conf. Comput. Vis. Pattern Recognit. (CVPR), Anchorage, AK, USA,Jun. 2008, pp. 24–26, doi: 10.1109/CVPR.2008.4587597.
 Z. Tu, Z. Guo, W. Xie, M. Yan, R. C. Veltkamp, B. Li, andJ. Yuan, ‘‘Fusing disparate object signatures for salient object detec-tion in video,’’Pattern Recognit., vol. 72, pp. 285–299, Dec. 2017, doi:10.1016/j.patcog.2017.07.028.
 Z. Zou, Z. Shi, Y. Guo, and J. Ye, ‘‘Object detection in 20years: A survey,’’ 2019,arXiv:1905.05055. [Online]. Available:http://arxiv.org/abs/1905.05055.
 M. Takahashi, A. Moro, Y. Ji, and K. Umeda, “Expandable YOLO: 3D Object Detection from RGB-D Images,” arXiv [cs.CV], Jun. 26, 2020.
 P. Li and H. Li, “Research on FOD Detection for Airport Runway based on YOLOv3,” in 2020 39th Chinese Control Conference (CCC), Jul. 2020, pp. 7096–7099.
 M. Simon, S. Milz, K. Amende, and H. M. Gross, “Complex-YOLO: An Euler-Region-Proposal for Real-Time 3D Object Detection on Point Clouds,” in Proc. of the European Conference on Computer Vision (ECCV), pp. 197-209, 2018.
 W. Ali, S. Abdelkarim, M. Zahran, M. Zidan, and A. E. Sallab,“YOLO3D: End-to-end real-time 3D OrientedObject Bounding Box Detection from LiDARPoint Cloud,”arXiv preprint arXiv:1808.02350,2018.
 L. Chen, B. Li, and L. Qi, “Improved YOLOv3 Algorithm for Ship Target Detection,” in 2020 39th Chinese Control Conference (CCC), Jul. 2020, pp. 7288–7293.
 K. He, X. Zhang, S. Ren, and J. Sun, ‘‘Deep residual learning for imagerecognition,’’ inProc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR),Jun. 2016, pp. 770–778.