Note: You are welcome to try the code with images of your own. This may require to train the code to classify the new fruit. To do this, open the Fruit (Colour Classifier)2.clf in the Color Classification Tool, open your new image, and tell the tool what the different fruit are.
In this project we have implemented fruit detection system for automatic harvesting of fruits .The system was implemented to detect the matured strawberry fruits from the plants , so that the robots could plug the fruits from the plants and collect on the tray. This image processing sytem used matlab for image processing.We have used deep neural networks for training .The proposed algorithm works well and compared with other algorithms in terms of accuracy.
Demonstration of (A) five different control mechanisms in V-REP, (B) inverse kinematics chain, (C, D) minimum distance calculation from tip vision sensor and plant/fruit model, (E) collision detection between robot links and plant model, and (F) path planning for moving a harvested fruit to the bin.
Messages appear in the Diagnostics Viewer. The code generator produces CUDA source and header files, and an HTML code generation report. The code generator places the files in a build folder, a subfolder named edgeDetection_grt_rtw under your current working folder.
This paper is organized as follows. Section 2 talks about the background of the study, i.e., it summarizes the role of CV methods applied in the domain of fruit detection and classification. Section 3 introduces an overview of the DL, i.e., CNN model and the prerequisite to implementing a CNN model. Section 4 talks about fruit detection and classification as well as various DL models applied in the field. We discuss about the utilized datasets by different authors in Section 5. In Section 6, we discuss evaluation metrics while Section 7 illustrates our experimental analysis on fruit classification based on a DL model using the popular Fruit 360 dataset. We further illustrate the use of transfer learning in the fruit detection and classification field and compare the result with the CNN models trained and developed from scratch. Section 8 highlights the future needs in the design of fruit detection and classification algorithms, and finally, we conclude our study in Section 9.
Description of recognition (detection and classification) could also be interpreted in various ways:(i)The identification of a fruit (differentiating a fruit and an object, e.g., a leaf and a background).(ii)Classification of the fruit species (e.g., orange and tangelo).(iii)Recognition of a number of species of fruit (e.g., Crimson Snow Apple from Granny Smith Apple).
Preprocessing and segmentation is a vital step in the field of classification and detection. Since fruits vary in shape, size, color, and texture, preprocessing is the first and most important thing to do in the task of fruit detection and classification. During the preprocessing stage, images taken are preprocessed to remove the background noise, thereby extracting the fruit image, as shown in Figure 3. After that, most researchers convert the image into a grey image from RGB before converting to binary image. Since the introduction of DL, feature extraction has been the widely used preprocessing step after dataset acquisition. Techniques such as FCH, MI, and so on are used to extract the fruit features (shape, color, and size) before converting them into vector features.
This model was adopted by Sa et al. [71]. The purpose was to establish a neural network which would be used to harvest fruits from autonomous robots. The network model employs transfer learning using ImageNet and two input image types: RGB and NIR (near infrared). There were two ways to merge the RGB and NIR input images: early fusion and late fusion. To begin with, 3 RGB channels and 1 NIR channel are needed for the early fusion. Late fusion utilized 2 explicitly trained models, which are paired with predictions from both models and results summed. Chen et al. [40] likewise employed a faster region-based convolutional neural network for fruit detection in orchards and compared the performance against other architectures (VGG and ZFNet). There are five convolutional layers in the ZF network and 13 deeper layers in the VGG-16 network. There is a set of convolutional layers from which a 3-channel input image is propagated, from the region of interest. Each box is spread across fully connected layers that offer back its class likelihood and retract a finer boundary box around each object. The ground truth of the input image is used during training in the RPN and the R-CNN layers. A class specific detection level is added to the performance during testing and a non-maximum detection threshold is applied to avoid overlap.
VGG-16 which is a 13-layer convolution model was compared by Bargoti and Underwood [41] against a faster region-based convolutional neural network for deep fruit detection in orchards. The performance from the convolution layers is a map with high dimensions, sampled by 16 due to the steps of the layers in the pooling. In the local function map areas, a layer of box regression and a box classification layer are distributed to two entirely related siblings.
YOLO V2 was used by Xiong et al. [88] for the visual detection of green mangos in orchards with the aid of an unmanned aerial vehicle (UAV). The YOLO V2 consists of 19 convolutional layers and five maximum pooling layers and achieves greater detection accuracy while preserving the detection pace of the YOLO. We can additionally see the use of YOLO V2 in the work of Santos et al. [87] for fruit recognition system. It was used to identify, segment, and track grapes, among other things. For the assessment of fruit identification, they used the Embrapa Wine Grape Instance Segmentation Dataset (WGISD).
We noticed the use of the EfficientNet model in the work of Thi Phuong Chung and Van Tai [77] for fruit recognition system. It makes use of pretrained convolution neural networks for performing image related functions in structure of a base network. EfficientNet originally performs a grid search for the base search in order to decide the relationships between the unique scaling dimensions of the network while looking at each model dimension and availability of computational resources.
A single-shot CNN was employed by Bresilla et al. [78] for a real-time fruit detection within a tree. A modification of the YOLO V2 model was employed here. The modification was seen at the grid search method. They changed the standard model input grid by removing some layers of the model and came up with a new model that utilized only 11 layers, dual grid size, and additional two new blocks.
For the task of detection and segmentation of overlapped fruits with apple harvesting robot, Jia et al. [90] introduced the use of optimized Mask R-CNN. They replaced the original backbone network structure of the Mask R-CNN with a combination of the ResNet and DenseNet network structure for the feature extraction. Reason for the replacement was to increase feature reusability and transitivity by the usage of less parameters, and they still obtained an excellent performance. The resulting feature map generated by this backbone network was fed into the RPN as input to generate the region proposal with the idea that the input is for each feature map accordingly. Lastly, the full convolution network generates the mask which shows the region where the apple is found. Mask R-CNN was also employed by Santos et al. [87] for grape detection, segmentation, and tracking using the Embrapa Wine Grape Instance Segmentation Dataset (WGISD). Yu et al. [81] also modified the Mask R-CNN for the task of fruit detection for strawberry harvesting robot in non-structural environment. The feature extractor was built on top of the ResNet-50 backbone architecture and the feature pyramid network (FPN). Sequel to the modelling of the region proposal for each feature map, they trained the region proposal network end to end after which the generation of the ripe fruit mask images was done, and a visual localization method was carried out for the strawberry picking points.
Liu et al. [82] employed this method in the classification of kiwifruit detection. They pretrained the original convolutional layers of the VGG-16 network with ImageNet dataset and fine-tuned them with the RGB and NIR images of kiwifruit training dataset denoted accordingly as RGB-only and NIR-only.
This model was proposed by Xue et al. [109] for hybrid deep learning-based fruit classification. This model pretrained images with a convolution autoencoder and extracted image features using an attention-based DenseNet. In the first part of the system, the greedy layerwise CAE is pretrained with an unsupervised method with an image set. Initializing a variety of weights and ADN biases using a CAE structure, the supervised ADN with the ground truth is applied to the second part of the system. The last section of the system includes a forecast of the fruit group.
Sampled images consisting of real-world information are referred to as datasets, and the term data acquisition is the method of digital collection of such images. For obtaining a good classifier, a high-quality dataset is important. The most difficult detection task is the absence of sufficient labelled samples. During the course of our research, we found that most researchers, especially for object detection, deal with real-time identification of fruits mostly in orchards. Each researcher utilized his/her own dataset. We will briefly discuss some of the datasets deployed by researchers for the classification of fruits. We have used various datasets but will focus more on the dataset that was made available by the authors online. Table 3 summarizes the datasets recorded in our reviewed papers that were not made available for other authors to use in their work. We will briefly discuss the dataset made available by the authors within our reviewed publications.
aa06259810