Publications

* Click on the title to read (or hide) the abstract.

2020

  Conference   ASCENSAO, N.; AFONSO, L.; COLOMBO, D.; Oliveira, L.; PAPA, J. P. Information Ranking Using Optimum-Path Forest. In: IEEE World Congress on Computational Intelligence, 2020, Glasgow. International Joint Conference on Neural Network, 2020.

Abstract: Image segmentation is the task of assigning a label to each image pixel. When the number of labels is greater than two (multi-label) the segmentation can be modelled as a multi-cut problem in graphs. In the general case, finding the minimum cut in a graph is an NP-hard problem, in which improving the results concerning time and quality is a major challenge. This paper addresses the multi-label problem applied in interactive image segmentation. The proposed approach makes use of dynamic programming to initialize an α-expansion, thus reducing its runtime, while keeping the Dice-score measure in an interactive segmentation task. Over BSDS data set, the proposed algorithm was approximately 51.2% faster than its standard counterpart, 36.2% faster than Fast Primal-Dual (FastPD) and 10.5 times faster than quadratic pseudo-boolean optimization (QBPO) optimizers, while preserving the same segmentation quality.

  Conference   FONTINELE, J.; MENDONÇA, M.; RUIZ, M.; PAPA, J.; OLIVEIRA, L. Faster α-expansion via dynamic programming and image partitioning. IEEE World Congress on Computational Intelligence, 2020, Glasgow. International Joint Conference on Neural Network, 2020.

Abstract: Image segmentation is the task of assigning a label to each image pixel. When the number of labels is greater than two (multi-label) the segmentation can be modelled as a multi-cut problem in graphs. In the general case, finding the minimum cut in a graph is an NP-hard problem, in which improving the results concerning time and quality is a major challenge. This paper addresses the multi-label problem applied in interactive image segmentation. The proposed approach makes use of dynamic programming to initialize an α-expansion, thus reducing its runtime, while keeping the Dice-score measure in an interactive segmentation task. Over BSDS data set, the proposed algorithm was approximately 51.2% faster than its standard counterpart, 36.2% faster than Fast Primal-Dual (FastPD) and 10.5 times faster than quadratic pseudo-boolean optimization (QBPO) optimizers, while preserving the same segmentation quality.

  Journal   CHAGAS, P.; SOUZA, L.; ARAÚJO, I.; ALDEMAN, N.; DUARTE, A.; ANGELO, M.; DOS-SANTOS, WL; OLIVEIRA, L. Classification of glomerular hypercellularity using convolutional features and support vector machine. Artificial Intelligence in Medicine. 2020 Mar 1;103:101808.

Abstract: Glomeruli are histological structures of the kidney cortex formed by interwoven blood capillaries, and are responsible for blood filtration. Glomerular lesions impair kidney filtration capability, leading to protein loss and metabolic waste retention. An example of lesion is the glomerular hypercellularity, which is characterized by an increase in the number of cell nuclei in different areas of the glomeruli. Glomerular hypercellularity is a frequent lesion present in different kidney diseases. Automatic detection of glomerular hypercellularity would accelerate the screening of scanned histological slides for the lesion, enhancing clinical diagnosis. Having this in mind, we propose a new approach for classification of hypercellularity in human kidney images. Our proposed method introduces a novel architecture of a convolutional neural network (CNN) along with a support vector machine, achieving near perfect average results on FIOCRUZ data set in a binary classification (lesion or normal). Additionally, classification of hypercellularity sub-lesions was also evaluated, considering mesangial, endocapilar and both lesions, reaching an average accuracy of 82%. Either in binary task or in the multi-classification one, our proposed method outperformed Xception, ResNet50 and InceptionV3 networks, as well as a traditional handcrafted-based method. To the best of our knowledge, this is the first study on deep learning over a data set of glomerular hypercellularity images of human kidney.

2019

  Journal   ARAÚJO, POMPÍLIO; FONTINELE, JEFFERSON; OLIVEIRA, L. Multi-perspective object detection for remote criminal analysis using drones. In: IEEE Geoscience and Remote Sensing Letters, 2019.

Abstract: When a crime is committed, the associated site must be preserved and reviewed by a criminal expert. Some tools are commonly used to ensure the total registration of the crime scene with minimal human interference. As a novel tool, we propose here an intelligent system that remotely recognizes and localizes objects considered as important evidences at a crime scene. Starting from a general viewpoint of the scene, a drone system defines trajectories through which the aerial vehicle performs a detailed search to record evidences. A multiperspective detection approach is introduced by analyzing several images of the same object in order to improve the reliability of the object recognition. To our knowledge, it is the first work on remote autonomous sensing of crime scenes. Experiments showed an accuracy increase of 18.2 percentage points, when using multiperspective detection.

  Conference   RUIZ, M.; FONTINELE, J.; PERRONE, R.; SANTOS, M.; OLIVEIRA, L. A Tool for Building Multi-purpose and Multi-pose Synthetic Data Sets. In: ECCOMAS THEMATIC CONFERENCE ON COMPUTATIONAL VISION AND MEDICAL IMAGE PROCESSING, Lecture Notes in Computational Vision and Biomechanics, 2019.

Abstract: Modern computer vision methods typically require expensive data acquisition and accurate manual labeling. In this work, we instead leverage the recent progress in computer graphics to propose a novel approach of designing and generating large scale multi-purpose image data sets from 3D object models directly, captured from multiple categorized camera viewpoints and controlled environmental conditions. The set of rendered images provide data for geometric computer vision problems such as depth estimation, camera pose estimation, 3D box estimation, 3D reconstruction, camera calibration, and also pixel-perfect ground truth for scene understanding problems, such as: semantic and instance segmentation, object detection, just to cite a few. In this paper, we also survey the most well-known synthetic data sets used in computer vision tasks, pointing out the relevance of rendering images for training deep neural networks. When compared to similar tools, our generator contains a wide set of features easy to extend, besides allowing for building sets of images in the MSCOCO format, so ready for deep learning works. To the best of our knowledge, the proposed tool is the first one to generate large-scale, multi-pose, synthetic data sets automatically, allowing for training and evaluation of supervised methods for all of the covered features.

  Conference   BARBOSA, L.; DAHIA, G.; PAMPLONA, M. Expression removal in 3D faces for recognition purposes. In: Brazilian Conference on Intelligent Systems, 2019.

Abstract: We present an encoder-decoder neural network to remove deformations caused by expressions from 3D face images. It receives a 3D face with or without expressions as input and outputs its neutral form. Our objective is not to obtain the most realistic results but to enhance the accuracy of 3D face recognition systems. To this end, we propose using a recognition-based loss function during training so that our network can learn to maintain important identity cues in the output. Our experiments using the Bosphorus 3D Face Database show that our approach successfully reduces the difference between face images from the same subject affected by different expressions and increases the gap between intraclass and interclass difference values. They also show that our synthetic neutral images improved the results of four different well-known face recognition methods.

  Conference   Emeršič et al. The Unconstrained Ear Recognition Challenge 2019. In: IAPR International Conference on Biometrics, 2019.

Abstract: This paper presents a summary of the 2019 Unconstrained Ear Recognition Challenge (UERC), the second in a series of group benchmarking efforts centered around the problem of person recognition from ear images captured in uncontrolled settings. The goal of the challenge is to assess the performance of existing ear recognition techniques on a challenging large-scale ear dataset and to analyze performance of the technology from various viewpoints, such as generalization abilities to unseen data characteristics, sensitivity to rotations, occlusions and image resolution and performance bias on sub-groups of subjects, selected based on demographic criteria, i.e. gender and ethnicity. Research groups from 12 institutions entered the competition and submitted a total of 13 recognition approaches ranging from descriptor-based methods to deep-learning models. The majority of submissions focused on ensemble based methods combining either representations from multiple deep models or hand-crafted with learned image descriptors. Our analysis shows that methods incorporating deep learning models clearly outperform techniques relying solely on hand-crafted descriptors, even though both groups of techniques exhibit similar behaviour when it comes to robustness to various covariates, such presence of occlusions, changes in (head) pose, or variability in image resolution. The results of the challenge also show that there has been considerable progress since the first UERC in 2017, but that there is still ample room for further research in this area.

  Journal   MINETTO, R.; PAMPLONA, M.; SARKAR, S. Hydra: An Ensemble of Convolutional Neural Networks for Geospatial Land Classification. In: IEEE Transactions on Geoscience and Remote Sensing, 2019.

Abstract: In this paper, we describe Hydra, an ensemble of convolutional neural networks (CNNs) for geospatial land classification. The idea behind Hydra is to create an initial CNN that is coarsely optimized but provides a good starting pointing for further optimization, which will serve as the Hydra’s body. Then, the obtained weights are fine-tuned multiple times with different augmentation techniques, crop styles, and classes weights to form an ensemble of CNNs that represent the Hydra’s heads. By doing so, we prompt convergence to different endpoints, which is a desirable aspect for ensembles. With this framework, we were able to reduce the training time while maintaining the classification performance of the ensemble. We created ensembles for our experiments using two state-of-the-art CNN architectures, residual network (ResNet), and dense convolutional networks (DenseNet). We have demonstrated the application of our Hydra framework in two data sets, functional map of world (FMOW) and NWPU-RESISC45, achieving results comparable to the state-of-the-art for the former and the best-reported performance so far for the latter. Code and CNN models are available at https://github.com/maups/hydra-fmow.

  Journal   NEVES, G.; RUIZ, M.; FONTINELE, J.; OLIVEIRA, L. Rotated object detection with forward-looking sonar in underwater applications. In: Elsevier Expert Systems with Applications, 2019.

Abstract: Autonomous underwater vehicles (AUVs) are often used to inspect the condition of submerged structures in oil and gas fields. Because the use of global positioning systems to aid AUV navigation is not feasible, object detection is an alternative method of supporting underwater inspection missions by detecting landmarks. Objects are detected not only to plan the trajectory of the AUVs, but their inspection can be the ultimate goal of the mission. In both cases, detecting an object’s distance and orientation with respect to the AUV provides clues for the vehicle’s navigation. Accordingly, we introduce a novel multi-object detection system that outputs object position and rotation from sonar images to support AUV navigation. To achieve this aim, two novel convolutional neural network-based architectures are proposed to detect and estimate rotated bounding boxes: an end-to-end network (RBoxNet), and a pipeline comprised of two networks (YOLOv2+RBoxDNet). Both proposed networks are structured from one of three novel representations of rotated bounding boxes regressed deep inside. Experimental analyses were performed by comparing several configurations of our proposed methods (by varying the backbone, regression representation, and architecture) with state-of-the-art methods using real sonar images. Results showed that RBoxNet presents the optimum trade-off between accuracy and speed, reaching an averaged mAP@[.5,.95] of 90.3% at 8.58 frames per second (FPS), while YOLOv2+RBoxDNet is the fastest solution, running at 16.19 FPS but with a lower averaged mAP@[.5,.95] of 77.5%. Both proposed methods are robust to additive Gaussian noise variations, and can detect objects even when the noise level is up to 0.10.

  Journal   ARAUJO JR., P.; MENDONÇA, M; OLIVEIRA, L. Towards Autonomous Investigation of Crime Scene by Using Drones. In: Sensors & Transducers, 2019.

Abstract: A location associated with a committed crime must be preserved, even before criminal experts start collecting and analyzing evidences. Indeed, crime scenes should be recorded with minimal human interference. In order to help specialists to accomplish this task, we propose an autonomous system for investigation of a crime scene using a drone. Our proposed autonomous system recognizes objects considered as important evidence at a crime scene, defining the trajectories through which a drone performs a detailed search. We used our previously proposed method, called Air- SSLAM, to estimate drone’s pose, as well as proportional-integral-derivative controllers for aircraft stabilization. The goal is to make the drone fly through the paths defined by the objects recognized across the scene. At the end, the proposed system outputs a report containing a list of evidences, sketches, images and videos collected during the investigation. The performance of our system is assessed from a simulator, and a real- life drone system is being prepared to reach the goal.

  Journal   ABDALLA, K; MENEZES, I.; OLIVEIRA, L. Modelling perceptions on the evaluation of video summarization. In: Elsevier Expert Systems with Applications, 2019.

Abstract: Hours of video are uploaded to streaming platforms every minute, with recommender systems suggest- ing popular and relevant videos that can help users save time in the searching process. Recommender systems regularly require video summarization as an expert system to automatically identify suitable video entities and events. Since there is no well-established methodology to evaluate the relevance of summarized videos, some studies have made use of user annotations to gather evidence about the effec- tiveness of summarization methods. Aimed at modelling the user’s perceptions, which ultimately form the basis for testing video summarization systems, this paper seeks to propose: (i) A guideline to collect unrestricted user annotations, (ii) a novel metric called compression level of user annotation (CLUSA) to gauge the performance of video summarization methods, and (iii) a study on the quality of annotated video summaries collected from different assessment scales. These contributions lead to benchmarking video summarization methods with no constraints, even if user annotations are collected from different assessment scales for each method. Our experiments showed that CLUSA is less susceptible to unbalanced compression data sets in comparison to other metrics, hence achieving higher reliability estimates. CLUSA also allows to compare results from different video summarizing approaches.

  Conference   ARAUJO JR., P.; MENDONÇA, M; OLIVEIRA, L. AirCSI – Remotely Criminal Investigator. In: International Conference on Advances in Signal Processing and Artificial Intelligence (ASPAI'2019), Barcelona, Spain, 2019.

Abstract: Once a location associated with a committed crime must be preserved, even before criminal experts start collecting and analyzing evidences, the crime scene should be recorded with minimal human interference. In this work, we introduce an autonomous system for investigation of crime scene using a drone. Our proposed intelligent system recognizes objects considered as important evidence of the crime scene, and defines the trajectories through which the drone performs a detailed search to record evidences of the scene. We used our own method, called Air-SSLAM, to estimate drone’s pose, as well as proportional–integral–derivative (PID) controllers for aircraft stabilization, while flying through the paths defined by the environment recognition step. We evaluated the performance of our system in a simulator, also preparing a real-drone system to work in a real environment.

2018

  Conference   ROTICH, G.; AAKUR, S.; MINETTO, R.; PAMPLONA, M.; SARKAR, S. Continuous Biometric Authentication using Possibilistic C-Means. In: IEEE International Conference on Fuzzy Systems, 2018.

Abstract: We propose a continuous biometric authentication framework that uses the Possibilistic C-Means (PCM) algorithm to guarantee that only authorized users can access a protected system. PCM is employed to cluster a history of biometric samples in two classes: genuine and impostor. The degree of membership of the current biometric sample to those classes is then used as a score, which is fused over time to reach a decision regarding the safety of the system. The main advantage of our approach is that it is training-free, and thus is applicable to any biometric feature that can be captured continuously without modification. We evaluated our system using 2D, 3D and NIR videos of faces and achieved results comparable to a training-based state-of-art work.

  Conference   ROTICH, G.; AAKUR, S.; MINETTO, R.; PAMPLONA, M.; SARKAR, S. Using Semantic Relationships among Objects for Geospatial Land Use Classification. In: IEEE Applied Imagery Pattern Recognition Workshop, 2019.

Abstract: The geospatial land recognition is often cast as a local-region based classification problem. We show in this work, that prior knowledge, in terms of global semantic relationships among detected regions, allows us to leverage semantics and visual features to enhance land use classification in aerial imagery. To this end, we first estimate the top-k labels for each region using an ensemble of CNNs called Hydra. Twelve different models based on two state-of-the-art CNN architectures, ResNet and DenseNet, compose this ensemble. Then, we use Grenander’s canonical pattern theory formalism coupled with the common-sense knowledge base, ConceptNet, to impose context constraints on the labels obtained by deep learning algorithms. These constraints are captured in a multi-graph representation involving generators and bonds with a flexible topology, unlike an MRF or Bayesian networks, which have fixed structures. Minimizing the energy of this graph representation results in a graphical representation of the semantics in the given image. We show our results on the recent fMoW challenge dataset. It consists of 1,047,691 images with 62 different classes of land use, plus a false detection category. The biggest improvement in performance with the use of semantics was for false detections. Other categories with significantly improved performance were: zoo, nuclear power plant, park, police station, and space facility. For the subset of fMow images with multiple bounding boxes the accuracy is 72.79% without semantics and 74.06% with semantics. Overall, without semantic context, the classification performance was 77.04%. With semantics, it reached 77.98%. Considering that less than 20% of the dataset contained more than one ROI for context, this is a significant improvement that shows the promise of the proposed approach.

  Conference   HANSLEY, EE.; PAMPLONA, M.; SARKAR, S. Employing fusion of learned and handcrafted features for unconstrained ear recognition. In: IET Biometrics, 2018.

Abstract: We present an unconstrained ear recognition framework that outperforms state-of-the-art systems in different publicly available image databases. To this end, we developed CNN-based solutions for ear normalization and description, we used well-known handcrafted descriptors, and we fused learned and handcrafted features to improve recognition. We designed a two-stage landmark detector that successfully worked under untrained scenarios. We used the results generated to perform a geometric image normalization that boosted the performance of all evaluated descriptors. Our CNN descriptor outperformed other CNN-based works in the literature, specially in more difficult scenarios. The fusion of learned and handcrafted matchers appears to be complementary as it achieved the best performance in all experiments. The obtained results outperformed all other reported results for the UERC challenge, which contains the most difficult database nowadays.

  Journal   SANTOS, M.; OLIVEIRA, L. ISEC: Iterative over-Segmentation via Edge Clustering. In: Elsevier Image and Vision Computing, 2018.

Abstract: Several image pattern recognition tasks rely on superpixel generation as a fundamental step. Image analysis based on superpixels facilitates domain-specific applications, also speeding up the overall processing time of the task. Recent superpixel methods have been designed to fit boundary adherence, usually regulating the size and shape of each superpixel in order to mitigate the occurrence of undersegmentation failures. Superpixel regularity and compactness sometimes imposes an excessive number of segments in the image, which ultimately decreases the efficiency of the final segmentation, specially in video segmentation. We propose here a novel method to generate superpixels, called iterative over-segmentation via edge clustering (ISEC), which addresses the over-segmentation problem from a different perspective in contrast to recent state-of-the-art approaches. ISEC iteratively clusters edges extracted from the image objects, providing adaptive superpixels in size, shape and quantity, while preserving suitable adherence to the real object boundaries. All this is achieved at a very low computational cost. Experiments show that ISEC stands out from existing methods, meeting a favorable balance between segmentation stability and accurate representation of motion discontinuities, which are features specially suitable to video segmentation.

  Conference   JADER, G.; FONTINELE, J.; RUIZ, M.; ABDALLA, K.; PITHON, M.; OLIVEIRA, L. Deep instance segmentation of teeth in panoramic X-ray images. In: Conference on Graphics, Patterns and Images (SIBGRAPI'2018), Foz do Iguaçu, 2018.

Abstract: In dentistry, radiological examinations help specialists by showing structure of the tooth bones with the goal of screening embedded teeth, bone abnormalities, cysts, tumors, infections, fractures, problems in the temporomandibular regions, just to cite a few. Sometimes, relying solely in the specialist’s opinion can bring differences in the diagnoses, which can ultimately hinder the treatment. Although tools for complete automaticdiagnosis are no yet expected, image pattern recognition has evolved towards decision support, mainly starting with the detection of teeth and their components in X-ray images. Tooth detection has been object of research during at least the last two decades, mainly relying in threshold and region-based methods. Following a different direction, this paper proposes to explore a deep learning method for instance segmentation of the teeth. To the best of our knowledge, it is the first system that detects and segment each tooth in panoramic X-ray images. It is noteworthy that this image type is the most challenging one to isolate teeth, since it shows other parts of patient’s body (e.g., chin, spine and jaws). We propose a segmentation system based on mask regionbased convolutional neural network to accomplish an instance segmentation. Performance was thoroughly assessed from a 1500 challenging image data set, with high variation and containing 10 categories of different types of buccal image. By training the proposed system with only 193 images of mouth containing 32 teeth in average, using transfer learning strategies, we achieved 98% of accuracy, 88% of F1-score, 94% of precision, 84% of recall and 99% of specificity over 1224 unseen images, results very superior than other 10 unsupervised methods.

  Journal   SOUZA, L.; OLIVEIRA, L.; PAMPLONA, M.; PAPA, J. How far did we get in face spoofing detection?. In: Elsevier Engineering Applications of Artificial Intelligence, 2018.

Abstract: The growing use of control access systems based on face recognition shed light over the need for even more accurate systems to detect face spoofing attacks. In this paper, an extensive analysis on face spoofing detection works published in the last decade is presented. The analyzed works are categorized by their fundamental parts, i.e., descriptors and classifiers. This structured survey also brings a comparative performance analysis of the works considering the most important public data sets in the field. The methodology followed in this work is particularly relevant to observe temporal evolution of the field, trends in the existing approaches, to discuss still opened issues, and to propose new perspectives for the future of face spoofing detection.

  Journal   SILVA, G.; OLIVEIRA, L.; PITHON, M. Automatic segmenting teeth in X-ray images: Trends, a novel data set, benchmarking and future perspectives. In: Elsevier Expert Systems with Applitions, 2018.

Abstract: This review presents an in-depth study of the literature on segmentation methods applied in dental imaging. Several works on dental image segmentation were studied and categorized according to the type of method (region-based, threshold-based, cluster-based, boundary-based or watershed-based), type of X-ray images analyzed (intra-oral or extra-oral), and characteristics of the data set used to evaluate the methods in each state-of-the-art work. We found that the literature has primarily focused on threshold-based segmentation methods (54%). 80% of the reviewed articles have used intra-oral X-ray images in their experiments, demonstrating preference to perform segmentation on images of already isolated parts of the teeth, rather than using extra-oral X-rays, which also show tooth structure of the mouth and bones of the face. To fill a scientific gap in the field, a novel data set based on extra-oral X-ray images, presenting high variability and with a large number of images, is introduced here. A statistical comparison of the results of 10 pixel-wise image segmentation methods over our proposed data set comprised of 1,500 images is also carried out, providing a comprehensive source of performance assessment. Discussion on limitations of the benchmarked methods, as well as future perspectives on exploiting learning-based segmentation methods to improve performance, is also addressed. Finally, we present a preliminary application of the MASK recurrent convolutional neural network to demonstrate the power of a deep learning method to segment images from our data set.

2017

  Conference   DAHIA, G. ; SANTOS, M. M. B. ; PAMPLONA SEGUNDO, M. A study of CNN outside of training conditions. In: IEEE International Conference on Image Processing (ICIP2017), Beijing, 2017.

Abstract: Convolution neural networks (CNN) are the main development in face recognition in recent years. However, their description capacities have been somewhat understudied. In this paper, we show that training CNN only with color images is enough to properly describe depth and near infrared face images by assessing the performance of three publicly available CNN models on these other modalities. Furthermore, we find that, despite displaying results comparable to the human performance on LFW, not all CNN behave like humans recognizing faces in other scenarios.

  Journal   CERQUEIRA, R.; TROCOLI, T.; NEVES, G.; JOYEUX, S.; ALBIEZ, J.; OLIVEIRA, L. A novel GPU-based sonar simulator for real-time applications. In: Elsevier Computers and Graphics, 2017.

Abstract: Mainly when applied in the underwater environment, sonar simulation requires great computational effort due to the complexity of acoustic physics. Simulation of sonar operation allows evaluating algorithms and control systems without going to the real underwater environment; that reduces the costs and risks of in-field experiments. This paper tackles with the problem of real-time underwater imaging sonar simulation by using the OpenGL shading language chain on GPU. Our proposed system is able to simulate two main types of acoustic devices: mechanical scanning imaging sonars and forward-looking sonars. The underwater scenario simulation is performed based on three frameworks: (i) OpenSceneGraph reproduces the ocean visual effects, (ii) Gazebo deals with physical forces, and (iii) the Robot Construction Kit controls the sonar in underwater environments. Our system exploits the rasterization pipeline in order to simulate the sonar devices, which are simulated by means of three parameters: the pulse distance, the echo intensity and the sonar field-of-view, being all calculated over observable objects shapes in the 3D rendered scene. Sonar-intrinsic operational parameters, speckle noise and object material properties are also considered as part of the acoustic image. Our evaluation demonstrated that the proposed system is able to operate close to or faster than the real-world devices. Also, our method generates visually realistic sonar images when compared with real-world sonar images of the same scenes.

  Journal   ARAÚJO, POMPÍLIO; MIRANDA, RODOLFO; CARMO, DIEDRE; ALVES, RAUL; OLIVEIRA, L. Air-SSLAM: A visual stereo indoor SLAM for aerial quadrotors. In: IEEE Geoscience and Remote Sensing Letters, 2017.

Abstract: In this letter, we introduce a novel method for visual simultaneous localization and mapping (SLAM) – so-called Air-SSLAM –, which exploits a stereo camera configuration. In contrast to monocular SLAM, scale definition and 3D information are issues that can be more easily dealt with in stereo cameras. Air-SSLAM starts from computing keypoints and the correspondent descriptors over the pair of images, using good features-to-track and rotated-binary robust independent elementary features, respectively. Then a map is created by matching each pair of right and left frames. The long-term map maintenance is continuously performed by analyzing the quality of each matching, as well as by inserting new keypoints into uncharted areas of the environment. Three main contributions can be highlighted in our method: (i) a novel method to match keypoints efficiently, (ii) three quality indicators with the aim of speeding up the mapping process, and (iii) map maintenance with uniform distribution performed by image zones. By using a drone equipped with a stereo camera, flying indoor, the translational average error with respect to a marked ground truth was computed, demonstrating promising results.

  Conference   CARMO, DIEDRE; ALVES, RAUL; OLIVEIRA, L. Face identification based on synergism of classifiers in rectified stereo images. In: Workshop of Undergraduate Works (SIBGRAPI'2017), Niteroi, 2017.

Abstract: This paper proposes a method to identify faces from a stereo camera. Our approach tries to avoid common problems that come with using only one camera that shall arise while detecting from a relatively unstable view in real world applications. The proposed approach exploits the use of a local binary pattern (LBP) to describe the faces in each image of the stereo camera, after detecting the face using the Viola-Jones’method. LBP histogram feeds then multilayer perceptron (MLP) and support vector machines (SVM) classifiers to identify the faces detected in each stereo image, considering a database of target faces. Computational cost problems due to the use of dual cameras are alleviated with the use of co-planar rectified images, achieved through calibration of the stereo camera. Performance is assessed using the well established Yale face dataset, and performance is assessed by using only one or both camera images.

2016

  Conference   CERQUEIRA, R.; TROCOLI, T.; NEVES, G.; OLIVEIRA, L.; JOYEUX, S.; ALBIEZ, J. Custom Shader and 3D Rendering for computationally efficient Sonar Simulation. In: XIX Conference on Graphics, Patterns and Images (SIBGRAPI), Sao Jose dos Campos, 2016. 4 p.

Abstract: This paper introduces a novel method for simulating underwater sonar sensors by vertex and fragment processing. The virtual scenario used is composed of the integration between the Gazebo simulator and the Robot Construction Kit (ROCK) framework. A 3-channel matrix with depth and intensity buffers and angular distortion values is extracted from OpenSceneGraph 3D scene frames by shader rendering, and subsequently fused and processed to generate the synthetic sonar data. To export and display simulation resources, this approach was written in C++ as ROCK packages. The method is evaluated on two use cases: the virtual acoustic images from a mechanical scanning sonar and forward-looking sonar simulations.

  Journal   FRANCO, A.; OLIVEIRA, L. Convolutional covariance features: Conception, integration and performance in person re-identification. In: Pattern Recognition, 2016.

Abstract: This paper introduces a novel type of features based on covariance descriptors – the convolutional covariance features (CCF). Differently from the traditional and handcrafted way to obtain covariance descriptors, CCF is computed from adaptive and trainable features, which come from a coarse-to-fine transfer learning (CFL) strategy. CFL provides a generic-to-specific knowledge and noise-invariant information for person re-identification. After training the deep features, convolutional and flat features are extracted from, respectively, intermediate and top layers of a hybrid deep network. Intermediate layer features are then wrapped in covariance matrices, composing the so-called CCF, which are integrated to the top layer features, called here flat features. Integration of CCF and flat features demonstrated to improve the proposed person re-identification in comparison with the use of the component features alone. Our person re-identification method achieved the best top 1 performance, when compared with other 18 state-of-the-art methods over VIPeR, i-LIDS, CUHK01 and CUHK03 data sets. The compared methods are based on deep learning, covariance descriptors, or handcrafted features and similarity functions.

  Conference   FRANCO, A.; OLIVEIRA, L. A coarse-to-fine deep learning for person re-identification. In: IEEE Winter Conference on Applications of Computer Vision (WACV), Lake Placid, New York, 2017.

Abstract: This paper proposes a novel deep learning architecture for person re-identification. The proposed network is based on a coarse-to-fine learning (CFL) approach, attempting to acquire a generic-to-specific knowledge throughout a transfer learning process. The core of the method relies on a hybrid network composed of a convolutional neural network and a deep belief network denoising autoenconder. This hybrid network is in charge of extracting features invariant to illumination varying, certain image deformations, horizontal mirroring and image blurring, and is embedded in the CFL architecture. The proposed network achieved the best results when compared with other state-of-the-arts methods on i-LIDS, CUHK01 and CUHK03 data sets, and also a competitive performance on VIPeR data set.

  Conference   TROCOLI, T.; OLIVEIRA, L. Using the scene to calibrate the camera. In: XIX Conference on Graphics, Patterns and Images (SIBGRAPI), Sao Jose dos Campos, 2016. 7 p.

Abstract: Surveillance cameras are used in public and private security systems. Typical systems may contain a large number of different cameras, which are installed in different locations. Manual calibration of each single camera in the network becomes an exhausting task. Although we can find methods that semiautomatically calibrate a static camera, to the best of our knowledge, there is not a fully automatic calibration procedure, so far. To fill this gap, we propose here a novel framework for completely auto-calibration of static surveillance cameras, based on information of the scene (environment and walkers). Characteristics of the method include robustness to walkers’ pose and to camera location (pitch, roll, yaw and height), and rapid camera parameter convergence. For a thorough evaluation of the proposed method, the walkers’ foot-head projection, the length of the lines projected on the ground plane and the walkers’ real heights were analyzed over public and private data sets, demonstrating the potential of the proposed method.

2015

  Conference   CARMO, D.; JOVITA, R.; FERRARI, R.; OLIVEIRA, L. A study on multi-view calibration methods for RGB-D cameras. In: Workshop of Undergraduate Works (SIBGRAPI'2015), Salvador, 2015. 6 p.

Abstract: RGB-D cameras became part of our daily life in applications such as human-computer interface and game interaction, just to cite a few. Because of their easy programming interface and response precision, such cameras have also been increasingly used to 3D reconstruction and movement analysis. In view of that, calibration of multiple cameras is an essential task. On that account, the goal of this paper is to present a preliminary study of methods which tackle the problem of multi-view geometry computation using RGB-D cameras. A brief overview of camera geometry is presented, some methods of calibration are discussed and one of them is evaluated in practice; finally, some important points are addressed about practical issues involving the problem.

  Conference   NOBRE, T.; OLIVEIRA, L. Finger phalanx detection and tracking by contour analysis on RGB-D images. In: Workshop of Works in Progress (SIBGRAPI'2015), Salvador, 2015. 4 p.

Abstract: In this paper we propose a method for identification of the finger phalanges based on the analysis of hand contour in RGB-D sensors. The proposed method is able to partially identify and track the kinematic structure of the fingers. The tracking was performed using the ORB algorithm to match points between a template with some hand images (in different poses) and the image captured. The principal component analysis was performed to compute the hand orientation relative to the image plane. The system will be used as a starting point for a full tracking of the fingers articulated movement.

  Conference   CANÁRIO, J. P.; OLIVEIRA, L. Recognition of Facial Expressions Based on Deep Conspicuous Net. In: Iberoamerican Congress on Pattern Recognition, Salvador, 2015. 8 p.

Abstract: Facial expression has an important role in human interaction and non-verbal communication. Hence more and more applications, which automatically detect facial expressions, start to be pervasive in various fields, such as education, entertainment, psychology, humancomputer interaction, behavior monitoring, just to cite a few. In this paper, we present a new approach for facial expression recognition using a so-called deep conspicuous neural network. The proposed method builds a conspicuousmap of region faces, training it via a deep network. Experimental results achieved an average accuracy of 90% over the extended Cohn-Kanade data set for seven basic expressions, demonstrating the best performance against four state-of-the-art methods.

  Conference   PAMPLONA SEGUNDO, M.; LEMES, P. R. Pore-based ridge reconstruction for fingerprint recognition. In: IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Boston, 2015. 6 p.

Abstract: The use of sweat pores in fingerprint recognition is becoming increasingly popular, mostly because of the wide availability of pores, which provides complementary information for matching distorted or incomplete images. In this work we present a fully automatic pore-based fingerprint recognition framework that combines both pores and ridges to measure the similarity of two images. To obtain the ridge structure, we propose a novel pore-based ridge reconstruction approach by considering a connect-the-dots strategy. To this end, Kruskal’s minimum spanning tree algorithm is employed to connect consecutive pores and form a graph representing the ridge skeleton. We evaluate our framework on the PolyU HRF database, and the obtained results are favorably compared to previous results in the literature.

  Conference   SANTOS, M.; OLIVEIRA, L. Context-supported Road Information for Background Modeling. In: XVIII Conference on Graphics, Patterns and Images (SIBGRAPI), Salvador, 2015. 8 p.

Abstract: Background subtraction methods commonly suffers from incompleteness and instability over many situations. If one treats fast updating when objects run fast, it is not reliable to modeling the background while objects stop in the scene, as well; it is easy to find examples where the contrary is also true. In this paper we propose a novel method–designated Context-supported ROad iNformation (CRON) for unsupervised back-ground modeling, which deals with stationary foreground objects, while presenting a fast background updating. Differently from general-purpose methods, our method was specially conceived for traffic analysis, being stable in several challenging circumstances in urban scenarios. To assess the performance of the method, a thorough analysis was accomplished, comparing the proposed method with many others, demonstrating promising results in our favor.

  Journal   VIEIRA, J. P.; CARMO, D.; JOVITA, Y.; OLIVEIRA, L. A proposal of a non-intrusive, global movement analysis of hemiparesis treatment. In: Journal of Communication and Information Systems (Online), v. 30, n. 1, 2015. 11 p.

Abstract: Hemiparesis is the most disabling condition after a stroke. Hemiparetic individuals suffer from a loss of muscle strength on one side of the body, resulting in a decreased capacity of performing movements. To assess the quality of Physiotherapy treatment, rating scales are commonly used but with the shortcoming of being subjective. With the aim of developing a system that objectively outcomes how a hemiparetic individual is responding to a Physiotherapy treatment, this paper proposes a method to analyze human functional movement by means of an apparatus comprised of multiple low-cost RGB-D cameras. After extrinsically calibrating the cameras, the setup system should be able to build a composite skeleton of the target patient, to globally analyze patient’s movement according to a reachable workspace and specific energy. These latter both a reproposed to be carried out by tracking the hand movements of the patient, and the movement volume produced. Here we present the concept of the proposed system, as well as, the idea of its parts.

Index Terms: Movement volume; Hemiparesis; RGB-D cameras; kinect; specific energy; reachable workspace.

2014

  Conference   LEMES, R. P.; PAMPLONA SEGUNDO, M.; BELLON, O. R. P.; SILVA, L. Dynamic Pore Filtering for Keypoint Detection Applied to Newborn Authentication. In: 22nd International Conference on Pattern Recognition (ICPR), Stockholm, 2014. 6 p.

Abstract: We present a novel method for newborn authentication that matches keypoints in different interdigital regions from palmprints or footprints. Then, the method hierarchically combines the scores for authentication. We also present a novel pore detector for keypoint extraction, named Dynamic Pore Filtering (DPF), that does not rely on expensive processing techniques and adapts itself to different sizes and shapes of pores. We evaluated our pore detector using four different datasets. The obtained results of the DPF when using newborn dermatoglyphic patterns (2400ppi) are comparable to the state-of-the-art results for adult fingerprint images with 1200ppi. For authentication, we used four datasets acquired by two different sensors, achieving true acceptance rates of 91.53% and 93.72% for palmprints and footprints, respectively, with a false acceptance rate of 0%. We also compared our results to our previous approach on newborn identification, and we considerably outperformed its results, increasing the true acceptance rate from 71% to 98%.

  Conference   VIEIRA, J. P.; CARMO, D.; FERREIRA, R.; MIRANDA, J. G.; OLIVEIRA, L. Analysis of Human Activity By Specific Energy of Movement Volume in Hemiparetic Individuals. In: XVII Conference on Graphics, Patterns and Images (SIBGRAPI), Workshop on Vision-based Human Activity Recognition, Rio de Janeiro, 2014. 7 p.

Abstract: Hemiparesis is the most disabling condition after a stroke. Hemiparetic individuals suffer from a loss of muscle strength on one side of the body, resulting in a decreased capacity of performing movements. To assess the quality of Physiotherapy treatment, rating scales are commonly used but with the drawback of being subjective.With the aim of developing a system that objectively outcomes how a hemiparetic individual is responding to a Physiotherapy treatment, this paper proposes a method to analyze human functional movement by means of an apparatus comprised of multiple low-cost RGB-D cameras. The idea is to first reconstruct the human body from multiple point of views, stitching them all, and, by isolating the movement of interest, track a movement volume and its specific energy in order to compare a “before” and “after” same activity. With that, we intend to avoid common problems referred to errors in the calculation of joints and angles. Here we present the concept of our system, as well as the idea of its parts.

  Journal   OLIVEIRA, L.; COSTA, V.; NEVES, G.; OLIVEIRA, T.; JORGE, E.; LIZARRAGA, M. A mobile, lightweight, poll-based food identification system. In: Pattern Recognition, v. 47, i. 5, p. 1941-1952, 2014.

Abstract: Even though there are many reasons that can lead to people being overweight, experts agree that ingesting more calories than needed is one of them. But besides the appearance issue ,being overweight is actually a medical concern because it can seriously affect a person’s health. Losing weight then becomes an important goal, and one way to achieve it, is to burn more calories than ingested. The present paper addresses the problem of food identification based on image recognition as a tool for dietary assessment. To the best of our knowledge, this is the first system totally embedded into a camera - equipped mobile device, capable of identifying and classifying meals – that is, pictures which have multiple types of food placed on a plate. Considering the variability of the environment conditions, which the camera will be in, the identification process must be robust. It must also be fast, sustaining very low wait-times for the user. In this sense, we propose a novel approach, which integrates segmentation and learning on a multi-ranking framework. The segmentation is based on a modified region-growing method which runs over multiple feature spaces. These multiple segments feed support vector machines, which rank the most probable segment corresponding to a type of food. Experimental results demonstrate the effectiveness of the proposed method on a cellphone.

2013

  Journal   GRIMALDO, J.; SCHNITMAN, L.; OLIVEIRA, L. Constraining image object search by multi-scale spectral residue analysis. In: Pattern Recognition Letters, v. 39, p. 31-18, 2013.

Abstract: Using an object detector over a whole image can require significant processing time. This is so since the majority of the images, in common scenarios, is composed of non-trivial amounts of background information, such as sky, ground and water. To alleviate this computational load, image search space reduction methods can make the detection procedure focus on more distinctive image regions. In this sense, we propose here the use of saliency information to organize regions based on their probability of containing objects. The proposed method was grounded on a multi-scale spectral residue (MSR) analysis for saliency detection. For better search space reduction, our method enables fine control of search scale, presents more robustness to variations on saliency intensity along an object length, and also a straightforward way to control the balance between search space reduction and false negatives, both being a consequence of region selection. MSR was capable of making object detection three to five times faster compared to the same detector without MSR. A thorough analysis was accomplished to demonstrate the effectiveness of the proposed method using a custom LabelMe dataset of person images, and also a Pascal VOC 2007 dataset, containing several distinct object classes.

  Conference   FRANCO, A.; LIMA, R.; OLIVEIRA, L. Person Classification in Images: An Unbiased Analysis from Multiple Poses and Situations. In: Simposio Brasileiro de Automacao Inteligente (SBAI), Fortaleza, 2013. 6 p.

Abstract: Person classification is one of the most important study topics in the field of image pattern recognition. Over the past decades, novel methods have been evolved, and object features and classifiers created. Applications such as person detection and tracking, in intelligent transportation systems or video surveillance, benefit from person classification for real-life applications. Nevertheless, for that systems to be employed there is a need of assessing their performance to assure that will be effective in practice. From plots of classification performance to real-life applications, there seems to be a gap not yet solved, since a near perfect performance curve is not a guarantee of a flawless detection system. In this paper, we present a thorough study toward comprehending why person classifiers are so perfect in plots but not yet completely successful in practice. For that, several features (histogram of oriented gradients (HOG), pyramid HOG, local binary pattern, local phase quantization and Haar-like), two of the most applied classifiers (support vector machine and adaptive boosting) are analyzed over the 2012 person classification Pascal VOC dataset with 27647 cropped images, grouped into 8 person poses and situations. By relying on receiver operating characteristic and precision-recall tools, it was observed that person classification, in several poses and situations, demonstrated to have two different dominant performances, or even different variances among those two performance tools. One main conclusion drawn from the present study was that there is an inherent biased analysis, while assessing a novel proposed method performance. Important guesses are given in the direction of explaining why most of classification performance analyses is somewhat biased.

  Conference   DUARTE, C.; SOUZA, T.; ALVES, R.; SHWARTZ, W. R.; OLIVEIRA, L. Re-identifying People based on Indexing Structure and Manifold Appearance Modeling. In: XVI Conference on Graphics, Patterns and Images (SIBGRAPI), Arequipa, 2013. 8 p.

Abstract: The role of person re-identification has increased in the recent years due to the large camera networks employed in surveillance systems. The goal in this case is to identify individuals that have been previously identified in a different camera. Even though several approaches have been proposed, there are still challenges to be addressed, such as illumination changes, pose variation, low acquisition quality, appearance modeling and the management of the large number of subjects being monitored by the surveillance system. The present work tackles the last problem by developing an indexing structure based on inverted lists and a predominance filter descriptor with the aim of ranking candidates with more probability of being the target search person. With this initial ranking, a more strong classification is done by means of a mean Riemann covariance method, which is based on a appearance strategy. Experimental results show that the proposed indexing structure returns an accurate shortlist containing the most likely candidates, and that manifold appearance model is able to set the correct candidate among the initial ranks in the identification process. The proposed method is comparable to other state-of-the-art approaches.

  Conference   SANTOS, M.; LINDER, M.; SCHNITMAN, L.; NUNES, U.; OLIVEIRA, L. Learning to segment roads for traffic analysis in urban images. In: IEEE Intelligent Vehicles Symposium, Gold Coast City, 2013. 6 p.

Abstract: Road segmentation plays an important role in many computer vision applications, either for in-vehicle perception or traffic surveillance. In camera-equipped vehicles, road detection methods are being developed for advanced driver assistance, lane departure, and aerial incident detection, just to cite a few. In traffic surveillance, segmenting road information brings special benefits: to automatically wrap regions of traffic analysis (consequently, speeding up flow analysis in videos), to help with the detection of driving violations (to improve contextual information in videos of traffic), and so forth. Methods and techniques can be used interchangeably for both types of application. Particularly, we are interested in segmenting road regions from the remaining of an image, aiming to support traffic flow analysis tasks. In our proposed method, road segmentation relies on a superpixel detection based on a novel edge density estimation method; in each superpixel, priors are extracted from features of gray-amount, texture homogeneity, traffic motion and horizon line. A feature vector with all those priors feeds a support vector machine classifier, which ultimately takes the superpixel-wise decision of being a road or not. A dataset of challenging scenes was gathered from traffic video surveillance cameras, in our city, to demonstrate the effectiveness of the method.

  Conference   OLIVEIRA, L.; NUNES, U. Pedestrian detection based on LIDAR-driven sliding window and relational parts-based detection. In: IEEE Intelligent Vehicles Symposium, Gold Coast City, 2013. 6 p.

Abstract: The most standard image object detectors are usually comprised of one or multiple feature extractors or classifiers within a sliding window framework. Nevertheless, this type of approach has demonstrated a very limited performance under datasets of cluttered scenes and real life situations. To tackle these issues, LIDAR space is exploited here in order to detect 2D objects in 3D space, avoiding all the inherent problems of regular sliding window techniques. Additionally, we propose a relational parts-based pedestrian detection in a probabilistic non-iid framework.With the proposed framework, we have achieved state-of-the-art performance in a pedestrian dataset gathered in a challenging urban scenario. The proposed system demonstrated superior performance in comparison with pure sliding-window-based image detectors.

  Conference   ANDREWS, S.; OLIVEIRA, L. SCHNITMAN, L.; SOUZA, F. (Best Paper) Highway Traffic Congestion Classification Using Holistic Properties. In: 15th International Conference on Signal Processing (ICSP), Pattern Recognition and Applications, Amsterdam, 2013. 8 p.

Abstract: This work proposes a holistic method for highway traffic video classification based on vehicle crowd properties. The method classifies the traffic congestion into three classes: light, medium and heavy. This is done by usage of average crowd density and crowd speed. Firstly, the crowd density is estimated by background subtraction and the crowd speed is performed by pyramidal Kanade-Lucas-Tomasi (KLT) tracker algorithm. The features classification with neural networks show 94.50% of accuracy on experimental results from 254 highway traffic videos of UCSD data set.

2012

  Conference   SOUZA, T.; SCHNITMAN, L.; OLIVEIRA, L. Eigen analysis and gray alignment for shadow detection applied to urban scene image. In: IEEE International Conference on Intelligent Robots and Systems (IROS) Workshop on Planning, Perception and Navigation for Intelligent Vehicles, Vilamoura, 2012.

Abstract: Urban scene analysis is very useful for many intelligent transportation systems (ITS), such as advanced driver assistance, lane departure control and traffic flow analysis. All these systems are prone to any kind of noise, which ultimately harms system performance. Considering shadow as a noise problem, this may represent a critical line between the success or fail of an ITS framework. Therefore, shadow detection usually provides benefits for further stages of machine vision applications on ITS, although its practical use usually depends on the computational load of the detection system. To cope with those issues, a novel shadow detection method, applied to urban scenes, is proposed in this paper. This method is based on a measure of the energy defined by the summation of the eigenvalues of image patches. The final decision of an image region to contain a shadow is made according to a new metric for unsupervised classification called here as gray alignment. The characteristics of the proposed method include no supervision, very low computational cost and mathematical background unification, which turns the method very effective. Our proposed approach was evaluated on two public datasets.

  Conference   SILVA, G.; SCHNITMAN, L.; OLIVEIRA, L. Multi-Scale Spectral Residual Analysis to Speed up Image Object Detection. In: XV Conference on Graphics, Patterns and Images (SIBGRAPI), Ouro Preto, 2012. 8 p.

Abstract: Accuracy in image object detection has been usually achieved at the expense of much computational load. Therefore a trade-off between detection performance and fast execution commonly represents the ultimate goal of an object detector in real life applications. In this present work, we propose a novel method toward that goal. The proposed method was grounded on a multi-scale spectral residual (MSR) analysis for saliency detection. Compared to a regular sliding window search over the images, in our experiments, MSR was able to reduce by 75% (in average) the number of windows to be evaluated by an object detector. The proposed method was thoroughly evaluated over a subset of LabelMe dataset (person images), improving detection performance in most cases.

  Conference   SILVA, C.; SCHNITMAN, L.; OLIVEIRA, L. Detecção de Landmarks em Imagens Faciais Baseada em Informações Locais. In: XIX Congresso Brasileiro de Automática (CBA), Campina Grande, 2012.

Abstract: This paper proposes a method for the detection of 19 facial points of interest (landmarks). Most methods available in the art for detecting facial points fall into two main categories: global and local. Global methods are usually able to detect various landmarks simultaneously with robustness while local landmarks can often detect more quickly. The method presented is based on local information and is composed of several stages of processing to the detection of landmarks that describe eyes, eyebrows and mouth. The experimental results demonstrate that the proposed method compared to the results showed technical ASM compatible.