2011
The coverage and volume of geo-referenced datasets are extensive and incessantly growing. The systematic capture of geo-referenced information generates large volumes of spatio-temporal data to be analyzed. Clustering and visualization play a key role in the exploratory data analysis and the extraction of knowledge embedded in these data. However, new challenges in visualization and clustering are posed when dealing with the special characteristics of this data. For instance, its complex structures, large quantity of samples, variables involved in a temporal context, high dimensionality and large variability in cluster shapes. The central aim of my thesis is to propose new algorithms and methodologies for clustering and visualization, in order to assist the knowledge extraction from spatiotemporal geo-referenced data, thus improving making decision processes. I present two original algorithms, one for clustering: the Fuzzy Growing Hierarchical Self-Organizing Networks (FGHSON), and the second for exploratory visual data analysis: the Tree-structured Self-organizing Maps Component Planes. In addition, I present methodologies that combined with FGHSON and the Tree-structured SOM Component Planes allow the integration of space and time seamlessly and simultaneously in order to extract knowledge embedded in a temporal context. The originality of the FGHSON lies in its capability to reflect the underlying structure of a dataset in a hierarchical fuzzy way. A hierarchical fuzzy representation of clusters is crucial when data include complex structures with large variability of cluster shapes, variances, densities and number of clusters. The most important characteristics of the FGHSON include: (1) It does not require an a-priori setup of the number of clusters. (2) The algorithm executes several self-organizing processes in parallel. Hence, when dealing with large datasets the processes can be distributed reducing the computational cost. (3) Only three parameters are necessary to set up the algorithm. In the case of the Tree-structured SOM Component Planes, the novelty of this algorithm lies in its ability to create a structure that allows the visual exploratory data analysis of large high-dimensional datasets. This algorithm creates a hierarchical structure of Self-Organizing Map Component Planes, arranging similar variables' projections in the same branches of the tree. Hence, similarities on variables' behavior can be easily detected (e.g. local correlations, maximal and minimal values and outliers). Both FGHSON and the Tree-structured SOM Component Planes were applied in several agroecological problems proving to be very efficient in the exploratory analysis and clustering of spatio-temporal datasets. In this thesis I also tested three soft competitive learning algorithms. Two of them well-known non supervised soft competitive algorithms, namely the Self-Organizing Maps (SOMs) and the Growing Hierarchical Self-Organizing Maps (GHSOMs); and the third was our original contribution, the FGHSON. Although the algorithms presented here have been used in several areas, to my knowledge there is not any work applying and comparing the performance of those techniques when dealing with spatiotemporal geospatial data, as it is presented in this thesis. I propose original methodologies to explore spatio-temporal geo-referenced datasets through time. Our approach uses time windows to capture temporal similarities and variations by using the FGHSON clustering algorithm. The developed methodologies are used in two case studies. In the first, the objective was to find similar agroecozones through time and in the second one it was to find similar environmental patterns shifted in time. Several results presented in this thesis have led to new contributions to agroecological knowledge, for instance, in sugar cane, and blackberry production. Finally, in the framework of this thesis we developed several software tools: (1) a Matlab toolbox that implements the FGHSON algorithm, and (2) a program called BIS (Bio-inspired Identification of Similar agroecozones) an interactive graphical user interface tool which integrates the FGHSON algorithm with Google Earth in order to show zones with similar agroecological characteristics.