Ecoinformatics Conference Service, International Conference on Ecological Informatics 6

A High Performance Computational Environment for Hosting openModeller Framework

Jeferson Martin Araujo, Luis Carlos Trevelin, Pedro Correa, Antonio Saraiva, Liria Sato

Last modified: 2008-09-13

Abstract


Balancing social and economic development with environmental conservation is a major challenge. There is a strong demand for software applications to determine the fundamental ecological niche of organisms. Such tools can help us to understand the occurrence and distribution of biological species, such as invasive or endangered species. The openModeller framework was developed to provide a software environment for conducting such analyses.

openModeller (oM) is a generic and open source static spatial distribution modeling framework. Models are generated by an algorithm that receives as input a set of occurrence points (latitude/longitude) and a set of environmental layer files. One of several algorithms can be used to produce a model and generate a probability surface. The algorithms evaluate the environmental conditions at known occurrence sites and then compute the preferred ecological niche. Ecological niche models play an import role in species distribution prediction as they provide ways to study biodiversity distribution, past and present, to understand its causes, and to propose scenarios and strategies for sustainable use and for preservation initiatives.

However, to produce the model and generate the projected probability surface of species occurrences, the system uses computational resources in an intensive way. The modeling process is complex and demands a lot of processing and time. For optimizing the performance of oM a careful study of individual classes and the typical execution flow chart, defined with the help of end users, was conducted.

In this study, the openModeller libraries were divided into several components. These components were then each deployed and configured in an inter-process communication mechanism. Following this component distribution and configuration, the code was instrumented using Aspect Oriented Programming techniques. This provided a detailed breakdown of the length of time that each method in each component ran for and thus, determined which part of a submitted job takes the most time and consumed most computational resources, like memory and processor. In this way, AOP and asynchronous message processing were useful in identifying the components that can be processed in alternative parallel (like the Parallel GARP development initiative) and distributed ways.

These results were also used to propose a performance analysis model. The aim of this performance analysis model was to aid the development of an infrastructure that was able to deliver optimal performance, maximum system availability, interoperability and that minimizes financial costs. Through this infrastructure and architecture was possible to set up a highly scalable and available system in agreement with biological researches and scientists necessities and in order to make the system accessible to a larger range of users. Also, the new openModeller system implemented offers a lot of others benefits, like remote and distributed processing , ability to include new algorithms and functionalities quickly, improved processing speed to run more accurate simulations, etc. Before the high performance environment it would have taken several days to run the modeling algorithms and to build the projection using the research desktop to process the jobs.