Implementing an Automated Processing System for Low-Frequency Streaming Data Using an Eclectic Approach
John H. Porter
Last modified: 2008-08-21
The path streaming data follows from the sensor to a dataset or graph on the World-Wide Web has many steps including ingestion, quality assurance, archival storage, and generation of products for display and download. The software available for accomplishing these steps are widely varied, each with its own strengths and weaknesses. However, no piece of software is best at everything (although many have overlapping capabilities). For this reason, the Virginia Coast Reserve Long-Term Ecological Research Project has developed fully-automated systems for processing low-frequency (> 0.10 hours per measurement) data that build on the strengths of an eclectic mix of software products and computer systems. This poster will provide an overview of a system used to collect and process data from a small (10 node) network of water level recorders located on a Virginia barrier island. Serial and Internet Protocol wireless networks are used to harvest hourly data from Campbell Scientific data loggers, using proprietary Loggernet software that runs on a PC at the Anheuser-Busch Coastal Research Center. Every few hours, Windows scheduler is used to run a batch file that copies the downloaded files to a network-accessible directory on a Unix computer at the University of Virginia. There the Statistical Analysis System (SAS) is used to integrate new data with existing data, including elimination of duplicates, data format conversions (e.g., dates and times) into standard forms, flagging of out-of-range values, and production of an integrated dataset for download by users. That integrated dataset is also used as input to "R" programs on a Linux-based web server to produce a variety of graphical and textual statistical summaries that are automatically posted on the WWW. The advantages of these types of systems are that they require relatively simple programming, each software product is doing what it does best with no need for esoteric programs; that they can incorporate a variety of computers and operating systems, taking full advantage of what is available; and finally, that they can operate unattended for months at a time, reliably providing data to users with minimal operator intervention.