Scalable and configurable echosounder data workflows

Abstract

Acoustic fisheries surveys and ocean observing systems collect terabytes of echosounder data that require custom processing pipelines to obtain biological estimates of target species, which often can be hard to reuse or adapt. There is a rising need to scale computations on local and cloud computing clusters. However, this requires an elaborate configuration of computing infrastructure and distributed computing libraries, and the ability to monitor progress and performance. In this talk, we describe how we address some of these challenges by developing a framework that allows researchers to execute complex echosounder data processing procedures on both local and cloud platforms by editing text-based configuration “recipe” templates. We create a user-friendly Python package Echodataflow that leverages Prefect, a modern workflow orchestration framework, to run large data pipelines (reading raw files, computing volume backscatter, performing frequency differencing, etc.) with only a few lines of code. We will demonstrate how we used Echodataflow to process ship data from the U.S.-Canada Pacific Hake Acoustic Trawl Survey and discuss other use cases. We believe that this approach will increase the reproducibility and transparency of fisheries acoustics data pipelines and allow the community to learn from each other’s work.

Date
Apr 11, 2024 9:00 AM
Location
Brest, France
Valentina Staneva
Valentina Staneva
Senior Data Scientist
Soham Kishor Butala
Soham Kishor Butala
Research Software Engineer Intern

Interested in disrupting the norms and reshaping the digital landscape!

Wu-Jung Lee
Wu-Jung Lee
Senior Oceanographer
Don Setiawan
Don Setiawan
Research Software Engineer

I am a Research Software Engineer at the University of Washington with a strong focus in designing, developing, and maintaining scientific data analysis systems. I am a contributor to various open source software. I learning new technologies and apply them in my work.

Related