Scalable and configurable echosounder data workflows

Name: Scalable and configurable echosounder data workflows
Start: 2024-04-11T09:00:00-05:00
Location: Brest, France

Valentina Staneva, Soham Kishor Butala, Wu-Jung Lee, Don Setiawan

Abstract

Acoustic fisheries surveys and ocean observing systems collect terabytes of echosounder data that require custom processing pipelines to obtain biological estimates of target species, which often can be hard to reuse or adapt. There is a rising need to scale computations on local and cloud computing clusters. However, this requires an elaborate configuration of computing infrastructure and distributed computing libraries, and the ability to monitor progress and performance. In this talk, we describe how we address some of these challenges by developing a framework that allows researchers to execute complex echosounder data processing procedures on both local and cloud platforms by editing text-based configuration “recipe” templates. We create a user-friendly Python package Echodataflow that leverages Prefect, a modern workflow orchestration framework, to run large data pipelines (reading raw files, computing volume backscatter, performing frequency differencing, etc.) with only a few lines of code. We will demonstrate how we used Echodataflow to process ship data from the U.S.-Canada Pacific Hake Acoustic Trawl Survey and discuss other use cases. We believe that this approach will increase the reproducibility and transparency of fisheries acoustics data pipelines and allow the community to learn from each other’s work.

Date

Apr 11, 2024 9:00 AM

Event

WGFAST 2024 Meeting

Location

Brest, France

open-source pipeline fisheries acoustics community engagement

Scalable and configurable echosounder data workflows

Abstract

Valentina Staneva

Senior Data Scientist

Soham Kishor Butala

Software Data Operations Engineer at MAQ Software

Wu-Jung Lee

Principal Oceanographer

Don Setiawan

Research Software Engineer at UW eScience Institute

Related