Ask the Expert

Hannah Linder
Senior Manager, Data and Analysis

How do pipelines feed data into Global Fishing Watch’s map and how has the latest release standardized the process for future pipeline updates?

Our map is based on satellite data that contains key information on vessel identities and positions. A data pipeline takes this raw data and uses a series of automated processes to turn it into a format that can be used for visualization, analysis and reporting.

For example, one of our data pipelines uses information from vessels’ automatic identification system (AIS). This initial data is simply global position messages and separate identity messages of vessels pinged daily. The data often has “noise” – errors or inaccuracies – which means that some position messages and identity messages are incorrect, missing information or invalid. Therefore, in order for it to be used in our platform, the data has to be processed through our pipeline, where it is cleaned, organized and aggregated into distinct vessel tracks with merged vessel identity information. Then we can apply additional information to the data. This can include the estimation of different vessel activities, such as fishing, or specific indicators that are shared through APIs and shown on our map

A graphic illustrates the flow of data (left to right) as colorful data points from satellites through a process that cleans and organizes the data into usable streams that flow into the Global Fishing Watch map, which is shown on a computer monitor.
Our AIS data pipeline ingests raw data, which includes global position messages and separate identity messages of vessels, and processes it. First the data is cleaned – errors and inaccuracies are removed – and then it is organized and aggregated into streams for use in our map and through our APIs. Copyright: © 2024 Global Fishing Watch

The Global Fishing Watch AIS pipeline has always been regularly updated and reviewed by our team. However, to improve accuracy and usability, we recently updated the pipeline through changes to the technical infrastructure, organization and underlying data. This marks the culmination of our first ever end-to-end pipeline review – a two-year long journey that included improving elements of processing and implementing quality assurance monitoring and evaluation throughout the entire pipeline.

These improvements include:

Modifying our processing methods and the architecture of the pipeline to ensure future stability and flexibility. As a result we are much faster at resolving errors or inconsistencies in the pipeline. For example, it used to take nearly two months to rerun the entire pipeline – now it takes a few days.

Updating our boundary sources for the first time. We have updated over half of our anchorage labels after an internal review and connected boundary information to our AIS position information with a finer resolution to be more accurate. We also updated to the latest version of the shapefiles for regional boundaries, which now include identifiable metadata for regional fisheries management organization (RFMO), marine protected area (MPA) and exclusive economic zone (EEZ) boundaries.

Stabilizing our dataset naming, lineage and organization to provide more transparency internally and for users of our datasets.

Identifying and resolving an unforeseen error in our estimated vessel classes. This resulted in an additional classification of 263,627 vessels, of which we classified 29,619 as fishing.

Changes to a pipeline can have meaningful impacts on the data products that are used and applied by our users to solve real-world problems. This makes it essential to sustain and improve our data health through regular pipeline releases. Global Fishing Watch has also expanded significantly from a small start-up to a much larger organization and, with this growth, we now have the capacity to develop more consistent data governance structures, improve data communication and provide continual and thorough data quality assurance.

We now have the foundation to quickly resolve any errors that can occur from the daily processing of an ever growing dataset and improve upon our data products. From this release forward, we will update our data versions approximately once per year. We will also continue to improve upon our communication to increase our data accessibility and transparency for all users as we strive to always ensure data quality, reliability and innovations.

Scroll to Top