Datasets and Code: Vessel Identity

Transparency plays a central role in promoting the sustainability of marine resources and advancing ocean governance. Information that allows us to better understand vessel identity is a fundamental part of transparency. But such information is often fragmented, incomplete or inconsistent across multiple sources, and sometimes it is not publicly available at all.

At Global Fishing Watch, we combine over 30 public vessel registries worldwide and pair this information with the predictions of a machine learning model. The result is a comprehensive database of vessel identities for over 400,000 ships that broadcast their locations each year via automatic identification system (AIS). The database is made up of vessel characteristics— type and size—authorizations, and other information about the individuals who own or operate them. While the majority of these vessels are non-fishing vessels, tens of thousands are industrial fishing vessels. Vessel identity data can provide fisheries authorities, researchers and policy-makers with vital information that can help build a more complete picture of fishing.

Vessel classes

Understanding the different types of vessels that operate on the ocean is a key step towards successful monitoring of human activity—cargo ships have different social and environmental implications than fishing vessels and the impacts of different types of fishing vessels, like trawlers and drifting longlines, can differ as well.To date, we classify vessels into 40 different categories, of which 16 are types of fishing vessel.

Our vessel classes use a nested hierarchy that reflects increasing levels of confidence. For many fishing vessels, information from registries and our machine learning models allow us to assign a specific geartype, such as drifting longline or trawler. For many geartypes, including fixed gear and seiners, we first attempt to assign vessels to more specific categories before defaulting to broader classes. For other fishing vessels, incomplete, contradictory, or low confidence information prevents us from assigning a specific vessel class, and instead we label the vessel class as simply “fishing”.

Some vessel types or fishing gears are interpreted differently across countries and regions. Our vessel classification does not represent all possible types of vessels around the world, but we are working to better align our vessel types, fishing gears and their nomenclature with those adopted by the Food and Agriculture Organization of the United Nations.

How it works

Compiling vessel registries

We collect information related to vessel identity from over 30 registries, either available in the public domain or obtained from authorities and researchers. The list includes public vessel registries from regional fisheries management organizations as well as country-level vessel registries. We also use lists of vessels that have been provided by other organizations or manually reviewed by Global Fishing Watch and our partners. The sources are regularly extracted to obtain updated information.

The collected registry records of vessels then undergo a process that matches them to identity records in AIS messages. The matching process is conducted by determining how similar a set of identity fields from two sources are to one another, including ship name, international radio call sign, International Maritime Organization number, flag State and Maritime Mobile Service Identifier (MMSI)—the identifier in AIS. To be paired as a match, multiple identity fields should generally match records from both sources without major conflicts of information in other fields. Once matched to AIS, these registry records are aggregated to produce synthesized information about vessels. We refer to these vessels that are listed on vessel registries as “known” fishing vessels.

Vessel characterization

We use our vessel registry database to identify vessels in the AIS data that have a known vessel class. We then train a convolutional neural network—a cutting edge form of machine learning model—to identify other vessels in the AIS data that behave in a similar fashion.

This vessel characterization model assigns every active MMSI in the AIS data to one of the 40 vessel classes distinguished by Global Fishing Watch, inferring the length, tonnage and engine power for each vessel. As a result, we are able to infer the vessel class and dimensions of tens of thousands of vessels in the AIS data for which we have no other information. We refer to vessels classified as fishing vessels by this method as “inferred” fishing vessels.

Self-reported fishing vessels and gear

When classifying vessels, we also consider the content of their AIS messages. The shiptype field is a two-digit number corresponding to the vessel’s activity. The full list of these possible activities is listed on Marine Traffic. About 70,000vessels per year report that they are fishing and we refer to these vessels as “self-reported” fishing vessels. This information is broadly accurate, but because AIS devices require a manual input of information, there is potential for human error, and in some cases the shiptype is entered incorrectly. The margin of error also includes self-reported fishing vessels that are not actually fishing vessels, as well as fishing vessels that do not report as such.

While AIS devices are meant to correspond to vessels, many fishing boats attach AIS devices to fishing gear. We identify thousands of MMSI in the AIS data that likely represent gear and not vessels by examining the shipname field of an MMSI’s AIS messages, which often contain values like “NET MARK”, “BUOY”, and “FISHING NET”. We refer to these MMSI as “likely_gear.”

Using the best available information

After integrating all sources of information (known, inferred, self-reported, likely_gear), we attempt to select the best values for every MMSI in the AIS data. These “GFW” values are then assigned to that MMSI in all of our online and downloadable data products. This process is easy when vessel classes are consistent across sources. However, different sources can provide conflicting information. In these situations, we assign the MMSI to the broadest vessel class that allows for agreement. For example, an MMSI inferred as a tuna purse seiner but registered simply as a purse seiner would be assigned a GFW class of purse seiner. If the inferred and known information for an MMSI disagree on whether a vessel is a fishing vessel at all, that MMSI is not assigned a vessel class.

Limitations and caveats

Vessel registry information is not globally representative

The quantity and quality of our collected registry data vary by flag State, thereby introducing some uneven degrees of information about vessel identity. For instance, we have less complete information about Asian fleets as a smaller number of flag States make their vessel registry publicly available compared to European nations. This lack of information, however, is complemented by the other types of data like self-reported AIS data and our vessel characterization model. We continue to find more public sources of information as an increasing number of initiatives help make more vessel identity data publicly available.

Lack of smaller vessels

Our vessel identity data tend to include larger vessels, in particular vessels greater than 24 meters long, since larger vessels are more likely registered to regional or global public vessel registries. Apart from a few national level registries that include small vessels—those less than 15 meters—most registries do not provide sufficient identity data for smaller vessels. Additionally, these small vessels are unlikely to use AIS due to lack of regulation, therefore, we are unable to ascertain AIS-based identity information for them. We work with our partners to bring greater transparency to smaller vessels by encouraging broader use of AIS worldwide, assisting nations in opening up their vessel monitoring system data, and supporting initiatives that help small-scale fishery vessels become trackable.

Defining a vessel is challenging

A vessel in our data is generally defined by an MMSI number assigned to a vessel identity. While a vast majority of vessels, in particular fishing vessels, use their unique MMSIs for the entire duration of our data (2012 to the present), vessel identities—including MMSI numbers—associated with some vessel hulls change over time when a vessel changes its name or flag State, or it comes under new operation or ownership. Such changes can be tracked through IMO numbers, which are permanent unique identifiers that follow a vessel from construction to scrapping. For other vessels, however, we are currently unable to provide a complete track of identity changes due to lack of available information. We work toward identifying links between vessel identities associated with the same vessel hulls to share such information in the future.

Lastly, a fraction of vessels engage in behaviors that make their AIS data unreliable, such as by simultaneously broadcasting the same MMSI as another vessel, commonly referred to as “identity spoofing”. Our map and downloadable datasets omit these vessels in order to remove misleading information, as we continue to pursue ways to identify and correct the dataset for these behaviors.

Access our data

Data on vessel identity is available through several Global Fishing Watch tools:

Global Fishing Watch Map: Explore vessel identity information, including flag State, gear type, and vessel characteristics such as length, tonnage, and engine power.
APIs and packages: Access vessel identity data programmatically through the Global Fishing Watch Vessels API, as well as the gfwr package for R users and the Global Fishing Watch Python package. This is the same data that is available in the Map.
Data Download Portal: Download a complete static dataset and supporting documentation, including vessel identity attributes. This data differs from the data available in the Map and APIs.

Each platform offers access to different components of the vessel identity dataset. To determine which tool best fits your needs, refer to our data availability guide.

What we do

Science

Highlights

Tools

Data

Help

Who we are

Leadership

Team

Reporting