Datasets and Code: Anchorages

Datasets and Code: Anchorages 2020-03-20T17:40:08-04:00

Large, ocean-going vessels routinely carry a device that transmits position and identity information in a near continuous stream, called the Automatic Identification System, or AIS. This system was originally designed as a collision avoidance system, with vessels sharing information about their speed, course, and position with their neighbors so as to avoid collision. In recent years, these same transmissions can be detected by receivers in low-orbit satellites and by terrestrial installations, allowing us to monitor vessel movements. Global Fishing Watch and its research partners are using these data to provide insights into the movements of individual fishing vessels and it has allowed us to understand patterns of fishing around the globe. These data can also show us locations where vessels congregate, and thus identify the locations of anchorages and ports.

Using AIS vessel positions since 2012, we developed and update an anchorages/ports database based on identifying locations where vessels congregate. The actual logic works like this:

  1. We apply a grid to the surface of the globe. Without special care, such a grid would have cells at the poles that encompass different areas than cells at the equator. However, using a type of grid made up of what are called s2 cells, we can produce a gridded overlay in which all grid cells have the roughly the same area. (For more details on the s2 concept, see links at the bottom of the page). The area of each s2 cell is specified by a level, from 0 (grid cells that are 9220km on a side) to 30 (grid cells 1cm on a side). We use a s2 level of 14, resulting in grid cells roughly 0.5km on a side. Each s2 cell in the grid has a unique identifier (s2id) which corresponds to the spatial location of that cell.                         s2cell
  2. Across this grid, we identify where individual vessels (specifically, individual MMSI) remain stationary (defined as when a vessel moves less than 0.5km over at least 12 hr). If, within an s2 cell, at least 20 unique MMSI remained stationary at some point since 2012, we identify this cell as an anchorage point. We then assign the location (lat/lon) of the anchorage as the mean location of all the stationary periods within that cell. Note that this means an anchorage location is not necessarily in the center of the s2 cell.
  3. As there is one anchorage point per s2 cell, each anchorage point is identified uniquely by its s2id, along with its position (latitude, longitude) in decimal degrees.
  4. The anchorages data set continues to be extended by incorporating user contributed anchorages, as well as regional or country-specific anchorages databases (such as one provided by the Indonesian Ministry of Marine Affairs and Fisheries). All contributed anchorages and their locations(lat/lon) take precedence over AIS derived locations within a given s2 cell.
  5. In some cases, when many anchorages are adjacent, such as in large ports, it is useful to group anchorages together. We implement a simple grouping scheme by combining anchorage points located within 4 kilometers of one another into anchorage groups. The method and code for generating these groupings using BigQuery and Python is described here.

Anchorage Naming

The raw anchorage data is useful, but we have also sought to name each anchorage point (s2id) by referencing publicly available datasets and provisionally applying names to each anchorage. Often, a single port is made up of a number of different anchorages. We assigned names to anchorages, grouping anchorages into ports using a multistep process and several primary data sources:

  1. World Port Index. Current data on Github.
  2. Geonames 1000 database. Current data on Github.
  3. Top destination as reported in the AIS messages of stationary vessels that defined the anchorage.
  4. User contributed names and regional port databases (such as the one from the Indonesian Ministry of Marine Affairs and Fisheries).

To name each anchorage (s2id) we used the following process:

  1. First, we apply any names from the manually reviewed/corrected and user-contributed anchorage names (the current list is available on GitHub here)
  2. For any unnamed anchorages, we identify those anchorage points that are within 4 km of a World Port Index (WPI) port (using Haversine distance), and assign the unnamed anchorage point the WPI port name.
  3. Next, if an anchorage is provided by a curated regional list and corresponds to an anchorage in our database (occurs within the same s2 cell), we assign the curated anchorage name to the anchorage in our database.
  4. For the remaining unnamed anchorages, we identify those that are within 4 km of a geonames 1000city from the geonames database, and assign the anchorage point the geoname 1000 city name.
  5. For those anchorage points that remained unnamed, we assign the top AIS destination name.
  6. The same anchorage groups as described for the unnamed anchorages have been included.

The complete named anchorages dataset is available via our data download portal.

Details regarding s2 quad-tree hierarchies

https://docs.google.com/presentation/d/1Hl4KapfAENAOf4gv-pSngKwvS_jwNVHRPZTTDzXXn6Q/view#slide=id.i28

http://blog.christianperone.com/2015/08/googles-s2-geometry-on-the-sphere-cells-and-hilbert-curve/

http://schd.ws/hosted_files/user2017/32/talk.html#(4)