Generic

read_bufr(path, reader='generic', columns=[], filters={}, required_columns=True)

Extract the specified columns from BUFR as a pandas.DataFrame using a hierarchical collector.

Parameters:
  • path (str, bytes, os.PathLike or a message list object) – path to the BUFR file or a message list object

  • columns (str, sequence[str]) – a list of BUFR keys and computed keys to extract from each BUFR message/subset. Please note that computed keys do not preserve their position in columns but are placed to the end of the resulting DataFrame.

  • filters (dict) – defines the conditions when to extract the specified columns. The individual conditions are combined together with the logical AND operator to form the filter. See Filters for details.

  • required_columns (bool, iterable[str]) –

    the list of ecCodes BUFR keys that are required to be present in the BUFR message/subset. It has a twofold meaning:

    • if any of the keys in required_columns is missing in the message/subset the whole message/subset is skipped

    • if all the keys in required_columns are present, the message/subset is processed even if some key from columns are missing (supposing the filter conditions are met)

    Bool values are interpreted as follows:

    • True means all the keys in columns are required. It means that if any of the keys in columns missing in the message/subset the whole message/subset is skipped.

    • False means no columns are required

Return type:

pandas.DataFrame

How the generic reader works

The generic reader reader interprets each BUFR message/subset as a hierarchical structure (see Hierarchical structure for details). During data extraction pdbufr traverses this hierarchy and when all the columns are collected and the all the filters match a new record is added to the output. With this several records can be extracted from the same message/subset.

Example

The input is one of the tests data files with classic radiosonde observations, where each message contains a single location (“latitude”, “longitude”) with several pressure levels of temperature, dewpoint etc. The message hierarchy is shown in the following snapshot:

../../_images/temp_structure.png

To extract the temperature profile for the first two stations we can use this code:

df = pdbufr.read_bufr(
    "tests/sample_data/temp.bufr",
    columns=("latitude", "longitude", "pressure", "airTemperature"),
    filters={"count": [1, 2]},
)

which results in the following DataFrame:

    latitude  longitude  pressure  airTemperature
0      58.47     -78.08  100300.0           258.3
1      58.47     -78.08  100000.0           259.7
2      58.47     -78.08   99800.0           261.1
...
46     53.75     -73.67   25000.0           221.1
47     53.75     -73.67   23200.0           223.1
48     53.75     -73.67   20500.0           221.5

[48 rows x 4 columns]

Examples