Flat

read_bufr(path, reader='flat', columns=[], filters={}, required_columns=True)

Extract data from BUFR as a pandas.DataFrame assuming a flat BUFR structure.

Parameters:

path (str, bytes, os.PathLike or a message list object) – path to the BUFR file or a message list object
columns (str, sequence[str]) –
specify the BUFR keys to extract. The following values are supported:
- ”all”, empty str or empty list (default): all the ecCodes BUFR keys (including both the header and data sections) are extracted
- ”header”: only the ecCodes BUFR keys from the header section are extracted
- ”data”: only the ecCodes BUFR keys from the data section are extracted
filters (dict) – defines the conditions when to extract the specified columns. The individual conditions are combined together with the logical AND operator to form the filter. See Filters for details.
required_columns (bool, iterable[str]) –
the list of ecCodes BUFR keys that are required to be present in a BUFR message/subset. It has a twofold meaning:
- if any of the keys in required_columns is missing in the message/subset the whole message/subset is skipped
- if all the keys in required_columns are present, the message/subset is processed even if some key from columns are missing (supposing the filter conditions are met)
- if it is a bool (either True or False), messages/subsets are always processed (supposing the filter conditions are met)

Return type:

pandas.DataFrame

How the flat reader works

The flat reader extracts each message/subset as a whole preserving the column order (see the warning below for exceptions). Each extracted message/subset will be a separate record in the resulting DataFrame.

By default, all the columns in a message/subset are extracted (see the exceptions below), but this can be changed by setting columns to “header” or “data” to get only the header or data section keys. Other column selection modes are not available.

In the results the original ecCodes keys containing the rank are used as column names, e.g. “#1#latitude” instead of “latitude”. The following set of keys are omitted:

from the header: “unexpandedDescriptors”

from the data section: data description operator qualifiers (e.g. “delayedDescriptorReplicationFactor”) and “operator”

key attributes e.g. “latitude->code”

The rank appearing in the keys in a message containing uncompressed subsets is not reset by ecCodes when a new subset started. To make the columns as aligned as a possible in the output pdbufr resets the rank and ensures that e.g. the first “latitude” key is always called “#1#latitude” in each uncompressed subset.

Warning

Messages/subsets in a BUFR file can have a different set of BUFR keys. When a new message/subset is processed the flat reader adds it to the resulting DataFrame as a new record and columns that are not yet present in the output are automatically appended by Pandas to the end changing the original order of keys for that message. When this happens pdbufr prints a warning message to the stdout (see the example below or the Flat: overview notebook for details).

Filters

With filters we can control which messages/subsets should be selected. The conditions are combined together with the logical AND operator to form the filter. See Filters for details. The following special rules apply to the flat reader filters:
they can only contain keys without a rank
for non-computed keys the filter condition matches if there is a match for the same key with any given rank in the message/subset. E.g. if
filters = {"pressure": 50000}
and there is e.g. a value “#12#pressure” = 50000 in the message/subset then the filter matches.
for computed keys the filter condition matches if there is a match for the involved keys at their first occurrence (e.i. rank=1) in the message/subset. E.g:
filters = {"WMO_station_id": 12925}
matches if “#1#blockNumber” = 12 and “#1#stationNumber” = 925 in the message/subset (remember WMO_station_id=blockNumber*1000+stationNumber)

Example

The input is one of the tests data files with classic radiosonde observations, where each message contains a single location (“latitude”, “longitude”) with several pressure levels of temperature, dewpoint etc. The message hierarchy is shown in the following snapshot:

To extract all the data values for the first two stations we can use this code:
df = pdbufr.read_bufr(
    "tests/sample_data/temp.bufr",
    reader="flat",
    columns="data",
    filters={"count": [1, 2]},
)
which results in the following DataFrame:
    subsetNumber  #1#blockNumber  #1#stationNumber    ...  #25#airTemperature  #25#dewpointTemperature  #25#windDirection  #25#windSpeed
0        1             71               907                    NaN                      NaN                NaN            NaN
1        1             71               823                  221.5                    191.5                NaN            NaN

[2 rows x 197 columns]
and generates the following warning:
Warning: not all BUFR messages/subsets have the same structure in the input file.
Non-overlapping columns (starting with column[189] = #1#generatingApplication)
were added to end of the resulting dataframe altering the original column order
for these messages.
This warning can be disabled by using the warnings module. The code below produces the same DataFrame as the one above but does not print the warning message:
import warnings

warnings.filterwarnings("ignore", module="pdbufr")

df = pdbufr.read_bufr(
    "tests/sample_data/temp.bufr",
    reader="flat",
    columns="data",
    filters={"count": [1, 2]},
)

Examples

Flat: overview