Try this notebook in Binder.

[1]:
!test -f aircraft_small.bufr || wget https://get.ecmwf.int/repository/test-data/pdbufr/test-data/aircraft_small.bufr
!test -f temp.bufr || wget https://get.ecmwf.int/repository/test-data/pdbufr/test-data/temp.bufr

Flat: overview

[2]:
import pdbufr

The flat reader is activated with the reader="flat" option in read_bufr(). With this messages/subsets are extracted as a whole preserving the column order (see exceptions below).

Since the results contain a large number of columns with very long names the transpose of the DataFrames are shown in all the examples below to make better use of the available space.

Options

By default all the header and data keys are extracted:

[3]:
df = pdbufr.read_bufr("aircraft_small.bufr", reader="flat")
df.T
[3]:
0 1 2 3 4 5 6 7 8 9
edition 3 3 3 3 3 3 3 3 3 3
masterTableNumber 0 0 0 0 0 0 0 0 0 0
bufrHeaderSubCentre 0 0 0 0 0 0 0 0 0 0
bufrHeaderCentre 98 98 98 98 98 98 98 98 98 98
updateSequenceNumber 0 0 0 0 0 0 0 0 0 0
... ... ... ... ... ... ... ... ... ... ...
#1#dewpointTemperature None None None None None None None None None None
#1#relativeHumidity None None None None None None None None None None
#1#airframeIcing None None None None None None None None None None
#1#centre 98 98 98 98 98 98 98 98 98 98
#1#generatingApplication 1 1 1 1 1 1 1 1 1 1

81 rows × 10 columns

However, we can extract only the header keys:

[4]:
df = pdbufr.read_bufr("aircraft_small.bufr", columns="header", reader="flat")
df.T[:6]
[4]:
0 1 2 3 4 5 6 7 8 9
edition 3 3 3 3 3 3 3 3 3 3
masterTableNumber 0 0 0 0 0 0 0 0 0 0
bufrHeaderSubCentre 0 0 0 0 0 0 0 0 0 0
bufrHeaderCentre 98 98 98 98 98 98 98 98 98 98
updateSequenceNumber 0 0 0 0 0 0 0 0 0 0
dataCategory 4 4 4 4 4 4 4 4 4 4

or only the data keys:

[5]:
df = pdbufr.read_bufr("aircraft_small.bufr", columns="data", reader="flat")
df.T[:18]
[5]:
0 1 2 3 4 5 6 7 8 9
subsetNumber 1 1 1 1 1 1 1 1 1 1
#1#aircraftFlightNumber QGOBTRRA QGOBTRRA UOZDOZ2S UOZDOZ2S UOZDOZ2S UOZDOZ2S VUVTEWZQ 4IPASOZA WSSASKBA WSSASKBA
#1#aircraftRegistrationNumberOrOtherIdentification HGSKJFBA HGSKJFBA O2RYR4JA O2RYR4JA O2RYR4JA O2RYR4JA 4NK13QZA 0IKWU1JA P4MAWDZA P4MAWDZA
#1#aircraftNavigationalSystem None None None None None None None None None None
#1#aircraftDataRelaySystemType 3 3 3 3 3 3 3 3 3 3
#1#instrumentationForWindMeasurement 4 4 4 4 4 4 4 4 4 4
#1#temperatureObservationPrecision 0.1 0.1 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25
#1#originalSpecificationOfLatitudeOrLongitude 1 1 10 10 10 10 10 1 10 10
#1#aircraftRollAngle None None None None None None None None None None
#1#stationType 0 0 0 0 0 0 0 0 0 0
#1#year 2009 2009 2009 2009 2009 2009 2009 2009 2009 2009
#1#month 1 1 1 1 1 1 1 1 1 1
#1#day 23 23 23 23 23 23 23 23 23 23
#1#hour 13 13 12 12 13 13 13 13 13 13
#1#minute 0 1 56 58 0 2 2 2 0 0
#1#latitude 35.1 35.07 41.45 41.52 41.59 41.67 38.26 19.62 42.31 42.29
#1#longitude -89.97 -89.97 -75.43 -75.63 -75.87 -76.16 -78.57 73.75 -70.7 -70.67
#1#phaseOfAircraftFlight 6.0 6.0 NaN NaN NaN NaN 3.0 NaN 5.0 5.0

Filtering works similarly to the hierarchical (i.e. non-flat) mode:

[6]:
df = pdbufr.read_bufr("aircraft_small.bufr",
    columns="data",
    filters={"aircraftFlightNumber": "UOZDOZ2S"},
    reader="flat")
df.T.iloc[:18]
[6]:
0 1 2 3
subsetNumber 1 1 1 1
#1#aircraftFlightNumber UOZDOZ2S UOZDOZ2S UOZDOZ2S UOZDOZ2S
#1#aircraftRegistrationNumberOrOtherIdentification O2RYR4JA O2RYR4JA O2RYR4JA O2RYR4JA
#1#aircraftNavigationalSystem None None None None
#1#aircraftDataRelaySystemType 3 3 3 3
#1#instrumentationForWindMeasurement 4 4 4 4
#1#temperatureObservationPrecision 0.25 0.25 0.25 0.25
#1#originalSpecificationOfLatitudeOrLongitude 10 10 10 10
#1#aircraftRollAngle None None None None
#1#stationType 0 0 0 0
#1#year 2009 2009 2009 2009
#1#month 1 1 1 1
#1#day 23 23 23 23
#1#hour 12 12 13 13
#1#minute 56 58 0 2
#1#latitude 41.45 41.52 41.59 41.67
#1#longitude -75.43 -75.63 -75.87 -76.16
#1#phaseOfAircraftFlight None None None None

Column alignment

The aircraft messages we have examined so far had identical structure; each message contained the very same keys in the very same order. The result was always a nicely aligned DataFrame.

However, in a BUFR file each message can have a different structure and the alignment is not guaranteed at all. We will demonstrate it with a BUFR file containing radiosonde data.

First, we extract the first message only. From the output we can see it contains 24 pressure level blocks.

[7]:
df = pdbufr.read_bufr("temp.bufr", columns="data", filters={"count": 1}, reader="flat")
df.T.iloc[-16:]
[7]:
0
#23#pressure 26300.0
#23#verticalSoundingSignificance 4
#23#nonCoordinateGeopotential 89290.0
#23#airTemperature 218.5
#23#dewpointTemperature 198.5
#23#windDirection None
#23#windSpeed None
#24#pressure 25800.0
#24#verticalSoundingSignificance 4
#24#nonCoordinateGeopotential 90490.0
#24#airTemperature 218.5
#24#dewpointTemperature 196.5
#24#windDirection None
#24#windSpeed None
#1#centre 98
#1#generatingApplication 1

Next, we extract the second message. This message contains one more block (25 in total):

[8]:
df = pdbufr.read_bufr("temp.bufr", columns="data", filters={"count": 2}, reader="flat")
df.T.iloc[-16:]
[8]:
0
#24#pressure 23200.0
#24#verticalSoundingSignificance 4
#24#nonCoordinateGeopotential 98410.0
#24#airTemperature 223.1
#24#dewpointTemperature 192.1
#24#windDirection None
#24#windSpeed None
#25#pressure 20500.0
#25#verticalSoundingSignificance 4
#25#nonCoordinateGeopotential 106300.0
#25#airTemperature 221.5
#25#dewpointTemperature 191.5
#25#windDirection None
#25#windSpeed None
#1#centre 98
#1#generatingApplication 1

Now, if we extract these messages together the columns will not be aligned:

[9]:
df = pdbufr.read_bufr("temp.bufr", columns="data", filters={"count": [1,2]}, reader="flat")
df.T.iloc[-16:]
Warning: not all BUFR messages/subsets have the same structure in the input file. Non-overlapping columns (starting with column[{column_info.first_count-1}] =#1#generatingApplication) were added to end of the resulting dataframealtering the original column order for these messages.
[9]:
0 1
#24#pressure 25800.0 23200.0
#24#verticalSoundingSignificance 4 4
#24#nonCoordinateGeopotential 90490.0 98410.0
#24#airTemperature 218.5 223.1
#24#dewpointTemperature 196.5 192.1
#24#windDirection None None
#24#windSpeed None None
#1#centre 98 98
#1#generatingApplication 1 1
#25#pressure NaN 20500.0
#25#verticalSoundingSignificance NaN 4.0
#25#nonCoordinateGeopotential NaN 106300.0
#25#airTemperature NaN 221.5
#25#dewpointTemperature NaN 191.5
#25#windDirection NaN NaN
#25#windSpeed NaN NaN

So what happened here? The resulting DataFrame was built message by message and columns not yet present were automatically appended to the end by Pandas. We can see that this happened to block #25 from the second message. It changed the original column order because “#1#centre” and “#1#generatingApplication” now come before and not after block #25. While this is probably a harmless change in this case we can imagine it can pose a significant challenge for more complex message types.

As a safety measure, when messages are not fully aligned read_bufr() prints a warning message to the stderr.

To disable the warning message use the warnings module as shown below:

[10]:
import warnings
warnings.filterwarnings("ignore", module="pdbufr")

df = pdbufr.read_bufr("temp.bufr", columns="data", filters={"count": [1,2]}, reader="flat")
df.T.iloc[-16:]
[10]:
0 1
#24#pressure 25800.0 23200.0
#24#verticalSoundingSignificance 4 4
#24#nonCoordinateGeopotential 90490.0 98410.0
#24#airTemperature 218.5 223.1
#24#dewpointTemperature 196.5 192.1
#24#windDirection None None
#24#windSpeed None None
#1#centre 98 98
#1#generatingApplication 1 1
#25#pressure NaN 20500.0
#25#verticalSoundingSignificance NaN 4.0
#25#nonCoordinateGeopotential NaN 106300.0
#25#airTemperature NaN 221.5
#25#dewpointTemperature NaN 191.5
#25#windDirection NaN NaN
#25#windSpeed NaN NaN