Try this notebook in .

[1]:

!test -f aircraft_small.bufr || wget https://get.ecmwf.int/repository/test-data/pdbufr/test-data/aircraft_small.bufr
!test -f temp.bufr || wget https://get.ecmwf.int/repository/test-data/pdbufr/test-data/temp.bufr

Flat: overview

[2]:

import pdbufr

The flat reader is activated with the reader="flat" option in read_bufr(). With this messages/subsets are extracted as a whole preserving the column order (see exceptions below).

Since the results contain a large number of columns with very long names the transpose of the DataFrames are shown in all the examples below to make better use of the available space.

Options

By default all the header and data keys are extracted:

[3]:

df = pdbufr.read_bufr("aircraft_small.bufr", reader="flat")
df.T

[3]:

	0	1	2	3	4	5	6	7	8	9
edition	3	3	3	3	3	3	3	3	3	3
masterTableNumber	0	0	0	0	0	0	0	0	0	0
bufrHeaderSubCentre	0	0	0	0	0	0	0	0	0	0
bufrHeaderCentre	98	98	98	98	98	98	98	98	98	98
updateSequenceNumber	0	0	0	0	0	0	0	0	0	0
...	...	...	...	...	...	...	...	...	...	...
#1#dewpointTemperature	None	None	None	None	None	None	None	None	None	None
#1#relativeHumidity	None	None	None	None	None	None	None	None	None	None
#1#airframeIcing	None	None	None	None	None	None	None	None	None	None
#1#centre	98	98	98	98	98	98	98	98	98	98
#1#generatingApplication	1	1	1	1	1	1	1	1	1	1

81 rows × 10 columns

However, we can extract only the header keys:

[4]:

df = pdbufr.read_bufr("aircraft_small.bufr", columns="header", reader="flat")
df.T[:6]

[4]:

	0	1	2	3	4	5	6	7	8	9
edition	3	3	3	3	3	3	3	3	3	3
masterTableNumber	0	0	0	0	0	0	0	0	0	0
bufrHeaderSubCentre	0	0	0	0	0	0	0	0	0	0
bufrHeaderCentre	98	98	98	98	98	98	98	98	98	98
updateSequenceNumber	0	0	0	0	0	0	0	0	0	0
dataCategory	4	4	4	4	4	4	4	4	4	4

or only the data keys:

[5]:

df = pdbufr.read_bufr("aircraft_small.bufr", columns="data", reader="flat")
df.T[:18]

[5]:

	0	1	2	3	4	5	6	7	8	9
subsetNumber	1	1	1	1	1	1	1	1	1	1
#1#aircraftFlightNumber	QGOBTRRA	QGOBTRRA	UOZDOZ2S	UOZDOZ2S	UOZDOZ2S	UOZDOZ2S	VUVTEWZQ	4IPASOZA	WSSASKBA	WSSASKBA
#1#aircraftRegistrationNumberOrOtherIdentification	HGSKJFBA	HGSKJFBA	O2RYR4JA	O2RYR4JA	O2RYR4JA	O2RYR4JA	4NK13QZA	0IKWU1JA	P4MAWDZA	P4MAWDZA
#1#aircraftNavigationalSystem	None	None	None	None	None	None	None	None	None	None
#1#aircraftDataRelaySystemType	3	3	3	3	3	3	3	3	3	3
#1#instrumentationForWindMeasurement	4	4	4	4	4	4	4	4	4	4
#1#temperatureObservationPrecision	0.1	0.1	0.25	0.25	0.25	0.25	0.25	0.25	0.25	0.25
#1#originalSpecificationOfLatitudeOrLongitude	1	1	10	10	10	10	10	1	10	10
#1#aircraftRollAngle	None	None	None	None	None	None	None	None	None	None
#1#stationType	0	0	0	0	0	0	0	0	0	0
#1#year	2009	2009	2009	2009	2009	2009	2009	2009	2009	2009
#1#month	1	1	1	1	1	1	1	1	1	1
#1#day	23	23	23	23	23	23	23	23	23	23
#1#hour	13	13	12	12	13	13	13	13	13	13
#1#minute	0	1	56	58	0	2	2	2	0	0
#1#latitude	35.1	35.07	41.45	41.52	41.59	41.67	38.26	19.62	42.31	42.29
#1#longitude	-89.97	-89.97	-75.43	-75.63	-75.87	-76.16	-78.57	73.75	-70.7	-70.67
#1#phaseOfAircraftFlight	6.0	6.0	NaN	NaN	NaN	NaN	3.0	NaN	5.0	5.0

Filtering works similarly to the hierarchical (i.e. non-flat) mode:

[6]:

df = pdbufr.read_bufr("aircraft_small.bufr",
    columns="data",
    filters={"aircraftFlightNumber": "UOZDOZ2S"},
    reader="flat")
df.T.iloc[:18]

[6]:

	0	1	2	3
subsetNumber	1	1	1	1
#1#aircraftFlightNumber	UOZDOZ2S	UOZDOZ2S	UOZDOZ2S	UOZDOZ2S
#1#aircraftRegistrationNumberOrOtherIdentification	O2RYR4JA	O2RYR4JA	O2RYR4JA	O2RYR4JA
#1#aircraftNavigationalSystem	None	None	None	None
#1#aircraftDataRelaySystemType	3	3	3	3
#1#instrumentationForWindMeasurement	4	4	4	4
#1#temperatureObservationPrecision	0.25	0.25	0.25	0.25
#1#originalSpecificationOfLatitudeOrLongitude	10	10	10	10
#1#aircraftRollAngle	None	None	None	None
#1#stationType	0	0	0	0
#1#year	2009	2009	2009	2009
#1#month	1	1	1	1
#1#day	23	23	23	23
#1#hour	12	12	13	13
#1#minute	56	58	0	2
#1#latitude	41.45	41.52	41.59	41.67
#1#longitude	-75.43	-75.63	-75.87	-76.16
#1#phaseOfAircraftFlight	None	None	None	None

Column alignment

The aircraft messages we have examined so far had identical structure; each message contained the very same keys in the very same order. The result was always a nicely aligned DataFrame.

However, in a BUFR file each message can have a different structure and the alignment is not guaranteed at all. We will demonstrate it with a BUFR file containing radiosonde data.

First, we extract the first message only. From the output we can see it contains 24 pressure level blocks.

[7]:

df = pdbufr.read_bufr("temp.bufr", columns="data", filters={"count": 1}, reader="flat")
df.T.iloc[-16:]

[7]:

	0
#23#pressure	26300.0
#23#verticalSoundingSignificance	4
#23#nonCoordinateGeopotential	89290.0
#23#airTemperature	218.5
#23#dewpointTemperature	198.5
#23#windDirection	None
#23#windSpeed	None
#24#pressure	25800.0
#24#verticalSoundingSignificance	4
#24#nonCoordinateGeopotential	90490.0
#24#airTemperature	218.5
#24#dewpointTemperature	196.5
#24#windDirection	None
#24#windSpeed	None
#1#centre	98
#1#generatingApplication	1

Next, we extract the second message. This message contains one more block (25 in total):

[8]:

df = pdbufr.read_bufr("temp.bufr", columns="data", filters={"count": 2}, reader="flat")
df.T.iloc[-16:]

[8]:

	0
#24#pressure	23200.0
#24#verticalSoundingSignificance	4
#24#nonCoordinateGeopotential	98410.0
#24#airTemperature	223.1
#24#dewpointTemperature	192.1
#24#windDirection	None
#24#windSpeed	None
#25#pressure	20500.0
#25#verticalSoundingSignificance	4
#25#nonCoordinateGeopotential	106300.0
#25#airTemperature	221.5
#25#dewpointTemperature	191.5
#25#windDirection	None
#25#windSpeed	None
#1#centre	98
#1#generatingApplication	1

Now, if we extract these messages together the columns will not be aligned:

[9]:

df = pdbufr.read_bufr("temp.bufr", columns="data", filters={"count": [1,2]}, reader="flat")
df.T.iloc[-16:]

Warning: not all BUFR messages/subsets have the same structure in the input file. Non-overlapping columns (starting with column[{column_info.first_count-1}] =#1#generatingApplication) were added to end of the resulting dataframealtering the original column order for these messages.

[9]:

	0	1
#24#pressure	25800.0	23200.0
#24#verticalSoundingSignificance	4	4
#24#nonCoordinateGeopotential	90490.0	98410.0
#24#airTemperature	218.5	223.1
#24#dewpointTemperature	196.5	192.1
#24#windDirection	None	None
#24#windSpeed	None	None
#1#centre	98	98
#1#generatingApplication	1	1
#25#pressure	NaN	20500.0
#25#verticalSoundingSignificance	NaN	4.0
#25#nonCoordinateGeopotential	NaN	106300.0
#25#airTemperature	NaN	221.5
#25#dewpointTemperature	NaN	191.5
#25#windDirection	NaN	NaN
#25#windSpeed	NaN	NaN

So what happened here? The resulting DataFrame was built message by message and columns not yet present were automatically appended to the end by Pandas. We can see that this happened to block #25 from the second message. It changed the original column order because “#1#centre” and “#1#generatingApplication” now come before and not after block #25. While this is probably a harmless change in this case we can imagine it can pose a significant challenge for more complex message types.

As a safety measure, when messages are not fully aligned read_bufr() prints a warning message to the stderr.

To disable the warning message use the warnings module as shown below:

[10]:

import warnings
warnings.filterwarnings("ignore", module="pdbufr")

df = pdbufr.read_bufr("temp.bufr", columns="data", filters={"count": [1,2]}, reader="flat")
df.T.iloc[-16:]

[10]:

	0	1
#24#pressure	25800.0	23200.0
#24#verticalSoundingSignificance	4	4
#24#nonCoordinateGeopotential	90490.0	98410.0
#24#airTemperature	218.5	223.1
#24#dewpointTemperature	196.5	192.1
#24#windDirection	None	None
#24#windSpeed	None	None
#1#centre	98	98
#1#generatingApplication	1	1
#25#pressure	NaN	20500.0
#25#verticalSoundingSignificance	NaN	4.0
#25#nonCoordinateGeopotential	NaN	106300.0
#25#airTemperature	NaN	221.5
#25#dewpointTemperature	NaN	191.5
#25#windDirection	NaN	NaN
#25#windSpeed	NaN	NaN