[1]:
!test -f aircraft_small.bufr || wget https://get.ecmwf.int/repository/test-data/pdbufr/test-data/aircraft_small.bufr
!test -f temp.bufr || wget https://get.ecmwf.int/repository/test-data/pdbufr/test-data/temp.bufr
Flat: overview
[2]:
import pdbufr
The flat reader is activated with the reader="flat" option in read_bufr(). With this messages/subsets are extracted as a whole preserving the column order (see exceptions below).
Since the results contain a large number of columns with very long names the transpose of the DataFrames are shown in all the examples below to make better use of the available space.
Options
By default all the header and data keys are extracted:
[3]:
df = pdbufr.read_bufr("aircraft_small.bufr", reader="flat")
df.T
[3]:
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | |
|---|---|---|---|---|---|---|---|---|---|---|
| edition | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 3 |
| masterTableNumber | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| bufrHeaderSubCentre | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| bufrHeaderCentre | 98 | 98 | 98 | 98 | 98 | 98 | 98 | 98 | 98 | 98 |
| updateSequenceNumber | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| #1#dewpointTemperature | None | None | None | None | None | None | None | None | None | None |
| #1#relativeHumidity | None | None | None | None | None | None | None | None | None | None |
| #1#airframeIcing | None | None | None | None | None | None | None | None | None | None |
| #1#centre | 98 | 98 | 98 | 98 | 98 | 98 | 98 | 98 | 98 | 98 |
| #1#generatingApplication | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
81 rows × 10 columns
However, we can extract only the header keys:
[4]:
df = pdbufr.read_bufr("aircraft_small.bufr", columns="header", reader="flat")
df.T[:6]
[4]:
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | |
|---|---|---|---|---|---|---|---|---|---|---|
| edition | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 3 |
| masterTableNumber | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| bufrHeaderSubCentre | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| bufrHeaderCentre | 98 | 98 | 98 | 98 | 98 | 98 | 98 | 98 | 98 | 98 |
| updateSequenceNumber | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| dataCategory | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 |
or only the data keys:
[5]:
df = pdbufr.read_bufr("aircraft_small.bufr", columns="data", reader="flat")
df.T[:18]
[5]:
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | |
|---|---|---|---|---|---|---|---|---|---|---|
| subsetNumber | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
| #1#aircraftFlightNumber | QGOBTRRA | QGOBTRRA | UOZDOZ2S | UOZDOZ2S | UOZDOZ2S | UOZDOZ2S | VUVTEWZQ | 4IPASOZA | WSSASKBA | WSSASKBA |
| #1#aircraftRegistrationNumberOrOtherIdentification | HGSKJFBA | HGSKJFBA | O2RYR4JA | O2RYR4JA | O2RYR4JA | O2RYR4JA | 4NK13QZA | 0IKWU1JA | P4MAWDZA | P4MAWDZA |
| #1#aircraftNavigationalSystem | None | None | None | None | None | None | None | None | None | None |
| #1#aircraftDataRelaySystemType | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 3 |
| #1#instrumentationForWindMeasurement | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 |
| #1#temperatureObservationPrecision | 0.1 | 0.1 | 0.25 | 0.25 | 0.25 | 0.25 | 0.25 | 0.25 | 0.25 | 0.25 |
| #1#originalSpecificationOfLatitudeOrLongitude | 1 | 1 | 10 | 10 | 10 | 10 | 10 | 1 | 10 | 10 |
| #1#aircraftRollAngle | None | None | None | None | None | None | None | None | None | None |
| #1#stationType | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| #1#year | 2009 | 2009 | 2009 | 2009 | 2009 | 2009 | 2009 | 2009 | 2009 | 2009 |
| #1#month | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
| #1#day | 23 | 23 | 23 | 23 | 23 | 23 | 23 | 23 | 23 | 23 |
| #1#hour | 13 | 13 | 12 | 12 | 13 | 13 | 13 | 13 | 13 | 13 |
| #1#minute | 0 | 1 | 56 | 58 | 0 | 2 | 2 | 2 | 0 | 0 |
| #1#latitude | 35.1 | 35.07 | 41.45 | 41.52 | 41.59 | 41.67 | 38.26 | 19.62 | 42.31 | 42.29 |
| #1#longitude | -89.97 | -89.97 | -75.43 | -75.63 | -75.87 | -76.16 | -78.57 | 73.75 | -70.7 | -70.67 |
| #1#phaseOfAircraftFlight | 6.0 | 6.0 | NaN | NaN | NaN | NaN | 3.0 | NaN | 5.0 | 5.0 |
Filtering works similarly to the hierarchical (i.e. non-flat) mode:
[6]:
df = pdbufr.read_bufr("aircraft_small.bufr",
columns="data",
filters={"aircraftFlightNumber": "UOZDOZ2S"},
reader="flat")
df.T.iloc[:18]
[6]:
| 0 | 1 | 2 | 3 | |
|---|---|---|---|---|
| subsetNumber | 1 | 1 | 1 | 1 |
| #1#aircraftFlightNumber | UOZDOZ2S | UOZDOZ2S | UOZDOZ2S | UOZDOZ2S |
| #1#aircraftRegistrationNumberOrOtherIdentification | O2RYR4JA | O2RYR4JA | O2RYR4JA | O2RYR4JA |
| #1#aircraftNavigationalSystem | None | None | None | None |
| #1#aircraftDataRelaySystemType | 3 | 3 | 3 | 3 |
| #1#instrumentationForWindMeasurement | 4 | 4 | 4 | 4 |
| #1#temperatureObservationPrecision | 0.25 | 0.25 | 0.25 | 0.25 |
| #1#originalSpecificationOfLatitudeOrLongitude | 10 | 10 | 10 | 10 |
| #1#aircraftRollAngle | None | None | None | None |
| #1#stationType | 0 | 0 | 0 | 0 |
| #1#year | 2009 | 2009 | 2009 | 2009 |
| #1#month | 1 | 1 | 1 | 1 |
| #1#day | 23 | 23 | 23 | 23 |
| #1#hour | 12 | 12 | 13 | 13 |
| #1#minute | 56 | 58 | 0 | 2 |
| #1#latitude | 41.45 | 41.52 | 41.59 | 41.67 |
| #1#longitude | -75.43 | -75.63 | -75.87 | -76.16 |
| #1#phaseOfAircraftFlight | None | None | None | None |
Column alignment
The aircraft messages we have examined so far had identical structure; each message contained the very same keys in the very same order. The result was always a nicely aligned DataFrame.
However, in a BUFR file each message can have a different structure and the alignment is not guaranteed at all. We will demonstrate it with a BUFR file containing radiosonde data.
First, we extract the first message only. From the output we can see it contains 24 pressure level blocks.
[7]:
df = pdbufr.read_bufr("temp.bufr", columns="data", filters={"count": 1}, reader="flat")
df.T.iloc[-16:]
[7]:
| 0 | |
|---|---|
| #23#pressure | 26300.0 |
| #23#verticalSoundingSignificance | 4 |
| #23#nonCoordinateGeopotential | 89290.0 |
| #23#airTemperature | 218.5 |
| #23#dewpointTemperature | 198.5 |
| #23#windDirection | None |
| #23#windSpeed | None |
| #24#pressure | 25800.0 |
| #24#verticalSoundingSignificance | 4 |
| #24#nonCoordinateGeopotential | 90490.0 |
| #24#airTemperature | 218.5 |
| #24#dewpointTemperature | 196.5 |
| #24#windDirection | None |
| #24#windSpeed | None |
| #1#centre | 98 |
| #1#generatingApplication | 1 |
Next, we extract the second message. This message contains one more block (25 in total):
[8]:
df = pdbufr.read_bufr("temp.bufr", columns="data", filters={"count": 2}, reader="flat")
df.T.iloc[-16:]
[8]:
| 0 | |
|---|---|
| #24#pressure | 23200.0 |
| #24#verticalSoundingSignificance | 4 |
| #24#nonCoordinateGeopotential | 98410.0 |
| #24#airTemperature | 223.1 |
| #24#dewpointTemperature | 192.1 |
| #24#windDirection | None |
| #24#windSpeed | None |
| #25#pressure | 20500.0 |
| #25#verticalSoundingSignificance | 4 |
| #25#nonCoordinateGeopotential | 106300.0 |
| #25#airTemperature | 221.5 |
| #25#dewpointTemperature | 191.5 |
| #25#windDirection | None |
| #25#windSpeed | None |
| #1#centre | 98 |
| #1#generatingApplication | 1 |
Now, if we extract these messages together the columns will not be aligned:
[9]:
df = pdbufr.read_bufr("temp.bufr", columns="data", filters={"count": [1,2]}, reader="flat")
df.T.iloc[-16:]
Warning: not all BUFR messages/subsets have the same structure in the input file. Non-overlapping columns (starting with column[{column_info.first_count-1}] =#1#generatingApplication) were added to end of the resulting dataframealtering the original column order for these messages.
[9]:
| 0 | 1 | |
|---|---|---|
| #24#pressure | 25800.0 | 23200.0 |
| #24#verticalSoundingSignificance | 4 | 4 |
| #24#nonCoordinateGeopotential | 90490.0 | 98410.0 |
| #24#airTemperature | 218.5 | 223.1 |
| #24#dewpointTemperature | 196.5 | 192.1 |
| #24#windDirection | None | None |
| #24#windSpeed | None | None |
| #1#centre | 98 | 98 |
| #1#generatingApplication | 1 | 1 |
| #25#pressure | NaN | 20500.0 |
| #25#verticalSoundingSignificance | NaN | 4.0 |
| #25#nonCoordinateGeopotential | NaN | 106300.0 |
| #25#airTemperature | NaN | 221.5 |
| #25#dewpointTemperature | NaN | 191.5 |
| #25#windDirection | NaN | NaN |
| #25#windSpeed | NaN | NaN |
So what happened here? The resulting DataFrame was built message by message and columns not yet present were automatically appended to the end by Pandas. We can see that this happened to block #25 from the second message. It changed the original column order because “#1#centre” and “#1#generatingApplication” now come before and not after block #25. While this is probably a harmless change in this case we can imagine it can pose a significant challenge for more complex message types.
As a safety measure, when messages are not fully aligned read_bufr() prints a warning message to the stderr.
To disable the warning message use the warnings module as shown below:
[10]:
import warnings
warnings.filterwarnings("ignore", module="pdbufr")
df = pdbufr.read_bufr("temp.bufr", columns="data", filters={"count": [1,2]}, reader="flat")
df.T.iloc[-16:]
[10]:
| 0 | 1 | |
|---|---|---|
| #24#pressure | 25800.0 | 23200.0 |
| #24#verticalSoundingSignificance | 4 | 4 |
| #24#nonCoordinateGeopotential | 90490.0 | 98410.0 |
| #24#airTemperature | 218.5 | 223.1 |
| #24#dewpointTemperature | 196.5 | 192.1 |
| #24#windDirection | None | None |
| #24#windSpeed | None | None |
| #1#centre | 98 | 98 |
| #1#generatingApplication | 1 | 1 |
| #25#pressure | NaN | 20500.0 |
| #25#verticalSoundingSignificance | NaN | 4.0 |
| #25#nonCoordinateGeopotential | NaN | 106300.0 |
| #25#airTemperature | NaN | 221.5 |
| #25#dewpointTemperature | NaN | 191.5 |
| #25#windDirection | NaN | NaN |
| #25#windSpeed | NaN | NaN |