A Teaching Notebook · SAR Flood Detection over Banda Aceh (2025)

ABSTRACT

This notebook teaches how to read Sentinel-1 SAR imagery, why water appears dark, how a modern flood-segmentation model sees a tile, and how a single preprocessing hyperparameter — input clamping — visibly changes the model's output. Every experiment below is live: press Run to re-execute against the GPU.

2 Background

2.1 Why water is dark in SAR

Synthetic Aperture Radar (SAR) sensors such as Sentinel-1 emit a side-looking microwave pulse and measure what returns. Over a smooth surface — still water, wet pavement, a calm rice paddy — the pulse reflects specularly away from the sensor. Almost nothing comes back. The pixel is black.

Over a rough surface — vegetation, urban roofs, bare soil — the pulse scatters diffusely in many directions, including back toward the sensor. The pixel is bright. Flood detection, reduced to one sentence, is a search for pixels that used to be rough and are now smooth[1].

2.2 Banda Aceh, 26 November 2025

In late October and November 2025 the northern tip of Sumatra received sustained heavy rainfall, with the Banda Aceh metropolitan area reporting a widespread fluvial and pluvial flood on 26 November. Sentinel-1A revisited the region on 21 Oct (pre-event 1), 2 Nov (pre-event 2), and 26 Nov (co-event), yielding the three interferometric passes used throughout this notebook. The scene is cropped into 49 KuroSiwo-format tiles of 224×224 px each[2], providing a three-timestamp, dual-polarisation stack per tile suitable for both teaching and benchmarking.

3 Data

Each tile directory ships six GeoTIFF rasters — two polarisations (VV, VH) at three acquisition timestamps: pre-event 1 (21 Oct, baseline), pre-event 2 (2 Nov, approach), and co-event (26 Nov, the flood scene). Running the code cell below samples one tile from the server and displays all six bands:

pythonendpoint offline

# Draw one KuroSiwo tile at random.
from sar_toolkit.data import sample_tile

tile = sample_tile(split='test', region='banda_aceh')
tile.bands   # ['SL1_IVV', 'SL1_IVH', 'SL2_IVV', 'SL2_IVH', 'MS1_IVV', 'MS1_IVH']
tile.info['lat_start'], tile.info['lat_end']

Row-order reminder: two rows of the output figure are VV (top) and VH (bottom). Columns are pre 1, pre 2, co-event left-to-right. You should see the third column darken noticeably where water has pooled.

4 Method

4.1 The model

We use a UNet-RSMamba backbone — a U-shaped encoder-decoder with Mamba state-space blocks[3] replacing transformer self-attention at every stage. Mamba preserves the long-range receptive field of attention while reducing the per-layer cost from 𝒪(n²) to 𝒪(n), which matters at 224×224 patch scale. Bidirectional scanning follows Vision Mamba[4]; the remote-sensing-specific three-direction scan (forward, reverse, shuffled) and gated fusion follow RSMamba[5]. The head emits three logits per pixel: background, permanent water, flood water. We train with the FloodFocus loss, a Focal variant that up-weights the minority flood class.

4.2 Input clamping

Raw backscatter in linear units is very heavy-tailed: a handful of bright urban pixels can dominate the 99th percentile. Standard practice is to clamp the input at some threshold c and re-scale:

x′ = min(x, c) / c x ∈ ℝ⁺, c ∈ {∞, 0.50, 0.30, 0.15}

Eq. (1) — One-line preprocessing that yields four training regimes.

Smaller c gives the network a flatter, contrast-rich view of low-backscatter (water-like) regions at the cost of saturating bright returns. That trade-off is the central experimental question we examine next.

5 Experiments

5.1 A single live inference

The cell below streams one tile to the GPU and renders the argmax segmentation mask. On an RTX 5070 with warm weights the call typically returns in 300–500 ms end-to-end.

python

# Load checkpoint, run one tile.
from sar_toolkit.models import UNetRSMamba

model  = UNetRSMamba.from_checkpoint('best_clamp03.pt').cuda().eval()
tile   = sample_tile(split='test', region='banda_aceh')
probs  = model(tile.to_tensor().cuda())    # (1, 3, 224, 224)
labels = probs.argmax(dim=1)                # (1, 224, 224)

5.2 Sweeping the clamp hyperparameter

To let you see why clamping matters rather than accept it on faith, the next cell fires four parallel inference calls against the same tile with c ∈ {∞, 0.50, 0.30, 0.15} and compares the resulting flood-water fractions. Expect non-trivial disagreement — that is the point.

python

# Re-run the same tile under four preprocessing regimes.
configs = ['original', 'clamp05', 'clamp03', 'clamp015']
results = [infer(tile, config=c) for c in configs]
for r in results:
    pct = 100 * r.class_histogram['flood_water'] / 224**2
    print(f'{r.config:>10s}  water={pct:5.1f}%  t={r.elapsed_ms} ms')

6 Discussion

Three observations for the reader to take away.

Preprocessing is a modelling choice, not a technicality. The clamp sweep above shows two defensible values of c producing visibly different masks on the same pixel. Documenting c is as important as documenting the model architecture.
A model's confidence is not the user's confidence. Agreement between configurations is a cheap, instructor-friendly proxy for how trustworthy a pixel's label is — pixels where all four configurations agree are the ones to teach against, not the ambiguous majority.
The interaction should be the lesson. Reading that a flood mask is sensitive to preprocessing is one thing. Pressing Run and watching the water-coverage percentage swing from 8 % to 38 % is another. That swing is what pedagogy calls productive failure.

If a student leaves this notebook believing the single sentence “water reflects radar away, so it looks dark — until the model's preprocessing tells it otherwise”, the notebook has earned its runtime.

7 References

— fin —