| Subject: | DMs with S. Keshav |
|---|---|
| Date: | Wed, 17 Jun 2026 08:00:52 +0000 |
| From: | Zulip notifications <nor...@zulip.com> |
| Reply-To: | Zulip <mmff692a1d4f19fc2...@streams.zulipchat.com> |
| To: | Aaditeshwar Seth <as...@cse.iitd.ernet.in> |
S. Keshav said in #Blogs > Useful packages for working with Tessera:
I'm happy to announce the public release of several components that proved useful in the course of developing TEE that might be useful
to others.tessera-eval — assessing the representations (previously bundled into TEE)
<https://github.com/ucam-eo/tessera-eval>
A standalone library for evaluating land-cover and habitat classifiers on Tessera embeddings end to end: loading and dequantization, rasterization of shapefile labels onto the pixel grid, and scoring of a range of models (random forest, MLP, k-NN, and optionally XGBoost and a U-Net) using learning curves, k-fold cross-validation, and spatial hold-out splits. It also includes a local-compute server and a VQ data path, so that the downstream accuracy cost of
tessera-vq's compression can be measured directly.tessera-vq — per-tile vector quantisation (new, drop in replacement for geotessera, works with a VQ service running publicly at tee.cl.cam.ac.uk )
<https://github.com/sk818/tessera-vq>
Within an individual tile, only a small number of land-cover prototypes are typically present. A per-tile codebook together with an index map can therefore reduce the storage required for the embeddings 80-90x, at some cost to downstream accuracy.
tessera-vqcomprises the underlying study together with a server and a GeoTessera-compatible client (VQTessera) that serve residual-VQ-compressed embeddings and reconstruct them on request, including in the browser.tessera-zarr-utils — reading a bounding box into a single grid (extracted from TEE)
<https://github.com/ucam-eo/tessera-zarr-utils>
GeoTessera's zarr store is efficient, but it is not univdersally available. Also, aa
read_regioncall assigns a bounding box to a single UTM zone and returns data in native UTM, clipped to the centre zone.tessera-zarr-utilschecks if zarr is availa ble for a bounding box and, if so, uses this format. If not, it reverts to numpy arrays as befor. Moreover, when usingzarr, it reads the region in chunks, each in its native zone, and reprojects them onto a common EPSG:4326 grid. A bounding box spanning two UTM zones is therefore returned as a single lon/lat mosaic. Resampling uses nearest-neighbour interpolation, so that embeddings are not blended, and NaN nodata values are preserved. Dependencies are limited to numpy, rasterio, and affine.blockwise-kmeans — k-means for large tiles on CPU (extracted from RVQ)
<https://github.com/sk818/blockwise-kmeans>
A blocked BLAS-GEMM implementation of k-means with sampled fitting, exact blocked assignment, and bounded memory use. It is intended to quantise on the order of millions of rows (with k up to roughly 1024) on a CPU without materialising the full N×k distance matrix. The implementation is pure NumPy; the principal cost is a single multithreaded
x @ cᵀmatrix multiplication. Results are deterministic given a fixed seed.Comments and feedback welcome!