The Data

Where it comes from, what happens to it, and how you can use it.

We collect air quality measurements from every source we can find across Kazakhstan — government stations, low-cost sensors, international aggregators. Each source reports data differently: different units, different formats, different levels of reliability.

Our job is to bring all of this together, clean it carefully, and publish a single dataset that people can actually trust and use. Here is how that works.

Where it comes from

Our Sources

Government Reference

KazHydroMet (KGMT)

Official government monitoring network covering all of Kazakhstan. Reports PM2.5, PM10, NO2, SO2, CO, O3, H2S, and weather data.

141+ stationsSince 2018All Kazakhstan

Low-Cost Sensors

AirGradient

Dense network of low-cost air quality sensors providing high-frequency PM, CO2, TVOC, temperature, and humidity data.

139 sensorsEvery 5 minAlmaty

International Aggregator

OpenAQ

Global open air quality platform aggregating PM2.5 and PM10 measurements from government and research-grade monitors.

200+ locationsSince 2020Kazakhstan

International Aggregator

WAQI / aqicn.org

World Air Quality Index project providing multi-pollutant data including PM2.5, PM10, NO2, SO2, CO, O3, and meteorological parameters.

KZ + Central AsiaSince 2023Multi-pollutant

Low-Cost Sensors

AirKaz

Historical network of low-cost PM2.5 sensors across Almaty, providing daily city-wide and per-sensor measurements.

41 sensors2017–2020Almaty

The process

From sensor to open data

Every measurement goes through the same careful process before it reaches you. Nothing is changed silently — we keep the original, and we show our work.

1

Collect

We pull data from all four sources automatically. Every API response is saved exactly as we received it — the original is never modified. If something goes wrong with one source, the others keep running.

2

Harmonize

Different sources use different units and formats. We convert everything to a common standard — all concentrations in micrograms per cubic meter, all timestamps aligned, all stations mapped to a single registry. The original values are always preserved alongside the converted ones.

3

Clean

This is where we spend the most care. Every measurement is checked for problems — impossible values, frozen sensors, sudden spikes, readings that do not make physical sense. For PM2.5 specifically, we run a deeper statistical analysis to catch subtle issues that simple checks would miss. Every problem is flagged openly, never silently removed.

4

Validate

Before anything is published, the entire dataset goes through a final round of validation — automated checks that look for anything we might have missed. If any check fails, nothing gets published until the issue is resolved. We would rather delay than publish bad data.

5

Publish

Only measurements that passed every check are included in the published dataset. The result is available in open formats — ready to download, ready to use, with every value traceable back to the exact raw reading it came from.

Data quality

Why trust this data?

We know that air quality data is only useful if people can trust it. Sensors break, readings spike for no reason, instruments freeze. We do not pretend this does not happen — instead, we deal with it openly:

4

Four layers of quality checks — from basic sanity checks at the moment data arrives to deep statistical analysis that catches subtle sensor problems. Each layer catches what the previous one might miss.

Every measurement is flagged — clean, suspect, or invalid. We do not silently drop bad data. You can see exactly what we flagged and why, and decide for yourself what to include.

Full traceability — every published value links back to the exact raw reading it came from, with the original value and original unit preserved. You can verify our work.

Failed checks block publication — if our validation finds a problem, nothing gets published until it is fixed. We would rather have a delay than let unreliable data through.

What we measure

Parameters

Parameter Description Unit
pm25 PM2.5 — fine particulate matter ug/m3
pm10 PM10 — coarse particulate matter ug/m3
pm1 PM1.0 — ultrafine particulate matter ug/m3
no2 Nitrogen dioxide ug/m3
so2 Sulfur dioxide ug/m3
co Carbon monoxide ug/m3
o3 Ozone ug/m3
h2s Hydrogen sulfide ug/m3
co2 Carbon dioxide ppm
tvoc Total volatile organic compounds ppb
temperature Air temperature C
humidity Relative humidity %

Open data

Download the Data

All data is free and open. Measurements are partitioned by year and month. Only data that passed quality control is included.

.parquet
Measurements
.csv
Stations
.geojson
Locations