The Data
Where it comes from, what happens to it, and how you can use it.
We collect air quality measurements from every source we can find across Kazakhstan — government stations, low-cost sensors, international aggregators. Each source reports data differently: different units, different formats, different levels of reliability.
Our job is to bring all of this together, clean it carefully, and publish a single dataset that people can actually trust and use. Here is how that works.
Where it comes from
Our Sources
Government Reference
KazHydroMet (KGMT)
Official government monitoring network covering all of Kazakhstan. Reports PM2.5, PM10, NO2, SO2, CO, O3, H2S, and weather data.
Low-Cost Sensors
AirGradient
Dense network of low-cost air quality sensors providing high-frequency PM, CO2, TVOC, temperature, and humidity data.
International Aggregator
OpenAQ
Global open air quality platform aggregating PM2.5 and PM10 measurements from government and research-grade monitors.
International Aggregator
WAQI / aqicn.org
World Air Quality Index project providing multi-pollutant data including PM2.5, PM10, NO2, SO2, CO, O3, and meteorological parameters.
Low-Cost Sensors
AirKaz
Historical network of low-cost PM2.5 sensors across Almaty, providing daily city-wide and per-sensor measurements.
The process
From sensor to open data
Every measurement goes through the same careful process before it reaches you. Nothing is changed silently — we keep the original, and we show our work.
Collect
We pull data from all four sources automatically. Every API response is saved exactly as we received it — the original is never modified. If something goes wrong with one source, the others keep running.
Harmonize
Different sources use different units and formats. We convert everything to a common standard — all concentrations in micrograms per cubic meter, all timestamps aligned, all stations mapped to a single registry. The original values are always preserved alongside the converted ones.
Clean
This is where we spend the most care. Every measurement is checked for problems — impossible values, frozen sensors, sudden spikes, readings that do not make physical sense. For PM2.5 specifically, we run a deeper statistical analysis to catch subtle issues that simple checks would miss. Every problem is flagged openly, never silently removed.
Validate
Before anything is published, the entire dataset goes through a final round of validation — automated checks that look for anything we might have missed. If any check fails, nothing gets published until the issue is resolved. We would rather delay than publish bad data.
Publish
Only measurements that passed every check are included in the published dataset. The result is available in open formats — ready to download, ready to use, with every value traceable back to the exact raw reading it came from.
Data quality
Why trust this data?
We know that air quality data is only useful if people can trust it. Sensors break, readings spike for no reason, instruments freeze. We do not pretend this does not happen — instead, we deal with it openly:
Four layers of quality checks — from basic sanity checks at the moment data arrives to deep statistical analysis that catches subtle sensor problems. Each layer catches what the previous one might miss.
Every measurement is flagged — clean, suspect, or invalid. We do not silently drop bad data. You can see exactly what we flagged and why, and decide for yourself what to include.
Full traceability — every published value links back to the exact raw reading it came from, with the original value and original unit preserved. You can verify our work.
Failed checks block publication — if our validation finds a problem, nothing gets published until it is fixed. We would rather have a delay than let unreliable data through.
What we measure
Parameters
| Parameter | Description | Unit |
|---|---|---|
| pm25 | PM2.5 — fine particulate matter | ug/m3 |
| pm10 | PM10 — coarse particulate matter | ug/m3 |
| pm1 | PM1.0 — ultrafine particulate matter | ug/m3 |
| no2 | Nitrogen dioxide | ug/m3 |
| so2 | Sulfur dioxide | ug/m3 |
| co | Carbon monoxide | ug/m3 |
| o3 | Ozone | ug/m3 |
| h2s | Hydrogen sulfide | ug/m3 |
| co2 | Carbon dioxide | ppm |
| tvoc | Total volatile organic compounds | ppb |
| temperature | Air temperature | C |
| humidity | Relative humidity | % |
Open data
Download the Data
All data is free and open. Measurements are partitioned by year and month. Only data that passed quality control is included.