Methodology
About the numbers
Updated 2026-05-20 · all figures derived from /data/summary.json.
The four counts that matter
| Stage | Count | Definition |
|---|---|---|
| All US hospitals | 5,426 | Rows in CMS Hospital General Information. |
| CMS-required | 4,625 | Acute care, critical access, children's, and rural emergency hospitals — the set 45 CFR § 180 binds. |
| Live MRF (compliant) | 3,986 | CMS-required hospitals where we verified a live, downloadable MRF URL. |
| Standardized prices | 3,692 | Live MRFs we successfully parsed into n > 0 standardized rows. This is the homepage headline. |
The 294-hospital gap, explained
Between "live MRF" (3,986) and "standardized prices" (3,692) sits a coverage hole of 294 hospitals. Two failure modes:
- ▸108 hospitals parsed to 0 rows. Their MRF is downloadable but the file is empty, placeholder, or in a format the parser produced no usable rows from (e.g. an XML/PDF that didn't yield CDM / CPT / DRG / HCPCS rows). These show up in /data/prices/index.json with
n = 0. - ▸186 hospitals aren't in the price index at all yet. They have a live MRF (we can fetch the URL) but ingest has either failed (decompression, schema, encoding) or hasn't been attempted in the current build. These are queued for the next ingest pass.
We surface only n > 0 entries on the homepage because zero-row outputs aren't useful for comparison. They count toward the parse-pipeline yield, not toward the patient- facing count.
The pipeline
- Seed — start with CMS hospital list (5,426) and the CMS HPT seed.
- Discover — sitemap walker + email-domain expansion + Wayback fallback to locate each hospital's MRF URL.
- Probe — HEAD / partial GET to confirm the URL is alive and serves a non-HTML, non-XML, non-PDF body. 3,986 of 4,625 CMS-required hospitals pass this gate.
- Parse → standardize — extract CDM / CPT / DRG / HCPCS rows into a uniform schema. 3,800 hospitals have an index entry; 3,692 of those have at least one usable row.
- Publish — write per-hospital JSON to R2 + /data/prices/index.json for cross-hospital comparison.
Closing the gap
The closeout pipeline (scripts/coverage_closeout.py) replays the ingest pass against the 186 not-in-index CCNs and the 108 zero-row CCNs with auto-tuned parallelism. Two open issues:
- ▸ A few host clusters (e.g.
apps.para-hcfs.com,sthpiprd.blob.core.windows.net) account for most of the parse failures — format quirks rather than transient errors. - ▸ Zero-row outputs are mostly placeholder files with no price content (CMS technically counts them as "available" even though they teach you nothing). The law's hole, not ours.
Re-derive everything yourself
# Live summary
curl -s https://hospitalledger.com/data/summary.json | jq '{compliant, standardized_price_hospitals, standardized_price_index_hospitals, zero_price_index_entries}'
# Price index (full)
curl -s https://hospitalledger.com/data/prices/index.json | jq '.hospitals | length'
# Hospitals with n > 0
curl -s https://hospitalledger.com/data/prices/index.json | jq '[.hospitals[] | select(.n > 0)] | length'