The Importance of Statistical Significance

Auditing for Compliance

Written by Joanne Byron, LPN, BS, CCA, CIFHA, CHA, COCAS, CORCM, CHCO, HPOC, OHCC, CMDP, ICDCT-CM/PCS

Information provided below is a basic overview of audit sampling used when Auditing for Compliance, specifically chart or billing audits. It is not intended as being comprehensive, legal, or consulting advice.

Introduction

A statistically significant chart audit in healthcare is a structured, randomized review of medical records designed to project findings onto an entire population of claims (the "universe") with measurable reliability.

The Office of Inspector General (OIG) states these audits be random, unbiased, and sufficiently large to be representative of the population. A common misconception is that a fixed percentage (e.g., 10%) of charts is always sufficient. The OIG does not set a fixed percentage. The sample size must be large enough to provide a reliable estimate of the universe's overpayment amount. A statistically significant chart audit, compliant with OIG guidelines, is a scientifically rigorous process.

In healthcare, audit sampling is crucial when auditing the entire population (100% of claims) is impractical due to high volume. A "statistically valid" sample differs from a simple "probe" or arbitrary sample (e.g., 10 charts) because it allows for the projection of error rates onto the larger population. A statistically valid sample is necessary for:

Provider Self-Disclosure Protocol: Submitting self-audits to the OIG.
Corporate Integrity Agreements (CIAs): Mandatory compliance for providers under investigation.
External Audits: Rebutting audits from Unified Program Integrity Contractors (UPICs) or Medicare Administrative Contractors (MACs).

Even your routine audits should be grounded as statistically significant, which is fundamental when auditing a healthcare organization for compliance. Taking this approach transforms subjective chart reviews into defensible, objective, and scalable evidence that can be used to prove compliance. government agencies.

Statistical significance provides the necessary confidence, typically 90% or higher, that findings from a small sample accurately represent the entire population, minimizing the risk of false positives. Experts often check if the auditor used an appropriate one-sided 90% confidence level, which is a common standard in these audits.

The confidence level defines how often the true population value (e.g., total overpayment) falls within the range calculated from the sample. The precision (Margin of Error) defines the range of accuracy around the point estimate (e.g., +/- $10,000). The trade-off is a higher confidence level (e.g., 99%) which usually requires a wider range of precision, or a significantly higher sample size to maintain precision.

Legal and Regulatory Defensibility

Mandatory for Extrapolation - Government contractors, such as Recovery Audit Contractors (RACs), Unified Program Integrity Contractors (UPICs) and Department of Health and Human Services (HHS) Office of Inspector General (OIG) Office of Audit Services, require statistical sampling for projecting overpayment amounts. If an audit lacks statistical significance, it cannot be legally extrapolated to the total claim population.
Rebuttal of Audit Findings - Organizations can use statistical expert testimony to challenge improper sampling methods used by auditors, as flawed sampling often leads to inflated repayment demands. Core areas challenged by experts are:
- Improper Audit Universe/Frame: Auditors may fail to define the correct population of claims, including irrelevant claims or excluding relevant, paid-in-full claims that would balance the error rate.
- Lack of Randomization/Bias: Experts look for patterns showing the sample was not truly random, such as a sample mean paid amount dramatically higher than the universe mean, indicating a biased selection.
- Failure to Account for Underpayments: A common, frequently successfully challenged error is the failure of auditors to include underpayments, which skews the audit and "significantly" overstates the overpayment.
- Imprecise Extrapolation: Even if a sample is random, it may be too small or produce a wide confidence interval (high imprecision), making the projection highly unreliable.
- Failure to Replicate: Government auditors often fail to document their work sufficiently, making it impossible to reproduce the sample or calculations.
Lower Bound Calculation - Statistical methods (like Rat-Stats) calculate the lower limit of a 90% confidence interval, ensuring that recoupment amounts are statistically defensible and conservative.
- OIG RAT-STATS is a free statistical software package created by the Office of Inspector General (OIG) that provides a "rock-solid," defensible foundation for auditing healthcare claims. It is used to generate random samples, determine sample sizes, and extrapolate error rates to entire populations. It is widely used by auditors, and often by providers in corporate integrity agreements.
- While RAT-STATS is user-friendly, it requires a thorough understanding of statistics and the software itself to use it properly and to challenge, if necessary, the findings of an audit.

Ensuring Accuracy in Large Datasets

Statistical tools calculate the minimum required sample size (often at least 30 but higher depending on variance) to ensure that the audit has enough power to detect errors without wasting resources on excessive, manual review.

It is important to mitigate potential bias. Statistical sampling prevents "judgmental sampling," where auditors might only select high-dollar or potentially erroneous claims, which would falsely inflate the error rate. To achieve this, we need to address the confidence interval.

Key Components of an Audit Confidence Interval

We strive to reduce "false positives." A 95% confidence level indicates that there is only a 5% chance that observed deviations in documentation or billing were due to random chance, rather than a systematic compliance failure. Let’s dive a little deeper into the confidence level and margins of error.

A confidence level (e.g., 95%) is the reliability of the sampling method. A 95% confidence level means that if the audit were repeated 100 times, 95 of the resulting intervals would contain the true population value. Applying a 90% confidence level is a common requirement for CMS contractors to use as a basis for extrapolation.
Precision refers to the margin of error or the width of the interval. A tighter (narrower) interval means more precise results, often requiring a larger sample size. Larger samples shrink the confidence interval, providing higher precision. A higher confidence level (e.g., 99% instead of 95%) makes the interval wider (less precise) because you are trying to be more certain.
Upper/Lower Limits are the boundaries of the interval, providing the "best-case" and "worst-case" scenario for errors. In healthcare audits, particularly those involving billing compliance, overpayment extrapolation, and quality of care, upper and lower limits define the range of plausible values for a population parameter (such as total overpayment) with a set level of confidence (typically 90% or 95%).
- Lower Limit (LL): The lowest expected value of the confidence interval. In many CMS audits, the lower limit of a one-sided 90% confidence interval is used to determine the minimum amount of overpayment to be recouped.
- Upper Limit (UL): The highest expected value of the confidence interval. It represents the worst-case scenario for error rates.
- Confidence Interval (CI): The full range between the Lower and Upper Limit. A narrower interval indicates higher precision.

Key Statistical Concepts for Auditors

Confidence Levels: The percentage of times (e.g., 90% or 95%) that the true value of an error is expected to fall within the calculated confidence interval.

Null Hypothesis (H0): The assumption that there is no meaningful difference between the audited sample and the expected (compliant) standard.

P-Value: The probability that results were produced by chance. A low p-value (typically $p<0.05$) allows the auditor to reject the null hypothesis and conclude a real, significant error pattern exists.

Randomized & Unbiased: Every claim in the universe must have an equal chance of selection.

Representative: The sample must reflect the characteristics of the entire population.

Standard Deviation: Measures the variation in the data; higher variance in claims requires a larger sample size to achieve statistical significance.

Statistically Valid & Replicable: Another auditor using the same methodology should arrive at similar results.

Universe Definition: The specific time period, the provider, and types of claims being audited (CPT code range 99212-99215 from Jan-Dec 2025).

Limitations to Consider

Not Always Meaningful: A statistically significant result (due to a large sample size) does not always mean the error is clinically or financially important.
Small Populations: When auditing small departments, high variation may lead to non-significant results, even if errors are present.
Requires Expertise: Misapplication of statistical formulas can create misleading conclusions; statistical literacy is crucial for compliance officers.

General Rules of Thumb - When full statistical calculation is not possible, industry guidelines offer the following benchmarks:

Small Populations (<100): Audit all records (100% sampling).
Large Populations: 10% of the total eligible charts, up to a maximum of 1000, is often sufficient.
Rapid Cycle Sampling: Small, consecutive samples (e.g., 5-10 charts) can be used to track changes over time in quality improvement projects and for monitoring purposes.

Conclusion

It’s all about measuring the effectiveness of your compliance program

The effectiveness of the compliance program must identify high-risk patterns. Organizations use statistical significance to track if voluntary changes to coding or billing procedures resulted in significant, measurable reductions in error rates. Taking this approach allows the organization to determine if corrective actions, such as training, efforts to correct Electronic Health Record systems, conducting pre-billing targeted audits, etc. are making the expected improvements required for compliance.

Statistical techniques allow internal auditors to identify trends in data, such as high-frequency billing of complex codes, which indicate potential risk for future external audits.

About the Author

Joanne Byron, BS, LPN, CCA, CHA, CHCO, CHBS, CHCM, CIFHA, CMDP, COCAS, CORCM, OHCC, ICDCT-CM/PCS is an educator with the American Institute of Healthcare Compliance, a Licensing/Certification non-profit partner with CMS. She shares her experience of over 40 years as a nurse, consultant, auditor, and investigator in the healthcare field.

References

AIHC