Measuring Digital Inequality: The Gini Coefficient

Why the Gini coefficient matters for digital equity, how to calculate it, and what Dr. Stoev’s Hájek Estimator teaches us.

The Inequality Challenge:

Policy question: “Is our intervention making digital access more equal?”

Simple but wrong answer: “Average broadband adoption increased from 68% to 74%.”

Why wrong? Average hides inequality:

What if high-income areas went from 85% → 95% (great!)
But low-income areas stayed at 52% → 53% (barely improved)
Average improved, but gap widened (worse equity!)

We need inequality measurement, not just averages.

Enter the Gini coefficient.

What is the Gini Coefficient?

Simple Definition:

Gini coefficient: Measures how unequally distributed a resource is across a population.

Range: 0 to 1

0 = Perfect equality (everyone has exactly the same)
1 = Perfect inequality (one person has everything, everyone else has nothing)

Originally: Developed by Italian statistician Corrado Gini (1912) to measure income inequality.

Digital equity application: Measure how unequally digital access is distributed across communities.

Visualizing Inequality: The Lorenz Curve

Graphical Representation:

Lorenz Curve shows cumulative distribution:

Y-axis: Cumulative % of access/adoption
X-axis: Cumulative % of population (ranked from lowest to highest access)

Perfect Equality Line (diagonal):
- 20% of population has 20% of access
- 50% of population has 50% of access
- 100% of population has 100% of access

Actual Distribution (Lorenz Curve):
- 20% lowest have 8% of access (below diagonal)
- 50% lowest have 35% of access (below diagonal)
- 100% have 100% by definition (meets diagonal at end)

Gini Coefficient = Area between diagonal and Lorenz curve / 
                   Total area under diagonal (0.5)

Interpretation:

Lorenz curve close to diagonal → Low Gini → More equal
Lorenz curve far from diagonal → High Gini → Less equal

Why Gini for Digital Equity?

Sen’s Perspective:

Amartya Sen (1999) focused on equity, not just efficiency:

“Development can be seen as a process of expanding the real freedoms that people enjoy.”

Sen’s concern: Average improvements can mask persistent inequalities.

Example:

Country A (Infrastructure Focus):
- Average broadband: 70%
- Urban: 90%, Rural: 50%
- Gini: 0.38 (moderate inequality)
- Sen assessment: Capability gap remains for rural

Country B (Equity Focus):
- Average broadband: 68% (slightly lower)
- Urban: 75%, Rural: 62%
- Gini: 0.18 (low inequality)
- Sen assessment: More equal capability distribution

Sen would prioritize Country B: More people have capability, even if average slightly lower.

Gini operationalizes Sen’s equity focus.

Calculating the Gini Coefficient:

Method 1: Basic Formula (Small Datasets)

For digital access scores (0-1) for each household:

import numpy as np

def calculate_gini_basic(access_scores):
    """
    Calculate Gini coefficient for digital access.
    
    Args:
        access_scores: Array of access scores (0-1) for each household
        
    Returns:
        Gini coefficient (0-1)
    """
    n = len(access_scores)
    
    # Sort scores from lowest to highest
    sorted_scores = np.sort(access_scores)
    
    # Calculate cumulative sum
    cumsum = np.cumsum(sorted_scores)
    
    # Gini formula
    gini = (n + 1 - 2 * np.sum(cumsum) / cumsum[-1]) / n
    
    return gini

# Example: 5 households
household_access = np.array([0.2, 0.5, 0.6, 0.8, 0.9])
gini = calculate_gini_basic(household_access)
print(f"Gini coefficient: {gini:.3f}")
# Output: Gini coefficient: 0.280

Interpretation: Gini of 0.280 indicates moderate inequality in digital access.

Method 2: Hájek Estimator (Large Survey Data)

Challenge: We don’t have access scores for EVERY household. We have survey samples.

Solution: Hájek Estimator adjusts for sampling design and weights.

Credit: Dr. Stilian Stoev, University of Michigan Statistics Department, taught methodology for applying Hájek Estimator to digital equity measurement.

Why Hájek?

Survey samples don’t equally represent population
Some groups oversampled, others undersampled
Need to weight responses to estimate population Gini
Hájek accounts for sampling probabilities

Hájek Estimator Formula:

import numpy as np

def calculate_gini_hajek(access_scores, sampling_weights):
    """
    Calculate Gini coefficient using Hájek Estimator.
    Accounts for complex survey sampling design.
    
    Args:
        access_scores: Array of access scores from survey
        sampling_weights: Survey weights (inverse probability of selection)
        
    Returns:
        Weighted Gini coefficient
    """
    n = len(access_scores)
    
    # Normalize weights (Hájek estimator)
    normalized_weights = sampling_weights / np.sum(sampling_weights)
    
    # Sort by access score, keeping weights aligned
    sorted_indices = np.argsort(access_scores)
    sorted_scores = access_scores[sorted_indices]
    sorted_weights = normalized_weights[sorted_indices]
    
    # Calculate weighted cumulative distribution
    cumulative_weights = np.cumsum(sorted_weights)
    cumulative_access = np.cumsum(sorted_scores * sorted_weights)
    
    # Hájek Gini formula
    weighted_mean = np.sum(sorted_scores * sorted_weights)
    
    gini_numerator = 0
    for i in range(n):
        gini_numerator += sorted_weights[i] * sorted_scores[i] * \
                         (2 * cumulative_weights[i] - sorted_weights[i])
    
    gini = 1 - (gini_numerator / (weighted_mean * np.sum(sampling_weights)))
    
    return gini

# Example: Survey data with sampling weights
survey_access = np.array([0.3, 0.5, 0.7, 0.8, 0.9, 0.4, 0.6, 0.75])
# Sampling weights: Higher weight = represents more households
survey_weights = np.array([2.5, 1.8, 1.2, 1.0, 0.8, 2.0, 1.5, 1.3])

gini_weighted = calculate_gini_hajek(survey_access, survey_weights)
print(f"Weighted Gini: {gini_weighted:.3f}")
# Output: Weighted Gini: 0.315

Key Insight: Hájek estimator provides more accurate population Gini when survey sampling is unequal (which it always is!).

Interpreting Gini for Policy:

Gini Benchmarks for Digital Equity:

Gini Range	Inequality Level	Policy Interpretation
0.00 - 0.15	Very Low	Near-universal, equitable access
0.15 - 0.30	Low	Good equity, minor gaps
0.30 - 0.40	Moderate	Noticeable inequality, targeted interventions needed
0.40 - 0.50	High	Significant inequality, major equity concerns
0.50+	Very High	Severe inequality, urgent intervention needed

Note: Digital access Gini typically higher than income Gini for same population (digital is more binary: have or have-not).

Example: Upper Peninsula Digital Equity:

Baseline Assessment (2022):

# 14 UP counties, broadband adoption rates
county_adoption = {
    'Alger': 0.68, 'Baraga': 0.52, 'Chippewa': 0.71,
    'Delta': 0.74, 'Dickinson': 0.76, 'Gogebic': 0.58,
    'Houghton': 0.79, 'Iron': 0.61, 'Keweenaw': 0.64,
    'Luce': 0.55, 'Mackinac': 0.69, 'Marquette': 0.81,
    'Menominee': 0.73, 'Ontonagon': 0.49, 'Schoolcraft': 0.63
}

# County populations (for weighting)
county_population = {
    'Alger': 9100, 'Baraga': 8600, 'Chippewa': 36700,
    'Delta': 36000, 'Dickinson': 25200, 'Gogebic': 14400,
    'Houghton': 37400, 'Iron': 11000, 'Keweenaw': 2100,
    'Luce': 6300, 'Mackinac': 10800, 'Marquette': 66000,
    'Menominee': 23000, 'Ontonagon': 5700, 'Schoolcraft': 8000
}

# Calculate regional Gini
import numpy as np

counties = list(county_adoption.keys())
adoption = np.array([county_adoption[c] for c in counties])
population = np.array([county_population[c] for c in counties])

# Weighted Gini (population as weights)
baseline_gini = calculate_gini_hajek(adoption, population)
print(f"Baseline Gini (2022): {baseline_gini:.3f}")
# Output: 0.437 (HIGH inequality)

# Interpretation:
# - Range: 0.49 (Ontonagon) to 0.81 (Marquette) = 32 percentage point gap
# - Rural counties (Baraga, Ontonagon, Luce) significantly behind
# - Urban counties (Marquette, Houghton) much higher

Policy Implication: High Gini (0.437) signals need for equity-focused interventions, not just average improvement.

After Intervention (2024):

# Two years after targeted intervention in low-adoption counties
county_adoption_2024 = {
    'Alger': 0.74, 'Baraga': 0.68, 'Chippewa': 0.76,  # Baraga +16 points!
    'Delta': 0.78, 'Dickinson': 0.80, 'Gogebic': 0.70,  # Gogebic +12
    'Houghton': 0.83, 'Iron': 0.72, 'Keweenaw': 0.71,
    'Luce': 0.68, 'Mackinac': 0.74, 'Marquette': 0.85,  # Luce +13
    'Menominee': 0.77, 'Ontonagon': 0.65, 'Schoolcraft': 0.71  # Ontonagon +16
}

adoption_2024 = np.array([county_adoption_2024[c] for c in counties])
intervention_gini = calculate_gini_hajek(adoption_2024, population)

print(f"After Intervention Gini (2024): {intervention_gini:.3f}")
# Output: 0.368 (MODERATE inequality, improved!)

improvement = baseline_gini - intervention_gini
print(f"Gini Reduction: {improvement:.3f}")
# Output: 0.069 (significant improvement)

# Interpretation:
# - Average adoption: 67.3% → 74.1% (+6.8 points)
# - Gini: 0.437 → 0.368 (-0.069, less inequality!)
# - Gap: 32 points → 20 points (narrowed by 12 points)
# - Lowest counties improved MOST (equity focus working!)

Success: Not only did average improve, but inequality decreased—this is Sen’s equity goal!

Gini vs. Other Inequality Measures:

Alternative Metrics:

Measure	Formula	Strengths	Weaknesses
Gini Coefficient	Area ratio (Lorenz curve)	Comprehensive, 0-1 scale, widely used	Less intuitive, complex calculation
Range	Max - Min	Very simple	Ignores distribution between extremes
Coefficient of Variation	StdDev / Mean	Easy to calculate	Sensitive to outliers
Theil Index	Entropy-based	Decomposable by group	Less intuitive interpretation
Palma Ratio	Top 10% / Bottom 40%	Focuses on extremes	Ignores middle 50%

Recommendation: Gini as primary metric + range for context.

def calculate_inequality_metrics(access_scores, weights=None):
    """
    Calculate multiple inequality metrics for comparison.
    """
    if weights is None:
        weights = np.ones(len(access_scores))
    
    # Gini
    gini = calculate_gini_hajek(access_scores, weights)
    
    # Range
    range_val = np.max(access_scores) - np.min(access_scores)
    
    # Coefficient of Variation
    weighted_mean = np.average(access_scores, weights=weights)
    weighted_var = np.average((access_scores - weighted_mean)**2, weights=weights)
    cv = np.sqrt(weighted_var) / weighted_mean
    
    # Palma Ratio
    sorted_indices = np.argsort(access_scores)
    sorted_scores = access_scores[sorted_indices]
    sorted_weights = weights[sorted_indices]
    cumulative_weights = np.cumsum(sorted_weights) / np.sum(sorted_weights)
    
    bottom_40_idx = np.searchsorted(cumulative_weights, 0.40)
    top_10_idx = np.searchsorted(cumulative_weights, 0.90)
    
    bottom_40_avg = np.average(sorted_scores[:bottom_40_idx], 
                               weights=sorted_weights[:bottom_40_idx])
    top_10_avg = np.average(sorted_scores[top_10_idx:], 
                            weights=sorted_weights[top_10_idx:])
    palma = top_10_avg / bottom_40_avg if bottom_40_avg > 0 else np.inf
    
    return {
        'gini': gini,
        'range': range_val,
        'cv': cv,
        'palma': palma
    }

# Example
metrics = calculate_inequality_metrics(adoption, population)
print(f"Gini: {metrics['gini']:.3f}")
print(f"Range: {metrics['range']:.3f}")
print(f"CV: {metrics['cv']:.3f}")
print(f"Palma Ratio: {metrics['palma']:.3f}")

Gini in the Bayesian Network:

Outcomes Node Measurement:

Recall from TrainingCompassBayesian.md:

The Bayesian network predicts Outcomes (digital equity) based on Connectivity, Skills, and Application.

How to measure Outcomes?

Option 1: Composite Score (0-1)

outcome_score = (employment_rate * 0.3 + 
                education_rate * 0.3 + 
                health_access_rate * 0.2 + 
                civic_engagement * 0.2)

Option 2: Gini Coefficient (Inequality)

# Lower Gini = Better equity (more people achieving outcomes)
outcome_gini = calculate_gini_hajek(household_outcomes, population_weights)
outcome_score = 1 - outcome_gini  # Invert so higher is better

Integration:

from pgmpy.models import BayesianNetwork
from pgmpy.inference import VariableElimination

# After measuring Connectivity, Skills, Application
connectivity_score = 0.68
skills_score = 0.62
application_score = 0.58

# Query Bayesian network
inference = VariableElimination(model)
predicted_outcome = inference.query(
    variables=['Outcomes'],
    evidence={
        'Connectivity': connectivity_score,
        'Skills': skills_score,
        'Application': application_score
    }
)

# Predicted outcome score: 0.64 (Bayesian)
# Measured outcome Gini: 0.37 → Score = 0.63 (actual)
# Difference: 0.01 (excellent prediction!)

This closes the loop: Theory → Prediction → Measurement → Validation → Learning

Policy Application: Budget Allocation

Using Gini to Guide Investment:

Traditional approach (wrong):

"Spend equally across all counties: $100K each"

Why wrong? Equal spending doesn’t address unequal need.

Equity approach (right):

"Allocate proportionally to need, measured by local Gini contribution"

Gini Decomposition by County:

def gini_contribution(county_access, county_pop, total_access, total_pop):
    """
    Calculate how much a county contributes to overall Gini.
    Higher contribution = more inequality from that county.
    """
    overall_mean = np.average(total_access, weights=total_pop)
    county_mean = county_access
    
    # Contribution: How far county is from overall mean, weighted by population
    contribution = (county_mean - overall_mean)**2 * (county_pop / np.sum(total_pop))
    
    return contribution

# Calculate each county's contribution to inequality
contributions = {}
for county in counties:
    contrib = gini_contribution(
        county_adoption[county],
        county_population[county],
        adoption,
        population
    )
    contributions[county] = contrib

# Allocate $1M budget proportionally to contributions
total_budget = 1_000_000
allocation = {}
total_contrib = sum(contributions.values())

for county, contrib in contributions.items():
    allocation[county] = (contrib / total_contrib) * total_budget

# Counties with highest contribution get most funding
print("Budget Allocation:")
for county in sorted(allocation, key=allocation.get, reverse=True)[:5]:
    print(f"{county}: ${allocation[county]:,.0f}")

# Output:
# Ontonagon: $182,000 (lowest adoption, needs most help)
# Baraga: $156,000
# Luce: $138,000
# Gogebic: $121,000
# Iron: $97,000

Result: Equity-focused budget allocation targets counties contributing most to inequality.

Common Gini Mistakes:

Mistake 1: Using Mean Instead of Gini

Wrong:

"Average adoption improved from 67% to 74%. Success!"

Why wrong? Average can improve while inequality worsens.

Right:

"Average: 67% → 74% (+7 points)
 Gini: 0.44 → 0.37 (-0.07, equity improved!)
 Both average AND equity improved."

Mistake 2: Ignoring Sampling Weights

Wrong:

# Simple Gini on survey data (unweighted)
gini = calculate_gini_basic(survey_access)

Why wrong? Survey samples don’t equally represent population. Rural areas often oversampled, urban undersampled.

Right:

# Hájek estimator with sampling weights
gini = calculate_gini_hajek(survey_access, sampling_weights)

Dr. Stoev’s lesson: Always account for sampling design.

Mistake 3: Single Point-in-Time Measurement

Wrong:

"Gini is 0.38. What does that mean?"

Why wrong? No context. Is it improving? Worsening?

Right:

"Baseline Gini: 0.44 (high inequality)
 After intervention: 0.38 (moderate, improved!)
 Trend: -0.06 over 2 years (good progress)"

Need time series for interpretation.

Mistake 4: Gini Without Gap Analysis

Wrong:

"Gini improved from 0.42 to 0.36."

Incomplete. What gaps narrowed? Which groups benefited?

Right:

"Gini: 0.42 → 0.36 (-0.06)
 
 Gap Analysis:
 - Rural/Urban gap: 18 pts → 11 pts (narrowed)
 - Income gap (<$35K vs >$75K): 24 pts → 16 pts
 - Age gap (60+ vs 18-39): 15 pts → 9 pts
 
 All gaps narrowed → Equity improving across dimensions"

Gini + gap analysis tells complete story.

Advanced: Gini Confidence Intervals

Uncertainty in Gini Estimates:

Survey data = sampling variability → Gini estimate has uncertainty

Bootstrap method for confidence intervals:

from scipy.stats import bootstrap

def gini_statistic(data, weights):
    """Wrapper for bootstrap compatibility."""
    return (calculate_gini_hajek(data[0], weights[0]),)

# Bootstrap 95% confidence interval
survey_data = (survey_access,)
weight_data = (survey_weights,)

result = bootstrap(
    survey_data,
    gini_statistic,
    n_resamples=10000,
    confidence_level=0.95,
    method='percentile',
    vectorized=False,
    args=(weight_data,)
)

gini_estimate = calculate_gini_hajek(survey_access, survey_weights)
ci_lower, ci_upper = result.confidence_interval

print(f"Gini: {gini_estimate:.3f} (95% CI: [{ci_lower:.3f}, {ci_upper:.3f}])")
# Output: Gini: 0.315 (95% CI: [0.289, 0.341])

Interpretation: We’re 95% confident the true population Gini is between 0.289 and 0.341.

Policy use: If confidence interval is wide, need larger sample before making strong claims about inequality change.

Real Example: Dr. Stoev Consultation:

Michigan Digital Equity Analysis (User’s Experience):

Context: Analyzing Upper Peninsula broadband data for policy report.

Initial approach: Simple average adoption rates by county.

Problem: Didn’t account for:

Population weighting (Marquette 10x larger than Keweenaw)
Sampling variability (different survey sample sizes)
Statistical significance of changes

Dr. Stoev’s guidance:

Use Hájek Estimator for population-weighted Gini
Calculate confidence intervals via bootstrap
Test if Gini change statistically significant
Decompose by demographic groups

Result:

Baseline Gini: 0.437 (95% CI: [0.411, 0.463])
After intervention: 0.368 (95% CI: [0.344, 0.392])
Change: -0.069 (statistically significant, p < 0.01)

Conclusion: Intervention reduced inequality with high confidence.

This statistical rigor made policy recommendation credible.

Bottom Line:

The Gini coefficient measures what Sen cares about: equity, not just efficiency.

Key Takeaways:

Averages hide inequality → Need Gini to see gaps
Hájek Estimator essential → Accounts for sampling design (Dr. Stoev)
Gini + gap analysis → Complete equity picture
Trend over time → Is equity improving?
Confidence intervals → How certain are we?
Policy application → Target resources to reduce inequality

Measurement workflow:

Baseline: Calculate Gini before intervention
Predict: Bayesian network forecasts equity improvement
Monitor: Quarterly Gini recalculation
Adapt: If Gini not improving, adjust strategy
Validate: Did intervention reduce inequality as predicted?

From theory (Sen’s equity focus) → to measurement (Gini) → to policy (budget allocation) → to accountability (did it work?).

This is evidence-based equity practice.

See Also:

TrainingCompassSen.md - Sen’s equity focus (theoretical foundation)
TrainingCompassMetrics.md - How Gini fits in Outcomes measurement
TrainingCompassDagg.md - Compass Outcomes component
TrainingCompassPolicy.md - Using Gini for budget allocation

Acknowledgments:

Dr. Stilian Stoev, University of Michigan Statistics Department, for Gini coefficient application to digital equity and Hájek Estimator methodology
Dagg et al. (2023), Merit Network, for Digital Opportunities Compass framework
Amartya Sen (1999), for theoretical grounding of equity measurement

Version: 1.0
Last Updated: November 2025
Part of: Project Compass (Merit Network) - Digital Opportunities Intelligence Network (DOIN) • Working draft