Interactive Probability Distribution Fitting Tool
Powered by
Fit_It! is a free hobby project for educational and entertainment purposes only. It is not intended for critical decision-making. The service runs on free-tier cloud hosting with limited resources. Please use considerately - limit to 1-2 analyses per session to keep this service available for everyone.
Fit_It! by Gene is a free educational tool that brings the world of probability distributions to your browser. Powered by SciPy, FastAPI, and Plotly, it allows you to explore how different statistical distributions fit your data through an intuitive interface.
Upload your dataset and discover its probability distribution
Compare 4 fitting methods and evaluate goodness-of-fit
Interactive visualizations help understand distribution fits
Note: Fit_It! is a hobby project developed during spare time. It's provided as a free service for educational purposes only. Please use it considerately to help keep it available for everyone.
Understanding probability distributions helps in numerous domains:
"All models are wrong, but some are useful"
- George Box
Please Note: To keep this service free and available to all, we request that you limit your usage to 1-2 analyses per session and avoid automated scripts.
Value 12.5 14.3 11.7 ...
Index, Value 1, 12.5 2, 14.3 3, 11.7 ...
Upload CSV
Select Distributions
Choose Fitting Method
Review Results
Many distribution has mathematical constraints (data domain rules). Fit_It! tries to automatically check compatibility however this is not robust hence the take it with a pinch of salt:
def validate_data_domains(data): for rule in DOMAIN_RULES: context = {'np': np, 'data': data} try: mask = eval(rule["domain_check"], context) if not np.all(mask): # Flag incompatible distributions except Exception as e: # Handle evaluation error
Optimal for non-normal data with potential outliers:
Optimal for larger datasets:
def calculate_bins(data): n = len(data) if n < 500: q75, q25 = np.percentile(data, [75, 25]) iqr = q75 - q25 bin_width = 2 * iqr / (n ** (1/3)) bins = int(np.ceil((np.max(data) - np.min(data)) / bin_width) else: bins = int(np.ceil(2 * (n ** (1/3))) return max(bins, 10)
While Fit_It! works with raw data values, normalizing your data before analysis can significantly improve results for many distributions. Normalization transforms data to a common scale without distorting differences in ranges.
Best for normally distributed data
Scales to [0,1] range
Resistant to outliers
IQR (Interquartile Range) normalization is particularly useful for skewed data:
Where Q₁ is the 25th percentile and Q₃ is the 75th percentile. This method scales data to [0,1] range based on quartiles rather than min/max.
Advanced robust normalization using median and IQR:
For best results with Fit_It!, we recommend normalizing your data before uploading. While not currently automated in this version, normalization can be easily done in spreadsheet software or Python:
# Python normalization examples import numpy as np from sklearn.preprocessing import RobustScaler # Robust scaling data = np.array([...]).reshape(-1, 1) scaler = RobustScaler(quantile_range=(25, 75)) robust_normalized = scaler.fit_transform(data) # IQR-based normalization Q1 = np.percentile(data, 25) Q3 = np.percentile(data, 75) IQR = Q3 - Q1 iqr_normalized = (data - Q1) / IQR
Important: Remember to keep track of your scaling parameters (median, IQR, min/max) if you need to transform results back to original scale!
Beyond normalization, data transformation can make your data better conform to distributional assumptions. Transformations modify the shape of your distribution, often addressing skewness and making patterns more visible.
Best for right-skewed data. Compresses large values and expands small values. Requires x > 0. Add constant c if data contains zeros.
Moderate effect on right-skewed data. Less aggressive than log transform. Works for zero values with constant adjustment.
Power transformation that finds optimal λ to make data most normal-like. Requires strictly positive values.
Extension of Box-Cox that works for both positive and negative values. More flexible for real-world datasets.
import numpy as np import scipy.stats as stats from sklearn.preprocessing import PowerTransformer # Sample data with positive and negative values data = np.array([1.2, 5.7, 0.8, -2.3, 10.4, -0.5, 7.1]) # Logarithmic transformation (for positive data) log_transformed = np.log1p(data[data > 0]) # log(1+x) to handle zeros # Square root transformation (for positive data) sqrt_transformed = np.sqrt(np.abs(data)) * np.sign(data) # Box-Cox transformation (strictly positive) positive_data = data[data > 0] + 1e-6 # Add small constant if zeros exist boxcox_transformed, lambda_val = stats.boxcox(positive_data) # Yeo-Johnson transformation (handles all values) yj_transformer = PowerTransformer(method='yeo-johnson', standardize=False) yj_transformed = yj_transformer.fit_transform(data.reshape(-1, 1)) # Inverse transformation example inverse_data = yj_transformer.inverse_transform(yj_transformed)
Important: Always test transformations visually with Q-Q plots or distribution comparisons. Remember that parameters (like λ in Box-Cox) must be saved to reverse transformations later.
Data Characteristic | Recommended Transformation |
---|---|
Right-skewed, positive values | Logarithmic, Square root, Box-Cox |
Left-skewed, positive values | Exponential (x^k, k>1), Square (x^2) |
Positive and negative values | Yeo-Johnson, Signed power transformations |
Count data, Poisson-like | Square root, Anscombe (√(x + 3/8)) |
Proportions, percentages | Logit, Arcsine square root |
Fit_It! ranks distributions using the Akaike Information Criterion (AIC). The distribution with the lowest AIC score is considered the best fit. This approach balances model fit with complexity, penalizing distributions with more parameters to avoid overfitting.
Where k is the number of parameters and \(\hat{L}\) is the maximized value of the likelihood function. AIC estimates the relative information lost by a given model - the lower the AIC, the better the model balances fit and complexity.
Similar to AIC but with a stronger penalty for additional parameters. BIC introduces a sample size (n) dependent penalty term. Lower values indicate better fit, with preference for simpler models especially with larger datasets.
Measures the discrepancy between observed values (yᵢ) and values predicted by the model (ŷᵢ). Lower SSE indicates better fit, but this metric doesn't account for model complexity and can favor overparameterized models.
Measures the maximum distance between the empirical distribution function (Fₙ) and the theoretical cumulative distribution function (F). Lower values indicate better fit. KS statistic is particularly sensitive to differences in the center of the distribution.
A modification of KS that gives more weight to the tails of the distribution. This makes it more sensitive to outliers and extreme values. Lower values indicate better fit.
Measures the integrated squared difference between the empirical and theoretical CDFs. Like AD, it's sensitive to tail behavior but generally less so than AD. Lower values indicate better fit.
While Fit_It! uses AIC as the primary ranking metric, we recommend considering multiple goodness-of-fit measures:
Always combine statistical metrics with visual inspection of the probability plots - a good statistical fit should also look reasonable when plotted against your data.
Fit_It! uses a consistent color-coding system to help you track distributions across all visualizations:
// JavaScript color mapping implementation const colorMap = { "norm": "#6366f1", // Indigo "gamma": "#10b981", // Emerald "beta": "#f59e0b", // Amber "expon": "#ef4444" // Red }; function applyColorCoding(distribution) { const color = colorMap[distribution]; // Apply to plot trace Plotly.newPlot('graph', [{...trace, line: {color}}]); // Apply to documentation link document.getElementById(`doc-${distribution}`).style.color = color; // Apply to PDF formula display document.getElementById(`pdf-${distribution}`).style.borderColor = color; }
Interactive distribution plot with color-coded elements
The normal distribution is also called the "Gaussian bell curve" after Carl Friedrich Gauss, who introduced it in 1809 to analyze astronomical data. It appears in nature more often than you'd expect - from human height distributions to measurement errors!
Show/hide distributions with checkboxes
Explore details with interactive controls
Download PNG or SVG for publications
Each distribution includes a direct link to its official Scipy documentation. These links are color-coded to match the distribution's plot color for quick reference. The documentation provides:
Normal Distribution Documentation:
scipy.stats.normFeature | Fit_It! | Traditional Tools |
---|---|---|
Color consistency | All elements synchronized | Often inconsistent |
Distribution limit | Compare 7 simultaneously | Typically 1-2 distributions |
Documentation access | Direct links with color coding | Manual search required |
Contextual information | Fun facts and educational notes | Pure statistical output |
Fit_It! API is a free educational service hosted on free-tier cloud infrastructure. Please be considerate of resource constraints:
Fit_It! by Gene is built on a modern RESTful API architecture that handles all computational tasks. This design enables scalability, separation of concerns, and efficient resource management.
Upload CSV data for analysis
Perform distribution fitting
Retrieve visualization data
List available distributions
# Upload data curl -X POST https://api.fitit-tool.com/upload \ -F "file=@data.csv" # Analyze data curl -X POST https://api.fitit-tool.com/analyze \ -H "Content-Type: application/json" \ -d '{ "session_id": "a1b2c3d4", "selected_dists": ["norm", "gamma", "expon"], "fit_method": "mle" }' # Retrieve results curl -X GET https://api.fitit-tool.com/plot?session_id=a1b2c3d4
import requests # Step 1: Upload data upload_url = "https://api.fitit-tool.com/upload" files = {'file': open('data.csv', 'rb')} upload_response = requests.post(upload_url, files=files) session_id = upload_response.json()['session_id'] # Step 2: Analyze data analyze_url = "https://api.fitit-tool.com/analyze" payload = { "session_id": session_id, "selected_dists": ["norm", "beta", "weibull_min"], "fit_method": "robust_min_sse" } analysis_response = requests.post(analyze_url, json=payload) # Step 3: Retrieve visualization data plot_url = f"https://api.fitit-tool.com/plot?session_id={session_id}" plot_data = requests.get(plot_url).json() # Process results print(f"Best fit: {plot_data['best_fit']['name']}") print(f"AIC: {plot_data['best_fit']['aic']}")
Use our Postman collection to quickly test the API endpoints. Import the collection using the button below:
{ "status": "success", "session_id": "a1b2c3d4e5", "results": { "best_fit": { "name": "gamma", "params": [1.85, 0.0, 0.75], "aic": 1245.67, "bic": 1258.92 }, "distributions": [ { "name": "gamma", "params": [1.85, 0.0, 0.75], "aic": 1245.67, "bic": 1258.92, "ks_stat": 0.042, "sse": 0.0032 }, { "name": "norm", "params": [5.2, 1.8], "aic": 1298.45, "bic": 1308.21, "ks_stat": 0.087, "sse": 0.0121 } ] }, "plot_data": { "histogram": { "x": [1.2, 2.4, 3.1, ...], "y": [0.05, 0.12, 0.18, ...] }, "pdfs": [ { "name": "gamma", "x": [0.5, 0.6, 0.7, ...], "y": [0.02, 0.04, 0.07, ...], "color": "#6366f1" }, { "name": "norm", "x": [0.5, 0.6, 0.7, ...], "y": [0.01, 0.03, 0.05, ...], "color": "#10b981" } ] }, "metadata": { "fit_method": "mle", "data_points": 1024, "processing_time": 1.24 } }
Fit_It! API is provided as a free educational resource running on free-tier cloud hosting. To ensure fair access for all users:
We appreciate your considerate usage to help keep this service available to the educational community.
"Probability is the very guide of life" - Cicero
Last Updated: June 2025 | Version 1.0.0 | By Gene Boo