aaa

10 Regression Estimators That Don’t Collapse When Outliers Slip Into Production Data (link)
Understanding Gradient Descent (link)
Marketing Data Science with Joe Domaleski (link)
Dual SCD2, Snapshots, and PIT Tables: A Technical Deep-Dive for Data Engineers (link)
Top 25 Machine Learning Interview Questions (link)
Decoding Dendrograms: A Comprehensive Guide (link)
Detecting Peaks and Valleys: Learn The Essentials for Accurate Analysis. (link)
Linear Regression Explained: From Theory to Real-World Implementation (link)
Linear Regression for Humans: Predicting the Future in Plain English (link)
How Google and Stanford made AI more Interpretable with a 20 year old Technique (link)
Visual Intro to Machine Learning (link)
Linear Algebra Concepts Every Data Scientist Should Know (link)
Table Transformer (TATR) (link)
Correlation vs. Regression: A Key Difference That Many Analysts Miss (link)
A New Coefficient of Correlation (link)
Frustration: One Year With R (link)
Precision & Recall (link)
An overview of time-aware cross-validation techniques (link)
Unsupervised Learning: What, Why, and Where? (link)
Does Isolation Forest really perform well in its task? (link)
EDA(Exploratory Data Analysis) On Haberman’s Cancer Survival Dataset (link)
Data Pre-processing in Python using Scikit-learn - Heart Disease Kaggle (link)
Matthews correlation coefficient - Tweet Raschka (link)
Supercharge Your Machine Learning Experiments with PyCaret and Gradio (link)
Feature Selection — Exhaustive Overview (link)
Introduction to Parallel Processing in Machine Learning using Dask (link)
Scikit-Learn: A silver bullet for basic machine learning (link)
Clustering using PyCaret!!! (link)
Data scientist’s guide to efficient coding in Python (link)
Applied Machine Learning: Part 1 (link)
How to avoid machine learning pitfalls: a guide for academic researchers (link)
A data science project - Analysis of Berlin rental prices (link)
The Normal Distribution Simplified (link)
Visualizing Statistics with Python — Telling Stories with Matplot (link)
Analyzing the eigenvalues of a covariance matrix to identify multicollinearity (link)
Gradient Descent for Machine Learning (link)
PM4PY - Process Mining in Python - Fraunhofer (link)
26 Datasets For Your Data Science Projects (link)
Practical Machine Learning Tutorial: Part.1 (Exploratory Data Analysis) (link)
Data-Driven Artificial Intelligence (AI) for Churn Reduction (link)
Feature Transformation for Machine Learning, a Beginners Guide (link)
A Reference Notebook for 30+ Statistical Charts in Seaborn (link)
Multicollinearity — How does it create a problem? (link)
MAE, MSE, RMSE, Coefficient of Determination, Adjusted R Squared — Which Metric is Better? (link)
Essential Math for Data Science: Information Theory (link)
Bulldozer Prices Prediction (link)
3 must-have projects for your data science portfolio (link)
Understand Bayes’ Theorem Through Visualization (link)
A Complete Exploratory Data Analysis with Python (link)
What’s in the Black Box? (link)
How to peek inside a black box model — Understand Partial Dependence Plot (link)
Pitfalls To Avoid while Interpreting Machine Learning-PDP/ICE case (link)
Building 10 Regression Models in Machine Learning with Python (link)
First neural network for beginners explained (with code) (link)
Data Pre-Processing in Machine Learning with Python and Jupyter (link)
Building +10 Classifier Models in Machine Learning (link)
A field guide to the most popular parameters (link)
Customer Segmentation Analysis with Python (link)
Data Preparation and Data Binning (link)
Pipelines: Automated machine learning with HyperParameter Tuning! (link)
Correlation in Statistics (link)
Normal distribution (link)
Hierarchical Clustering: It’s just the order of clusters! (link)
Understanding AUC - ROC Curve (link)
Ridge Regression for Better Usage (link)
Data Pre-Processing in Machine Learning with Python+Notebook (link)
Entropy is a measure of uncertainty (link)
Support Vector Machine (link)
Multi-Dimensional Data (PCA) — boon or bane? (link)
Intuitions on L1 and L2 Regularisation (link)
Top Five Methods to Identify Outliers in Data (link)
Bengaluru House Price Prediction (link)
Bayes’ Rule Applied (link)
Starbucks offers: Advanced customer segmentation with Python (link)
How to Not Misunderstand Correlation (link)
Logistic Regression — Detailed Overview (link)
Introduction to Markov chains (link)
Scaling vs. Normalizing Data (link)
Chi-Square Test for Feature Selection in Machine learning (link)
Handling imbalanced datasets in machine learning (link)
Better Heatmaps and Correlation Matrix Plots in Python (link)
Logistic Regression Model Tuning with scikit-learn — Part 1 (link)
Building a Logistic Regression in Python (link)
Introduction to Bayesian Linear Regression (link)
Understanding Boxplots (link)
Patterns, Predictions, and Actions - Buch (link)
Gradient Descent in Python (link)
17 types of similarity and dissimilarity measures used in data science (link)
Linear Regression using Gradient Descent (link)
Histograms and Density Plots in Python (link)
The Mathematics Behind Principal Component Analysis (link)
Fundamental Techniques of Feature Engineering for Machine Learning (link)
PCA using Python (scikit-learn) (link)
Machine Learning Basics with the K-Nearest Neighbors Algorithm (link)
Feature Selection with sklearn and Pandas (link)
How to Estimate the Bias and Variance with Python (link)
Comet - Supercharge Machine Learning (link)
Numerical Optimization: Understanding L-BFGS (link)
MLPerf (link)
Kaggle - Use Data from differnt Kernels (link)
Regular Expressions for Data Scientists (link)
Python Machine Learning (2nd Ed.) Code Repository (link)
Learning Math for Machine Learning (link)
Is R-squared Useless? (link)
Google Machine Learning Guides (link)
Machine Learning cheatsheets (link)
A Comprehensive Guide to Gradient Descent (link)
What’s the trade-off between Bias and Variance? (link)
Top 5 Machine Learning Algorithms Explained (link)
Encoding Categorical Variables in Machine Learning Dataset (link)
17 Clustering Algorithms Used In Data Science and Mining (link)
Mathematics Ressources For ML (link)
LDA vs. PCA (link)
How to do matrix derivatives (link)
The Clustering Algorithm with Geolocation data (link)
The Poisson Distribution (link)
9 Deadly Sins of Dataset Selection in ML (link)
Fraud detection — Unsupervised Anomaly Detection (link)
There is no classification — here’s why (link)
What Is Your Model Hiding? A Tutorial on Evaluating ML Models (link)
A Feature Selection Tool for Machine Learning in Python (link)
How to Remove Outliers for Machine Learning? (link)
Predicting House Prices in Ames, IA (link)
Using Random Forests to predict Housing Prices (link)
House Price Prediction using FastAI (link)
Customer Segmentation Using K Means Clustering (link)
Clustergam: visualisation of cluster analysis (link)
Bayes’ Theorem Unbound (link)

Bayesian Inference

Bayesian Inference: The Engine Behind Probabilistic Reasoning in Data Science (link)
Bayesian Inference - Wolfram (link)
An Introduction to Bayesian Thinking - R (link)

CRF

Performing Sequence Labelling using CRF in Python (link)
sklearn-crfsuite (link)
CRFsuite - Documentation (link)
Overview of Conditional Random Fields (link)
Conditional Random Fields for Sequence Prediction (link)
Getting started with Conditional Random Fields (link)
Introduction to Conditional Random Fields (link)

Curse of Dimensionality

What Is the Curse of Dimensionality? (link)
Curse of Dimensionality — A “Curse” to Machine Learning (link) Curse of Dimensionality - Notebook (link)
What is the Curse of Dimensionality? Simplest Explanation! (link)
Curse of Dimensionality (link)
Curse of Dimensionality - notebook (link)
The Curse of Dimensionality – Illustrated With Matplotlib (link)
The Curse of Dimensionality (part 1) (link)
Top 40 Curse of Dimensionality Interview Questions (link)

Embeddings

Vector Embeddings Explained for Developers! (link)
Explained: Tokens and Embeddings in LLMs (link)
Vector Embeddings 101: The New Building Blocks for Generative AI (link)
Meet AI’s multitool: Vector embeddings (link)
New and improved embedding model (link)
openai - embeddings (link)
Jurafski-Buch Kap 6 (link)
Jurafski-Buch Kap 6 - Folien (link)
A Guide on Word Embeddings in NLP (link)
The Beginner’s Guide to Text Embeddings (link)

explained.ai

home (link)
The Mechanics of Machine Learning (link)
rent.csv (link)

Information Extraction

Twenty-five years of information extraction (link)

Metrics

Similarity Metrics in Vector Databases (link)
Distance Metrics in Vector Search (link)
9 Distance Measures in Data Science (link)
Euclidean vs. Cosine Distance (link)
Cosine Similarity Vs Euclidean Distance (link)
When to use Cosine Similarity over Euclidean Similarity? (link)
Understanding Distance Metrics in Vector Embeddings: Cosine Similarity, Euclidean Distance, and Dot Product (link)
Understanding Vector Similarity for Machine Learning (link)
How the dot product measures similarity (link)
Similarity Measures: Check Your Understanding (link)

outlier detection

Outlier Detection — 3 Effective Methods Every Data Scientist Should Know (link)

projects

Mapping Healthcare Access in Chicago (link)
Lessons from the Titanic Kaggle Dataset (Part 1): Aggresive Data Cleaning Isn’t Always Improve Model Accuracy (link)
Lessons from the Titanic Kaggle Dataset (Part 2): Which Features Matter Most in Predicting Survival? (link)
Diabetes Prediction Using Machine Learning Classification Approaches: A Capstone Project by Team Nabhan (link)

probability

Understanding Probability Distribution (link)
Probability distributions — A deeper look (link)
Probability concepts explained: Maximum likelihood estimation (link)
Transforming Scores Into Probability (link)
Probability vs Likelihood (link)
Random Variables & Probability Distributions explained (link)
Probability Theory for Machine Learning: A Beginner’s Tutorial (link)
Bayesian Statistics: An introductory course (link)