DataCamp @datacamp Channel on Telegram



Data Science & Machine Learning

DataCamp (English)

Welcome to DataCamp, your go-to channel for all things data science and machine learning! Whether you're a beginner looking to learn the basics or an experienced data scientist seeking to enhance your skills, DataCamp is the perfect place for you. Our channel offers a wide range of tutorials, courses, and resources to help you excel in the field of data science. From Python programming to deep learning algorithms, we cover it all. Stay updated with the latest trends, tools, and technologies in the world of data science through our informative and engaging content. Join our community of data enthusiasts and unleash your potential with DataCamp. Who is DataCamp? DataCamp is a leading online platform that provides interactive data science and machine learning courses. What is DataCamp? DataCamp offers courses designed by industry experts to help individuals learn and improve their data science skills. With DataCamp, you can gain hands-on experience and practical knowledge that will set you apart in the competitive field of data science. Don't miss out on the opportunity to advance your career and become a data science pro with DataCamp!


04 Nov, 21:34

🚨 DataCamp Free Access Week is LIVE! 🚨

Get 100% free and unlimited access to our full course library.

No catch, no card details needed. Just hit the link and explore everything DataCamp has to offer.

Sign up now πŸ‘‰


03 May, 05:13


31 Mar, 18:51

It took me 6 weeks to learn overfitting. I'll share in 6 minutes (business case study included). Let's dive in:

1. Overfitting is a common issue in machine learning and statistical modeling. It occurs when a model is too complex and captures not only the underlying pattern in the data but also the noise.

2. Key Characteristics of Overfitting: High Performance on Training Data, Poor Performance on Test Data, Overly Complex with many parameters, Sensitive to minor fluctuations in training data (not robust).

3. How to Avoid Overfitting (and Underfitting): The goal is to get a model trained to the point where it's robust (not overly sensitive) and generalizes well to new data (unseen during model training). How we do this is to balance bias and variance tradeoff. Common techniques: K-Fold Cross Validation, Regularization (penalizing features), and even simplifying the model.

4. How I learned about overfitting (business case): I was making a forecast model using linear regression. The model had dozens of features: lags, external regressors, economic features, calendar features... You name it, I included it. And the model did well (on the training data). The problem came when I put my first forecast model into production...

5. Lack of Stability (is a nice way to put it): My model went out-of-wack. The linear regression predicted demand for certain products 100X more than it's recent trends. And luckily the demand planner called me out on it before the purchase orders went into effect.

6. I learned a lot from this: Linear regression models can be highly sensitive. I switched to penalized regression (elastic net) and the model became much more stable. Luckily my organization knew I was onto something, and I was given more chances to improve.

7. The end result: We actually called the end of the Oil Recession of 2016 with my model, and workforce planning was ready to meet the increased demand. This saved us 3 months of inventory time and put us in a competitive advantage when orders began ramping up.

Estimated savings: 10% of sales x 3 months = $6,000,000.

Pretty shocking what a couple data science skills can do for a business.


24 Dec, 18:07

Which one is the best classification algorithm?

Don't forget this line:

'All models are wrong, but some models are useful.' - George Box

Here are 5 classification models to start with πŸ”½

1. Logistic Regression
LR is mainly used for binary classifications, such as 'yes' or 'no' cases.
The output is between 0 and 1, so it can be translated into a probability.
It's effective with simple problems but may struggle with complex ones.

2. Decision Trees
Tree-based models split the data into different subsets based on the input.
It's easy to visualize and follow each step and see how the model works.
They are simple and effective, but be careful with overfitting!

3. Random Forest
Random Forest builds multiple decision trees to improve accuracy.
It's great for large datasets and reduces the risk of overfitting.
Each tree in the forest has a so-called vote, and the majority vote decides the outcome.

4. Support Vector Machines (SVM)
SVM is effective for both linear and non-linear classification.
It works effectively when there is a clear margin between categories, but it also leaves some room for error.
It can be computationally expensive.

5. K-Nearest Neighbors (KNN)
KNN classifies data based on the closest neighboring points.
It may be a struggle to find the optimal K value in the model.
Yet it's simple and effective with small datasets.


14 Dec, 18:26

ROC and AUC are important concepts for evaluating classification models in business (e.g. lead scoring). In 6 minutes, I'll share what took me 60 days to figure out. Let's dive in.

1. ROC Curve: The ROC curve, which stands for Receiver Operating Characteristic curve, is a graphical representation used to evaluate the performance of a binary classifier system as its discrimination threshold is varied.

2. True Positive Rate (TPR): On the y-axis, the ROC curve plots the True Positive Rate (also known as sensitivity, or recall) which measures the proportion of actual positives that are correctly identified as such. It's calculated as TPR = TP / (TP + FN), where TP is true positives and FN is false negatives.

3. False Positive Rate (FPR): On the x-axis, the curve plots the False Positive Rate, which measures the proportion of actual negatives that are incorrectly identified as positives. It's calculated as FPR = FP / (FP + TN), where FP is false positives and TN is true negatives.

4. Thresholds: The ROC curve is created by plotting TPR against FPR at various threshold settings. A threshold in a classification algorithm is a point at which the decision is made whether a given instance belongs to a certain class.

5. Area Under the Curve (AUC): The area under the ROC curve is a measure of the effectiveness of a binary classification algorithm. An AUC of 1 represents a perfect classifier, while an AUC of 0.5 represents a worthless classifier.

6. AUC Calculation: The most common method for calculating the AUC of an ROC curve is by using the trapezoidal rule. This approach involves approximating the area under the curve by summing up the areas of trapezoids formed beneath the curve.

7. Interpretation: A curve closer to the top-left corner indicates a better performance. As the area under the ROC curve increases, the model is better at distinguishing between the positive and negative classes.


13 Dec, 18:21

Have you ever seen data that contradicts your expectations?
It could be due to Simpson's Paradox.

Trends in different groups can reverse when combined.

Let me explain this phenomenon.
Look at the first part of the image.
The correlation is clearly positive.
The problem is that this is not the whole picture.
This is just a subgroup of our population.
In the second part, we have 3 subgroups.
Individually all of them show a positive correlation.
But if we put everything together, we have a negative trend.
This is Simpson's paradox.

A trend appears in several groups of data but disappears or reverses when the groups are combined.

Here is an example:
A medical study on kidney stone removal in 1986 showed that:
A new treatment had 83% success rate.
The old treatment had 78% success rate.
One can conclude that the new treatment is better. The problem is that Simpson’s Paradox is lurking in the data.
When researchers considered kidney stone size, the result was reversed. The old treatment was better for both small and large kidney stones.
How is it possible?
Let's take a look at the table.
The new treatment was tested on many small stones - which is probably an easier procedure - with 87%.
On large stones, it performed really badly with 69%.
But the sample was smaller for large stones. Therefore, the overall success is weighted more toward 87%.
The old treatment had more samples for large stones, so the overall result was "pulled" toward this rate.
The old treatment was more successful in both cases, but the overall result was "pulled" toward the lower number due to the sample sizes.

How to avoid this paradox?

1. Consider all relevant variables (kidney size in the example).

2. Visualize the data to identify any underlying patterns.

3. Be careful when interpreting aggregate data.


08 Dec, 03:11

Machine Learning models do not predict the future.
They just find patterns in the past.


04 Oct, 19:06

Try this:


12 Mar, 09:03


16 Sep, 08:26

Regression Analysis Cheat Sheet


14 Sep, 18:59

Probability for machine learning:


26 Aug, 04:46

Difference between supervised and unsupervised learning!





