Machine Learning And AI @geekycod Channel on Telegram

Machine Learning And AI

@geekycod


Hi All and Welcome Join our channel for Jobs,latest Programming Blogs, machine learning blogs.
In case any doubt regarding ML/Data Science please reach out to me @ved1104 subscribe my channel
https://youtube.com/@geekycodesin?si=JzJo3WS5E_VFmD1k

Machine Learning And AI (English)

Are you passionate about the world of Machine Learning and Artificial Intelligence? Look no further, as our Telegram channel 'Machine Learning And AI' is the perfect place for you to stay updated with the latest trends and developments in these cutting-edge technologies. Run by the username '@geekycod', this channel offers a wealth of resources including job opportunities, programming blogs, and insightful machine learning articles.

Whether you are a seasoned professional in the field or just starting out, 'Machine Learning And AI' is a valuable resource to broaden your knowledge and network with like-minded individuals. Need help with a specific topic in ML or Data Science? Simply reach out to @ved1104 who is always ready to assist and guide you through any challenges you may face.

Additionally, don't forget to subscribe to the YouTube channel '@geekycodesin' for even more in-depth tutorials and discussions on Machine Learning and AI. Stay ahead of the curve and join our channel today to immerse yourself in the exciting world of artificial intelligence and machine learning!

Machine Learning And AI

09 Jan, 11:21


๐Ÿ’ก ๐—›๐—ผ๐˜„ ๐—ฑ๐—ผ๐—ฒ๐˜€ ๐—ฎ๐—ป ๐—Ÿ๐—Ÿ๐—  (๐—น๐—ฎ๐—ฟ๐—ด๐—ฒ ๐—น๐—ฎ๐—ป๐—ด๐˜‚๐—ฎ๐—ด๐—ฒ ๐—บ๐—ผ๐—ฑ๐—ฒ๐—น) ๐—ฎ๐—ฐ๐˜๐˜‚๐—ฎ๐—น๐—น๐˜† ๐—น๐—ฒ๐—ฎ๐—ฟ๐—ป?
Itโ€™s a journey through 3 key phases:

1๏ธโƒฃ ๐—ฆ๐—ฒ๐—น๐—ณ-๐—ฆ๐˜‚๐—ฝ๐—ฒ๐—ฟ๐˜ƒ๐—ถ๐˜€๐—ฒ๐—ฑ ๐—Ÿ๐—ฒ๐—ฎ๐—ฟ๐—ป๐—ถ๐—ป๐—ด (๐—จ๐—ป๐—ฑ๐—ฒ๐—ฟ๐˜€๐˜๐—ฎ๐—ป๐—ฑ๐—ถ๐—ป๐—ด ๐—Ÿ๐—ฎ๐—ป๐—ด๐˜‚๐—ฎ๐—ด๐—ฒ)
The model is trained on massive text datasets (Wikipedia, blogs, websites). This is where the transformer architecture comes into picture which you can simply think of it as neural networks that sees words and predicts what comes next.
For example:
โ€œA flash flood watch will be in effect all _____.โ€
The model ranks possible answers like โ€œnight,โ€ โ€œday,โ€ or even โ€œgiraffe.โ€ Over time, it gets really good at picking the right one.

2๏ธโƒฃ ๐—ฆ๐˜‚๐—ฝ๐—ฒ๐—ฟ๐˜ƒ๐—ถ๐˜€๐—ฒ๐—ฑ ๐—Ÿ๐—ฒ๐—ฎ๐—ฟ๐—ป๐—ถ๐—ป๐—ด (๐—จ๐—ป๐—ฑ๐—ฒ๐—ฟ๐˜€๐˜๐—ฎ๐—ป๐—ฑ๐—ถ๐—ป๐—ด ๐—œ๐—ป๐˜€๐˜๐—ฟ๐˜‚๐—ฐ๐˜๐—ถ๐—ผ๐—ป๐˜€)
Next, we teach it how humans like their answers. Thousands of examples of questions and well-crafted responses are fed to the model. This step is smaller but crucial, itโ€™s where the model learns to align with human intent.

3๏ธโƒฃ ๐—ฅ๐—ฒ๐—ถ๐—ป๐—ณ๐—ผ๐—ฟ๐—ฐ๐—ฒ๐—บ๐—ฒ๐—ป๐˜ ๐—Ÿ๐—ฒ๐—ฎ๐—ฟ๐—ป๐—ถ๐—ป๐—ด (๐—œ๐—บ๐—ฝ๐—ฟ๐—ผ๐˜ƒ๐—ถ๐—ป๐—ด ๐—•๐—ฒ๐—ต๐—ฎ๐˜ƒ๐—ถ๐—ผ๐—ฟ)
Finally, the model learns to improve its behavior based on feedback. Humans rate its answers (thumbs up or thumbs down), and the model adjusts.
This helps it avoid harmful or wrong answers and focus on being helpful, honest, and safe.

Through this process, the model learns patterns and relationships in language, which are stored as numerical weights. These weights are then compressed into the parameter file, the core of what makes the model function.

โš™๏ธ So what happens when you ask a question?
The model breaks your question into tokens (small pieces of text, turned into numbers). It processes these numbers through its neural networks and predicts the most likely response.

For example:
โ€œWhat should I eat today?โ€ might turn into numbers like [123, 11, 45, 78], which the model uses to calculate the next best words to give you the answer.

โ—๏ธBut hereโ€™s something important: every model has a token limit -> a maximum number of tokens it can handle at once. This can vary between small and larger models. Once it reaches that limit, it forgets the earlier context and focuses only on the most recent tokens.

Finally, you can imagine an LLM as just two files:

โžก๏ธ ๐—ฃ๐—ฎ๐—ฟ๐—ฎ๐—บ๐—ฒ๐˜๐—ฒ๐—ฟ ๐—ณ๐—ถ๐—น๐—ฒ โ€“ This is the big file, where all the knowledge lives. Think of it like a giant zip file containing everything the model has learned about language.

โžก๏ธ ๐—ฅ๐˜‚๐—ป ๐—ณ๐—ถ๐—น๐—ฒ โ€“ This is the set of instructions needed to use the parameter file. It defines the modelโ€™s architecture, handles text tokenization, and manages how the model generates outputs.

Thatโ€™s a very simple way to break down how LLMs work!
These models are the backbone of AI agents, so lets not forget about them ๐Ÿ˜‰

Machine Learning And AI

09 Jan, 10:34


โœ…Count Rows:
SELECT COUNT(*) FROM source_table;
SELECT COUNT(*) FROM target_table;
๐Ÿ”นVerify if the number of rows in the source and target tables match
โœ…Check Duplicates:
SELECT column_name, COUNT(*)
FROM table_name
GROUP BY column_name
HAVING COUNT(*) > 1;
๐Ÿ”นIdentify duplicate records in a table
โœ…Compare Data Between Source and Target:
SELECT *
FROM source_table
MINUS
SELECT *
FROM target_table;
๐Ÿ”นCheck for discrepancies between the source and target tables
โœ…Validate Data Transformation:
SELECT column_name,
CASE
WHEN condition THEN 'Valid'
ELSE 'Invalid'
END AS validation_status
FROM table_name;
2. Performance and Integrity Checks
โœ…Check for Null Values:
SELECT *
FROM table_name
WHERE column_name IS NULL;
โœ…Data Type Validation:
SELECT column_name,
CASE
WHEN column_name LIKE '%[^0-9]%' THEN 'Invalid'
ELSE 'Valid'
END AS data_status
FROM table_name;
โœ…Primary Key Uniqueness:
SELECT primary_key_column, COUNT(*)
FROM table_name
GROUP BY primary_key_column
HAVING COUNT(*) > 1;
3. Data Aggregation and Summarization:
โœ…Aggregate Functions:
SELECT SUM(column_name), AVG(column_name), MAX(column_name), MIN(column_name)
FROM table_name;
โœ…Group Data:
SELECT column_name, COUNT(*)
FROM table_name
GROUP BY column_name;
4. Data Sampling and Extracting Subsets
โœ…Retrieve Top Records:
SELECT *
FROM table_name
WHERE ROWNUM <= 10; -- Oracle
SELECT *
FROM table_name
LIMIT 10; -- MySQL, PostgreSQL
โœ…Fetch Specific Columns:
SELECT column1, column2
FROM table_name;
5. Joins for Data Comparison
โœ…Inner Join:
SELECT a.column1, b.column2
FROM table_a a
INNER JOIN table_b b
ON a.common_column = b.common_column;
โœ…Left Join:
SELECT a.column1, b.column2
FROM table_a a
LEFT JOIN table_b b
ON a.common_column = b.common_column;
โœ…Full Outer Join:
SELECT a.column1, b.column2
FROM table_a a
FULL OUTER JOIN table_b b
ON a.common_column = b.common_column;
6. Data Cleaning
โœ…Remove Duplicates:
DELETE FROM table_name
WHERE rowid NOT IN (
SELECT MIN(rowid)
FROM table_name
GROUP BY column_name);
โœ…Update Data:
UPDATE table_name
SET column_name = new_value
WHERE condition;
7. ETL-Specific Validations
โœ…ETL Data Loading Validation:
SELECT COUNT(*)
FROM target_table
WHERE load_date = SYSDATE; -- For today's loaded data
โœ…Compare Aggregates Between Source and Target:
SELECT SUM(amount) AS source_sum
FROM source_table;
SELECT SUM(amount) AS target_sum
FROM target_table;
โœ…Check for Missing Records:
SELECT source_key
FROM source_table
WHERE source_key NOT IN (
SELECT target_key
FROM target_table);
8. Metadata Queries
โœ…View Table Structure:
DESCRIBE table_name; -- Oracle
SHOW COLUMNS FROM table_name; -- MySQL
โœ…View Indexes:
SELECT *
FROM user_indexes
WHERE table_name = 'TABLE_NAME
โœ…View Constraints:
SELECT constraint_name, constraint_type
FROM user_constraints
WHERE table_name = 'TABLE_NAME'; -- Oracle

Machine Learning And AI

08 Jan, 21:15


https://geekycodes.in/2025/01/09/amazon-data-science-interview-question/

Machine Learning And AI

27 Dec, 22:40


I struggled with Data Science interviews until...

I followed this roadmap:

๐—ฃ๐˜†๐˜๐—ต๐—ผ๐—ป
๐Ÿ‘‰๐Ÿผ Master the basics: syntax, loops, functions, and data structures (lists, dictionaries, sets, tuples)
๐Ÿ‘‰๐Ÿผ Learn Pandas & NumPy for data manipulation
๐Ÿ‘‰๐Ÿผ Matplotlib & Seaborn for data visualization

๐—ฆ๐˜๐—ฎ๐˜๐—ถ๐˜€๐˜๐—ถ๐—ฐ๐˜€ & ๐—ฃ๐—ฟ๐—ผ๐—ฏ๐—ฎ๐—ฏ๐—ถ๐—น๐—ถ๐˜๐˜†
๐Ÿ‘‰๐Ÿผ Descriptive statistics: mean, median, mode, standard deviation
๐Ÿ‘‰๐Ÿผ Probability theory: distributions, Bayes' theorem, conditional probability
๐Ÿ‘‰๐Ÿผ Hypothesis testing & A/B testing

๐— ๐—ฎ๐—ฐ๐—ต๐—ถ๐—ป๐—ฒ ๐—Ÿ๐—ฒ๐—ฎ๐—ฟ๐—ป๐—ถ๐—ป๐—ด
๐Ÿ‘‰๐Ÿผ Supervised vs. unsupervised learning
๐Ÿ‘‰๐Ÿผ Key algorithms: Linear & Logistic Regression, Decision Trees, Random Forest, KNN, SVM
๐Ÿ‘‰๐Ÿผ Model evaluation metrics: accuracy, precision, recall, F1 score, ROC-AUC
๐Ÿ‘‰๐Ÿผ Cross-validation & hyperparameter tuning

๐——๐—ฒ๐—ฒ๐—ฝ ๐—Ÿ๐—ฒ๐—ฎ๐—ฟ๐—ป๐—ถ๐—ป๐—ด
๐Ÿ‘‰๐Ÿผ Neural Networks & their architecture
๐Ÿ‘‰๐Ÿผ Working with Keras & TensorFlow/PyTorch
๐Ÿ‘‰๐Ÿผ CNNs for image data and RNNs for sequence data

๐——๐—ฎ๐˜๐—ฎ ๐—–๐—น๐—ฒ๐—ฎ๐—ป๐—ถ๐—ป๐—ด & ๐—™๐—ฒ๐—ฎ๐˜๐˜‚๐—ฟ๐—ฒ ๐—˜๐—ป๐—ด๐—ถ๐—ป๐—ฒ๐—ฒ๐—ฟ๐—ถ๐—ป๐—ด
๐Ÿ‘‰๐Ÿผ Handling missing data, outliers, and data scaling
๐Ÿ‘‰๐Ÿผ Feature selection techniques (e.g., correlation, mutual information)

๐—ก๐—Ÿ๐—ฃ (๐—ก๐—ฎ๐˜๐˜‚๐—ฟ๐—ฎ๐—น ๐—Ÿ๐—ฎ๐—ป๐—ด๐˜‚๐—ฎ๐—ด๐—ฒ ๐—ฃ๐—ฟ๐—ผ๐—ฐ๐—ฒ๐˜€๐˜€๐—ถ๐—ป๐—ด)
๐Ÿ‘‰๐Ÿผ Tokenization, stemming, lemmatization
๐Ÿ‘‰๐Ÿผ Bag-of-Words, TF-IDF
๐Ÿ‘‰๐Ÿผ Sentiment analysis & topic modeling

๐—–๐—น๐—ผ๐˜‚๐—ฑ ๐—ฎ๐—ป๐—ฑ ๐—•๐—ถ๐—ด ๐——๐—ฎ๐˜๐—ฎ
๐Ÿ‘‰๐Ÿผ Understanding cloud services (AWS, GCP, Azure) for data storage & computing
๐Ÿ‘‰๐Ÿผ Working with distributed data using Spark
๐Ÿ‘‰๐Ÿผ SQL for querying large datasets

Donโ€™t get overwhelmed by the breadth of topics. Start smallโ€”master one concept, then move to the next. ๐Ÿ“ˆ

Youโ€™ve got this! ๐Ÿ’ช๐Ÿผ

Machine Learning And AI

23 Dec, 06:59


Data science interview questions :

Position : Data Scientist
There were 3 rounds of interview followed by 1 HR discussion.

Coding related questions :

1. You are given 2 lists
l1 = [1,2,2,3,4,5,6,6,7]
l2= [1,2,4,5,5,6,6,7]
Return all the elements from list l2 which were spotted in l1 as per their frequency of occurence.
For eg: elements in l2 which occured in l1 are : 1, 2 (only once) ,4 ,5,6 ,7
Expected Output : [1,2,4,5,6,6,7]

2.
text = ' I am a data scientist working in Paypal'
Return the longest word along with count of letters
Expected Output : ('scientist', 9)
(In case of ties , sort the words in alphabetical order and return 1st one)

3. You are given 2 tables in SQL , table 1 contains one column which has only single value '3' repeated 5 times , table 2 contains only one column which has 2 values ,'2' repeated 3 times & '3' repeated 4 times. Tell me number of records in case of inner . left , right ,outer , cross join.

4. You are given a transaction table which has txn_id (primary key) ,cust_id (foreign key) , txn_date (datetime) , txn_amt as 4 columns , one cust id has multiple txn_ids. Your job is to find out all the cust_id for which there were minimum 2 txns which are made in 10 seconds of duration. (This might help to identify fraudulent patterns)

Case study questions :
1. Tell me business model and revneue sources of Paypal.
Tell me general consequences when you change pricing of any product.

2. You are data scientist who studies impact of pricing change. Suppose business team comes to you and asks you what will happen if they increase the merchant fees per txn by 5%.
What will be your recommendation & strategy ?
What all factors will you think of and what will be the proposed ML solutioning look like ?

3. You see that some merchants are heavily misusing the refund facility (for incorrect txn /disputed txn merchants get refund) , they are claiming reimbursements by doing fake txns.
List possible scenarios and ways to identify such merchants ?

4. How will you decide pricing of a premier product like Iphone in India vs say South Africa ? What factors will you consider ?

Statistics Questions:
1. What is multicollinearity ?
2. What is Type1 error and type 2 error (explain in pricing experimentation pov)
3. What is Weibull distribution ?
4. What is CLT ? What is difference between t & normal distribution.
5. What is Wald's test ?
6. What is Ljung box test and explain null hypothesis for ADF test in Time series
7. What is causality ?

ML Questions:
1. What is logistic regression ? What is deviance ?
2. What is difference between R-Squared & Adj R-squared
3. How does Randomforest works ?
4. Difference between bagging and boosting ?
On paradigm of Variance -Bias what does bagging/boosting attempt to solve ?

Machine Learning And AI

14 Dec, 08:09


https://geekycodes.in/2024/12/10/changing-the-default-port-in-sql-server-a-complete-guide/

Machine Learning And AI

29 Nov, 05:20


https://geekycodes.in/2024/11/29/exploring-java-8-a-game-changer-in-modern-software-development/

Machine Learning And AI

27 Nov, 20:09


https://geekycodes.in/2024/11/28/the-complete-guide-to-the-manual-testing-process/

Machine Learning And AI

27 Nov, 19:15


https://geekycodes.in/2024/11/28/understanding-the-singleton-pattern-in-nestjs/

Machine Learning And AI

26 Nov, 09:52


https://geekycodes.in/2024/11/26/questions-asked-in-data-scientist-interviews-part-12/

Machine Learning And AI

26 Nov, 09:52


https://geekycodes.in/2024/11/26/questions-asked-in-data-scientist-interviews-part-11/

Machine Learning And AI

26 Nov, 07:05


https://geekycodes.in/2024/11/26/questions-asked-in-data-scientist-interviews-part-10/

Machine Learning And AI

21 Nov, 11:56


https://geekycodes.in/2024/11/21/questions-asked-in-data-scientist-interviews-part-9/

Machine Learning And AI

21 Nov, 09:13


https://geekycodes.in/2024/11/21/questions-asked-in-data-scientist-interviews-part-8-2/

Machine Learning And AI

20 Nov, 17:07


https://geekycodes.in/2024/11/20/questions-asked-in-data-scientist-interviews-part-7-2/

Machine Learning And AI

20 Nov, 13:11


Company Name : Swiggy
Role : Associate Software Engineer
Batch : 2024/2023/2022 passouts

Link : https://docs.google.com/forms/d/1E029cjZV8Em6zPC0YJYAMDDP_NjPtDkwufqHfvkVG2E/viewform?edit_requested=true&pli=1

Machine Learning And AI

18 Nov, 22:55


Company Name : Amazon
Role : Cloud Support Associate
Batch : 2024/2023 passouts

Link : https://www.amazon.jobs/en/jobs/2676989/cloud-support-associate

Machine Learning And AI

18 Nov, 19:29


https://geekycodesin.wordpress.com/2024/11/19/questions-asked-in-data-scientist-interviews-part-6/

Machine Learning And AI

18 Nov, 14:32


๐——๐—ฎ๐˜๐—ฎ ๐—ฆ๐—ฐ๐—ถ๐—ฒ๐—ป๐—ฐ๐—ฒ ๐—œ๐—ป๐˜๐—ฒ๐—ฟ๐˜ƒ๐—ถ๐—ฒ๐˜„ ๐—ค๐˜‚๐—ฒ๐˜€๐˜๐—ถ๐—ผ๐—ป:
How does outliers impact kNN?

Outliers can significantly impact the performance of kNN, leading to inaccurate predictions due to the model's reliance on proximity for decision-making. Hereโ€™s a breakdown of how outliers influence kNN:

๐—›๐—ถ๐—ด๐—ต ๐—ฉ๐—ฎ๐—ฟ๐—ถ๐—ฎ๐—ป๐—ฐ๐—ฒ
The presence of outliers can increase the model's variance, as predictions near outliers may fluctuate unpredictably depending on which neighbors are included. This makes the model less reliable for regression tasks with scattered or sparse data.

๐——๐—ถ๐˜€๐˜๐—ฎ๐—ป๐—ฐ๐—ฒ ๐— ๐—ฒ๐˜๐—ฟ๐—ถ๐—ฐ ๐—ฆ๐—ฒ๐—ป๐˜€๐—ถ๐˜๐—ถ๐˜ƒ๐—ถ๐˜๐˜†
kNN relies on distance metrics, which can be significantly affected by outliers. In high-dimensional spaces, outliers can increase the range of distances, making it harder for the algorithm to distinguish between nearby points and those farther away. This issue can lead to an overall reduction in accuracy as the modelโ€™s ability to effectively measure "closeness" degrades.

๐—ฅ๐—ฒ๐—ฑ๐˜‚๐—ฐ๐—ฒ ๐—ฝ๐—ฒ๐—ฟ๐—ณ๐—ผ๐—ฟ๐—บ๐—ฎ๐—ป๐—ฐ๐—ฒ ๐—ถ๐—ป ๐—–๐—น๐—ฎ๐˜€๐˜€๐—ถ๐—ณ๐—ถ๐—ฐ๐—ฎ๐˜๐—ถ๐—ผ๐—ป/๐—ฅ๐—ฒ๐—ด๐—ฟ๐—ฒ๐˜€๐˜€๐—ถ๐—ผ๐—ป ๐—ง๐—ฎ๐˜€๐—ธ๐˜€
Outliers near class boundaries can pull the decision boundary toward them, potentially misclassifying nearby points that should belong to a different class. This is particularly problematic if k is small, as individual points (like outliers) have a greater influence. The same happens in regression tasks as well.

๐—™๐—ฒ๐—ฎ๐˜๐˜‚๐—ฟ๐—ฒ ๐—œ๐—ป๐—ณ๐—น๐˜‚๐—ฒ๐—ป๐—ฐ๐—ฒ ๐——๐—ถ๐˜€๐—ฝ๐—ฟ๐—ผ๐—ฝ๐—ผ๐—ฟ๐˜๐—ถ๐—ผ๐—ป
If certain features contain outliers, they can dominate the distance calculations and overshadow the impact of other features. For example, an outlier in a high-magnitude feature may cause distances to be determined largely by that feature, affecting the quality of the neighbor selection.

Machine Learning And AI

17 Nov, 12:46


https://geekycodesin.wordpress.com/2024/11/17/minimum-number-of-platforms-required-for-trains-a-problem-solution-approach/

Machine Learning And AI

16 Nov, 16:17


https://youtu.be/P1QX6bhnojk?si=gO0kplNJfzNL1AF6

Machine Learning And AI

15 Nov, 19:06


https://geekycodesin.wordpress.com/2024/11/16/finding-the-maximum-subarray-sum-in-on-time-a-guide-to-kadanes-algorithm/

Machine Learning And AI

15 Nov, 18:51


https://geekycodesin.wordpress.com/?p=11435&preview=true&_thumbnail_id=11438

Machine Learning And AI

15 Nov, 08:38


Amazon Data Science Interview Question:
In a linear regression model, what are the key assumptions that need to be satisfied for the model to be valid? How would you evaluate whether these assumptions hold in your dataset?

This is also, the most common question I see across companies!

So the assumptions are -

๐—Ÿ๐—ถ๐—ป๐—ฒ๐—ฎ๐—ฟ๐—ถ๐˜๐˜†
The relationship between the independent variables (predictors) and the dependent variable is linear. This means that the effect of each predictor on the outcome is constant and additive.
How to evaluate? - Scatter plots of predictors vs. the dependent variable and residual vs. fitted value plots. You can also use polynomial regression or transformations (log, square root) if non-linearity is detected.
How to fix? - Apply feature transformations (e.g., log, square root, polynomial) or use non-linear models.

๐—ก๐—ผ๐—ฟ๐—บ๐—ฎ๐—น๐—ถ๐˜๐˜† ๐—ผ๐—ณ ๐—˜๐—ฟ๐—ฟ๐—ผ๐—ฟ๐˜€
The residuals are normally distributed, especially for the purpose of conducting statistical tests and constructing confidence intervals.
How to evaluate - Residual autocorrelation plots or the Durbin-Watson test for time-series data. For non-time-series data, this assumption can often be assumed to be satisfied if the data is randomly sampled.
How to fix - Transform the dependent variable (log, box-cox) and/or check for outliers.

๐—›๐—ผ๐—บ๐—ผ๐˜€๐—ฐ๐—ฒ๐—ฑ๐—ฎ๐˜€๐˜๐—ถ๐—ฐ๐—ถ๐˜๐˜† (๐—–๐—ผ๐—ป๐˜€๐˜๐—ฎ๐—ป๐˜ ๐—ฉ๐—ฎ๐—ฟ๐—ถ๐—ฎ๐—ป๐—ฐ๐—ฒ ๐—ผ๐—ณ ๐—˜๐—ฟ๐—ฟ๐—ผ๐—ฟ๐˜€)
The variance of the residuals (errors) is constant across all levels of the independent variables. In other words, the spread of residuals should not increase or decrease as the predicted values increase.
How to evaluate - Plot the residuals against fitted values. If the plot shows a "fan" shape (i.e., increasing or decreasing spread of residuals), you may need to address heteroscedasticity using robust standard errors or a transformation (e.g., log-transformation).
How to fix - Transformation of dependent variable (log, box-cox) or weighted least squares regression can help

๐—ก๐—ผ ๐— ๐˜‚๐—น๐˜๐—ถ๐—ฐ๐—ผ๐—น๐—น๐—ถ๐—ป๐—ฒ๐—ฎ๐—ฟ๐—ถ๐˜๐˜†
The independent variables (predictors) are not highly correlated with each other. High correlation between predictors can lead to multicollinearity, which makes it difficult to determine the individual effect of each predictor on the dependent variable.
How to evaluate - Calculate the Variance Inflation Factor (VIF) for each predictor. If VIF is high, consider removing highly correlated predictors or combining them into a single predictor (e.g., using Principal Component Analysis).
How to fix - Remove or combine correlated predictors, or use regularized regression models like Ridge or Lasso regression.

Machine Learning And AI

13 Nov, 15:59


https://geekycodesin.wordpress.com/2024/11/13/sorting-an-array-with-0s-1s-and-2s-in-one-pass-python-implementation/

Machine Learning And AI

13 Nov, 15:43


https://geekycodesin.wordpress.com/2024/11/13/understanding-dependency-injection-in-machine-learning-a-comprehensive-guide/

Machine Learning And AI

13 Nov, 15:31


https://geekycodesin.wordpress.com/2024/11/13/understanding-scrum-methodology-a-comprehensive-guide/

Machine Learning And AI

13 Nov, 14:43


https://geekycodesin.wordpress.com/2024/11/13/understanding-filters-in-mvc-a-deep-dive/

Machine Learning And AI

12 Nov, 13:39


https://youtu.be/P1QX6bhnojk

Machine Learning And AI

12 Nov, 12:26


https://geekycodesin.wordpress.com/2024/11/12/understanding-primary-and-secondary-constraints-in-sql/

Machine Learning And AI

11 Nov, 18:33


https://geekycodesin.wordpress.com/2024/11/12/understanding-extension-methods-in-programming/

Machine Learning And AI

11 Nov, 18:01


https://geekycodesin.wordpress.com/2024/11/11/understanding-the-singleton-design-pattern-ensuring-a-single-instance/

Machine Learning And AI

11 Nov, 16:21


https://geekycodesin.wordpress.com/2024/11/11/understanding-oops-concepts-a-deep-dive-with-real-life-examples/

Machine Learning And AI

11 Nov, 15:43


https://geekycodesin.wordpress.com/2024/11/11/understanding-normalization-indexing-sql-queries-and-constraints-in-dbms/

Machine Learning And AI

10 Nov, 17:28


https://geekycodesin.wordpress.com/2024/11/10/finding-the-third-greatest-element-in-an-array-a-step-by-step-guide/

Machine Learning And AI

26 Oct, 06:20


SQL Interview Questions (0-5 Year Experience)!

Are you preparing for a SQL interview? Here are some essential SQL concepts to review:

๐๐š๐ฌ๐ข๐œ ๐’๐๐‹ ๐‚๐จ๐ง๐œ๐ž๐ฉ๐ญ๐ฌ:

1. What is SQL, and why is it important in data analytics?
2. Explain the difference between INNER JOIN, LEFT JOIN, RIGHT JOIN, and FULL OUTER JOIN.
3. What is the difference between WHERE and HAVING clauses?
4. How do you use GROUP BY and HAVING in a query?
5. Write a query to find duplicate records in a table.
6. How do you retrieve unique values from a table using SQL?
7. Explain the use of aggregate functions like COUNT(), SUM(), AVG(), MIN(), and MAX().
8. What is the purpose of a DISTINCT keyword in SQL?

๐ˆ๐ง๐ญ๐ž๐ซ๐ฆ๐ž๐๐ข๐š๐ญ๐ž ๐’๐๐‹:

1. Write a query to find the second-highest salary from an employee table.
2. What are subqueries and how do you use them?
3. What is a Common Table Expression (CTE)? Give an example of when to use it.
4. Explain window functions like ROW_NUMBER(), RANK(), and DENSE_RANK().
5. How do you combine results of two queries using UNION and UNION ALL?
6. What are indexes in SQL, and how do they improve query performance?
7. Write a query to calculate the total sales for each month using GROUP BY.

๐€๐๐ฏ๐š๐ง๐œ๐ž๐ ๐’๐๐‹:

1. How do you optimize a slow-running SQL query?
2. What are views in SQL, and when would you use them?
3. What is the difference between a stored procedure and a function in SQL?
4. Explain the difference between TRUNCATE, DELETE, and DROP commands.
5. What are windowing functions, and how are they used in analytics?
6. How do you use PARTITION BY and ORDER BY in window functions?
7. How do you handle NULL values in SQL, and what functions help with that (e.g., COALESCE, ISNULL)?

Machine Learning And AI

25 Oct, 16:51


Tokenization in NLP is the first essential step in breaking down text into smaller pieces, often referred to as "tokens." This looks simple but is the foundation of everything that follows in NLP tasks from text classification to machine translation.


For example, in a sentence like "I love learning NLP", tokenization splits it into four tokens: ["I", "love", "learning", "NLP"].

But it can get more complicated with contractions, punctuations and languages without clear word boundaries like Chinese.

Thatโ€™s where techniques like Byte-Pair Encoding (BPE) and WordPiece help to handle these complexities.

Mastering tokenization helps NLP models capture the right meaning from the data.

Machine Learning And AI

24 Oct, 11:08


https://youtu.be/w2anY0hYsL0

Hi guys a lot of you have not subscribed my channel yet. If you're reading this message then don't forget to subscribe my channel and comment your views.  At least half of you go and subscribe my channel.
Thank you in advance

Machine Learning And AI

21 Oct, 20:00


Resume key words for data scientist role explained in points:

1. Data Analysis:
- Proficient in extracting, cleaning, and analyzing data to derive insights.
- Skilled in using statistical methods and machine learning algorithms for data analysis.
- Experience with tools such as Python, R, or SQL for data manipulation and analysis.

2. Machine Learning:
- Strong understanding of machine learning techniques such as regression, classification, clustering, and neural networks.
- Experience in model development, evaluation, and deployment.
- Familiarity with libraries like TensorFlow, scikit-learn, or PyTorch for implementing machine learning models.

3. Data Visualization:
- Ability to present complex data in a clear and understandable manner through visualizations.
- Proficiency in tools like Matplotlib, Seaborn, or Tableau for creating insightful graphs and charts.
- Understanding of best practices in data visualization for effective communication of findings.

4. Big Data:
- Experience working with large datasets using technologies like Hadoop, Spark, or Apache Flink.
- Knowledge of distributed computing principles and tools for processing and analyzing big data.
- Ability to optimize algorithms and processes for scalability and performance.

5. Problem-Solving:
- Strong analytical and problem-solving skills to tackle complex data-related challenges.
- Ability to formulate hypotheses, design experiments, and iterate on solutions.
- Aptitude for identifying opportunities for leveraging data to drive business outcomes and decision-making.


Resume key words for a data analyst role

1. SQL (Structured Query Language):
- SQL is a programming language used for managing and querying relational databases.
- Data analysts often use SQL to extract, manipulate, and analyze data stored in databases, making it a fundamental skill for the role.

2. Python/R:
- Python and R are popular programming languages used for data analysis and statistical computing.
- Proficiency in Python or R allows data analysts to perform various tasks such as data cleaning, modeling, visualization, and machine learning.

3. Data Visualization:
- Data visualization involves presenting data in graphical or visual formats to communicate insights effectively.
- Data analysts use tools like Tableau, Power BI, or Python libraries like Matplotlib and Seaborn to create visualizations that help stakeholders understand complex data patterns and trends.

4. Statistical Analysis:
- Statistical analysis involves applying statistical methods to analyze and interpret data.
- Data analysts use statistical techniques to uncover relationships, trends, and patterns in data, providing valuable insights for decision-making.

5. Data-driven Decision Making:
- Data-driven decision making is the process of making decisions based on data analysis and evidence rather than intuition or gut feelings.
- Data analysts play a crucial role in helping organizations make informed decisions by analyzing data and providing actionable insights that drive business strategies and operations.

Machine Learning And AI

18 Oct, 04:35


https://youtu.be/w2anY0hYsL0

Hi guys a lot of you have not subscribed my channel yet. If you're reading this message then don't forget to subscribe my channel and comment your views.  At least half of you go and subscribe my channel.
Thank you in advance

Machine Learning And AI

28 Sep, 09:02


In my previous team at IBM, we hired over 450 AI Engineers worldwide. They are working on Generative AI pilots for our IBM customers across various industries.

Thousands applied, and we developed a clear rubric to identify the best candidates.

Here are 8 concise tips to help you ace a technical AI engineering interview:

๐Ÿญ. ๐—˜๐˜…๐—ฝ๐—น๐—ฎ๐—ถ๐—ป ๐—Ÿ๐—Ÿ๐—  ๐—ณ๐˜‚๐—ป๐—ฑ๐—ฎ๐—บ๐—ฒ๐—ป๐˜๐—ฎ๐—น๐˜€ - Cover the high-level workings of models like GPT-3, including transformers, pre-training, fine-tuning, etc.

๐Ÿฎ. ๐——๐—ถ๐˜€๐—ฐ๐˜‚๐˜€๐˜€ ๐—ฝ๐—ฟ๐—ผ๐—บ๐—ฝ๐˜ ๐—ฒ๐—ป๐—ด๐—ถ๐—ป๐—ฒ๐—ฒ๐—ฟ๐—ถ๐—ป๐—ด - Talk through techniques like demonstrations, examples, and plain language prompts to optimize model performance.

๐Ÿฏ. ๐—ฆ๐—ต๐—ฎ๐—ฟ๐—ฒ ๐—Ÿ๐—Ÿ๐—  ๐—ฝ๐—ฟ๐—ผ๐—ท๐—ฒ๐—ฐ๐˜ ๐—ฒ๐˜…๐—ฎ๐—บ๐—ฝ๐—น๐—ฒ๐˜€ - Walk through hands-on experiences leveraging models like GPT-4, Langchain, or Vector Databases.

๐Ÿฐ. ๐—ฆ๐˜๐—ฎ๐˜† ๐˜‚๐—ฝ๐—ฑ๐—ฎ๐˜๐—ฒ๐—ฑ ๐—ผ๐—ป ๐—ฟ๐—ฒ๐˜€๐—ฒ๐—ฎ๐—ฟ๐—ฐ๐—ต - Mention latest papers and innovations in few-shot learning, prompt tuning, chain of thought prompting, etc.

๐Ÿฑ. ๐——๐—ถ๐˜ƒ๐—ฒ ๐—ถ๐—ป๐˜๐—ผ ๐—บ๐—ผ๐—ฑ๐—ฒ๐—น ๐—ฎ๐—ฟ๐—ฐ๐—ต๐—ถ๐˜๐—ฒ๐—ฐ๐˜๐˜‚๐—ฟ๐—ฒ๐˜€ - Compare transformer networks like GPT-3 vs Codex. Explain self-attention, encodings, model depth, etc.

๐Ÿฒ. ๐——๐—ถ๐˜€๐—ฐ๐˜‚๐˜€๐˜€ ๐—ณ๐—ถ๐—ป๐—ฒ-๐˜๐˜‚๐—ป๐—ถ๐—ป๐—ด ๐˜๐—ฒ๐—ฐ๐—ต๐—ป๐—ถ๐—พ๐˜‚๐—ฒ๐˜€ - Explain supervised fine-tuning, parameter efficient fine tuning, few-shot learning, and other methods to specialize pre-trained models for specific tasks.

๐Ÿณ. ๐——๐—ฒ๐—บ๐—ผ๐—ป๐˜€๐˜๐—ฟ๐—ฎ๐˜๐—ฒ ๐—ฝ๐—ฟ๐—ผ๐—ฑ๐˜‚๐—ฐ๐˜๐—ถ๐—ผ๐—ป ๐—ฒ๐—ป๐—ด๐—ถ๐—ป๐—ฒ๐—ฒ๐—ฟ๐—ถ๐—ป๐—ด ๐—ฒ๐˜…๐—ฝ๐—ฒ๐—ฟ๐˜๐—ถ๐˜€๐—ฒ - From tokenization to embeddings to deployment, showcase your ability to operationalize models at scale.

๐Ÿด. ๐—”๐˜€๐—ธ ๐˜๐—ต๐—ผ๐˜‚๐—ด๐—ต๐˜๐—ณ๐˜‚๐—น ๐—พ๐˜‚๐—ฒ๐˜€๐˜๐—ถ๐—ผ๐—ป๐˜€ - Inquire about model safety, bias, transparency, generalization, etc. to show strategic thinking.

Machine Learning And AI

26 Sep, 19:39


https://youtu.be/ZOJvKbbc6cw


Hi guys a lot of you have not subscribed my channel yet. If you're reading this message then don't forget to subscribe my channel and comment your views.  At least half of you go and subscribe my channel.
Thank you in advance

Machine Learning And AI

16 Sep, 12:39


Recently, I completed two rounds of technical interviews for an ML Engineer role focused on LLMs, which pushed me to dive deep into concepts like attention mechanisms, tokenization, RAG, and GPU parallelism. I ended up creating a 30-page document of notes to organize my learnings.

To further solidify these concepts, I built three projects:
1๏ธโƒฃ Two follow-along RAG-based "ChatPDF" projects with slight variationsโ€”one using Google Gen AI + FAISS, and another using HuggingFace + Pinecone.
2๏ธโƒฃ A custom web scraper project that creates a vector store from website data and leverages advanced RAG techniques (like top-k retrieval and reranking) to provide LLM-driven answers for queries about the website.

Although the company ultimately chose another candidate who better matched their specific requirements, I received positive feedback on both rounds, and Iโ€™m excited to continue building on what Iโ€™ve learned. Onward and upward!

Notes: https://lnkd.in/dAvJjawc
Google Gen AI + FAISS+ Streamlit: https://lnkd.in/d7hPEz8c
Huggingface + Pinecone:https://lnkd.in/dgbJTSpq
Web scraper + Advanced RAG: https://lnkd.in/ddJfbBcF

P.S. you would need your own API keys for Google Gen AI, Pinecone and Cohere. All these are free to use for the purposes of small projects and for learning.

Machine Learning And AI

13 Sep, 11:51


https://youtu.be/ZOJvKbbc6cw


Hi guys a lot of you have not subscribed my channel yet. If you're reading this message then don't forget to subscribe my channel and comment your views.  At least half of you go and subscribe my channel.
Thank you in advance

Machine Learning And AI

10 Sep, 15:23


ARIMA is easier than you think.

Explained in 3 minutes.

ARIMA stands for AutoRegressive Integrated Moving Average. Itโ€™s a popular method used for forecasting time series data.

In simple terms, ARIMA helps us predict future values based on past data. It combines three main components: autoregression, differencing, and moving averages.

Let's breakdown those three parts:

1๏ธโƒฃ Autoregression means we use past values to predict future ones.

2๏ธโƒฃ Differencing helps to make the data stationary, which means it has a consistent mean over time.

3๏ธโƒฃ Moving averages smooth out short-term fluctuations.

Using ARIMA can help you make better decisions, manage inventory, and boost profits. Itโ€™s a powerful tool for anyone looking to understand trends in their data!

Machine Learning And AI

03 Sep, 15:16


๐— ๐—ฎ๐˜€๐˜๐—ฒ๐—ฟ ๐—ฆ๐—ค๐—Ÿ ๐—ช๐—ถ๐—ป๐—ฑ๐—ผ๐˜„ ๐—™๐˜‚๐—ป๐—ฐ๐˜๐—ถ๐—ผ๐—ป๐˜€ ๐ŸŒŸ

SQL window functions are key to cracking technical interviews and optimizing your SQL queries. Theyโ€™re often a focal point in data-focused roles, where showing your knowledge of these functions can set you apart. By mastering these functions, you can solve complex problems efficiently and design more effective databases, making you a valuable asset in any data-driven organization.

To make it easier to understand, I have divided SQL window functions into three main categories: Aggregate, Ranking, and Value functions.

1. ๐—”๐—ด๐—ด๐—ฟ๐—ฒ๐—ด๐—ฎ๐˜๐—ฒ ๐—™๐˜‚๐—ป๐—ฐ๐˜๐—ถ๐—ผ๐—ป๐˜€

Aggregate functions like AVG(), SUM(), COUNT(), MIN(), and MAX() compute values over a specified window, such as running totals or averages. These functions help optimize queries that require complex calculations while retaining row-level details.

2. ๐—ฅ๐—ฎ๐—ป๐—ธ๐—ถ๐—ป๐—ด ๐—™๐˜‚๐—ป๐—ฐ๐˜๐—ถ๐—ผ๐—ป๐˜€

Ranking functions such as ROW_NUMBER(), RANK(), and DENSE_RANK() assign ranks, dense ranks, or row numbers based on a specified order within a partition. These are crucial for solving common interview problems and creating optimized queries for ordered datasets.

3. ๐—ฉ๐—ฎ๐—น๐˜‚๐—ฒ ๐—™๐˜‚๐—ป๐—ฐ๐˜๐—ถ๐—ผ๐—ป๐˜€

Value functions like LAG(), LEAD(), FIRST_VALUE(), and LAST_VALUE() allow you to access specific rows within your window. These functions are essential for trend analysis, comparisons, and detecting changes over time.

Iโ€™ve broken down each category with examples, sample code, expected output, interview questions, and even ChatGPT prompts to help you dive deeper into SQL window functions. Whether you're preparing for an interview or looking to optimize your SQL queries, understanding these functions is a game-changer.

Machine Learning And AI

02 Sep, 01:08


Making all my interview experiences public so that I am forced to learn new things :)

Machine Learning
1. Explain 'irreducible error' with the help of a real life example
2. What two models are compared while calculating R2 in a regression setup?
3. How do you evaluate clustering algorithms?
4. What is Gini and Cross-entropy? What are the minimum and maximum value for both?
5. What does MA component mean in ARIMA models?
6. You are a senior data scientist and one of your team members suggests you to use KNN with 70:30 train test split , what must you immediately correct in his approach?

AWS & DevOps
1. Run time limit for Lambda functions.
2. What do you mean by a serverless architecture?
3. Tell me any four Docker commands.
4. What is Git Checkout?
5. How does ECS help container orchestration and how could you make it serverless?
6. Can you run a docker image locally?

Generative AI
1. Most important reason why one may just still use RAG when you have LLMs offering context window in million tokens
2. How do you handle a situation when tokens in your retrieved context exceed tokens that your LLM supports?
3. What is context precision and context recall in the context of RAG?
4. What is hybrid search and what are the advantages / limitations?
5. What inputs are shared when you do recursive chunking?

Machine Learning And AI

01 Sep, 08:48


๐Ÿšจ Major Announcement: Mukesh Ambani to transform Rel'AI'ince into a deeptech company

He is focused on driving AI adoption across Reliance Industries Limited's operations through several initiatives:

โžก๏ธ Developing cost-effective generative AI models and partnering with tech companies to optimize AI inferencing

โžก๏ธ Introducing Jio Brain, a comprehensive suite of AI tools designed to enhance decision-making, predictions, and customer insights across Relianceโ€™s ecosystem

โžก๏ธ Building a large-scale, AI-ready data center in Jamnagar, Gujarat, equipped with advanced AI inference facilities

โžก๏ธ Launching JioAI Cloud with a special Diwali offer of up to 100 GB of free cloud storage

โžก๏ธ Collaborating with Jio Institute to create AI programs for upskilling

โžก๏ธ Introducing "Hello Jio," a generative AI voice assistant integrated with JioTV OS to help users find content on Jio set-top boxes

โžก๏ธ Launching "JioPhoneCall AI," a feature that uses generative AI to transcribe, summarize, and translate phone calls.

Machine Learning And AI

28 Aug, 18:27


๐Ÿ“š Understanding Linear Regression Through a Studentโ€™s Journey

Letโ€™s take a trip back to your student days to understand linear regression, one of the most fundamental concepts in machine learning.

Alex, a dedicated student, is trying to predict their final exam score based on the number of hours they study each week. They gather data over the semester and notice a patternโ€”more hours studied generally leads to higher scores. To quantify this relationship, Alex uses linear regression.

What is Linear Regression?
Linear regression is like drawing a straight line through a scatterplot of data points that best predicts the dependent variable (exam scores) from the independent variable (study hours). The equation of the line looks like this:

Score= Intercept + Slope * Study Hours

Here, the intercept is the score Alex might expect with zero study hours (hopefully not too low!), and the slope shows how much the score increases with each additional hour of study.

Linear regression works under several assumptions:

1. Linearity: The relationship between study hours and exam scores should be linear. If Alex studies twice as much, their score should increase proportionally. But what if the benefit of extra hours diminishes over time? Thatโ€™s where the linearity assumption can break down.

2. Independence: Each data point (study hours vs. exam score) should be independent of others. If Alexโ€™s friends start influencing their study habits, this assumption might be violated.

3. Homoscedasticity: The variance of errors (differences between predicted and actual scores) should be consistent across all levels of study hours. If Alexโ€™s predictions are more accurate for students who study a little but less accurate for those who study a lot, this assumption doesnโ€™t hold.

4. Normality of Errors: The errors should follow a normal distribution. If the errors are skewed, it might suggest that factors beyond study hours are influencing scores.


Despite its simplicity, linear regression isnโ€™t perfect. Here are a few limitations of linear regression.

- Non-Linearity:If the relationship between study hours and exam scores isnโ€™t linear (e.g., diminishing returns after a certain point), linear regression might not capture the true pattern.

- Outliers: A few students who study a lot but still score poorly can heavily influence the regression line, leading to misleading predictions.

- Overfitting: If Alex adds too many variables (like study environment, type of study material, etc.), the model might become too complex, fitting the noise rather than the true signal.

In Alexโ€™s case, while linear regression provides a simple and interpretable model, itโ€™s important to remember these assumptions and limitations. By understanding them, Alex can better assess when to rely on linear regression and when it might be necessary to explore more advanced methods.

Machine Learning And AI

28 Aug, 16:14


https://datasciencewithved.wordpress.com/2024/08/28/advanced-data-transformation-and-manipulation-with-pyspark-a-comprehensive-tutorial/

Machine Learning And AI

28 Aug, 15:46


https://datasciencewithved.wordpress.com/2024/08/28/introduction-to-pyspark-a-beginners-tutorial/

Machine Learning And AI

28 Aug, 15:28


https://datasciencewithved.wordpress.com/2024/08/28/understanding-l1-and-l2-regularization-in-machine-learning/

Machine Learning And AI

28 Aug, 12:35


Remove Consecutive Duplicates in Python from a list โ€“ Data Science With Ved
https://datasciencewithved.wordpress.com/2024/08/28/remove-consecutive-duplicates-in-python-from-a-list/

Machine Learning And AI

28 Aug, 12:35


Check out my new blog.

Machine Learning And AI

27 Aug, 16:59


https://geekycodesin.wordpress.com/2024/08/27/3-sum-closest-leetcode/

1,388

subscribers

195

photos

1

videos