Data Science | Machine Learning with Python for Researchers @datasciencet Channel on Telegram

Data Science | Machine Learning with Python for Researchers

@datasciencet


The Data Science and Python channel is for researchers and advanced programmers

Buy ads: https://telega.io/c/dataScienceT

Admin: @hussein_sheikho

Data Science | Machine Learning with Python for Researchers (English)

Are you a researcher or an advanced programmer looking to delve deeper into the world of data science and machine learning using Python? Look no further than the Data Science and Python channel, also known as @datasciencet. This channel is dedicated to providing valuable insights, resources, and tips for individuals interested in the fields of data science and machine learning. Whether you're a seasoned professional or just starting out, this channel offers something for everyone.

Stay up to date with the latest trends, tools, and techniques in the world of data science and machine learning. Learn how to harness the power of Python, a versatile and powerful programming language, to analyze data, build predictive models, and extract valuable insights. Connect with like-minded individuals, share your knowledge, and collaborate on exciting projects within the community.

In addition to valuable content, the Data Science and Python channel also offers opportunities to promote your own work or products. If you're interested in advertising on the channel, visit https://telega.io/c/dataScienceT for more information. The channel is managed by Admin @hussein_sheikho, who is dedicated to creating a supportive and engaging community for researchers and programmers alike.

Join the Data Science and Python channel today to take your skills to the next level and unlock new opportunities in the world of data science and machine learning. Whether you're looking to enhance your knowledge, network with professionals, or showcase your expertise, this channel has something for everyone. Don't miss out on this valuable resource for researchers and advanced programmers. Join @datasciencet today!

Data Science | Machine Learning with Python for Researchers

21 Nov, 05:43


Explore "Pretraining LLMs," a short course developed with upstageai.

The course covers pretraining from scratch, continuing pretraining on custom data, and how using smaller open-source models can reduce costs.

Take the course for free:
https://hubs.la/Q02YFKyx0

https://t.me/DataScienceT βœ…

Data Science | Machine Learning with Python for Researchers

20 Nov, 12:03


πŸ“ˆHow to make $15,000 in a month in 2024?

Easy!!! Lisa is now the hippest trader who is showing crazy results in the market!

She was able to make over $15,000 in the last month! ❗️

Right now she has started a marathon on her channel and is running it absolutely free. πŸ’‘

To participate in the marathon, you will need to :

1. Subscribe to the channel SIGNALS BY LISA TRADER πŸ“ˆ
2. Write in private messages : β€œMarathon” and start participating!

πŸ‘‰CLICK HEREπŸ‘ˆ

Data Science | Machine Learning with Python for Researchers

20 Nov, 09:53


🧹πŸͺ£ MOP+MiHo+NCC πŸ–ΌοΈπŸ‘€: Image Matching Filtering and Refinement by Planes and Beyond

πŸ–₯ Github: https://github.com/fb82/miho

πŸ“• Paper: https://arxiv.org/abs/2411.09484v1

🌟 Dataset: https://paperswithcode.com/dataset/scannet

https://t.me/DataScienceT βœ…

Data Science | Machine Learning with Python for Researchers

16 Nov, 05:45


OpenCoder doesn't get enough love

They open-sourced the entire pipeline to create QwenCoder-level code models.

This includes:
- Large datasets
- High-quality models
- Eval framework

Tons of great lessons and observations in the paper

πŸ“ Paper: arxiv.org/abs/2411.04905

https://t.me/DataScienceT βœ…

Data Science | Machine Learning with Python for Researchers

14 Nov, 14:11


Coursera has launched a collaboration with the MAJOR platform to enable students to self-fund using the MAJOR platform.

Students can now access free Coursera scholarships through MAJOR.

Don't miss the opportunity: Click here.

Data Science | Machine Learning with Python for Researchers

14 Nov, 06:53


Most classical ML algorithms cannot be trained with a batch implementation.

This is concerning because enterprises typically deal with tabular data and classical ML algorithms, such as tree-based methods, are frequently used for modeling.

For instance, to train a random forest from sklearn, the entire dataset must be present in memory. This limits its usage to only small/intermediate datasets.

There are two ways to extend random forests to large datasets.

1) Use big-data frameworks like Spark MLlib to train them.

2) Use random patches, which I learned from the PhD thesis of Dr. Gilles Louppe β€” Understanding Random Forests.

> Here’s what he proposed.

Note: This approach only works in an ensemble setting. So, you would have to train multiple models.

The idea is to sample random data patches (both rows and columns) and train a decision tree model on the patch.

Repeat this step multiple times to obtain the entire random forest model.

> Here's why it works.

The core objective of Bagging is to build trees that are as different as possible.

In this case, the dataset overlap between any two trees is NOT expected to be huge compared to the typical random forest. This aids in the Bagging objective.

His thesis presented benchmarks on 13 datasets:
- Random patches performed better than the random forest on 11 datasets.
- On the other two datasets, the difference was quite small (~0.05).

And this is how we can train a random forest model on large datasets that do not fit into memory.

https://t.me/DataScienceT ⭐️

Data Science | Machine Learning with Python for Researchers

13 Nov, 12:00


πŸ€‘EARN YOUR $100 TODAY! EASY!

Lisa Trader has launched a free marathon on her VIP channel.

Now absolutely everyone can earn from trading. It has become even easier to earn in the cryptocurrency market, you can start today!

WHAT DO YOU NEED TO START?

1. Subscribe to the channel SIGNALS BY LISA TRADER πŸ“ˆ.
2. Write β€œMARATHON” in private messages. She will then tell you how to get on the vip channel for absolutely FREE!

πŸ‘‰CLICK HEREπŸ‘ˆ
πŸ‘‰CLICK HEREπŸ‘ˆ
πŸ‘‰CLICK HEREπŸ‘ˆ

Data Science | Machine Learning with Python for Researchers

12 Nov, 14:44


OmniGen: Unified Image Generation

Paper: https://arxiv.org/pdf/2409.11340v1.pdf

Code: https://github.com/vectorspacelab/omnigen

Datasets: DreamBooth - MagicBrush

https://t.me/DataScienceT ⭐️

Data Science | Machine Learning with Python for Researchers

12 Nov, 14:42


Docling Technical Report

Paper: https://arxiv.org/pdf/2408.09869v3.pdf

Code 1: https://github.com/DS4SD/docling
Code 2: https://github.com/DS4SD/docling-core

https://t.me/DataScienceT βœ…

Data Science | Machine Learning with Python for Researchers

11 Nov, 20:25


πŸ“Œ Practical exercises and additional materials for the book "Build a Large Language Model (From Scratch)"

A Github repository with practical exercises, notebooks with code for developing, pre-training, and fine-tuning a GPT-type LLM model based on one of the best books on building an LLM from scratch.

▢️ About the book:
In this book, you will learn and understand how large language models work from the inside, creating your own LLM step by step, with a detailed explanation of each stage in clear language, diagrams and examples.

The method described in the book demonstrates the approach used to create large fundamental models such as those underlying ChatGPT.

In the repository, each chapter of the book has several (3-4) applied examples in ipynb format or as an executable python script. The code is aimed at a wide audience, is designed to run on regular laptops and does not require specialized equipment.

▢️ The main value of the repository is additional practical materials that will help you to study in more depth the subtleties and nuances of the process of setting up and learning LLM:

Setting

🟒 Tips on Setting Up Python
🟒 Installing Python Packages and Libraries
🟒 Docker Environment Setup Guide

Chapter 2: Working with Text Data

🟠 Comparison of different implementations of Byte Pair Encoding (BPE)
🟠 Understanding the difference between embedding and line layers
🟠 Dataloader Intuition with Prime Numbers

Chapter 3: Code of Attention Mechanisms

🟒 Comparison of Effective Implementations of Multi-Head Attention
🟒 PyTorch Buffers

Chapter 4: Implementing the GPT Model from Scratch

🟠 FLOPS Analysis

Chapter 5: Pre-training on unlabeled data

🟒 Alternative Loading of HuggingFace Scales Using Transformers
🟒 Pre-training GPT on the Project Gutenberg dataset
🟒 Adding more features to the learning cycle
🟒 Hyperparameter optimization for pretraining
🟒 Creating a user interface for interacting with LLM
🟒 Convert GPT to Llama
🟒 Llama 3.2 from scratch
🟒 Memory-efficient model loading

Chapter 6: Fine-tuning for Classification

🟠 More experiments on fine-tuning the different layers and using larger models
🟠 Fine-tuning various models based on a 50K row IMDB movie review dataset.
🟠 Building a User Interface for Interacting with a GPT-Based Spam Classifier

Chapter 7: Fine-tuning to Follow Instructions

🟒 Dataset utilities for finding close duplicates and creating passive voice entries
🟒 Evaluating responses to instructions using OpenAI and Ollama APIs
🟒 Creating a dataset for fine-tuning instructions
🟒 Improving the dataset for fine-tuning instructions
🟒 Creating a Preference Dataset with Llama 3.1 70B and Ollama
🟒 DPO for LLM Alignment procedure
🟒 Creating a user interface for interacting with a GPT model with fine-tuning of instructions

πŸ–₯ Github

https://t.me/DataScienceT βœ…

Data Science | Machine Learning with Python for Researchers

11 Nov, 18:54


A promising digital wallet will distribute $40 for free to every user who creates an account on this wallet

Terms of creating an account: Subscribe to their channel only.

https://t.me/TronKeeperBot/app?startapp=418788114

Data Science | Machine Learning with Python for Researchers

07 Nov, 07:06


Constrained Diffusion Implicit Models!

We use diffusion models to solve noisy inverse problems like inpainting, sparse-recovery, and colorization. 10-50x faster than previous methods!

Paper: arxiv.org/pdf/2411.00359

Demo: https://t.co/m6o9GLnnZF

https://t.me/DataScienceT

Data Science | Machine Learning with Python for Researchers

06 Nov, 12:07


🎁 Your balance is credited $4,000 , the owner of the channel wants to contact you!

Dear subscriber, we would like to thank you very much for supporting our channel, and as a token of our gratitude we would like to provide you with free access to Lisa's investor channel, with the help of which you can earn today

T.me/Lisainvestor

Be sure to take advantage of our gift, admission is free, don't miss the opportunity, change your life for the better.

You can follow the link :
https://t.me/+j4-NLonPlWJmZDVh

Data Science | Machine Learning with Python for Researchers

06 Nov, 08:23


πŸ”¦ Biggest Sale Of The Year NOW ON πŸ”¦Double 11 Shopping Festival Event is live! Check out your most loved for less. βœ¨πŸ›οΈ

Enjoy SPOTO Double 11 Crazy Sale to Join Lucky Draw and win gifts worth up to $1000!πŸ’Έ
🎁⏯️: https://www.spotoexam.com/snsdouble11sale2024/?id=snstxrbzhussein

πŸ”—πŸ“Test Your IT Skills for Free: https://bit.ly/48q8Cb3

πŸ”—πŸ“²Contact for 1v1 IT Certs Exam Help: https://wa.link/k0vy3x
πŸŒπŸ“š JOIN IT Study GROUP to Get Madness Discount πŸ‘‡: https://chat.whatsapp.com/HqzBlMaOPci0wYvkEtcCDa

Data Science | Machine Learning with Python for Researchers

02 Nov, 04:49


Don’t sleep on Vision Language Models (VLMs).

With the releases of Llama 3.2 and ColQwen2, multimodal models are gaining more and more traction.

VLMs are multimodal models that can handle image and text modalities:

Input: Image and text
Output: Text

They can be used for many use cases, including visual question answering or document understanding (as in the case of ColQwen2).

How do they work under the hood?

The main challenge in VLMs is to unify the image and text representations.

For this, a typical VLM architecture consists of the following components:

β€’ image encoder (e.g., CLIP, SigLIP)
β€’ embedding projector to align image and text representations
β€’ text decoder (e.g., Vicuna, Gemma)

huggingface.co/blog/vlms

https://t.me/DataScienceT

Data Science | Machine Learning with Python for Researchers

29 Oct, 19:27


πŸ“– LLM-Agent-Paper-List is a repository of papers on the topic of agents based on large language models (LLM)! The papers are divided into categories such as LLM agent architectures, autonomous LLM agents, reinforcement learning (RL), natural language processing methods, multimodal approaches and tools for developing LLM agents, and more.

πŸ–₯ Github

https://t.me/DataScienceT βœ…

Data Science | Machine Learning with Python for Researchers

27 Oct, 16:20


SAM2Long: Enhancing SAM 2 for Long Video Segmentation with a Training-Free Memory Tree

πŸ–₯ Github: https://github.com/mark12ding/sam2long

πŸ“• Paper: https://arxiv.org/abs/2410.16268v1

πŸ€— HF: https://huggingface.co/papers/2410.16268

Data Science | Machine Learning with Python for Researchers

23 Oct, 12:05


🚨With me you will make money! I have made over $20,000 in the last week! πŸ”₯

I don't care where you are and what you can do, I will help absolutely everyone earn money.

My name is Lisa and:
βœ”οΈ I will teach you trading for FREE in a short period of time
βœ”οΈ I will give you FREE signals every day
βœ”οΈ I will help you to get income of 1,000$ in a week

Sounds unbelievable?

You have 2 hours to join our channel.

But it’s true - just look at the results in my channel and JOIN FOR FREE πŸ‘‰πŸ» https://t.me/+fJ0XM3sZkaxkNjgx

Data Science | Machine Learning with Python for Researchers

21 Oct, 06:22


Benchmarking Agentic Workflow Generation"! ⭐️

ArXiv:
https://arxiv.org/abs/2410.07869

Website:
https://www.zjukg.org/project/WorFBench/

Data:
https://huggingface.co/collections/zjunlp/worfbench-66fc28b8ac1c8e2672192ea1

Github:
https://github.com/zjunlp/WorFBench

https://t.me/DataScienceT ⭐

Data Science | Machine Learning with Python for Researchers

20 Oct, 08:42


estimating body and hand motion from a pair of glasses πŸ€“

website:
http://egoallo.github.io

code:
http://github.com/brentyi/egoallo

https://t.me/DataScienceT 🏡

Data Science | Machine Learning with Python for Researchers

16 Oct, 12:08


LOOKING FOR A NEW SOURCE OF INCOME?
Average earnings from 100$ a day

Lisa is looking for people who want to earn money. If you are responsible, motivated and want to change your life. Welcome to her channel.

WHAT YOU NEED TO WORK:
1. phone or computer
2. Free 15-20 minutes a day
3. desire to earn

❗️ Requires 20 people ❗️
Access is available at the link below
πŸ‘‡

https://t.me/+NhwYZAXFlT8yZDIx

Data Science | Machine Learning with Python for Researchers

15 Oct, 20:42


Efficiently Democratizing Medical LLMs for 50 Languages via a Mixture of Language Family Experts

πŸ’» Github: https://github.com/freedomintelligence/apollomoe

πŸ”– Paper: https://arxiv.org/abs/2410.10626v1

πŸ€— Dataset: https://paperswithcode.com/dataset/mmlu

https://t.me/DataScienceT 🏡

Data Science | Machine Learning with Python for Researchers

12 Oct, 11:28


Generalizable and Animatable Gaussian Head Avatar

πŸ–₯ Github: https://github.com/xg-chu/gagavatar

πŸ“• Paper: https://arxiv.org/abs/2410.07971v1

https://t.me/DataScienceT 🏡