Rami Krispin's Data Science Channel @ramikrispinds Channel on Telegram

Rami Krispin's Data Science Channel

@ramikrispinds


Rami Krispin's Data Science Channel (English)

Are you a data science enthusiast looking to expand your knowledge and skills in the field? Look no further than Rami Krispin's Data Science Channel on Telegram! This channel, managed by data scientist Rami Krispin himself, is dedicated to providing valuable insights, resources, and discussions related to the world of data science. Rami Krispin is a renowned expert in the field, with years of experience and a passion for sharing his knowledge with others. Whether you are just starting out in data science or are a seasoned professional, this channel has something for everyone. From tutorials and case studies to discussions on the latest trends and technologies, Rami Krispin's Data Science Channel covers it all. Join today to connect with like-minded individuals, stay up-to-date on industry news, and take your data science skills to the next level!

Rami Krispin's Data Science Channel

12 Feb, 13:59


Edition 23 is out! 🗞️

This week's agenda:
➡️ Open Source of the Week - The lang project
➡️ New learning resources - LLM course from Andrej Karpathy, trigger Airflow DAG with AI agent, running Polars on a GPU, running AI models on multiple serverless GPUs
➡️ Book of the week - Effective Visualization by Matt Harrison

Subscribe to receive weekly updates 👇🏼
https://www.linkedin.com/pulse/lang-project-effective-visualization-llm-course-more-rami-krispin-mn3oc

Rami Krispin's Data Science Channel

11 Feb, 16:46


Sounds familiar? 😅

Rami Krispin's Data Science Channel

10 Feb, 18:45


Large AI Models on Multiple Serverless GPUs in Python 🚀
This tutorial by NeuralNine provides a step-by-step guide for running AI models on multiple GPUs with Python 👇🏼

📽️ https://www.youtube.com/watch?v=xYtYB8lfxtI

Rami Krispin's Data Science Channel

10 Feb, 15:02


I come across the Dive, a CLI tool for exploring Docker images. This cool tool enables you to review the file system and identify layers that take unnecessary size.

https://github.com/wagoodman/dive

License: MIT

Image credit: project repo

Rami Krispin's Data Science Channel

09 Feb, 19:46


Every week, I share a data science book in my newsletter. In this week's edition, the focus was on football analytics, dedicated to today's Super Bowl match 🏈!

The Introduction to NFL Analytics with R by Brad Congelio, as the name implies, provides an introduction to NFL sports analytics using R. The book covers the following topics:
Working and processing NFL data
NFL analytics with the nflverse libraries
Data visualization of NFL data
Modeling NFL data

Thanks to the author, the book has an open online version:
https://bradcongelio.com/nfl-analytics-with-r-book/

More details are available in the newsletter 🗞️.

Rami Krispin's Data Science Channel

08 Feb, 17:17


I thought I knew well how to optimize Python container size until I watched this talk from Matthijs Brouns 👇🏼

https://www.youtube.com/watch?v=Z1Al4I4Os_A

Rami Krispin's Data Science Channel

08 Feb, 15:53


Happy Saturday! ☀️

Continue with the VScode extensions sequence 🚀. The 5th edition focuses on extensions for working with Docker containers in VScode 🐳.

Next week's edition will be the last on this sequence, focusing on supporting AI extensions for VScode 🎯.

Subscribe to get weekly updates 👇🏼

https://www.linkedin.com/pulse/five-vscode-extensions-working-containers-rami-krispin-50mec

Rami Krispin's Data Science Channel

07 Feb, 14:17


Every week, I review an open-source project in my newsletter. This week, the focus is on the LazyGit project.

The LazyGit is a cool CLI tool that provides a simple terminal UI to manage Git workflow. This Git UI has the following features:
Review local branches and historical commits
Commit changes
Graphic representative of the commit history
Interactive code rebase
Stage individual lines
Undo the last commit (love this functionality!)

License: MIT 🦄

Image credit: project repo

More details are available in my newsletter:
https://www.linkedin.com/pulse/lazygit-project-rami-krispin-zz82c

Rami Krispin's Data Science Channel

07 Feb, 03:57


Trigger AirFlow Data with AI Agent 🚀

A short tutorial by Marc Lamberti for setting up an AI Agent that can trigger Airflow DAGs 👇🏼

📽️ https://www.youtube.com/watch?v=R4UUAJjYvdI

Rami Krispin's Data Science Channel

06 Feb, 14:04


Introduction to Polars 🐻‍❄️

A short introduction to Polars by Mariya Sha 👇🏼

https://www.youtube.com/watch?v=8GoBlwgbirE

Rami Krispin's Data Science Channel

06 Feb, 03:23


Andrej Karpathy released today a new course focused on LLM, such as chatGPT architecture. This three and half hour course dives into the foundation topics such as working and processing text, language model architecture, and dive into applications of foundation models such as Llama 3.1, DeepSeek-R1, etc.

https://www.youtube.com/watch?v=7xTGNNLPyMI

Rami Krispin's Data Science Channel

04 Feb, 13:31


Happy Tuesday! ☀️

Edition 22 of my newsletter is out 👇🏼

This week's agenda:
Open Source of the Week - The LazyGit project
New learning resources - Python Object Oriented Programming, Jax tutorial, Python init file, PyTest best practices
Book of the week - Introduction to NFL Analytics with R 🏈 by Brad Congelio


⭐️ Subscribe to get weekly updates
https://www.linkedin.com/pulse/lazygit-project-rami-krispin-zz82c

Rami Krispin's Data Science Channel

03 Feb, 16:47


There is no reason if you are working with Docker 🐳 that your terminal inside the container won't feel and look like your local one 🚀. I recently created a new dockerized Python development environment template with UV and added my Zsh settings. This includes:
 Zsh and Oh-My-Zsh setting
 Mount my local zsh history
 Zsh syntax highlighting
 Colorls


Planning to add a few more CLI tools such as btop, bat, fzf, zsh-autosuggestions, etc.

You can fork your template and create your own version:
https://github.com/RamiKrispin/vscode-python-uv-template

Rami Krispin's Data Science Channel

02 Feb, 21:57


The init.py 🐍

The following short tutorial by Tech with Tim explains the functionality of the init.py in Python 👇🏼

https://www.youtube.com/watch?v=VEbuZox5qC4

Rami Krispin's Data Science Channel

02 Feb, 19:30


JAX Tutorial

This one-hour tutorial from NeuralNine provides an introduction to JAX library - Google's machine learning framework for transforming numerical functions

https://www.youtube.com/watch?v=wq-UsiOkBRU

Rami Krispin's Data Science Channel

02 Feb, 16:37


Every week, I share a data science book in my newsletter. In this week's edition, the focus was one of my favorite topics - time series forecasting 🎯.

The Modern Time Series Forecasting with Python (2nd edition) by Manu Joseph and Jeffrey Tackes is a new book that focuses, as the name implies, on forecasting with Python with machine learning and deep learning models. The book covers both foundations and advanced topics in time series forecasting, such as:
Working with time series data
Exploratory analysis of time series data
Core statistical models (ARIMA, exponential smoothing, etc.)
Forecasting with regression models
Feature engineering
Forecasting with machine learning
Ensembling and stacking methods
Global forecasting models
Forecasting with deep learning

The book is for data scientists or data analysts who want to learn about forecasting using industry standards. Most of the topics in the book are explained from the ground up.

Rami Krispin's Data Science Channel

01 Feb, 18:54


TIL about The F**K, a cool CLI tool that corrects errors in previous console commands 👇🏼

https://github.com/nvbn/thefuck

Rami Krispin's Data Science Channel

01 Feb, 15:48


Happy Saturday! ☀️

Continue with the VScode extensions sequence 🚀. The 4th edition focuses on extensions for working with Python in VScode. Next week's edition will focus on extensions for working with Docker on VScode 🐳.

Subscribe to get weekly updates 👇🏼

https://www.linkedin.com/pulse/five-vscode-extensions-working-python-rami-krispin-nlqnc

Rami Krispin's Data Science Channel

31 Jan, 16:50


PyTest Best Practices

The following tutorial by Josh provides examples and best practices for writing tests in Python with PyTest 👇🏼

https://www.youtube.com/watch?v=WxMFCfFRY2w

Rami Krispin's Data Science Channel

31 Jan, 12:56


The project provides a built-in UI based on a steamlit application to analyze the results.

License: Apache 2.0 🦄

Source code: https://github.com/plurai-ai/intellagent

Image credit: project repo

Rami Krispin's Data Science Channel

24 Jan, 13:00


Edition 20 is out! 🎉

This week's agenda:
➡️ Open Source of the Week - The PydanticAI project
➡️ New learning resources - PydanticAI Tutorial, Deep Learning research explained, Feature Engineering 101, Agentic Analytics with PhiData and DuckDB, Python Full Course
➡️ Book of the week - Julia for Data Analysis by Bogumil Kamiński

https://www.linkedin.com/pulse/pydanticai-project-agentic-analytics-phidata-duckdb-julia-krispin-kqlyc

Rami Krispin's Data Science Channel

24 Jan, 03:41


Introduction to PydanticAI - a new Python framework for GenAI model validation:
https://www.youtube.com/watch?v=YKRqnWLZbpU

Rami Krispin's Data Science Channel

22 Jan, 17:19


Gaussian Elimination From Scratch in Python 🚀

The following short tutorial provides a step-by-step guide for creating a Gaussian elimination process in Python 🐍
https://www.youtube.com/watch?v=SCZEfkSSRoQ&t=328s

Rami Krispin's Data Science Channel

20 Jan, 15:28


One of the things I learned when I started to write a newsletter was that the content scope should be limited to specific topics. My first newsletter editions included mixed topics, such as time series forecasting, Docker, and general data science topics.

Then I realized that while I enjoy writing about both Docker and forecasting, it is not necessarily the case that folks who are interested in forecasting are also interested in Docker and the other way around.

Therefore, I narrowed my newsletter scope and decided to dedicate separate newsletters to forecasting and MLOps/Docker. In recent weeks, I started to work on my second newsletter - the Forecaster, which will be focusing on... forecasting 🎯.

This email newsletter is going to be my brain dump about time series analysis and forecasting, and the plan is to launch it next month. While I am working on setting up the newsletter website, you can subscribe on the following link:
https://the-forecaster.beehiiv.com/subscribe

P.S. if you subscribe, please fill in the survey, thx! 🙏🏼

Rami Krispin's Data Science Channel

20 Jan, 14:08


Introduction to agent data analysis with PhiData by Mark Needham:

https://www.youtube.com/watch?v=sVBFPNW_GGc

Rami Krispin's Data Science Channel

19 Jan, 15:32


Python Course for Beginners 🚀

This Python 🐍 course by Dave Gray is a beginner-level course that focuses on the foundation of Python programming. The 9-hour course covers topics such as:
Operators
Data types
Functions, loops, and statements
Objects, classes, and OOP
Flask REST API

https://www.youtube.com/playlist?list=PL0Zuz27SZ-6MQri81d012LwP5jvFZ_scc

Rami Krispin's Data Science Channel

18 Jan, 14:24


Happy Saturday!

Second weekend edition focuses on VScode extensions, this time great utility extensions 🚀.

Next week edition will focus on VScode extensions for working with data.

https://www.linkedin.com/pulse/five-vscode-utility-extensions-rami-krispin-ipwsc

Rami Krispin's Data Science Channel

18 Jan, 12:04


A short tutorial for running LLMs locally with Ollama:
https://www.youtube.com/watch?v=UtSSMs6ObqY

Rami Krispin's Data Science Channel

15 Jan, 17:20


Deep Learning research explained

The following 1-hour tutorial explains how to read and implement effectively deep-learning papers

https://www.youtube.com/watch?v=onU5Hbb3qao

Rami Krispin's Data Science Channel

15 Jan, 04:11


Object Oriented Programming with Python 🚀

A two-hour course about OOP with Python for beginner level

https://www.youtube.com/watch?v=LjmgrupmAl4

Rami Krispin's Data Science Channel

14 Jan, 13:58


Happy Tuesday! ☀️

I started to get bored with the old newsletter cover, so I decided to start leveraging MidJoureny more for creating weekly covers 😅

This week's agenda:
Open Source of the Week - The RTutor project
New learning resources - Git course, Creating Python library, VScode Python settings, States Space models with deep learning, Loading data with AWS EMR and Airflow, Decorators in Python
Book of the week - Connecting the Dots by Milan Janosov

Are you enjoying reading? Please do share! 🙏🏼

https://www.linkedin.com/pulse/rtutor-project-python-resources-git-full-course-rami-krispin-isvzc

Rami Krispin's Data Science Channel

13 Jan, 12:08


Agents in Production

A new survey by YouGot.Us focuses on the deployment of AI agents in production. The survey focuses on the characteristics of the companies, roles, and skills in the industry that are involved in AI agents development and deployments:
https://yougot.us/news/2024-12-28-AI-Agents-Survey-Results/

Thanks to Martin Stein for sharing the survey!

Rami Krispin's Data Science Channel

13 Jan, 00:43


A short tutorial for OOP in Python
https://www.youtube.com/watch?v=rLyYb7BFgQI

Rami Krispin's Data Science Channel

11 Jan, 23:27


Feature Engineering 101

The following tutorial by NeuralNine provides an introduction to feature engineering for machine learning applications in Python.

https://www.youtube.com/watch?v=kGemHLOEF3w

Rami Krispin's Data Science Channel

11 Jan, 15:09


Five VScode extensions for Git 🚀

I started a sequence of weekend editions of my newsletter that are going to focus on VScode extensions for data science and engineering applications. This weekend - VScode extensions for Git applications 👇🏼

https://www.linkedin.com/pulse/five-vscode-extensions-git-rami-krispin-qajrc

Rami Krispin's Data Science Channel

10 Jan, 15:36


Build a REST API with Python

A step by step tutorial for setting up a REST API from scratch in Python.

https://www.youtube.com/watch?v=Ha3ls0EAtW8

Rami Krispin's Data Science Channel

09 Jan, 13:28


Python Full Course 🐍👇🏼

This 12-hour course by Bro Code covers the foundation of Python programming for beginners.

https://www.youtube.com/watch?v=ix9cRaBkVe0

Rami Krispin's Data Science Channel

07 Jan, 13:28


My weekly newsletter is back after the holiday break 👇🏼

This week's agenda:
Open Source of the Week - the ebook2audiobook and Latexify projects
New learning resources - a guide for setting up a new machine, 10 Python concepts, introduction to tables in R with reactable, Airflow for Beginners, Ollama + Postgres, Fine-Tuning LLMs for RAG, Introduction to Latexify
Book of the week - An Introduction to Statistical Learning


Please share this if you find it useful!

https://www.linkedin.com/pulse/introduction-statistical-learning-new-python-fine-tuning-rami-krispin-mdp3c

Rami Krispin's Data Science Channel

06 Jan, 15:18


Setting up Ruff and Pytest within a FastAPI with UV 👇🏼

The following short tutorial demonstrates how to manage libraries within a Python project using UV.

https://www.youtube.com/watch?v=ph_XLky5pRs

Rami Krispin's Data Science Channel

05 Jan, 21:26


Python Magic Methods 👇🏼

The following tutorial explains a core concept of Python - magic methods (or dunder) in Python and demonstrates how to use them.

https://www.youtube.com/watch?v=qqp6QN20CpE

Rami Krispin's Data Science Channel

04 Jan, 18:50


The ebook2audiobook is a new open-source project that enables the conversion of ebooks into audiobooks.

According to the project documentation, it supports 1124 languages 🚀.

Key features:
Web GUI interface
Docker image 🐳
Works with CPUs and GPUs

License: Apache 2.0 🦄

Source code: https://github.com/DrewThomasson/ebook2audiobook

Image credit: Project documentation

Rami Krispin's Data Science Channel

04 Jan, 17:51


A guide for setting up a Python library 👇🏼

The following tutorial by Mariya Sha provides a step-by-step guide for setting up a Python library and deploying it to the Python package manager (pip).

https://www.youtube.com/watch?v=9Ii34WheBOA

Rami Krispin's Data Science Channel

03 Jan, 20:35


Loading data with AWS EMR and Airflow 🚀

The following tutorial by Marc Lamberti focuses on loading data into S3 Iceberg Tables with AWS EMR and Airflow.

https://www.youtube.com/watch?v=3q9PozvkV-c

Rami Krispin's Data Science Channel

03 Jan, 15:05


Decorators in Python 🚀

The following tutorial covers what decorators in Python are and when to use them:

https://www.youtube.com/watch?v=BeNH2WdETYc

Rami Krispin's Data Science Channel

02 Jan, 14:20


The combination of my low memory and the need to set up a new machine often made me to document the process a few years ago. Over time, it became a tutorial for setting up a new machine with core data science tools.

During the holiday break, I found the time to refresh and update this tutorial, and it covers the following topics:
Setting up git and ssh to GitHub ⚙️
Installing and setting up CLI tools 🛠️
Installing Docker 🐳
Setting up Postgres 🐘
Setting up VScode
Installing tools for Python 🐍
Installing R and Positron
General purpose tools

🔗 https://github.com/RamiKrispin/awesome-ds-setting

Rami Krispin's Data Science Channel

01 Jan, 15:39


State-Space Models and Deep Learning 🚀

The talk by Mike Erlihson focuses on using deep learning models for state space applications.

https://www.youtube.com/watch?v=15jTs82U2SI

Rami Krispin's Data Science Channel

01 Jan, 14:19


Happy New Year! 🎉

Are you considering learning a new data science or engineering skill as part of your New Year’s resolution? Here is a collection of free courses and resources covering a variety of topics, including programming, statistics, math, machine learning, deep learning, natural language processing (NLP), large language models (LLM), and more.

Please share this if you find it useful!

https://www.linkedin.com/pulse/curated-list-data-science-free-courses-rami-krispin-tfjwc

Rami Krispin's Data Science Channel

31 Dec, 21:42


A great VScode tip for creating new folder with shortcut 👇🏼

https://www.youtube.com/shorts/wUcYp_J93VM

Rami Krispin's Data Science Channel

31 Dec, 16:05


Hi folks,

I just wanted to wish you all a Happy New Year! 🎉

Thank you for supporting this channel! 🙏🏼

Rami

Rami Krispin's Data Science Channel

30 Dec, 18:14


A tutorial for query Polars DataFrames with SQL using Marimo Notebooks by BugBytes 👇🏼

https://www.youtube.com/watch?v=XP4fvOsXLAM

Rami Krispin's Data Science Channel

28 Dec, 18:22


I started yesterday to play with uv - Python new package and projects manager, and here is my learning so far:
Simple and easy to use
Super faster!
So far, I am aware of two main approaches to setting up a Python development environment with uv:
➡️ Setting up a virtual environment
➡️ Setting up a project

Setting up a virtual environment is fairly similar to setting up venv with python venv [arguments] just using uv:
uv venv [arguments]
So, if your previous workflow was based on venv, moving to uv is straightforward, and the installation is faster.
The main difference between using venv and project is related to your workflow and preference.

I created an initial GitHub template for a dockerized 🐳 Python 🐍 dev environment using uv:
https://github.com/RamiKrispin/vscode-python-uv-template

It will probably get updated once I start using it.

Rami Krispin's Data Science Channel

28 Dec, 12:41


VScode setting for Python 🚀
A great tutorial for setting up VScode for Python development by ArjanCodes 👇🏼

https://www.youtube.com/watch?v=PwGKhvqJCQM

Rami Krispin's Data Science Channel

27 Dec, 18:06


Here is a shot tutorial by NeuralNine that provides an introduction to the latexify library:
https://www.youtube.com/watch?v=KC2R9JySqbU

Rami Krispin's Data Science Channel

27 Dec, 15:50


TIL about latexify - a Python 🐍 library from Google to generate LaTeX expressions from Python code 👇🏼

https://github.com/google/latexify_py

Rami Krispin's Data Science Channel

26 Dec, 17:50


Full Git Course 🚀

This four hours course provides an in-depth introduction to git 👇🏼

https://www.youtube.com/watch?v=rH3zE7VlIMs

Rami Krispin's Data Science Channel

25 Dec, 16:30


Are you looking for something to learn in the coming break? My LinkedIn Learning course - Data Pipeline Automation with GitHub Actions Using R and Python- is open for a limited time 👇🏼

The course provides an introduction to setting up automation with GitHub Actions with both R and Python 🚀. Throughout the course, we will use a real-life example by working with the U.S. Energy Information Administration (EIA) API for data automation. This includes:

Learn how to work with the EIA API
Define the data pipeline scope and characteristics
Set functions to pull data and metadata from the API
Set data backfill and refresh process
Deploy the process to GitHub Actions
Create a monitoring dashboard and deploy it on GitHub Pages

Happy Holidays and Happy Automation! 😎

https://www.linkedin.com/posts/rami-krispin_data-dataengineering-datascience-activity-7277721990117957632-waOz?utm_source=share&utm_medium=member_desktop

Rami Krispin's Data Science Channel

24 Dec, 15:40


Me trying to create an image with the text "Happy Holidays & Happy New Year" using Midjourney 😎

Midjourney: No problem! Just make sure you run spell check 🤪

Me: 🤦🏻‍♂️

Happy Holidays & Happy New Year!

Rami Krispin's Data Science Channel

05 Dec, 14:30


Last but not least, a VScode dockerized project template:
https://github.com/RamiKrispin/vscode-r-template

Rami Krispin's Data Science Channel

05 Dec, 14:30


A tutorial for setting up a dockerized R development environment with VScode, Dev Containers extension, and Docker:
https://github.com/RamiKrispin/vscode-r

Rami Krispin's Data Science Channel

05 Dec, 14:29


A tutorial for customizing and launching RStudio Server inside a container with Docker Compose:
https://towardsdatascience.com/customizing-rstudio-container-with-docker-compose-60cdfe0e8894

Rami Krispin's Data Science Channel

05 Dec, 14:29


A step-by-step guide for setting up and customizing an RStudio server inside a container with your local RStudio settings:
https://medium.com/towards-data-science/running-rstudio-inside-a-container-e9db5e809ff8

Rami Krispin's Data Science Channel

05 Dec, 14:28


🐳+R = ❤️
If you want to learn how to dockerize your R development environment, here are some tutorials I created in the past year 🧶👇🏼

Rami Krispin's Data Science Channel

04 Dec, 14:46


Edition 15 is out!

This week's agenda:
Open Source of the Week - the nixtlar project
New learning resources - Gaussian Processes with PyMC, data science with the Positron IDE, introduction to multimodal embeddings, animations in Python with Matplotlib
Book of the week - Algorithms for Decision Making by Mykel J. Kochenderfer, Tim A. Wheeler, and Kyle H. Wray

https://www.linkedin.com/pulse/nixtlar-library-gaussian-processes-pymc-algorithms-decision-krispin-eigkc/

Rami Krispin's Data Science Channel

03 Dec, 13:28


The following events are based on a true story... 🤣

Credit: David Darnes

Rami Krispin's Data Science Channel

02 Dec, 04:21


Following today's post about animation with Python, here is another GREAT tutorial by Grant Sanderson. Grant Sanderson has one of the most amazing YouTube channels - 3Blue1Brown, where he uses animation to explain math concepts. He created the manim (Mathematical Animation Engine) Python library to support his tutorials 📽️👇🏼

https://www.youtube.com/watch?v=rbu7Zu5X1zI

Rami Krispin's Data Science Channel

01 Dec, 21:08


Introduction to Multimodal Embeddings Shaw Talebi 👇🏼

https://www.youtube.com/watch?v=YOvxh_ma5qE

Rami Krispin's Data Science Channel

01 Dec, 16:15


Here is the output from the tutorial repo:

Rami Krispin's Data Science Channel

01 Dec, 16:01


Not a big fan of Matplotlib, but I love the animation plots. Here is a great tutorial for creating bar plot animation by Keith Galli 👇🏼

https://www.youtube.com/watch?v=mafzIn8TneQ

Rami Krispin's Data Science Channel

30 Nov, 22:48


An Introduction to Polars 🐻‍❄️

The Polars workshop from the PyData NYC conference is now available online. This great workshop, by Matt Harrison, focuses on the foundation of Polars 📽️ 👇🏼

https://www.youtube.com/watch?v=q3o2IdFQTOE

Rami Krispin's Data Science Channel

30 Nov, 14:14


Full review is available in this edition👆🏼👆🏼👆🏼

Rami Krispin's Data Science Channel

30 Nov, 14:13


Every week, I review open-source projects in my weekly newsletter. This week, the focus was on the Feature-engine Python library 🐍. Feature engineering is the fuel of machine learning models. The Feature-engine project, as the name implies, is a Python library for feature engineering applications. The library provides a set of tools to create features for machine learning models. This includes the following key functionality:
Tools to handle and impute missing values
Encoding of categorical variables
Feature for handling outliers
Creating new features
Functions for setting up features for time series data
Discretization
Scaling
Preprocessing

Project repo: https://github.com/feature-engine/feature_engine

Rami Krispin's Data Science Channel

29 Nov, 23:15


Times and Dates in Pandas ❤️

Here is a great talk by Reuven Lerner from the PyData Tel Aviv conference that I wish it was existed when I started working with Pandas for time series applications. The talk, as the name implies, focused on working with date and time objects in Pandas.

https://www.youtube.com/watch?v=J-7xcs8nq7s

Rami Krispin's Data Science Channel

28 Nov, 15:51


Also, me running to test it 😂

Rami Krispin's Data Science Channel

28 Nov, 15:44


Holy 🐮, TIL that Nixtla has R version - nixtlar!

https://nixtla.github.io/nixtlar/index.html

Rami Krispin's Data Science Channel

28 Nov, 14:06


Multi-Arch Builds 🚀

Are you working on Apple Slicon (M1 - M4) and trying to deploy your code on GitHub Actions with Docker 🐳? It probably will fail as Apple Silicon is based on ARM64 CPU architecture and GitHub Actions using Intel x86 architecture. This is where multi-arch builds comes into action. The following tutorial by Abhishek Veeramalla focuses on setting up multi-arch builds.

https://www.youtube.com/watch?v=lx00Do4yEpQ

Rami Krispin's Data Science Channel

28 Nov, 06:30


All the talks from the Øredev 2024 developers conference are now available online:
https://www.youtube.com/playlist?list=PLOUKmSqExtAFpg3krEd6CXr3uIyUgP97b

Rami Krispin's Data Science Channel

27 Nov, 22:45


Last but not least, a tutorial for setting multi-stage build:
https://medium.com/towards-data-science/introduction-to-multi-stage-image-build-for-python-41b94ebe8bb3

Rami Krispin's Data Science Channel

27 Nov, 22:45


An in-depth tutorial is available on the following repo:
https://github.com/RamiKrispin/vscode-python

Rami Krispin's Data Science Channel

27 Nov, 22:45


And here is the "Elegant Way" - setting up a Python development environment with VScode and the Dev Containers extension:
https://medium.com/p/f716ef85571d

Rami Krispin's Data Science Channel

27 Nov, 22:45


Set a Python environment via the command line (or the "Hard Way"):
https://medium.com/p/e62531bca7a0

Rami Krispin's Data Science Channel

27 Nov, 22:45


🐳+🐍 = ❤️
If you want to learn how to dockerize your Python development environment, here are some tutorials I created in the past year 🧶👇🏼

Rami Krispin's Data Science Channel

27 Nov, 14:16


Happy Wednesday and happy Thanksgiving 🦃 for US folks!

Edition 14 is out 🚀, this week's agenda:
Open Source of the Week - the Feature-engine project
New learning resources - Deploy data pipeline on Github Actions, time series EDA, times and dates in Pandas, ML pipeline on GitHub, Docker multi-arch builds, introduction to the dverse library, Ollama course, adaptive prediction intervals
Book of the week - Python Feature Engineering Cookbook by Soledad Galli

https://www.linkedin.com/pulse/feature-engineering-python-data-ml-pipelines-github-actions-krispin-ehs2c

Rami Krispin's Data Science Channel

26 Nov, 16:50


Ollama course 🚀

This new course by Paulo Dichone and freeCodeCamp focuses on building AI applications locally with Ollama. The course covers the following topics:
Pulling and customizing models
Python integration
RAG system

📽️ https://www.youtube.com/watch?v=GWB9ApTPTv4

Rami Krispin's Data Science Channel

25 Nov, 19:56


My talk at the PyData NYC conference 2024 about Deploy & Monitor ML Pipelines with Python 🐍, Docker 🐳 and GitHub Actions is now available online.

Thank you to the conference organizers for the invite and organize this great conference!

📽️ https://www.youtube.com/watch?v=YM3UrQd2wEA

Rami Krispin's Data Science Channel

25 Nov, 18:15


The reactable R library is now available in Python! 👇🏼

https://machow.github.io/reactable-py/get-started/index.html

Rami Krispin's Data Science Channel

25 Nov, 15:08


A short intro to the new Python package management uv by Mehdio Ouazza 👇🏼

📽️ https://www.youtube.com/watch?v=goIwKjsEPOI

Rami Krispin's Data Science Channel

24 Nov, 16:17


The book is available to purchase on Amazon:
https://www.amazon.com/Python-Feature-Engineering-Cookbook-complete-ebook/dp/B0DBJ2RYXH/

Rami Krispin's Data Science Channel

24 Nov, 16:05


Congratulations to my LinkedIn friend Soledad Galli for the release of her new book - 𝐏𝐲𝐭𝐡𝐨𝐧 𝐅𝐞𝐚𝐭𝐮𝐫𝐞 𝐄𝐧𝐠𝐢𝐧𝐞𝐞𝐫𝐢𝐧𝐠 𝐂𝐨𝐨𝐤𝐛𝐨𝐨𝐤! 🎉

Feature engineering is one of the core elements of data science, and I am excited to see a data science book that is fully dedicated to this topic ❤️. Soledad is the author and maintainer of the 𝐅𝐞𝐚𝐭𝐮𝐫𝐞-𝐞𝐧𝐠𝐢𝐧𝐞 Python 🐍 library that, as the name implies, focuses on feature engineering for machine learning applications.

The book covers the following topics:
Multiple approaches for missing values imputation
Encode categorical variables
Transform numerical variables
Identify and handle outliers
Time series features
Features scaling

The book is for folks who are interested in getting started with machine learning applications and practitioners who wish to deepen their knowledge in this domain.

Rami Krispin's Data Science Channel

24 Nov, 15:52


This looks like an interesting project - AdalFlow for LLMops applications 👇🏼
https://github.com/SylphAI-Inc/AdalFlow

Rami Krispin's Data Science Channel

24 Nov, 04:30


Source code: https://github.com/nrennie/messy

Rami Krispin's Data Science Channel

24 Nov, 04:29


The Messy Library

The messy library is a new R project by Nicola Rennie that makes a data frame messy and untidy for learning purposes. In other words, it takes academic-like datasets (e.g., nice and clean) and turns them into messy ones by adding missing values, typos, white spaces, etc. This enables the learners to encounter real-life data issues and learn how to handle them.

Rami Krispin's Data Science Channel

21 Nov, 16:52


Multimodal AI

This short video by Shaw Talebi provides an introduction to Multimodal (Large) Language Models. This type of models combines text and video/audio models to create outputs. For example, voice -> text -> image/video.

https://www.youtube.com/watch?v=Ot2c5MKN_-w&t=346s

Rami Krispin's Data Science Channel

20 Nov, 13:08


The supplyseer Library 👇🏼

The supplyseer is a new Python library for applied computational supply chain & logistics applications. This library, by Jako Rostami and Lambert Rutaganda , provides modeling solutions for supply chain and logistics challenges, from forecasting applications to inventory optimization.

Source code: https://github.com/supplyseer-ai/supplyseer/tree/develop

Rami Krispin's Data Science Channel

19 Nov, 13:03


Happy Tuesday! ☀️

Edition 13 of the newsletter is out! 🚀

This week's agenda:
Open Source of the Week - new projects: supplyseer, messy, scoutbaR, targetsboard, froggeR, and more
Learning resources - NGINX tutorial, graph neural networks, visualize data lineage with Airflow, running Ollama on Kubernetes
Book of the week - Deep Learning by Prof. John D.Kelleher

https://www.linkedin.com/pulse/new-open-source-projects-nginx-tutorial-running-ollama-rami-krispin-4ld3c

Rami Krispin's Data Science Channel

18 Nov, 04:08


That’s a huge news for the Keras users community, I wonder how it will impact the future of this project.

https://developers.googleblog.com/en/farewell-and-thank-you-for-the-continued-partnership-francois-chollet/

Rami Krispin's Data Science Channel

17 Nov, 23:34


This looks like an interesting webinar:
https://www.youtube.com/watch?v=_1eegWElgEM&ab_channel=CanonicalUbuntu

Rami Krispin's Data Science Channel

17 Nov, 05:22


A short tutorial for setting a data lineage process with Airflow and Marquez by George Yates 👇🏼

📽️ https://www.youtube.com/watch?v=7cW-MCs0QpU

Rami Krispin's Data Science Channel

16 Nov, 17:50


The weekend edition of my newsletter is dedicated to Bluesky data starter packs 👇🏼

https://www.linkedin.com/pulse/bluesky-data-starter-packs-rami-krispin-cyldc/

Rami Krispin's Data Science Channel

16 Nov, 13:28


Google AI responds to a student...

Also, movie recommendation for the weekend - Terminator 2 😅

Rami Krispin's Data Science Channel

15 Nov, 20:54


Quarto Python Crash Course 🚀

This Quarto crash course by Keith Galli focuses on Quarto core applications for Python 🐍. This includes:
Quarto markdown features
Styling
Creating a HTML, PDF, and Docx documents
Quarto dashboard
Quarto slides

📽️ https://www.youtube.com/watch?v=_VKxTPWDhA4

Rami Krispin's Data Science Channel

15 Nov, 02:32


Full NGINX Tutorial - Demo Project with Node.js, Docker

https://www.youtube.com/watch?v=q8OleYuqntY

Rami Krispin's Data Science Channel

14 Nov, 14:19


Build and Deploy a RAG Chatbot 👇🏼

This two-hour tutorial by Ania Kubow and freeCodeCamp focuses on the steps of building and deploying a RAG chatbot using tools such as JavaScript, LangChain.js, Next.js, Vercel, and OpenAI.

https://youtu.be/d-VKYF4Zow0

Rami Krispin's Data Science Channel

14 Nov, 01:23


A short tutorial for getting started with Polars 🐻‍❄️👇🏼
https://www.youtube.com/watch?v=gL_mGbwgSkE

Rami Krispin's Data Science Channel

13 Nov, 16:47


Project of the Week - dlt 🚀

The dlt (data load tool) Python library is a relatively new project for data engineering applications from dltHub. The library provides a framework for data ingestion from one destination to another, for example, data ingestion from an API to a Postgres database on AWS. The library's main goal is to simplify the ELT (extract, load, transform) process.

Code: https://github.com/dlt-hub/dlt
Docs: https://dlthub.com/docs/intro

Rami Krispin's Data Science Channel

12 Nov, 22:33


I love the improvements and the integration (I assume) of the Gemini model, and WebAssembly in the Google search engine to answer coding questions. So far, it works great with core Python 🐍 applications 👇🏼

Rami Krispin's Data Science Channel

12 Nov, 12:34


This week in my newsletter:
Open Source of the Week - the dlt project
Learning resources - Hugging Face code generation with LLM, building RAG application with JS, insurance premium prediction
Book of the week - Data Science at the Command Line by Jeroen Janssens

https://www.linkedin.com/pulse/dlt-project-data-science-command-line-rami-krispin-rac4c

Rami Krispin's Data Science Channel

11 Nov, 18:20


Stanford CS234 - Reinforcement Learning 🚀

Stanford released a new course that focuses on reinforcement learning. Prof Emma Brunskill teaches this full-semester course, which provides an introduction to the field. It includes basics topics of reinforcement learning and as well advanced topics such as deep reinforcement learning. The course required proficency with Python, basic knowledge of linear algebra, probability, and machine learning.

https://www.youtube.com/playlist?list=PLoROMvodv4rN4wG6Nk6sNpTEbuOSosZdX

Rami Krispin's Data Science Channel

11 Nov, 14:27


If you are using the Meta's Llama 3 model or want to get started, the Llama Recipes project is a great resource. This project, as the name implies, provides a set of examples and use cases for the Llama 3 models, and it includes to following categories:
Quick start
Use cases
Integration with 3rd party tools
Responsible AI
Experimental

This project supports the latest LLama models - Llama 3.2 Vision and Llama 3.2 Text.

https://github.com/meta-llama/llama-recipes

Rami Krispin's Data Science Channel

09 Nov, 15:36


Happy Saturday! ☀️

This week's agenda:
Open Source of the Week - the Llama Recipes
Learning resources - Reinforcement learning and probabilistic methods in combinatorics courses, getting started with Quarto dashboard
Book of the week - Distributed Machine Learning Patterns by Yuan Tang

https://www.linkedin.com/pulse/llama-recipes-reinforcement-learning-probabilistic-methods-krispin-okuvc

Rami Krispin's Data Science Channel

08 Nov, 18:26


Probabilistic Methods in Combinatorics 🚀

MIT released a new graduate-level course in statistics tought by Prof. Yufei Zhao. The course provides an introduction to probabilistic methods and its applications in combinatorics and theoretical computer science.

📽️ https://www.youtube.com/playlist?list=PLUl4u3cNGP61cYB5ymvFiEbIb-wWHfaqO

Rami Krispin's Data Science Channel

08 Nov, 14:17


Quarto Dashboards 1: Hello, Dashboards! 🚀

Quarto Dashboards is one of my favorite frameworks for serverless dashboarding ❤️. It is language agnostic (R, Python, Julia, Observable) and easy to set and deploy (e.g., GitHub Pages). Here is an introductory tutorial by Mine Çetinkaya-Rundel:

📽️: https://www.youtube.com/watch?v=HW7QbqI4fH0

This is part one of a sequence of tutorials.

Rami Krispin's Data Science Channel

03 Nov, 02:59


A great introduction to the Python 🐍 f-string function 👇🏼
📽️: https://www.youtube.com/watch?v=xDTOz3qCOVM

Rami Krispin's Data Science Channel

02 Nov, 18:58


Are you talking about starting a newsletter? Here is what I learned from the first 10 weeks of publishing a weekly newsletter and why I chose LinkedIn over Substack 👇🏼

https://www.linkedin.com/feed/update/urn:li:activity:7258550383516483585/

Rami Krispin's Data Science Channel

02 Nov, 14:13


Here is a great list of 15 Python libraries for data engineering by Mehdi Ouazza:

https://www.youtube.com/watch?v=tEMhG9Pjaf4

Rami Krispin's Data Science Channel

01 Nov, 21:16


The talks from the Posit Conf 2024 are now available online:
https://www.youtube.com/playlist?list=PL9HYL-VRX0oSFkdF4fJeY63eGDvgofcbn

Rami Krispin's Data Science Channel

01 Nov, 14:14


Edition 10 is out! 🚀

This week's agenda:
Open Source of the Week - The Elmer project, Shiny new release for Python, and the Narwhals library
Learning resources - The GitHub Universe and PyData Amsterdam conferences
Book of the week - Mastering NLP from Foundations to LLMs

https://www.linkedin.com/pulse/elmer-project-new-shiny-release-python-mastering-nlp-from-krispin-r8mcc/

Rami Krispin's Data Science Channel

29 Oct, 13:01


Book of the week - Learning Python for Data 🚀

The Learning Python for Data by 🐍 Matt Harrison is a great book for starting with Python for data applications. The book focuses both on the foundation of Python and a variety of data-related topics.

As someone who started with R and uses both languages, I often use this book to get a clear understanding of Python core concepts such as OOP and methods. The book covers the following topics:
Variable, objects, and functions
Iteration methods such as For and While loops
Python core data structures
Working with Pandas
NumPy applications
Classes and methods

The book is available to purchase on Amazon and as well on Matt's website:
https://store.metasnake.com/learningpy

Rami Krispin's Data Science Channel

25 Oct, 16:20


Every week, I review an open source in my weekly newsletter. This week, the focus is on the CopilotKit project 🪁.

I was not aware of this project until I met some of the project's contributors at the Open Source GenAI & ML Summit Europe 2024 conference. It is amazing to see this project's and community's fast growth, with more than 12k stars ⭐️ on Github.

In a nutshell, this project provides a framework for AI application developers to incorporate their AI applications, similar to how data science frameworks such as Shiny, Dash, and Streamlit provide a UI for data scientists to deploy their applications 🚀.

Below is one of the cool demos available on the project repo.

Source code: https://github.com/CopilotKit/CopilotKit

More details are available in this week's edition 👉🏼: https://www.linkedin.com/pulse/copilotkit-project-data-engineering-mlops-science-fine-rami-krispin-3d7qc

Rami Krispin's Data Science Channel

25 Oct, 13:25


Writing Better R Code 🚀

Here are some great tips for writing code in R from Nicola Rennie's workshop. This includes tips for setting up projects, organizing scripts, and working with git and github from Rstudio.

Slides 🔗: https://nrennie.rbind.io/training-better-r-code/slides/slides.html#/title-slide
Workshop website 🔗: https://nrennie.rbind.io/training-better-r-code/

Rami Krispin's Data Science Channel

25 Oct, 02:45


Coalesce 2024 👇🏼

The Coalesce 2024 conference is one of the main data engineering conferences, and it mainly focused on dbt applications with data engineering. Thanks to the dbt Labs team for making the talks available online.

https://www.youtube.com/playlist?list=PL0QYlrC86xQnWJ72sJlzDqPS0peE7j9Ed

Rami Krispin's Data Science Channel

24 Oct, 13:23


Happy Thursday! ☀️

This week's on my newsletter:
Open Source of the Week - the CopilotKit project
New learning resources - main focus on data engineering
Book the week - Learning Python for Data

The theme of the next edition will be dedicated to VScode ❤️

https://www.linkedin.com/pulse/copilotkit-project-data-engineering-mlops-science-fine-rami-krispin-3d7qc

Rami Krispin's Data Science Channel

23 Oct, 14:42


One of the things I like about running workshops about GitHub Actions is setting up some fun pipelines and seeing them work smoothly months afterward 😎.

Below is an example of an ML pipeline I created for a workshop at the useR!2024 conference four months ago. The pipeline runs daily to refresh data and forecast California's hourly demand for electricity. I used here R and modeltime to set the forecasting models 🚀.

Here are my favorite tools for setting pipeline with GitHub Actions:
🌟 Docker for setting the environment and deploying the code
🌟 Quarto docs to run the data and modeling pipelines as it provides a great visual representative of the pipeline and is easier to debug
🌟 Last but not least, I use the Quarto dashboard to visualize the pipeline output

This framework is applicable for both R, Python, and Julia 🚀

Here are some tips for setting up a robust pipeline:
Deploy with Docker 🐳
Set up unit tests 🛠️
Capture logs 📝

🔗 https://github.com/RamiKrispin/useR2024-pipeline-workshop

Rami Krispin's Data Science Channel

23 Oct, 02:39


Here is a great tutorial by Juan Orduz for time series forecasting with NumPyro from the PyData Amsterdam 2024 👇🏼

https://www.youtube.com/watch?v=9Q6r2w0CDB0

Rami Krispin's Data Science Channel

22 Oct, 13:43


I could not find exactly who this guy is, but he is definitely the funniest Excel person 🤣

Credit: shared by 9gag

Rami Krispin's Data Science Channel

22 Oct, 13:06


Here is an example of setting an ETL pipeline with AWS Redshift, Airflow, and dbt by George Yates 👇🏼

https://www.youtube.com/watch?v=OH1bT7hYdgw

Rami Krispin's Data Science Channel

21 Oct, 17:48


Here is a great summary of the Python 🐍 Poetry core functionality by Eric Roby. Poetry is Python packaging and dependency management equivalent to conda but a completely open-source project under an MIT license 🦄.

📽️: https://www.youtube.com/watch?v=nrm8Lre-x_8

Rami Krispin's Data Science Channel

21 Oct, 13:36


Fine-Tuning Large Language Models 🚀

I posted a few weeks ago about Oren Sultan's tutorial about fine-tuning LLMs. Here is the second part of the tutorial, which focuses on real-life examples of fine-tuning LLM using video editing with Lightrick - a framework for processing and editing photos and videos with GenAI 🎯.

📽️: https://www.youtube.com/watch?v=8Z1T6YShwMY

This example is based on the following paper 📝:
https://arxiv.org/pdf/2410.02952

Rami Krispin's Data Science Channel

21 Oct, 01:05


Getting started with Streamlit 🚀

Here is a new Streamli crash course by Tim (from Tech with Tim). Streamlit is a Python 🐍 library that enables you to set up interactive dashboards 🎯. The course required basic Python knowledge.

📽️: https://www.youtube.com/watch?v=o8p7uQCGD0U

Rami Krispin's Data Science Channel

20 Oct, 17:07


All the talks and workshops from PyCon 2024 are now available online. This includes some great tutorials about data visualization with Python, data analytics with Ibis and DuckDb, and writing Python modules in Rust:
https://www.youtube.com/playlist?list=PL2Uw4_HvXqvYhjub9bw4uDAmNtprgAvlJ

Rami Krispin's Data Science Channel

19 Oct, 14:00


Happy Saturday! ☀️

Calculus Visualized is a three-course by Dennis F Davis that focuses on, as the name implies, learning calculus concepts using data visualization tools. I did not watch it, but it looks like a fun way to learn and understand calculus concepts such as derivatives, limits, integrals, etc.

📽️: https://www.youtube.com/watch?v=MO-AExWdl4Q

Rami Krispin's Data Science Channel

18 Oct, 18:11


This Python library is widely used for setting Bayesian and probabilistic applications such as MCMC algorithms, Hilbert Space Gaussian Process module, time series forecasting, Bayesian Neural Network, and others.

The library's documentation has a variety of examples of different use cases and supporting applications.

Rami Krispin's Data Science Channel

18 Oct, 18:11


Every week, I review an open source in my weekly newsletter. This week, the focus is on the  NumPyro.

The NumPyro is a lightweight probabilistic programming framework that provides a NumPy backend for the Pyro project (a deep universal probabilistic programming with Python and PyTorch). It is based on the JAX framework for automatic differentiation and JIT compilation to GPU / CPU.

Repo: https://github.com/pyro-ppl/numpyro

Rami Krispin's Data Science Channel

18 Oct, 03:25


I had the pleasure of presenting today about analyzing time series with cluster analysis methods in the "Workshops for Ukraine" series.

Workshop materials are available on the below links:
Code: https://github.com/RamiKrispin/ts-cluster-analysis-r
Dashboard: ramikrispin.github.io/ts-cluster-analysis-r/

Thanks to Dariia Mykhailyshyna for organizing the event!

Rami Krispin's Data Science Channel

18 Oct, 02:21


Building a Data Pipeline with Apache Airflow 🚀

Here is a 1-hour tutorial for setting ETL with Apache Airflow and Astro by Krish Naik:

https://www.youtube.com/watch?v=Y_vQyMljDsE

Rami Krispin's Data Science Channel

17 Oct, 12:28


Probability Bootcamp ❤️

The Probability Bootcamp by Prof. Steve Brunton from the University of Washington is a new course that focuses on probability theory. This crash course focuses on the core theory of probability, and it covers topics such as sampling, conditional probability, binomial distribution, and other topics:
https://www.youtube.com/playlist?list=PLMrJAkhIeNNR3sNYvfgiKgcStwuPSts9V

Rami Krispin's Data Science Channel

16 Oct, 20:13


The talks from the AirFlow Summit 2024 are now available online:

https://www.youtube.com/playlist?list=PLGudixcDaxY2NIjMYT8t5zA9KJ47wTCkM

Rami Krispin's Data Science Channel

15 Oct, 12:07


Happy Tuesday, edition 8 is out! ☀️

This week's agenda:
Open Source of the Week - the NumPyro project
New learning resources - Probability and stats courses, learn calculus with data visualizations, prompt engineering tutorials
Book the week - Tidy Finance with R/Python

https://www.linkedin.com/pulse/probability-statistics-courses-numpyro-project-tidy-finance-krispin-lcs4c/