Data science research papers @data_science_research_papers Channel on Telegram

Data science research papers

@data_science_research_papers


Stay updated with the latest data science research! Join our Telegram channel for quick insights, cutting-edge papers, and trends in AI, machine learning, and big data.

Data science research papers (English)

Are you passionate about data science and always looking for the latest research papers in the field? Look no further, our Telegram channel 'Data science research papers' is here to provide you with a curated collection of the most cutting-edge research articles in the world of data science. Whether you are a student, researcher, or professional in the field, our channel is the perfect resource for staying up-to-date with the latest advancements and discoveries in data science. From machine learning algorithms to big data analytics, our channel covers a wide range of topics to satisfy your thirst for knowledge. Join our community of like-minded individuals who share your passion for data science and immerse yourself in the world of research papers that will expand your horizons and deepen your understanding of this rapidly evolving field. Don't miss out on this valuable opportunity to enhance your knowledge and stay ahead of the curve in the exciting world of data science research. Join 'Data science research papers' today and embark on a journey of discovery and innovation!

Data science research papers

20 Nov, 07:48


Explanation-Preserving Augmentation for Semi-Supervised Graph Representation Learning


Publication date
: 16 Oct 2024

Topic: Representation Learning

Paper
: https://arxiv.org/pdf/2410.12657v1.pdf

GitHub: https://github.com/junxia97/simgrace

Description:

In this paper, we propose a novel method, Explanation-Preserving Augmentation (EPA), that leverages graph explanation techniques for generating augmented graphs that can bridge the gap between semantics-preservation and data-perturbation. EPA first uses a small number of labels to train a graph explainer to infer sub-structures (explanations) that are most relevant to a graph's semantics. These explanations are then used to generate semantics-preserving augmentations for self-supervised GRL, namely EPA-GRL. We demonstrate theoretically, using an analytical example, and through extensive experiments on a variety of benchmark datasets that EPA-GRL outperforms the state-of-the-art (SOTA) GRL methods, which are built upon semantics-agnostic data augmentations.

Data science research papers

18 Nov, 07:48


SimCSE: Simple Contrastive Learning of Sentence Embeddings


Publication date:
EMNLP 2021

Topic: Contrastive Learning

Paper
: https://arxiv.org/pdf/2104.08821v4.pdf

GitHub: https://github.com/princeton-nlp/SimCSE

Description:

This paper presents SimCSE, a simple contrastive learning framework that greatly advances state-of-the-art sentence embeddings. We first describe an unsupervised approach, which takes an input sentence and predicts itself in a contrastive objective, with only standard dropout used as noise. This simple method works surprisingly well, performing on par with previous supervised counterparts. We find that dropout acts as minimal data augmentation, and removing it leads to a representation collapse. Then, we propose a supervised approach, which incorporates annotated pairs from natural language inference datasets into our contrastive learning framework by using "entailment" pairs as positives and "contradiction" pairs as hard negatives.

Data science research papers

16 Nov, 07:45


OrientedFormer: An End-to-End Transformer-Based Oriented Object Detector in Remote Sensing Images


Publication date:
IEEE Transactions on Geoscience and Remote Sensing 2024

Topic: Object detection

Paper
: https://arxiv.org/pdf/2409.19648v1.pdf

GitHub: https://github.com/wokaikaixinxin/OrientedFormer

Description:

In this paper, we propose an end-to-end transformer-based oriented object detector, consisting of three dedicated modules to address these issues. First, Gaussian positional encoding is proposed to encode the angle, position, and size of oriented boxes using Gaussian distributions. Second, Wasserstein self-attention is proposed to introduce geometric relations and facilitate interaction between content and positional queries by utilizing Gaussian Wasserstein distance scores. Third, oriented cross-attention is proposed to align values and positional queries by rotating sampling points around the positional query according to their angles.

Data science research papers

14 Nov, 07:36


KPCA-CAM: Visual Explainability of Deep Computer Vision Models using Kernel PCA


Publication date
: 30 Sep 2024

Topic: Image Classification

Paper
: https://arxiv.org/pdf/2410.00267v1.pdf

GitHub: https://github.com/jacobgil/pytorch-grad-cam

Description:

This research introduces KPCA-CAM, a technique designed to enhance the interpretability of Convolutional Neural Networks (CNNs) through improved class activation maps. KPCA-CAM leverages Principal Component Analysis (PCA) with the kernel trick to capture nonlinear relationships within CNN activations more effectively. By mapping data into higher-dimensional spaces with kernel functions and extracting principal components from this transformed hyperplane, KPCA-CAM provides more accurate representations of the underlying data manifold. This enables a deeper understanding of the features influencing CNN decisions.

Data science research papers

12 Nov, 07:54


MedUniSeg: 2D and 3D Medical Image Segmentation via a Prompt-driven Universal Model


Publication date
: 8 Oct 2024

Topic: Semantic Segmentation

Paper
: https://arxiv.org/pdf/2410.05905v1.pdf

GitHub: https://github.com/yeerwen/uniseg

Description:

We evaluate MedUniSeg on a comprehensive multi-modal upstream dataset consisting of 17 sub-datasets. The results demonstrate that MedUniSeg achieves superior multi-task segmentation performance, attaining a 1.2% improvement in the mean Dice score across the 17 upstream tasks compared to nnUNet baselines, while using less than 1/10 of the parameters. For tasks that underperform during the initial multi-task joint training, we freeze MedUniSeg and introduce new modules to re-learn these tasks. This approach yields an enhanced version, MedUniSeg*, which consistently outperforms MedUniSeg across all tasks.

Data science research papers

10 Nov, 07:25


Unsupervised Representation Learning from Sparse Transformation Analysis


Publication date
: 7 Oct 2024

Topic: Representation Learning

Paper
: https://arxiv.org/pdf/2410.05564v1.pdf

GitHub: https://github.com/kingjamessong/latent-flow

Description:

In this paper we propose to learn representations from sequence data by factorizing the transformations of the latent variables into sparse components. Input data are first encoded as distributions of latent activations and subsequently transformed using a probability flow model, before being decoded to predict a future input state. The flow model is decomposed into a number of rotational (divergence-free) vector fields and a number of potential flow (curl-free) fields. Our sparsity prior encourages only a small number of these fields to be active at any instant and infers the speed with which the probability flows along these fields.

Data science research papers

08 Nov, 07:44


Improved Baselines with Momentum Contrastive Learning


Publication date:
9 Mar 2020

Topic: Contrastive Learning

Paper
: https://arxiv.org/pdf/2003.04297v1.pdf

GitHub: https://github.com/facebookresearch/moco

Description:

Contrastive unsupervised learning has recently shown encouraging progress, e.g., in Momentum Contrast (MoCo) and SimCLR. In this note, we verify the effectiveness of two of SimCLR's design improvements by implementing them in the MoCo framework. With simple modifications to MoCo---namely, using an MLP projection head and more data augmentation---we establish stronger baselines that outperform SimCLR and do not require large training batches. We hope this will make state-of-the-art unsupervised learning research more accessible. Code will be made public.

Data science research papers

06 Nov, 07:43


HazyDet: Open-source Benchmark for Drone-view Object Detection with Depth-cues in Hazy Scenes


Publication date:
30 Sep 2024

Topic: Object detection

Paper
: https://arxiv.org/pdf/2409.19833v1.pdf

GitHub: https://github.com/grokcv/hazydet

Description:

We introduce HazyDet, a large-scale dataset tailored for drone-based object detection in hazy scenes. It encompasses 383,000 real-world instances, collected from both naturally hazy environments and normal scenes with synthetically imposed haze effects to simulate adverse weather conditions. By observing the significant variations in object scale and clarity under different depth and haze conditions, we designed a Depth Conditioned Detector (DeCoDet) to incorporate this prior knowledge. DeCoDet features a Multi-scale Depth-aware Detection Head that seamlessly integrates depth perception, with the resulting depth cues harnessed by a dynamic Depth Condition Kernel module.

Data science research papers

04 Nov, 07:32


One Initialization to Rule them All: Fine-tuning via Explained Variance Adaptation


Publication date
: 09 June 2024

Topic: Image Classification

Paper
: https://arxiv.org/pdf/2410.07170v1.pdf

GitHub: https://github.com/ml-jku/EVA

Description:

We propose to enhance LoRA by initializing the new weights in a data-driven manner by computing singular value decomposition on minibatches of activation vectors. Then, we initialize the LoRA matrices with the obtained right-singular vectors and re-distribute ranks among all weight matrices to explain the maximal amount of variance and continue the standard LoRA fine-tuning procedure. This results in our new method Explained Variance Adaptation (EVA). We apply EVA to a variety of fine-tuning tasks ranging from language generation and understanding to image classification and reinforcement learning. EVA exhibits faster convergence than competitors and attains the highest average score across a multitude of tasks per domain.

Data science research papers

02 Nov, 07:48


Towards Natural Image Matting in the Wild via Real-Scenario Prior


Publication date
: 9 Oct 2024

Topic: Semantic Segmentation

Paper
: https://arxiv.org/pdf/2410.06593v1.pdf

GitHub: https://github.com/xiarho/semat

Description:

We propose SEMat which revamps the network architecture and training objectives. For network architecture, the proposed feature-aligned transformer learns to extract fine-grained edge and transparency features. The proposed matte-aligned decoder aims to segment matting-specific objects and convert coarse masks into high-precision mattes. For training objectives, the proposed regularization and trimap loss aim to retain the prior from the pre-trained model and push the matting logits extracted from the mask decoder to contain trimap-based semantic information. Extensive experiments across seven diverse datasets demonstrate the superior performance of our method, proving its efficacy in interactive natural image matting.

Data science research papers

31 Oct, 07:42


UniMatch V2: Pushing the Limit of Semi-Supervised Semantic Segmentation


Publication date
: 14 Oct 2024

Topic: Semantic Segmentation

Paper
: https://arxiv.org/pdf/2410.10777v1.pdf

GitHub: https://github.com/LiheYoung/UniMatch-V2

Description:

In this work, we argue that, it is necessary to switch the baseline of SSS from ResNet-based encoders to more capable ViT-based encoders (e.g., DINOv2) that are pre-trained on massive data. A simple update on the encoder (even using 2x fewer parameters) can bring more significant improvement than careful method designs. Built on this competitive baseline, we present our upgraded and simplified UniMatch V2, inheriting the core spirit of weak-to-strong consistency from V1, but requiring less training cost and providing consistently better results. Additionally, witnessing the gradually saturated performance on Pascal and Cityscapes, we appeal that we should focus on more challenging benchmarks with complex taxonomy, such as ADE20K and COCO datasets.

Data science research papers

29 Oct, 07:19


MatMamba: A Matryoshka State Space Model


Publication date
: 9 Oct 2024

Topic: Representation Learning

Paper
: https://arxiv.org/pdf/2410.06718v1.pdf

GitHub: https://github.com/scaledfoundations/matmamba

Description:

In this work, we present MatMamba: a state space model which combines Matryoshka-style learning with Mamba2, by modifying the block to contain nested dimensions to enable joint training and adaptive inference. MatMamba allows for efficient and adaptive deployment across various model sizes. We train a single large MatMamba model and are able to get a number of smaller nested models for free -- while maintaining or improving upon the performance of a baseline smaller model trained from scratch. We train language and image models at a variety of parameter sizes from 35M to 1.4B. Our results on ImageNet and FineWeb show that MatMamba models scale comparably to Transformers, while having more efficient inference characteristics.

Data science research papers

27 Oct, 07:36


Momentum Contrast for Unsupervised Visual Representation Learning


Publication date:
CVPR 2020

Topic: Contrastive Learning

Paper
: https://arxiv.org/pdf/1911.05722v3.pdf

GitHub: https://github.com/facebookresearch/moco

Description:

We present Momentum Contrast (MoCo) for unsupervised visual representation learning. From a perspective on contrastive learning as dictionary look-up, we build a dynamic dictionary with a queue and a moving-averaged encoder. This enables building a large and consistent dictionary on-the-fly that facilitates contrastive unsupervised learning. MoCo provides competitive results under the common linear protocol on ImageNet classification. More importantly, the representations learned by MoCo transfer well to downstream tasks. MoCo can outperform its supervised pre-training counterpart in 7 detection/segmentation tasks on PASCAL VOC, COCO, and other datasets, sometimes surpassing it by large margins.

Data science research papers

25 Oct, 07:40


OSSA: Unsupervised One-Shot Style Adaptation


Publication date: 1
Oct 2024

Topic: Object detection

Paper
: https://arxiv.org/pdf/2410.00900v1.pdf

GitHub: https://github.com/robingerster7/ossa

Description:

We introduce One-Shot Style Adaptation (OSSA), a novel unsupervised domain adaptation method for object detection that utilizes a single, unlabeled target image to approximate the target domain style. Specifically, OSSA generates diverse target styles by perturbing the style statistics derived from a single target image and then applies these styles to a labeled source dataset at the feature level using Adaptive Instance Normalization (AdaIN). Extensive experiments show that OSSA establishes a new state-of-the-art among one-shot domain adaptation methods by a significant margin, and in some cases, even outperforms strong baselines that use thousands of unlabeled target images.

Data science research papers

23 Oct, 07:28


DeBiFormer: Vision Transformer with Deformable Agent Bi-level Routing Attention


Publication date
: 11 June 2024

Topic: Image Classification

Paper
: https://arxiv.org/pdf/2410.08582v1.pdf

GitHub: https://github.com/maclong01/DeBiFormer

Description:

We propose the Deformable Bi-level Routing Attention (DBRA) module, which optimizes the selection of key-value pairs using agent queries and enhances the interpretability of queries in attention maps. Based on this, we introduce the Deformable Bi-level Routing Attention Transformer (DeBiFormer), a novel general-purpose vision transformer built with the DBRA module. DeBiFormer has been validated on various computer vision tasks, including image classification, object detection, and semantic segmentation, providing strong evidence of its effectiveness.

Data science research papers

21 Oct, 07:36


MANet: Fine-Tuning Segment Anything Model for Multimodal Remote Sensing Semantic Segmentation


Publication date
: 15 Oct 2024

Topic: Semantic Segmentation

Paper
: https://arxiv.org/pdf/2410.11160v1.pdf

GitHub: https://github.com/sstary/ssrs

Description:

Building upon recent advancements in vision foundation models, particularly the Segment Anything Model (SAM), this study introduces a novel Multimodal Adapter-based Network (MANet) for multimodal remote sensing semantic segmentation. At the core of this approach is the development of a Multimodal Adapter (MMAdapter), which fine-tunes SAM's image encoder to effectively leverage the model's general knowledge for multimodal data. In addition, a pyramid-based Deep Fusion Module (DFM) is incorporated to further integrate high-level geographic features across multiple scales before decoding.

Data science research papers

19 Oct, 07:07


SPA: 3D Spatial-Awareness Enables Effective Embodied Representation


Publication date
: 10 Oct 2024

Topic: Representation Learning

Paper
: https://arxiv.org/pdf/2410.08208v2.pdf

GitHub: https://github.com/haoyizhu/realrobot

Description:

In this paper, we introduce SPA, a novel representation learning framework that emphasizes the importance of 3D spatial awareness in embodied AI. Our approach leverages differentiable neural rendering on multi-view images to endow a vanilla Vision Transformer (ViT) with intrinsic spatial understanding. We present the most comprehensive evaluation of embodied representation learning to date, covering 268 tasks across 8 simulators with diverse policies in both single-task and language-conditioned multi-task scenarios. The results are compelling: SPA consistently outperforms more than 10 state-of-the-art representation methods, including those specifically designed for embodied AI, vision-centric tasks, and multi-modal applications, while using less training data.

Data science research papers

17 Oct, 11:59


3DGS-DET: Empower 3D Gaussian Splatting with Boundary Guidance and Box-Focused Sampling for 3D Object Detection


Publication date:
2 Oct 2024

Topic: Object detection

Paper
: https://arxiv.org/pdf/2410.01647v1.pdf

GitHub: https://github.com/yangcaoai/3dgs-det

Description:

We propose a Box-Focused Sampling strategy using 2D boxes to generate object probability distribution in 3D spaces, allowing effective probabilistic sampling in 3D to retain more object blobs and reduce noisy background blobs. Benefiting from our designs, our 3DGS-DET significantly outperforms the SOTA NeRF-based method, NeRF-Det, achieving improvements of +6.6 on [email protected] and +8.1 on [email protected] for the ScanNet dataset, and impressive +31.5 on [email protected] for the ARKITScenes dataset.

Data science research papers

15 Oct, 11:10


Rethinking the Evaluation of Visible and Infrared Image Fusion


Publication date
: 9 Oct 2024

Topic: Semantic Segmentation

Paper
: https://arxiv.org/pdf/2410.06811v1.pdf

GitHub: https://github.com/linfeng-tang/psfusion

Description:

Visible and Infrared Image Fusion (VIF) has garnered significant interest across a wide range of high-level vision tasks, such as object detection and semantic segmentation. However, the evaluation of VIF methods remains challenging due to the absence of ground truth. This paper proposes a Segmentation-oriented Evaluation Approach (SEA) to assess VIF methods by incorporating the semantic segmentation task and leveraging segmentation labels available in latest VIF datasets. Specifically, SEA utilizes universal segmentation models, capable of handling diverse images and classes, to predict segmentation outputs from fused images and compare these outputs with segmentation labels.

Data science research papers

10 Oct, 09:13


CXPMRG-Bench: Pre-training and Benchmarking for X-ray Medical Report Generation on CheXpert Plus Dataset


Publication date:
1 Oct 2024

Topic: Contrastive Learning

Paper
: https://arxiv.org/pdf/2410.00379v1.pdf

GitHub: https://github.com/event-ahu/medical_image_analysis

Description:

We propose a large model for the X-ray image report generation using a multi-stage pre-training strategy, including self-supervised autoregressive generation and Xray-report contrastive learning, and supervised fine-tuning. Extensive experimental results indicate that the autoregressive pre-training based on Mamba effectively encodes X-ray images, and the image-text contrastive pre-training further aligns the feature spaces, achieving better experimental results. Source code can be found on \url{https://github.com/Event-AHU/Medical_Image_Analysis}.

Data science research papers

07 Oct, 10:30


Self-Supervised Syllable Discovery Based on Speaker-Disentangled HuBERT


Publication date
: 16 Sep 2024

Topic: Representation Learning

Paper
: https://arxiv.org/pdf/2409.10103v1.pdf

GitHub: https://github.com/ryota-komatsu/speaker_disentangled_hubert

Description:

Self-supervised speech representation learning has become essential for extracting meaningful features from untranscribed audio. Recent advances highlight the potential of deriving discrete symbols from the features correlated with linguistic units, which enables text-less training across diverse tasks. In particular, sentence-level Self-Distillation of the pretrained HuBERT (SD-HuBERT) induces syllabic structures within latent speech frame representations extracted from an intermediate Transformer layer. In SD-HuBERT, sentence-level representation is accumulated from speech frame features through self-attention layers using a special CLS token. However, we observe that the information aggregated in the CLS token correlates more with speaker identity than with linguistic content. To address this, we propose a speech-only self-supervised fine-tuning approach that separates syllabic units from speaker information. Our method introduces speaker perturbation as data augmentation and adopts a frame-level training objective to prevent the CLS token from aggregating paralinguistic information. Experimental results show that our approach surpasses the current state-of-the-art method in most syllable segmentation and syllabic unit quality metrics on Librispeech, underscoring its effectiveness in promoting syllabic organization within speech-only models.

Data science research papers

05 Oct, 09:29


Explicitly Modeling Pre-Cortical Vision with a Neuro-Inspired Front-End Improves CNN Robustness


Publication date
: 21 June 2024

Topic: Image Classification

Paper
: https://arxiv.org/pdf/2409.16838v1.pdf

GitHub: https://github.com/dicarlolab/vonenet

Description:

While convolutional neural networks (CNNs) excel at clean image classification, they struggle to classify images corrupted with different common corruptions, limiting their real-world applicability. Recent work has shown that incorporating CNN front-end block that simulates some features of the primate primary visual cortex (V1) can improve overall model robustness. Here, we expand on this approach by introducing two novel biologically-inspired CNN model families that incorporate a new front-end block designed to simulate pre-cortical visual processing. RetinaNet, a hybrid architecture containing the novel front-end followed by a standard CNN back-end, shows a relative robustness improvement of 12.3% when compared to the standard model; and EVNet.

Data science research papers

03 Oct, 09:16


MCUBench: A Benchmark of Tiny Object Detectors on MCUs


Publication date: 27
Sep 2024

Topic: Object detection

Paper
: https://arxiv.org/pdf/2409.18866v1.pdf

GitHub: https://github.com/deeplite/deeplite-torch-zoo

Description:

We introduce MCUBench, a benchmark featuring over 100 YOLO-based object detection models evaluated on the VOC dataset across seven different MCUs. This benchmark provides detailed data on average precision, latency, RAM, and Flash usage for various input resolutions and YOLO-based one-stage detectors. By conducting a controlled comparison with a fixed training pipeline, we collect comprehensive performance metrics. Our Pareto-optimal analysis shows that integrating modern detection heads and training techniques allows various YOLO architectures, including legacy models like YOLOv3, to achieve a highly efficient tradeoff between mean Average Precision (mAP) and latency.

Data science research papers

01 Oct, 07:07


LKCell: Efficient Cell Nuclei Instance Segmentation with Large Convolution Kernels


Publication date
: 25 July 2024

Topic: Semantic Segmentation

Paper
: https://arxiv.org/pdf/2407.18054v1.pdf

GitHub: https://github.com/hustvl/lkcell

Description:

we propose LKCell, a high-accuracy and efficient cell segmentation method. Its core insight lies in unleashing the potential of large convolution kernels to achieve computationally efficient large receptive fields. Specifically, (1) We transfer pre-trained large convolution kernel models to the medical domain for the first time, demonstrating their effectiveness in cell segmentation. (2) We analyze the redundancy of previous methods and design a new segmentation decoder based on large convolution kernels.

Data science research papers

29 Sep, 07:47


Non-Adversarial Learning: Vector-Quantized Common Latent Space for Multi-Sequence MRI


Publication date: 03
Jul 2024

Topic: Contrastive Learning

Paper
: https://arxiv.org/pdf/2407.02911v1.pdf

GitHub: https://github.com/fiy2w/mri_seq2seq

Description:

We propose a generative model that compresses discrete representations of each sequence to estimate the Gaussian distribution of vector-quantized common (VQC) latent space between multiple sequences. Moreover, we improve the latent space consistency with contrastive learning and increase model stability by domain augmentation.

Data science research papers

27 Sep, 07:40


Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs


Publication date
: 24 Jun 2024

Topic: Representation Learning

Paper
: https://arxiv.org/pdf/2406.16860v1.pdf

GitHub: https://github.com/cambrian-mllm/cambrian

Description:

Our study uses LLMs and visual instruction tuning as an interface to evaluate various visual representations, offering new insights into different models and architectures -- self-supervised, strongly supervised, or combinations thereof -- based on experiments with over 20 vision encoders. We critically examine existing MLLM benchmarks, addressing the difficulties involved in consolidating and interpreting results from various tasks, and introduce a new vision-centric benchmark, CV-Bench. To further improve visual grounding, we propose the Spatial Vision Aggregator (SVA), a dynamic and spatially-aware connector that integrates high-resolution vision features with LLMs while reducing the number of tokens.

Data science research papers

25 Sep, 07:44


COALA: A Practical and Vision-Centric Federated Learning Platform


Publication date: 23 july 2024


Topic: Object detection

Paper
: https://arxiv.org/pdf/2407.16560v1.pdf

GitHub: https://github.com/sonyresearch/coala

Description:

We present COALA, a vision-centric Federated Learning (FL) platform, and a suite of benchmarks for practical FL scenarios, which we categorize into three levels: task, data, and model. At the task level, COALA extends support from simple classification to 15 computer vision tasks, including object detection, segmentation, pose estimation, and more. It also facilitates federated multiple-task learning, allowing clients to tackle multiple tasks simultaneously. At the data level, COALA goes beyond supervised FL to benchmark both semi-supervised FL and unsupervised FL. It also benchmarks feature distribution shifts other than commonly considered label distribution shifts.

Data science research papers

23 Sep, 07:55


TorchSpatial: A Location Encoding Framework and Benchmark for Spatial Representation Learning


Publication date
: 21 June 2024

Topic: Image Classification

Paper
: https://arxiv.org/pdf/2406.15658v1.pdf

GitHub: https://github.com/seai-lab/torchspatial

Description:

To fill this gap, we propose TorchSpatial, a learning framework and benchmark for location (point) encoding, which is one of the most fundamental data types of spatial representation learning. TorchSpatial contains three key components: 1) a unified location encoding framework that consolidates 15 commonly recognized location encoders, ensuring scalability and reproducibility of the implementations; 2) the LocBench benchmark tasks encompassing 7 geo-aware image classification and 4 geo-aware image regression datasets; 3) a comprehensive suite of evaluation metrics to quantify geo-aware models' overall performance as well as their geographic bias, with a novel Geo-Bias Score metric.

Data science research papers

21 Sep, 07:26


SFPNet: Sparse Focal Point Network for Semantic Segmentation on General LiDAR Point Clouds


Publication date
: 16 July 2024

Topic: Semantic Segmentation

Paper
: https://arxiv.org/pdf/2407.11569v1.pdf

GitHub: https://github.com/Cavendish518/SFPNet

Description:

Our SFPNet is capable of extracting multi-level contexts and dynamically aggregating them using a gate mechanism. By implementing a channel-wise information query, features that incorporate both local and global contexts are encoded. We also introduce a novel large-scale hybrid-solid LiDAR semantic segmentation dataset for robotic applications. SFPNet demonstrates competitive performance on conventional benchmarks derived from mechanical spinning LiDAR, while achieving state-of-the-art results on benchmark derived from solid-state LiDAR. Additionally, it outperforms existing methods on our novel dataset sourced from hybrid-solid LiDAR.

Data science research papers

19 Sep, 07:37


An Interactive Multi-modal Query Answering System with Retrieval-Augmented Large Language Models


Publication date:
05 Jul 2024

Topic: Contrastive Learning

Paper
: https://arxiv.org/pdf/2407.04217v1.pdf

GitHub: https://github.com/ZJU-DAILY/MQA

Description:

In this paper, we present an interactive Multi-modal Query Answering (MQA) system, empowered by our newly developed multi-modal retrieval framework and navigation graph index, integrated with cutting-edge LLMs. It comprises five core components: Data Preprocessing, Vector Representation, Index Construction, Query Execution, and Answer Generation, all orchestrated by a dedicated coordinator to ensure smooth data flow from input to answer generation. One notable aspect of MQA is its utilization of contrastive learning to assess the significance of different modalities, facilitating precise measurement of multi-modal information similarity.

Data science research papers

17 Sep, 07:37


Unified Auto-Encoding with Masked Diffusion


Publication date
: 25 Jun 2024

Topic: Representation Learning

Paper
: https://arxiv.org/pdf/2406.17688v1.pdf

GitHub: https://github.com/google-research/big_vision

Description:

We propose a unified self-supervised objective, dubbed Unified Masked Diffusion (UMD), that combines patch-based and noise-based corruption techniques within a single auto-encoding framework. Specifically, UMD modifies the diffusion transformer (DiT) training process by introducing an additional noise-free, high masking representation step in the diffusion noising schedule, and utilizes a mixed masked and noised image for subsequent timesteps. By integrating features useful for diffusion modeling and for predicting masked patch tokens, UMD achieves strong performance in downstream generative and representation learning tasks, including linear probing and class-conditional generation.

Data science research papers

15 Sep, 07:40


LION: Linear Group RNN for 3D Object Detection in Point Clouds


Publication date: 25 july 2024


Topic: Object detection

Paper
: https://arxiv.org/pdf/2407.18232v1.pdf

GitHub: https://github.com/happinesslz/LION

Description:

We propose a simple and effective window-based framework built on LInear grOup RNN (i.e., perform linear RNN for grouped features) for accurate 3D object detection, called LION. The key property is to allow sufficient feature interaction in a much larger group than transformer-based methods. However, effectively applying linear group RNN to 3D object detection in highly sparse point clouds is not trivial due to its limitation in handling spatial modeling. To tackle this problem, we simply introduce a 3D spatial feature descriptor and integrate it into the linear group RNN operators to enhance their spatial features rather than blindly increasing the number of scanning orders for voxel features.

Data science research papers

13 Sep, 07:50


Jacobian Descent for Multi-Objective Optimization


Publication date
: 23 June 2024

Topic: Image Classification

Paper
: https://arxiv.org/pdf/2406.16232v1.pdf

GitHub: https://github.com/torchjd/torchjd

Description:

We propose a new aggregator specifically designed to satisfy this. Emphasizing conflict between objectives, we then highlight direct applications for our methods. Most notably, we introduce instance-wise risk minimization (IWRM), a learning paradigm in which the loss of each training example is considered a separate objective. On simple image classification tasks, IWRM exhibits promising results compared to the direct minimization of the average loss. The performance of our aggregator in those experiments also corroborates our theoretical findings.

Data science research papers

11 Sep, 07:32


Mitigating Background Shift in Class-Incremental Semantic Segmentation


Publication date
: 16 July 2024

Topic: Semantic Segmentation

Paper
: https://paperswithcode.com/paper/mitigating-background-shift-in-class

GitHub: https://github.com/roadonep/eccv2024_mbs

Description:

Class-Incremental Semantic Segmentation(CISS) aims to learn new classes without forgetting the old ones, using only the labels of the new classes. To achieve this, two popular strategies are employed: 1) pseudo-labeling and knowledge distillation to preserve prior knowledge; and 2) background weight transfer, which leverages the broad coverage of background in learning new classes by transferring background weight to the new class classifier. However, the first strategy heavily relies on the old model in detecting old classes while undetected pixels are regarded as the background, thereby leading to the background shift towards the old classes(i.e., misclassification of old class as background). Additionally, in the case of the second approach, initializing the new class classifier with background knowledge triggers a similar background shift issue, but towards the new classes. To address these issues, we propose a background-class separation framework for CISS. To begin with, selective pseudo-labeling and adaptive feature distillation are to distill only trustworthy past knowledge. On the other hand, we encourage the separation between the background and new classes with a novel orthogonal objective along with label-guided output distillation.

Data science research papers

09 Sep, 07:32


Language Models Encode Collaborative Signals in Recommendation


Publication date:
07 Jul 2024

Topic: Contrastive Learning

Paper
: https://arxiv.org/pdf/2407.05441v1.pdf

GitHub: https://github.com/lehengthu/alpharec

Description:

Recent studies empirically indicate that language models (LMs) encode rich world knowledge beyond mere semantics, attracting significant attention across various fields. However, in the recommendation domain, it remains uncertain whether LMs implicitly encode user preference information. Contrary to the prevailing understanding that LMs and traditional recommender models learn two distinct representation spaces due to a huge gap in language and behavior modeling objectives, this work rethinks such understanding and explores extracting a recommendation space directly from the language representation space. Surprisingly, our findings demonstrate that item representations, when linearly mapped from advanced LM representations, yield superior recommendation performance.

2,076

subscribers

209

photos

1

videos