🍡 feedmeAI
← All topics
Datasets 7 items

Everything Datasets

📑 arXiv 2d ago

JFinTEB: Japanese Financial Text Embedding Benchmark

JFinTEB is the first comprehensive benchmark for Japanese financial text embeddings, covering retrieval and classification tasks including sentiment analysis, document categorization, and economic survey classification. Evaluates diverse embedding models on language-specific and domain-specific financial text processing scenarios.

📑 arXiv 3d ago

MADE: A Living Benchmark for Multi-Label Text Classification with Uncertainty Quantification of Medical Device Adverse Events

MADE introduces a living multi-label text classification benchmark for medical device adverse events, continuously updated with new reports to prevent training data contamination. Features long-tailed hierarchical labels and enables uncertainty quantification evaluation critical for high-stakes healthcare ML. Addresses benchmark saturation and memorization vs. reasoning distinction.

📑 arXiv 3d ago

OpenMobile: Building Open Mobile Agents with Task and Trajectory Synthesis

OpenMobile is an open-source framework for synthesizing high-quality mobile agent task instructions and trajectories, achieving nearly 70% success on AndroidWorld. Features scalable task synthesis using global environment memory and policy-switching strategy alternating between learner and expert models during trajectory rollout. Makes training recipes transparent unlike closed leading models.

🤗 Hugging Face 6d ago

Towards Autonomous Mechanistic Reasoning in Virtual Cells

VCR-Agent is a multi-agent framework that generates mechanistic action graphs to represent biological reasoning in virtual cells, enabling verification and falsification of LLM-generated explanations. The approach releases VC-TRACES, a dataset of verified biological mechanisms, addressing the challenge of factually grounded scientific explanations from LLMs in open-ended domains like biology.