Papers
arxiv:2506.22141

DAPFAM: A Domain-Aware Family-level Dataset to benchmark cross domain patent retrieval

Published on Jun 27
Authors:
,

Abstract

DAPFAM is a benchmark for patent prior-art retrieval that assesses system performance across in-domain and out-domain partitions, revealing persistent challenges in cross-domain retrieval.

AI-generated summary

Patent prior-art retrieval becomes especially challenging when relevant disclosures cross technological boundaries. Existing benchmarks lack explicit domain partitions, making it difficult to assess how retrieval systems cope with such shifts. We introduce DAPFAM, a family-level benchmark with explicit IN-domain and OUT-domain partitions defined by a new IPC3 overlap scheme. The dataset contains 1,247 query families and 45,336 target families aggregated at the family level to reduce international redundancy, with citation based relevance judgments. We conduct 249 controlled experiments spanning lexical (BM25) and dense (transformer) backends, document and passage level retrieval, multiple query and document representations, aggregation strategies, and hybrid fusion via Reciprocal Rank Fusion (RRF). Results reveal a pronounced domain gap: OUT-domain performance remains roughly five times lower than IN-domain across all configurations. Passage-level retrieval consistently outperforms document-level, and dense methods provide modest gains over BM25, but none close the OUT-domain gap. Document-level RRF yields strong effectiveness efficiency trade-offs with minimal overhead. By exposing the persistent challenge of cross-domain retrieval, DAPFAM provides a reproducible, compute-aware testbed for developing more robust patent IR systems. The dataset is publicly available on huggingface at https://huggingface.co/datasets/datalyes/DAPFAM_patent.

Community

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2506.22141 in a model README.md to link it from this page.

Datasets citing this paper 1

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2506.22141 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.