We develop large sample theory for merged data from multiple sources. Main statistical
issues treated in this paper are (1) the same unit potentially appears in multiple
datasets from overlapping data sources, (2) duplicated items are not identified and
(3) a sample from the same data source is dependent due to sampling without replacement.
We propose and study a new weighted empirical process and extend empirical process
theory to a dependent and biased sample with duplication. Specifically, we establish
the uniform law of large numbers and uniform central limit theorem over a class of
functions along with several empirical process results under conditions identical
to those in the i.i.d. setting. As applications, we study infinite-dimensional M-estimation
and develop its consistency, rates of convergence and asymptotic normality. Our theoretical
results are illustrated with simulation studies and a real data example.