Text Separation from Digital Images: a Pair-Copula Based Approach and Performance Analysis

Authors

  • Anandarup Roy Sarojini Naidu College for Women
  • Oendrila Samanta Academy of Technology, India

DOI:

https://doi.org/10.4114/intartif.vol29iss77pp92-107

Keywords:

Text extraction from image, D-vine copula, Pair-copula construction

Abstract

Automatic separation of text from digital images holds significant importance in various domains, including document processing and content-based image retrieval. This paper presents a statistical model-based approach for automatic text component extraction from digital images. The methodology comprises two primary tasks. The first step involves color image segmentation by means of mixture and neural models. This process helps to identify distinct components within the image, where some of the components contain text. In the second step, the task is to separate text components from the non-text components. This task requires a learned model for text features. In this context, we utilize ground truth text components provided by the ``Born-Digital Images'' dataset. From these text components, we extract text-representing features. Later, a D-vine multivariate distribution is fitted to these features, which serves as a model for text features. This trained model is used to discriminate text and non-text components obtained after segmentation. For this purpose, a statistical hypothesis testing method is employed on the log-likelihood statistic. The experimental performance of the D-vine based model is compared to the multivariate Gaussian copula-based model, and the former generally outperforms the latter in terms of recall percentages. Moreover, the segmentation algorithms are evaluated based on recall and precision percentages. The novelty of this research lies in the utilization of D-vine modeling. A D-vine model is capable to capture various feature distributions and associations, significantly enhancing the approximation of the joint distribution of features. This, in turn, boosts the method's ability to discriminate between text and non-text features effectively.

Downloads

Download data is not yet available.

Metrics

Metrics Loading ...

Downloads

Published

2026-02-24

How to Cite

Roy, A., & Samanta, O. (2026). Text Separation from Digital Images: a Pair-Copula Based Approach and Performance Analysis. Inteligencia Artificial, 29(77), 92–107. https://doi.org/10.4114/intartif.vol29iss77pp92-107