IJRCS – Volume 3 Issue 2 Paper 2


Author’s Name : K. Ilakkia Tamil

Volume 03 Issue 02  Year 2016  ISSN No:  2349-3828  Page no: 6-10



Text embedded in images and a video frame provides brief and important information about the content that can be used for indexing and retrieving of images and videos from large web databases efficiently. A coarse to fine algorithm is used to detect text lines in images and video frames under complex background. Coarse detection obtains the candidate text regions using the property dense intensity variety of text regions and contrast between text and its background by employing wavelet decomposition and density based region growing methods. Fine detection uses the texture property to discriminate between text and non-text pattern, it is done by employing four feature extractions wavelet moment feature, wavelet histogram feature, wavelet co-occurrence feature and crossing count histogram feature. Before classification the effective features from extracted features are selected using forward selection algorithm. Finally to detect text from non-text with polynomial kernel function Support Vector Machine (SVM) classifier is used.


Density based region growing; Feature Extraction; Feature selection; SVM classification; Text detection; Wavelet decomposition.


  1. Chen D, Odobez J. M and Bourlard H (2002), ‘Text segmentation and recognition in complex background based on Markov random field’, Proceedings of the International Conference on Pattern Recognition, Mumbai, pp. 227–230.
  2. Ye Q, Huang Q, Gao W and Zhao D (2005), ‘Fast and robust text detection in images and video frames’, Image and vision computing, Beijing, China, pp.565-576.
  3. Chen D.T, Bourlard H and Thiran J. P (2001), ‘Text identification in complex background using SVM’, International Conference on Computer Vision and Pattern Recognition, Switzerland, pp. 621–626.
  4. Heisele B, Serre T, Mukherjee S and Paggio T (2001), ‘Feature reduction and hierarchy of classifiers for fast object detection in video images’, International Conference on Computer Vision and Pattern Recognition, Hawaii, pp. 18–24.
  5. Hua X.S, Liu W.Y and Zhang H.J (2004), ‘An automatic performance evaluation protocol for video text detection algorithms’, IEEE Transactions on Circuits and Systems for Video Technology, Seattle, volume 14, pp. 498–507.
  6. Hua X.S, Yin P and Zhang H (2002), ‘Efficient video text recognition using multiple frame integration’, International Conference on Image Processing, New York, pp. 22–25.
  7. Jain A.K and Yu B (1998), ‘Automatic text location in images and video frames’, Pattern Recognition 31, USA, pp. 2055–2076.
  8. Kim K.I, Jung K and Kim H (2003), ‘Texture-based approach for text detection in images using support vector machines and continuously adaptive mean shift algorithm’, IEEE Transactions on PAMI 25, USA, pp. 1631–1639.
  9. Li H and Doermann D (1999), ‘Text enhancement in digital video using multiple frame integration’, ACM Multimedia, College Park, pp. 385–395.
  10. Li H, Doermann D and Kia O (1998),’ Automatic text detection and tracking in digital video’, Maryland University LAMP Technical Report 028, College Park, pp. 1-38.
  11. Lienhart R and Wernicke A (2002), ‘Localizing and segmenting text in images and videos’, IEEE Transactions on Circuits and Systems for Video Technology 12,USA, pp. 256–268.
  12. Luo B, Tang X, Liu J and Zhang H (2003), ‘Video caption detection and extraction using temporal feature vector’, International Conference on Image Processing, Spain, pp. 297–300.
  13. Mallat S.G (1989), ‘A theory for multiresolution signal decomposition: the wavelet representation’, IEEE Transactions on PAMI 11, New York, pp. 674–693.
  14. Sato T, Kanade T, Hughes E.K and Smith M.A (1998), ‘Video OCR for digital news archives’, IEEE Workshop on Content Based Access of Image and Video Databases, Bombay, pp. 52-60.
  15. Sato T, Kanade T, Jughes E.K, Smith M.A and Satoh S (1999), ‘Video OCR: indexing digital news libraries by recognition of superimposed captions’, ACM Multimedia Systems: Special Issue on Video Libraries 7, Japan, pp. 385–395.
  16. Smith M.A and Ksanade T (1995), ‘Video skimming for quick browsing based on audio and image characterization’, Carnegie Mellon University, Pittsburgh, PA, Technical Report CMU-CS-95-186, pp. 1-22.
  17. Sobottka K, Bunke H and Kronenberg H (1999), ‘Identification of text on colored book and journal covers’, International Conference on Document Analysis and Recognition, Switzerland, pp.57–63