Print — Iraqi Digital Repository

تنقيب محتويات وبيانات استخدام الشبكة العنكبوتية بالاعتماد على تقنيات العنقدة المحدثة == Web Content and Usage Mining Based on Modified Clustering Techniques

Author name: احمد جبار عبيد

Supervisor name: توفيق عبد الخالق الاسدي

General topic: Computer Science

Specific topic: Computer Science

Degree: Doctorate

University: University of Babylon - Information Technology Collage - Department Of Software

Language: English

University location: Babylon

First pages: 28T774 - p.pdf

Abstract: The extensibility of diversified information that available on the Web along with massive users' have accessed to the Web services frequently produce several challenges related to such critical tasks such as controlling, monitoring and perception of the Web contents. However, novel techniques must be used to satisfy the modernistic requirements and provides better understanding to the colossal collection of diversity data types that is growing in fast manner every day on the Web.Web Mining is an extension of Data Mining techniques upon the data that stored on the Web. Web Mining is classified into three categories based on the type of data that used in mining process which are : Web Content Mining (WCM) is concern with the process of extract useful information from Web pages' contents, Web Usage Mining (WUM) is concern with discovering users' access pattern from Web usage data, and finally Web Structure Mining (WSM) is concern with extracting knowledge from the structure of the hyperlinks. Web documents are the most complex data that scattered on the Web in random way and a lot of these documents are created without any prior information. Unsupervised Data Mining Clustering technique, is one of the most usage techniques that aim to portioned out the objects into set of coherence groups, where the objects in a cluster are having common patterns than objects in other clusters.In this dissertation, the task of Web Mining is divided into two parts based on the data collected from the universities of (Kufa, Technology, Anbar and Diyala). First part is hold the Web documents by applying WCM techniques upon the Web Pages and Images of the universities Web sites, while second part is consider applying WUM techniques upon the Web usage data that collected from the Kufa university Web server. Proposed system consist of two parts : first part uses a novel approach to pre - process and extract unobserved patterns from Web pages' text blocks content,