Title:
COMBINING INFORMATION EXTRACTION AND DATA INTEGRATION IN THE ESTEST SYSTEM
Author(s):
Dean Williams and Alexandra Poulovassilis
Abstract:
We describe an approach which builds on techniques from Data Integration and Information Extraction in order to make better use of the unstructured data found in application domains such as the Semantic Web which require the integration of information from structured data sources, ontologies and text. We describe the design and implementation of the ESTEST system which integrates available structured and semi-structured data sources into a virtual global schema which is used to partially configure an information extraction process. The information extracted from the text is merged with this virtual global database and is available for query processing over the entire integrated resource. As a result of this semantic integration, new queries can now be answered which would not be possible from the structured and semi-structured data alone. We give some experimental results from the ESTEST system in use.

Title:
A FRAMEWORK FOR THE DEVELOPMENT AND DEPLOYMENT OF EVOLVING APPLICATIONS - Elaborating on the Model Driven Architecture Towards a Change-resistant Development Framework
Author(s):
Georgios Voulalas and Georgios Evangelidis
Abstract:
Software development is an R&D intensive activity, dominated by human creativity and diseconomies of scale. Current efforts focus on design patterns, reusable components and forward-engineering mechanisms as the right next stage in cutting the Gordian knot of software. Model-driven development improves productivity by introducing formal models that can be understood by computers. Through these models the problems of portability, interoperability, maintenance, and documentation are also successfully addressed. However, the problem of evolving requirements, which is more prevalent within the context of business applications, additionally calls for efficient mechanisms that ensure consistency between models and code, and enable seamless and rapid accommodation of changes, without interrupting severely the operation of the deployed application. This paper introduces a framework that supports rapid development and deployment of evolving web-based applications, based on an integrated database schema. The proposed framework can be seen as an extension of the Model Driven Architecture targeting a specific family of applications.

Title:
SMART BUSINESS OBJECT - A New Approach to Model Business Objects for Web Applications
Author(s):
Xufeng (Danny) Liang and Athula Ginige
Abstract:
At present, there is a growing need to accelerate the development of web applications and to support continuous evolution of web applications due to evolving business needs. The object persistence capability and web interface generation capability in contemporary MVC (Model View Controller) web application development frameworks and model-to-code generation capability in Model-Driven Development tools has simplified the modelling of business objects for developing web applications. However, there is still a mismatch between the current technologies and the essential support for high-level, semantic-rich modelling of web-ready business objects for rapid development of modern web applications. Therefore, we propose a novel concept called Smart Business Object (SBO) to solve the above-mentioned problem. In essence, SBOs are web-ready business objects. SBOs have high-level, web-oriented attributes such as email, URL, video, image, document, etc. This allows SBO to be modelled at a higher-level of abstraction than traditional modelling approaches. A lightweight, near-English modelling language called SBOML (Smart Business Object Modelling Language) is proposed to model SBOs. We have created a toolkit to streamline the creation (modelling) and consumption (execution) of SBOs. With these tools, we are able to build fully functional web applications in a very short time without any coding.

Title:
MEASURING EFFECTIVENESS OF COMPUTING FACILITIES IN ACADEMIC INSTITUTES A NEW SOLUTION FOR A DIFFICULT PROBLEM
Author(s):
Smriti Sharma and Veena Bansal
Abstract:
There has been a constant effort to evaluate the success of Information Technology in organizations. This kind of investment is extremely hard to evaluate because of difficulty in identifying tangible benefits, as well as high uncertainty about achieving the expected value. Though a lot of research has taken place in this direction, but not much is written about evaluating IT in non-profit organizations like educational institutions. Measures for evaluating success of IT in such kind of institutes are markedly different from that of business organizations. The purpose of this paper is to build further upon the existing body of research by proposing a new model for measuring effectiveness of computing facilities in academic institutes. As a baseline, Delone & McLean’s model for measuring the success of Information System (DeLone & McLean 1992,DeLone & McLean 2003) is used, as it is the most pioneering model in this regard.

Title:
DISCOVERY AND AUTO-COMPOSITION OF SEMANTIC WEB SERVICES
Author(s):
Philippe Larvet and Bruno Bonnin
Abstract:
In order to facilitate the on-demand delivery of new services for mobile terminals as well as for fixed phones, we propose a user-centric solution based on Semantic Service-Oriented Architecture (SSOA) for instant building and delivery of new services composed with existing Web services discovered and assembled on-the-fly. This solution, based on semantic descriptions of Web services, is made of three main mechanisms: a semantic service discoverer, transparent for the user, allows to find the pertinent Web services matching with the user's original request, expressed vocally or by a SMS or a simple text ; a semantic service composer, using the semantic descriptions of the Web services, allows to combine and orchestrate the discovered services in order to build a new service fully matching the user's request, and a service deliverer makes the new service immediately accessible by the user.

Title:
PCA-BASED DATA MINING PROBABILISTIC AND FUZZY APPROACHES WITH APPLICATIONS IN PATTERN RECOGNITION
Author(s):
Luminita State, Catalina Cocianu, Panayiotis Vlamos and Viorica Stefanescu
Abstract:
The aim of the paper is to develop a new learning by examples PCA-based algorithm for extracting skeleton information from data to assure both good recognition performances, and generalization capabilities. Here the generalization capabilities are viewed twofold, on one hand to identify the right class for new samples coming from one of the classes taken into account and, on the other hand, to identify the samples coming from a new class. The classes are represented in the measurement/feature space by continuous repartitions, that is the model is given by the family of density functions , where H stands for the finite set of hypothesis (classes). The basis of the learning process is represented by samples of possible different sizes coming from the considered classes. The skeleton of each class is given by the principal components obtained for the corresponding sample. The recognition algorithm results by a defuzzyfication technique that identifies the class whose skeleton is the “nearest” to the tested example, where the closeness degree is expressed in terms of the amount of disturbance determined by the decision of allotting it to the corresponding class.

Title:
PROGRAM VERIFICATION TECHNIQUES FOR XML SCHEMA-BASED TECHNOLOGIES
Author(s):
Suad Alagic, Mark Royer and David Briggs
Abstract:
Representation and verification techniques for XML Schema types, structures, and applications, in a program verification system PVS are presented. Type derivations by restriction and extension as defined in XML Schema are represented in the PVS type system using predicate subtyping. Availability of parametric polymorphism in PVS makes it possible to represent XML sequences and sets via PVS theories. Powerful PVS logic capabilities are used to express complex constraints of XML Schema and its applications. Transaction verification methodology developed in the paper is based on declarative, logic-based specification of the frame constraints and the actual transaction updates. A sample XML application given in the paper includes constraints typical for XML schemas such as keys and referential integrity, and in addition ordering and range constraints. The developed proof strategy is demonstrated by a sample transaction verification with respect to this schema. The overall approach has a model theory based on the view of XML types and structures as theories. The core of this model theory is also presented in the paper.

Title:
ADMIRE FRAMEWORK: DISTRIBUTED DATA MINING ON DATA GRID PLATFORMS
Author(s):
Nhien An Le Khac, Tahar Kechadi and Joe Carthy
Abstract:
In this paper, we present the ADMIRE architecture; a new framework for developing novel and innovative data mining techniques to deal with very large and distributed heterogeneous datasets in both commercial and academic applications. The main ADMIRE components are detailed as well as its interfaces allowing the user to efficiently develop and implement their data mining applications techniques on a Grid platform such as Globus ToolKit, DGET, etc.

Title:
USAGE TRACKING LANGUAGE: A META LANGUAGE FOR MODELLING TRACKS IN TEL SYSTEMS
Author(s):
Christophe Choquet and Sébastien Iksal
Abstract:
In the context of distance learning and teaching, the re-engineering process needs a feedback on the learners' usage of the learning system. The feedback is given by numerous vectors, such as interviews, questionnaires, videos or log files. We consider that it is important to interpret tracks in order to compare the designer’s intentions with the learners’ activities during a session. In this paper, we present the usage tracking language – UTL. This language is designed to be generic and we present an instantiation of a part of it with IMS-Learning Design, the representation model we chose for our three years of experiments.

Title:
ON CONTEXT AWARE PREDICATE SEQUENCE QUERIES
Author(s):
Hagen Höpfner
Abstract:
Due to the limited input capabilities of small mobile information system clients like mobile phones, it is not a must to support a descriptive query language like SQL. Furthermore, information systems with mobile clients have to address characteristics resulting from clients mobility as well as from wireless communications. These additional functions can be supported by a reasonable, well-defined notation of queries. Moreover, such systems should be context aware. In this paper we present a query notation named ""context aware predicate sequence queries"" which respects these issues.

Title:
CLICKSTREAM DATA MINING ASSISTANCE - A Case-Based Reasoning Task Model
Author(s):
Cristina Wanzeller and Orlando Belo
Abstract:
This paper presents a case-based reasoning system to assist users in knowledge discovery from clickstream data. The system is especially oriented to store and make use of the knowledge acquired from the experience in solving specific clickstream data mining problems inside a corporate environment. We describe the main design, implementation and characteristics of this system. The system was implemented as a prototype Web-based application, centralizing the past mining processes in a corporative memory. Its main goal is the decentralized recommendation of the most suited mining strategies to address the problem at hand, accepting as inputs the characteristics of the available clickstream data and the analysis requirements. The system also takes advantage and integrates corporative related information resources, supporting a semi-automated data gathering approach along the organization.

Title:
A DATA MINING APPROACH TO LEARNING PROBABILISTIC USER BEHAVIOR MODELS FROM DATABASE ACCESS LOG
Author(s):
Mikhail Petrovskiy
Abstract:
The problem of user behavior modeling arises in many fields of computer science and software development. User models play an important role in recommendation and collaborative filtering systems, in intrusion detection systems and in some tasks of software engineering. In this paper we investigate a data mining approach for learning probabilistic user behavior models from the database usage logs. We propose simple but effective procedure for translating database traces into representation suitable for application of user behavior modeling techniques based on sequential, associative or classification data mining models. However, most existing methods have a serious drawback - they rely on the order of actions and ignore time intervals between actions. To avoid this problem we propose our novel method based on combination of decision trees classification algorithm and empirical time-dependent feature map, motivated by potential functions theory. As a result, the designed method generates understandable probabilistic user behavior models taking into account time dependencies. The performance of the proposed method was experimentally estimated on real-world data. The comparison to the results of state-of-the-art data mining methods has confirmed outstanding performance of our method in predictive user behavior modeling and has demonstrated competitive results in anomaly detection.

Title:
VIRTUAL MUSEUM – AN IMPLEMENTATION OF A MULTIMEDIA OBJECT-ORIENTED DATABASE
Author(s):
Rodrigo Filev Maia and Jorge Rady Almeida Junior
Abstract:
This paper describes the main characteristics involved in the process of using multimedia content in the Internet sites and it presents a proposal for an implementation of an object-oriented database, in order to assist the multimedia data exigency in a dynamic website. It is described an implementation of the proposed architecture, consisting of a virtual museum made for the Contemporary Art Museum of the USP, called Virtual MAC, which was elected as the 3rd best virtual museum of the world by INFOLAC Web 2005 (UNESCO) . The main objective of Virtual MAC is to create a virtual collection of works at art and make it available on Internet. Our analysis shows that it is more appropriate to use the Object Oriented paradigm instead of Relational Modelling due to the nature of the multimedia data and the structure of the dynamic web site used for the Virtual MAC.

Title:
AN ANALYSIS OF THE EFFECTS OF SPATIAL LOCALITY ON THE CACHE PERFORMANCE OF BINARY SEARCH TREES
Author(s):
Thomas B. Puzak and Chun-Hsi Huang
Abstract:
The topological structure of binary search trees does not translate well into the linear nature of a computer's memory system, resulting in high cache miss rates on data accesses. This paper analyzes the cache performance of search operations on several varieties of binary trees. Using uniform and nonuniform key distributions, the number of cache misses encountered per search is measured for Vanilla, AVL, and two types of Cache Aware Trees. Additionally, concrete measurements of the degree of spatial locality observed in the Trees is provided. This allows the trees to be evaluated for situational merit, and for definitive explanations of their performance to be given. Results show that the balancing operations of AVL trees effectively negates any spatial locality gained through naive allocation schemes. Furthermore, for uniform input this paper shows that large cache lines are only beneficial to trees that consider the cache's line size in their allocation strategy. Results in the paper demonstrate that adaptive cache aware allocation schemes that approximate the key distribution of a tree have universally better performance than static systems that favor a particular key distribution.

Title:
A UNIFIED APPROACH FOR SOFTWARE PROCESS REPRESENTATION AND ANALYSIS
Author(s):
Vassilis C. Gerogiannis, George Kakarontzas and Ioannis Stamelos
Abstract:
This paper presents a unified approach for software process management which combines object-oriented (ΟΟ) structures with formal models based on (high-level timed) Petri nets. This pairing may be proved beneficial not only for the integrated representation of software development processes, human resources and work products, but also in analysing properties and detecting errors of a software process specification, before the process is put to actual use. The use of OO models provides the advantages of graphical abstraction, high-level of understanding and manageable representation of a software process classes and instances. The resulted OO models are mechanically transformed into a high-level timed Petri net representation to derive a model for formally proving process properties as well as applying managerial analysis. We demonstrate the applicability of our approach by addressing a software process modelling example problem used in the literature to exercise various software process modelling notations.

Title:
USING LINGUISTIC TECHNIQUES FOR SCHEMA MATCHING
Author(s):
Ozgul Unal and Hamideh Afsarmanesh
Abstract:
Organizations from a variety of domains have now clearly realized the need for collaboration to achieve higher goals and/or to be more productive. Among others, the collaboration requirement has become prominent in the Biodiversity domain. Nevertheless, like in any other collaborative network, different Biodiversity nodes represent a variety of heterogeneous structuring/organization of the information, which is very challenging. Automatic resolution of semantic and schematic heterogeneity still remains a bottleneck for providing integrated access to and data sharing among heterogeneous, autonomous, and distributed biodiversity databases in a network of biodiversity organizations. In order to deal with this problem, matching components among database schemas need to be identified and heterogeneity needs to be resolved, by creating the corresponding mappings in a process called schema matching. One important step in this process is the identification of the syntactic and semantic similarity among elements from different schemas, usually referred to as Linguistic Matching. The Linguistic Matching component of a schema matching and integration system, called SASMINT, is the focus of this paper. Unlike other systems, which typically utilize only a limited number of similarity metrics, SASMINT makes an effective use of NLP techniques for the Linguistic Matching and proposes a weighted usage of several syntactic and semantic similarity metrics. In order to demonstrate the accuracy of weighted sum of metrics, a number of tests have been carried out, results of which are presented in this paper. Since it is not easy for the user to determine the weights, SASMINT provides a component called Sampler as another novelty, to support automatic generation of weights.

Title:
FORMAL FRAMEWORK FOR SEMANTIC INTEROPERABILITY
Author(s):
Nadia Yaacoubi Ayadi, Mohamed Ben Ahmed and Yann Pollet
Abstract:
We address in this paper the general issue of ""a posteriori"" semantic interoperability between systems relying on semantically heterogeneous schemas, having been designed for the purpose of independent specific goals and activities.\\ As conceptual schemas, we opt here for UML conceptual hierarchies of classes, integrating a very general notion of specialisation, including exclusive/inclusive and/or complete/incomplete constraints, etc., and from which the classical inheritance link is a particular case. \\ In our approach, we formalize all the knowledge embedded in hierarchies in terms of a consistent logical description. Using these set of predicates, we interpret an UML hierarchy as a property lattice by adding new abstractions of classes (that can be empty or non empty). However, a \emph{property lattice} is not unique. According to a particular attribute scaling, we may obtain different lattices that are all equivalent to the initial schema. So, we introduce the notion of conceptual structure that is the equivalence set of all equivalent lattices. In this context, given a set of assertions stating the existence of various semantic links between properties (attributes, concepts) of two schemas $S_{1}$ and $S_{2}$, our problem is to automatically build an ""interoperation"" structure equivalent to relevant parts of $S_{1}$ and $S_{2}$ schemas. So, we propose an algorithm to incrementally reorganize property lattices, based on schema logical formulation. For the purpose of the reorganization algorithm, we propose a set of elementary operators.

Title:
ON THE EVALUATION OF TREE PATTERN QUERIES
Author(s):
Yangjun Chen
Abstract:
The evaluation of Xpath expressions can be handled as a tree embedding problem. In this paper, we propose two strategies on this issue. One is ordered-tree embedding based and the other is unordered-tree embedding based. For the ordered-tree embedding, our algorithm needs only O(|T|Þ|P|) time and O(|T|Þ|P|) space, where |T| and |P| stands for the numbers of the nodes in the target tree T and the pattern tree P, respectively. We show that the unordered-tree embedding is NP-complete by a reduction from the constraint satisfaction problem (CSP). Based on such a reduction, an algorithm is devised for the unordered problem. In the case that the branching of pattern trees is limited, the algorithm works in polynomial time.

Title:
CRYSTALLIZATION OF AGILITY - Back to Basics
Author(s):
Asif Qumer and Brian Henderson-Sellers
Abstract:
There are a number of agile and traditional methodologies for software development. Agilists provide agile principles and agile values to characterize the agile methods but there is no clear and inclusive definition of agile methods; subsequently it is not feasible to draw a clear distinction between traditional and agile software development methods in practice. The purpose of this paper is to explain the concept of agility in detail; and then to suggest a definition of agile methods that would facilitate us to rank or differentiate agile methods from other available methods.

Title:
DATA MINING METHODS FOR GIS ANALYSIS OF SEISMIC VULNERABILITY
Author(s):
Florin Leon and Gabriela M. Atanasiu
Abstract:
This paper aims at designing some data mining methods of evaluating the seismic vulnerability of regions in the built infrastructure. A supervised clustering methodology is employed, based on k-nearest neighbor graphs. Unlike other classification algorithms, the method has the advantage of taking into account any distribution of training instances and also data topology. For the particular problem of seismic vulnerability analysis using a Geographic Information System, the gradual formation of clusters (for different values of k) allows a decision-making stakeholder to visualize more clearly the details of the cluster areas. The performance of the k-nearest neighbor graph method is tested on three classification problems, and finally it is applied to a sample from a digital map of Iasi, a large city located in the North-Eastern part of Romania.

Title:
SOFTWARE SYNTHESIS OF THE WEB-BASED QUESTIONNARIE SYSTEM
Author(s):
Masahiro Yamamoto
Abstract:
The questionnaires on the web are increasing in the fields of business and personals recently. However, staff peoples of business areas and conventional personals can not implement them by themselves. Usually they ask professionals of information technology fields for building of such kinds of questionnaires. In this case it takes many times and costs much. If staff peoples of business fields and personals can easily make it by themselves, it is very useful. Software synthesis system of a web–based questionnaire system for them is developed.

Title:
WEB INFORMATION SYSTEM: A FOUR LEVEL ARCHITECTURE
Author(s):
Roberto Paiano, Anna Lisa Guido and Leonardo Mangia
Abstract:
Business processes are playing a very important role in companies and the explicit introduction of them in Information System architecture is a must. According to the interest shown towards Web Application it is important to introduce a new web-oriented class of software, which is able to gives to the manager the possibility to operating directly with the process (we will talk about process oriented WIS - Web Information System). It is necessary to replace the three-level logic of the traditional application development (Data, Business Logic, Presentation), that hides processes in the Business Logic, with a four-level logic that allows to separate the process level from the application level: definition and management of the processes will be not tied solely to business logic. Our research (work in progress) focus is on an innovative framework (software architecture and methodology) for Information System development that links together the know-how acquired in Web Application design and on process definition concepts.

Title:
INFORMATION SYSTEM DESIGN AND PROTOTYPING USING FORM TYPES
Author(s):
Jelena Pavićević, Ivan Luković, Pavle Mogin and Miro Govedarica
Abstract:
The paper presents the form type concept that generalizes screen forms that users utilize to communicate with an information system. The concept is semantically rich enough to enable specifying such an initial set of constraints, which makes it possible to generate application prototypes together with related implementation database schema. IIS*Case is a CASE tool based on the form type concept that supports conceptual modelling of an information system and its database schema. The paper outlines a way how this tool can generate XML specifications of application prototypes of an information system. The aim is to improve IIS*Case through implementation of a module which can produce an executable prototype of an information system, automatically.

Title:
EFFECTİVENESS OF WEB BASED PBL USİNG COURSE MANAGEMENT TECHNOLOGİES: A CASE STUDY
Author(s):
Havva H. Basak and Serdar Ayan
Abstract:
Maritime education and training has typically focused on delivering practical courses for a practical vocation. In the modern environment, maritime personnel now need to be more professional, more open to change and more business-like in their thinking. This has led to changes in the education system that supports the maritime industries. Teaching thinking skills has become a major agenda for education. Problem Based Learning is a part of this thinking. Problem-Based Learning (PBL) within a web-based environment in the delivery of an undergraduate courses has been investigated. The effects was evaluated by comparing the performances of the students using the web-based PBL and comparing the outcomes with those of the traditional PBL. The outcomes of the experiments was positive. By having real life problems as focal points and students as active problem-solvers, the learning paradigm would shift towards the attainment of higher thinking skills.

Title:
A BAYESIAN NETWORK TO STRUCTURE A DATA QUALITY MODEL FOR WEB PORTALS
Author(s):
Angélica Caro, Coral Calero, Houari Sahraoui, Ghazwa Malak and Mario Piattini
Abstract:
The technological advances and the use of the internet have favoured the appearance of a great diversity of web applications, among them Web portals. Through them, organizations develop their businesses in a highly competitive environment. One decisive factor for this competitiveness is the assurance of its data quality. In previous works, a data quality model for Web portals has been developed. The model is represented as a matrix that links the user expectations of data web quality to the portal functionalities. Into this matrix a set of 34 attributes where classified. However, the quality attributes on this model have not an operational structure, necessary to be used actual assessment. In this paper we present how we have structured these attributes by means of a probabilistic approach, using Bayesian Networks. The final objective is to use the Bayesian network obtained for evaluating the quality of a data portal (or a subset of its characteristics).

Title:
A DETECTION METHOD OF STAGNATION SYMPTOMS BY USING PROJECT PROGRESS MODELS GENERATED FROM PROJECT REPORTS
Author(s):
Satoshi Tsuji, Yoshitmo Ikkai and Masanori Akiyoshi
Abstract:
The purpose of this research is to extract ``stagnation symptoms'' from progress reports about research project. The stagnation symptom is defined as a portion where remarkable stagnation is seen in a project progress. Concretely, according to project managers, stagnation symptoms can be classified into the following three kinds: first one is a bottleneck of the project grasped from one document, the second is clarified by comparing with the most recent document, and the third is clarified from changes of working object in a series of documents. We propose the method to extract stagnation symptoms with structural analysis of project progress. A progress model that is a structural chart to express progress of a project is generated from documents with label tags, which indicate contexts or attributes beforehand. multilevel layer model by detailed degree and situation analysis by color and relation analysis of details and basis by propagation of color. Stagnation symptoms are automatically extracted by applying stagnation symptom extraction rules to the progress model. The proposed method has been applied to set of real progress reports. It could extract stagnation symptoms that were extracted in sense manually.