(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){ (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o), m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) })(window,document,'script','https://www.google-analytics.com/analytics.js','ga'); ga('create', 'UA-79557786-1', 'auto'); ga('send', 'pageview');

Filter by type:

Sort by year:

A Domain Specific Language for Secure Outsourcing of Computation to the Cloud

Conference
Mohamed Nassar, Abdelkarim Erradi, Qutaibah M Malluhi
Enterprise Distributed Object Computing Conference (EDOC), 2015 IEEE 19th International , Pages 134-141 , Publisher IEEE

Secure outsourcing of computation has gained importance with the proliferation of cloud services. However, existing outsourcing protocol specification languages are mainly suitable for secure multi-party computation. They offer limited support for secure outsourcing of computation of large datasets in cloud computing environments. This paper presents a model driven approach to define then coordinate the execution of secure outsourcing protocols. First we present the details of our Outsourcing Protocol Definition Language (OPDL) used to define a machine-process able protocols in an abstract and declarative way while leaving the implementation details to the underlying runtime components. The proposed language aims to simplify the design of these protocols while allowing their verification and the generation of cloud services composition to coordinate the protocol execution. We evaluated the expressiveness of OPDL by using it to define a set of representative secure outsourcing protocols from the literature.

A Model Driven Framework for Secure Outsourcing of Computation to the Cloud

Conference
Mohamed Nassar, Abdelkarim Erradi, Farida Sabry, Qutaibah M Malluhi
IEEE 7th International Conference on Cloud Computing 2014, Pages 968-969 ,Publisher IEEE

This paper presents a model driven approach to define then coordinate the execution of protocols for secure outsourcing of computation of large datasets in cloud computing environments. First we present our Outsourcing Protocol Definition Language (OPDL) used to define a machine-processable protocols in an abstract and declarative way while leaving the implementation details to the underlying runtime components. The proposed language aims to simplify the design of these protocols while allowing their verification and the generation of cloud services composition to coordinate the protocol execution. We evaluated the expressiveness of OPDL by using it to define a set of representative secure outsourcing protocols from the literature.

A Practical and Scalable Tool to Find Overlaps between Sequences

Journal
Maan Haj Rachid, Qutaibah Malluhi
Journal of BioMed research international , Volume 2015 , Publisher Hindawi Publishing Corporation

The evolution of the next generation sequencing technology increases the demand for efficient solutions, in terms of space and time, for several bioinformatics problems. This paper presents a practical and easy-to-implement solution for one of these problems, namely, the all-pairs suffix-prefix problem, using a compact prefix tree. The paper demonstrates an efficient construction of this time-efficient and space-economical tree data structure. The paper presents techniques for parallel implementations of the proposed solution. Experimental evaluation indicates superior results in terms of space and time over existing solutions. Results also show that the proposed technique is highly scalable in a parallel execution environment.

A space-efficient solution to find the maximum overlap using a compressed suffix array

Conference
Maan Haj Rachid, Qutaibah Malluhi, Mohamed Abouelhoda
2nd Middle East Conference on Biomedical Engineering , Pages 329-333 , Publisher IEEE

Compressed indices are important data structures in stringology. Compressed versions of many well-known data structures such as suffix tree and suffix array, which are used in string matching problems, have been studied and proposed. This paper takes advantage of a very recent compressed suffix array to build a space-economic solution for an important bioinformatics problem, namely the all-pairs suffix prefix problem. The paper also presents a simple technique for parallelizing the solution. Our results show that the proposed solution consumes less than one fifth of the space required by other solutions based on standard data structures. In addition, our results demonstrate that good performance scalability can be achieved by employing the proposed parallel algorithm.

A survey and taxonomy on energy efficient resource allocation techniques for cloud computing systems

Journal
Abdul Hameed, Alireza Khoshkbarforoushha, Rajiv Ranjan, Prem Prakash Jayaraman, Joanna Kolodziej, Pavan Balaji, Sherali Zeadally, Qutaibah Marwan Malluhi, Nikos Tziritas, Abhinav Vishnu, Samee U Khan, Albert Zomaya
Journal Computing , Pages 1-24 ,Publisher Springer Vienna

In a cloud computing paradigm, energy efficient allocation of different virtualized ICT resources (servers, storage disks, and networks, and the like) is a complex problem due to the presence of heterogeneous application (e.g., content delivery networks, MapReduce, web applications, and the like) workloads having contentious allocation requirements in terms of ICT resource capacities (e.g., network bandwidth, processing speed, response time, etc.). Several recent papers have tried to address the issue of improving energy efficiency in allocating cloud resources to applications with varying degree of success. However, to the best of our knowledge there is no published literature on this subject that clearly articulates the research problem and provides research taxonomy for succinct classification of existing techniques. Hence, the main aim of this paper is to identify open challenges associated with energy efficient resource allocation. In this regard, the study, first, outlines the problem and existing hardware and software-based techniques available for this purpose. Furthermore, available techniques already presented in the literature are summarized based on the energy-efficient research dimension taxonomy. The advantages and disadvantages of the existing techniques are comprehensively analyzed against the proposed research dimension taxonomy namely: resource adaption policy, objective function, allocation method, allocation operation, and interoperability.

An Efficient Algorithm for Determining the Relationship Between Two Nodes on a DAG

Conference
Q Malluhi, Sungbum Hong

Anonymizing transactional datasets

Journal
Bechara al Bouna, Chris Clifton, Qutaibah Malluhi
Journal of Computer Security , Volume 23 ,Issue 1 ,Pages 89-106 ,Publisher IOS Press

In this paper, we study the privacy breach caused by unsafe correlations in transactional data where individuals have multiple tuples in a dataset. We provide two safety constraints to guarantee safe correlation of the data: (1) the safe grouping constraint to ensure that quasi-identifier and sensitive partitions are bounded by l-diversity and (2) the schema decomposition constraint to eliminate non-arbitrary correlations between non-sensitive and sensitive values to protect privacy and at the same time increase the aggregate analysis. In our technique, values are grouped together in unique partitions that enforce l-diversity at the level of individuals. We also propose an association preserving technique to increase the ability to learn/analyze from the anonymized data. To evaluate our approach, we conduct a set of experiments to determine the privacy breach and investigate the anonymization cost of safe grouping and preserving associations.

Approving Updates in Collaborative Databases

Conference
Khaleel Mershad, Qutaibah M Malluhi, Mourad Ouzzani, Mingjie Tang, Walid G Aref
IEEE International Conference on Cloud Engineering (IC2E), 2015 , Pages 42-47 , Publisher IEEE

Data curation activities in collaborative databases mandate that collaborators interact until they converge and agree on the content of their data. Typically, updates by a member of the collaboration are made visible to all collaborators for comments but at the same time are pending the approval or rejection of the data custodian, e.g., the principal scientist or investigator (PI). In current database technologies, approval and authorization of updates is based solely on the identity of the user, e.g., via the SQL GRANT and REVOKE commands. However, in collaborative environments, the updated data is open for collaborators for discussion and further editing and is finally approved or rejected by the PI based on the content of the data and not on the identity of the updater. In this paper, we introduce a cloud-based collaborative database system that promotes and enables collaboration and data curation scenarios. We realize content-based update approval and history tracking of updates inside HBase, a distributed and scalable open-source cluster-based database. The design and implementation as well as a detailed performance study of several approaches for update approval are presented and contrasted in the paper.

Assessment of de novo assemblers for draft genomes: a case study with fungal genomes

Journal
Mostafa M Abbas, Qutaibah M Malluhi, Ponnuraman Balakrishnan
Journal of BMC genomics , Volume 15 ,Issue 9, Pages 1 ,Publisher BioMed Central

Recently, large bio-projects dealing with the release of different genomes have transpired. Most of these projects use next-generation sequencing platforms. As a consequence, many de novo assembly tools have evolved to assemble the reads generated by these platforms. Each tool has its own inherent advantages and disadvantages, which make the selection of an appropriate tool a challenging task.

Association Rule Mining on Fragmented Database

Conference
Amel Hamzaoui, Qutaibah Malluhi, Chris Clifton, Ryan Riley
Book: Data Privacy Management, Autonomous Spontaneous Security, and Security Assurance , Pages 335-342 , Publisher Springer International Publishing

Anonymization methods are an important tool to protect privacy. The goal is to release data while preventing individuals from being identified. Most approaches generalize data, reducing the level of detail so that many individuals appear the same. An alternate class of methods, including anatomy, fragmentation, and slicing, preserves detail by generalizing only the link between identifying and sensitive data. We investigate learning association rules on such a database. Association rule mining on a generalized database is challenging, as specific values are replaced with generalizations, eliminating interesting fine-grained correlations. We instead learn association rules from a fragmented database, preserving fine-grained values. Only rules involving both identifying and sensitive information are affected; we demonstrate the efficacy of learning in such environment.

Automatic Generation of Optimized Workflow for Distributed Computations on Large-Scale Matrices

Conference
Farida Sabry, Abdelkarim Erradi, Mohamed Nassar, Qutaibah M Malluhi
International Conference on Service-Oriented Computing , Pages 79-92 , Publisher Springer Berlin Heidelberg

Efficient evaluation of distributed computation on large-scale data is prominent in modern scientific computation; especially analysis of big data, image processing and data mining applications. This problem is particularly challenging in distributed environments such as campus clusters, grids or clouds on which the basic computation routines are offered as web/cloud services. In this paper, we propose a locality-aware workflow-based solution for evaluation of large-scale matrix expressions in a distributed environment. Our solution is based on automatic generation of BPEL workflows in order to coordinate long running, asynchronous and parallel invocation of services. We optimize the input expression in order to maximize parallel execution of independent operations while reducing the matrix transfer cost to a minimum. Our approach frees the end-user of the system from the burden of writing and debugging lengthy BPEL workflows. We evaluated our solution on realistic mathematical expressions executed on large-scale matrices distributed on multiple clouds.

Cloud computing without seeing

Conference
Qutaibah Malluhi, Khaled M Khan
Proceedings of the First International Workshop on Security and Privacy Preserving in e-Societies , Pages 42-44 , Publisher ACM

In the emerging cloud computing model, security is of paramount concern. This paper discusses the need for practical techniques that enable private outsourcing on the cloud by allowing the service provider to work on clients’ data or computations without seeing the data being processed. The paper briefly discusses two examples of such techniques.

CloudFlow: A data-aware programming model for cloud workflow applications on modern HPC systems

Journal
Fan Zhang, Qutaibah M Malluhi, Tamer Elsayed, Samee U Khan, Keqin Li, Albert Y Zomaya
Journal of Future Generation Computer Systems , Volume 51 , Pages 98-110 ,Publisher North-Holland

Traditional High-Performance Computing (HPC) based big-data applications are usually constrained by having to move large amount of data to compute facilities for real-time processing purpose. Modern HPC systems, represented by High-Throughput Computing (HTC) and Many-Task Computing (MTC) platforms, on the other hand, intend to achieve the long-held dream of moving compute to data instead. This kind of data-aware scheduling, typically represented by Hadoop MapReduce, has been successfully implemented in its Map Phase, whereby each Map Task is sent out to the compute node where the corresponding input data chunk is located. However, Hadoop MapReduce limits itself to a one-map-to-one-reduce framework, leading to difficulties for handling complex logics, such as pipelines or workflows. Meanwhile, it lacks built-in support and optimization when the input datasets are shared among multiple applications and/or jobs. The performance can be improved significantly when the knowledge of the shared and frequently accessed data is taken into scheduling decisions.

To enhance the capability of managing workflow in modern HPC system, this paper presents CloudFlow, a Hadoop MapReduce based programming model for cloud workflow applications. CloudFlow is built on top of MapReduce, which is proposed not only being data aware, but also shared-data aware. It identifies the most frequently shared data, from both task-level and job-level, replicates them to each compute node for data locality purposes. It also supports user-defined multiple Map- and Reduce functions, allowing users to orchestrate the required data-flow logic. Mathematically, we prove the correctness of the whole scheduling framework by performing theoretical analysis. Further more, experimental evaluation also shows that the execution runtime speedup exceeds 4X compared to traditional MapReduce implementation with a manageable time overhead.

CloudFlow: A data-aware programming model for cloud workflow applications on modern HPC systems

Journal
Fan Zhang, Qutaibah M Malluhi, Tamer Elsayed, Samee U Khan, Keqin Li, Albert Y Zomaya
Journal of Future Generation Computer Systems

Traditional High-Performance Computing (HPC) based big-data applications are usually constrained by having to move large amount of data to compute facilities for real-time processing purpose. Modern HPC systems, represented by High-Throughput Computing (HTC) and Many-Task Computing (MTC) platforms, on the other hand, intend to achieve the long-held dream of moving compute to data instead. This kind of data-aware scheduling, typically represented by Hadoop MapReduce, has been successfully implemented in its Map Phase, whereby each Map Task is sent out to the compute node where the corresponding input data chunk is located. However, Hadoop MapReduce limits itself to a one-map-to-one-reduce framework, leading to difficulties for handling complex logics, such as pipelines or workflows. Meanwhile, it lacks built-in support and optimization when the input datasets are shared among multiple applications and/or jobs. The performance can be improved significantly when the knowledge of the shared and frequently accessed data is taken into scheduling decisions. To enhance the capability of managing workflow in modern HPC system, this paper presents CloudFlow, a Hadoop MapReduce based programming model for cloud workflow applications. CloudFlow is built on top of MapReduce, which is proposed not only being data aware, but also shared-data aware. It identifies the most frequently shared data, from both task-level and job-level, replicates them to each compute node for data locality purposes. It also supports user-defined multiple Mapand Reduce functions, allowing users to orchestrate the required data-flow logic. Mathematically, we prove the correctness of the whole scheduling framework by performing theoretical analysis. Further more, experimental evaluation also shows that the execution runtime speedup exceeds 4X compared to traditional MapReduce implementation with a manageable time overhead

Coding for High Availability of a Distributed-Parallel Storage System

Journal
Qutaibah M. Malluhi, William E. Johnston:
IEEE Trans. Parallel Distrib. Syst. 9(12): 1237-1252 (1998)

We have developed a distributed parallel storage system that employs the aggregate bandwidth of multiple data servers connected by a high-speed wide-area network to achieve scalability and high data throughput. This paper studies different schemes to enhance the reliability and availability of such network-based distributed storage systems. The general approach of this paper employs “erasure” error-correcting codes that can be used to reconstruct missing information caused by hardware, software, or human faults. The paper describes the approach and develops optimized algorithms for the encoding and decoding operations. Moreover, the paper presents techniques for reducing the communication and computation overhead incurred while reconstructing missing data from the redundant information. These techniques include clustering, multidimensional coding, and the full two-dimensional parity schemes. The paper considers trade-offs between redundancy, fault tolerance, and complexity of error recovery

Combinatorial Optimization of Distributed Queries

Journal
Bojan Groselj, Qutaibah M. Malluhi:
IEEE Trans. Knowl. Data Eng. 7(6): 915-927 (1995)

In relational distributed databases a query cost consists of a local cost and a transmission cost. Query optimization is a combinatorial optimization problem. As the query size grows, the optimization methods based on exhaustive search become too expensive. We propose the following strategy for solving large distributed query optimization problems in relational database systems: (1) represent each query-processing schedule by a labeled directed graph; (2) reduce the number of different schedules by pruning away invalid or high-cost solutions; and (3) find a suboptimal schedule by combinatorial optimization. We investigate several combinatorial optimization techniques: random search, single start, multistart, simulated annealing, and a combination of random search and local simulated annealing. The utility of combinatorial optimization is demonstrated in the problem of finding the (sub)optimal semijoin schedule that fully reduces all relations of a tree query. The combination of random search and local simulated annealing was superior to other tested methods

ConMR: concurrent MapReduce programming model for large scale shared-data applications

Conference
Fan Zhang, Qutaibah M Malluhi, Tamer M Elsyed
42nd International Conference on Parallel Processing , 2013 , Pages 671-679 , Publisher IEEE

The rapid growth of large-data processing has brought in the MapReduce programming model as a widely accepted solution. However, MapReduce limits itself to a one map-to-one-reduce framework. Meanwhile, it lacks built-in support and optimization when the input datasets are shared among concurrent applications and/or jobs. The performance might be improved when the shared and frequently accessed data is read from local instead of distributed file system.To enhance the performance of big data applications, this paper presents Concurrent MapReduce, a new programming model built on top of MapReduce that deals with large amount of shared data items. Concurrent MapReduce provides support for processing heterogeneous sources of input datasets and offers optimization when the datasets are partially or fully shared. Experimental evaluation has shown an execution runtime speedup of 4X compared to traditional nonconcurrent MapReduce implementation with a manageable time overhead.

Distributed Computing and Artificial Intelligence

Other
Sigeru Omatu, Qutaibah M. Malluhi, Sara Rodríguez-González, Grzegorz Bocewicz, Edgardo Bucciarelli, Gianfranco Giulioni, Farkhund Iqba
12th International Conference, DCAI 2015, Salamanca, Spain, June 3-5, 2015. Advances in Intelligent Systems and Computing 373, Springer 2015, ISBN 978-3-319-19637-4

DTS: A System For Real-Time Doha Traffic Surveillance Using Zigbee

Conference
Qutaibah Malluhi, Asma Darwish Al-Emadi, Mona Abdulla Al-Ansari, Nassma Saleem Mohands
Journal of QNRS Repository , Volume 2011 , Issue 1 , Pages 3482 , Publisher Bloomsbury Qatar Foundation Journals

Efficient alignment of next generation sequencing data using MapReduce on the cloud

Conference
Rawan AlSaad, Qutaibah Malluhi, Mohamed Abouelhoda
Biomedical Engineering Conference (CIBEC), 2012 Cairo International , Pages 18-22 ,Publisher IEEE

This paper presents a methodology for running NGS read mapping tools in the cloud environment based on the MapReduce programming paradigm. As a demonstration, the recently developed and robust sequence alignment tool, BFAST, is used within our methodology to handle massive datasets. The results of our experiments show that the transformation of existing read mapping tools to run within the MapReduce framework dramatically reduces the total execution time and enables the user to utilize the resources provided by the cloud.

Efficient Mapping of ANNs on Hypercube Massively Parallel Machines.

Journal
Qutaibah M. Malluhi, Magdy A. Bayoumi, T. R. N. Rao
IEEE Trans. Computers 44(6): 769-779 (1995)

This paper presents a technique for mapping artificial neural networks (ANNs) on hypercube massively parallel machines. The paper starts by synthesizing a parallel structure, the mesh-of-appendixed-trees (MAT), for fast ANN implementation. Then, it presents a recursive procedure to embed the MAT structure into the hypercube topology. This procedure is used as the basis for an efficient mapping of ANN computations on hypercube systems. Both the multilayer feedforward with backpropagation (FFBP) and the Hopfield ANN models are considered. Algorithms to implement the recall and the training phases of the FFBP model as well as the recall phase of the Hopfield model are provided. The major advantage of our technique is high performance. Unlike the other techniques presented in the literature which require O(n) time, where N is the size of the largest layer, our implementation requires only O(log N) time. Moreover, it allows pipelining of more than one input pattern and thus further improves the performance

Efficient Processing of Hamming-Distance-Based Similarity-Search Queries Over MapReduce.

Conference
Mingjie Tang, Yongyang Yu, Walid G Aref, Qutaibah M Malluhi, Mourad Ouzzani
EDBT Conference, Pages 361-372

Similarity search is crucial to many applications. Of particular interest are two flavors of the Hamming distance range query, namely, the Hamming select and the Hamming join (Hamming-select and Hamming-join, respectively). Hamming distance is widely used in approximate near neighbor search for high dimensional data, such as images and document collections. For example, using prede- fined similarity hash functions, high-dimensional data is mapped into one-dimensional binary codes that are, then linearly scanned to perform Hamming-distance comparisons. These distance comparisons on the binary codes are usually costly and, often involves excessive redundancies. This paper introduces a new index, termed the HA-Index, that speeds up distance comparisons and eliminates redundancies when performing the two flavors of Hamming distance range queries. An efficient search algorithm based on the HA-index is presented. A distributed version of the HA-index is introduced and algorithms for realizing Hamming distance-select and Hamming distance-join operations on a MapReduce platform are prototyped. Extensive experiments using real datasets demonstrates that the HA-index and the corresponding search algorithms achieve up to two orders of magnitude speedup over existing stateof-the-art approaches, while saving more than ten times in memory space.

Efficient sanitization of unsafe data correlations

Conference
Bechara AL Bouna, Chris Clifton, Qutaibah Malluhi
Workshop Proceedings of the EDBT/ICDT 2015 Joint Conference (March 27, 2015, Brussels, Belgium) on CEUR-WS.org (ISSN 1613-0073)

In this paper, we present a study to counter privacy violation due to unsafe data correlation. We propose a safe correlation requirement to keep correlated values bounded by l-diversity and evaluate the trade-off to be made for the sake of a strong privacy guarantee. Finally, we present a correlation sanitization algorithm that enforces our safety constraint and demonstrates its efficiency.

Enabling High-Performance Data Service in the Web Environment

Conference
Qutaibah M. Malluhi, Gwang S. Jung
INTERNATIONAL JOURNAL OF COMPUTERS AND THEIR APPLICATIONS , Volume 10 , Pages 125-133 , Publisher ISCA

Entrepreneurship for Engineering Students: A Course Development, Teaching, and Impact Assessment

Conference
M Hasanain, M Abdulwahed, Q Malluhi
Journal of Proceedings of the IEEE International Conference on Teachings, Assessment, and Learning for Engineering (TALE), Pages 26-29

Establishing Trust in Cloud Computing

Journal
Khaled M Khan, Qutaibah Malluhi
Journal of IT professional, Volume 12 ,Issue 5 , Pages 20-27 , Publisher IEEE

The paper discussed the emerging technologies that can help address the challenges of trust in cloud computing. Cloud computing provides many opportunities for enterprises by offering a range of computing services. In today’s competitive environment, the service dynamism, elasticity, and choices offered by this highly scalable technology are too attractive for enterprises to ignore. These opportunities, however, don’t come without challenges. Cloud computing has opened up a new frontier of challenges by introducing a different type of trust scenario. Today, the problem of trusting cloud computing is a paramount concern for most enterprises. It’s not that the enterprises don’t trust the cloud providers’ intentions; rather, they question cloud computing’s capabilities.

Evaluation of Combined Genome Assemblies: A Case Study with Fungal Genomes

Conference
Mostafa M Abbas, Ponnuraman Balakrishnan, Qutaibah M Malluhi
International Conference on Bioinformatics and Biomedical Engineering , Pages 333-344 , Publisher Springer International Publishing

The rapid advances in genome sequencing leads to the generation of huge amount of data in a single sequencing experiment. Several genome assemblers with different objectives were developed to process these genomic data. Obviously, the output assemblies produced by these assemblers have different qualities due to their diverse nature. Recent research efforts concluded that combining the assemblies from different assemblers would enhance the quality of the output assembly. Based on this, our study combines the five best assemblies of three fungal genomes and evaluates the quality of the output assembly as compared to that produced by individual assemblers. The results conclude that the output assembly quality is influenced by the increase of the number of gaps in the input assemblies more than the increase in N50 size. Based on this conclusion, we propose a set of guidelines to get better output assemblies.

Journal

With the wide adoption of cloud computing paradigm, it is important to develop appropriate techniques to protect client data privacy in the cloud. Encryption is one of the major techniques that could be used to achieve this gaol. However, data encryption at the rest along is insufficient for secure cloud computation environments. Further efficient techniques for carrying out computation over encrypted data are also required. Fully homomorphic encryption (FHE) and garbled circuits are naturally used to process encrypted data without leaking any information about the data. However, existing FHE schemes are inefficient for processing large amount of data in cloud and garbled circuits are one time programs and cannot be reused. Using modern technologies such as FHE, several authors have developed reusable garbled circuit techniques in recent years. But they are not efficient either and could not be deployed at a large scale. By relaxing the privacy definition from perfect forward secrecy to all-or-nothing privacy, we are able to design efficient reusable garbled circuits in this paper. These reusable garbled computation techniques could be used for processing encrypted cloud data efficiently.

Identifying Contextual Properties of Software Architecture in Cloud Computing

Conference
Khaled M Khan, Qutaibah Malluhi
IEEE Ninth International Conference on Dependable, Autonomic and Secure Computing (DASC), 2011 ,Pages 561-568 ,Publisher IEEE

This paper argues that the contextual properties of cloud-enabled software architecture should be identified and understood differently by cloud consumers. The existing architectures are not designed to exploit the contextual properties of cloud computing. The emergence of cloud computing virtually forces cloud consumers to re-evaluate their software architectures in light of the cloud computing context which requires relinquish control over most architectural components to cloud providers. In a cloud-enabled architecture, the shift of ownership of and control over architectural components from consumers to cloud providers has profound impact on the ways cloud consumers design their software architecture. In this perspective, we go beyond the traditional definition of software architecture, and introduce the concepts of architectural scope, ownership, and control as the contextual properties of an architecture.

Maintaining database anonymity in the presence of queries

Conference
Ryan Riley, Chris Clifton, Qutaibah Malluhi
International Workshop on Security and Trust Management , Pages 33-48 ,Publisher Springer Berlin Heidelberg

With the advent of cloud computing there is an increased interest in outsourcing an organization’s data to a remote provider in order to reduce the costs associated with self-hosting. If that database contains information about individuals (such as medical information), it is increasingly important to also protect the privacy of the individuals contained in the database. Existing work in this area has focused on preventing the hosting provider from ascertaining individually identifiable sensitive data from the database, through database encryption or manipulating the data to provide privacy guarantees based on privacy models such as k-anonymity. Little work has been done to ensure that information contained in queries on the data, in conjunction with the data, does not result in a privacy violation. In this work we present a hash based method which provably allows the privacy constraint of an unencrypted database to be extended to the queries performed on the database. In addition, we identify a privacy limitation of such an approach, describe how it could be exploited using a known-query attack, and propose a counter-measure based on oblivious storage.

On Order-independent Semantics of the Similarity Group-By Relational Database Operator

Other
Mingjie Tang, Ruby Y Tahboub, Walid G Aref, Qutaibah M Malluhi, Mourad Ouzzani
Journal of arXiv preprint arXiv:1412.4303

Similarity group-by (SGB, for short) has been proposed as a relational database operator to match the needs of emerging database applications. Many SGB operators that extend SQL have been proposed in the literature, e.g., similarity operators in the one-dimensional space. These operators have various semantics. Depending on how these operators are implemented, some of the implementations may lead to different groupings of the data. Hence, if SQL code is ported from one database system to another, it is not guaranteed that the code will produce the same results. In this paper, we investigate the various semantics for the relational similarity group-by operators in the multi-dimensional space. We define the class of order-independent SGB operators that produce the same results regardless of the order in which the input data is presented to them. Using the notion of interval graphs borrowed from graph theory, we prove that, for certain SGB operators, there exist order-independent implementations. For each of these operators, we provide a sample algorithm that is order-independent. Also, we prove that for other SGB operators, there does not exist an order-independent implementation for them, and hence these SGB operators are ill-defined and should not be adopted in extensions to SQL to realize similarity group-by. In this paper, we introduce an SGB operator, namely SGB-All, for grouping multi-dimensional data using similarity. SGB-All forms groups such that a data item, say O, belongs to a group, say G, if and only if O is within a user-defined threshold from all other data items in G. In other words, each group in SGB-All forms a clique of nearby data items in the multi-dimensional space. We prove that SGB-All are order-independent, i.e., there is at least one algorithm for each option that is independent of the presentation order of the input data.

Optimizing Distributed Storage Layouts in the Network Storage Manager

Conference
Zeyad Ali, Rui Zhang, Qutaibah Malluhi, Shaofeng Yang
Journal of Parallel and Distributed Computing and Systems: Proceedings of the 16 th IASTED International Conference

Distributed storage systems are increasing being used by data-intensive applications for efficient and reliable data delivery. The Network Storage Manager (NSM) is a distributed storage framework with a unique architecture that maximizes applications control over many of the storage and retrieval policies. Several applications are utilizing NSM for efficient, tunable, and controllable performance. Data layout is one policy that is considered to be application dependant and tailored algorithms are preferred for application with complex or irregular access patters. Experimental results have shown dramatic performance enhancement when optimized layout policies override the default NSM implementation. Layout algorithms are more effective when proper prefetching and cache replacement policies are implemented.

Paillier's encryption: Implementation and cloud applications

Conference
Mohamed Nassar, Abdelkarim Erradi, Qutaibah M Malluhi
Applied Research in Computer Science and Engineering (ICAR), International Conference on 2015 , Pages 1-5 , Publisher IEEE

Paillier’s additive homomorphic encryption is increasingly used in recent research in the field of cloud secure outsourcing and privacy-preserving computation in addition to other cryptographic tools such as garbled circuits. In this paper, we review Paillier’s encryption and its application to privacy-preserving computation outsourcing and secure online voting. We present a new implementation of Paillier’s cryptosystem using Python as for interface language and fast GMP C-routines for arithmetic operations.

Performance analysis of data intensive cloud systems based on data management and replication: a survey

Journal
Saif Ur Rehman Malik, Samee U Khan, Sam J Ewen, Nikos Tziritas, Joanna Kolodziej, Albert Y Zomaya, Sajjad A Madani, Nasro Min-Allah, Lizhe Wang, Cheng-Zhong Xu, Qutaibah Marwan Malluhi, Johnatan E Pecero, Pavan Balaji, Abhinav Vishnu, Rajiv Ranjan, Sherali Zeadally, Hongxiang Li
Distributed and Parallel Databases , Volume 34 , Issue 2 ,Pages 179-215 ,Publisher Springer US

As we delve deeper into the ‘Digital Age’, we witness an explosive growth in the volume, velocity, and variety of the data available on the Internet. For example, in 2012 about 2.5 quintillion bytes of data was created on a daily basis that originated from myriad of sources and applications including mobile devices, sensors, individual archives, social networks, Internet of Things, enterprises, cameras, software logs, etc. Such ‘Data Explosions’ has led to one of the most challenging research issues of the current Information and Communication Technology era: how to optimally manage (e.g., store, replicated, filter, and the like) such large amount of data and identify new ways to analyze large amounts of data for unlocking information. It is clear that such large data streams cannot be managed by setting up on-premises enterprise database systems as it leads to a large up-front cost in buying and administering the hardware and software systems. Therefore, next generation data management systems must be deployed on cloud. The cloud computing paradigm provides scalable and elastic resources, such as data and services accessible over the Internet Every Cloud Service Provider must assure that data is efficiently processed and distributed in a way that does not compromise end-users’ Quality of Service (QoS) in terms of data availability, data search delay, data analysis delay, and the like. In the aforementioned perspective, data replication is used in the cloud for improving the performance (e.g., read and write delay) of applications that access data. Through replication a data intensive application or system can achieve high availability, better fault tolerance, and data recovery. In this paper, we survey data management and replication approaches (from 2007 to 2011) that are developed by both industrial and research communities. The focus of the survey is to discuss and characterize the existing approaches of data replication and management that tackle the resource usage and QoS provisioning with different levels of efficiencies. Moreover, the breakdown of both influential expressions (data replication and management) to provide different QoS attributes is deliberated. Furthermore, the performance advantages and disadvantages of data replication and management approaches in the cloud computing environments are analyzed. Open issues and future challenges related to data consistency, scalability, load balancing, processing and placement are also reported.

Performance Analysis of Distributed Parallel Archive Data Retrieval with Reliable Data Delivery

Conference
Zeyad Ali, Qutaiba Malluhi
Journal of Distributed Computing Laboratory , Pages 219-228

Performance and Evaluation of A VOD System Using A Distributed Parallel Data Delivery

Conference
Omar Aldaoud, Aparna Vaddi, Qutaibah Malluhi
Journal of Seventh IASTED International Conference on Internet and Multimedia Systems and Applications

Designing a Video-on-Demand (VoD) system imposes a great challenge due to huge storage requirements and real time constraints of continuous media. Continuous media are different from traditional textual data. First, the retrieval and display of continuous media are subject to real-time constraints. Second, objects of this media type are typically large in size. In this paper we present the performance evaluation of VoD system utilizing a distributed parallel data delivery mechanism for storage and retrieval of continuous media that satisfy the VoD QoS parameters in terms of delay jitter over best effort networks.

Practical and Secure Outsourcing of Matrix Computations to the Cloud`

Conference
Mohamed Nassar, Abdelkarim Erradi, Qutaibah M Malluhi
IEEE 33rd International Conference on Distributed Computing Systems Workshops ,2013 , Pages 70-75 ,Publisher IEEE

Cloud computing enables a cost effective outsourcing of storage and resource-intensive computations. Secure outsourcing of data and computation is challenging in this emerging computing model. In particular, outsourcing of sensitive computations should be assured in terms of input privacy and cheating detection. Existing approaches use either expensive homomorphic encryption which is not yet efficient enough for practical applications or secure tamper-proof hardware that are expensive and not scalable. Works on devising a real-world framework enabling such secured outsourcing are still needed. In this paper, we propose practical protocols for secure outsourcing of matrix algebra using randomization and without using cryptography. We address the issues of a real deployment in terms of distributed, reliable and secure storage, secure data transfer and computation on multiple, non-colluding clouds. We present the architecture and the APIs of our framework as well as some experimental results demonstrating the effectiveness of the proposed approach.

Ragged-Edge Array Coding for Reliable and Efficient Storage Management

Conference
Qutaibah Malluhi, MF Malouhi
Journal of QNRS Repository , Volume 2011 , Issue 1 , Pages 1393, Publisher Bloomsbury Qatar Foundation Journals

Role of contextual properties in enterprise service migration to cloud computing

Other
Khaled M Khan, Qutaibah Malluhi
Journal of Concurrency and Computation: Practice and Experience , Volume 25 ,Issue 18 , Pages 2455-2470

This paper attempts to identify the role of contextual properties of enterprise systems architecture in relation to service migration to cloud computing. In a cloud-based service architecture, the shift of ownership, scope, and control over architectural elements from consumers to cloud providers has a profound impact on ways cloud consumers design and manage their systems architecture. In this perspective, we introduce the concepts of architectural scope, ownership, and control as the contextual properties of systems architecture. The paper explores ways in which these properties can be mapped into a quantifiable framework that could be used to measure the degree of changes of contextual properties due to service migration to cloud computing. We seek here to address the service migration problems from a different perspective, namely, focusing on the contextual properties of architectural elements. Copyright © 2013 John Wiley & Sons, Ltd.

Secure and efficient outsourcing of sequence comparisons

Conference
Marina Blanton, Mikhail J Atallah, Keith B Frikken, Qutaibah Malluhi
European Symposium on Research in Computer Security ,Pages 505-522 ,Publisher Springer Berlin Heidelberg

We treat the problem of secure outsourcing of sequence comparisons by a client to remote servers, which given two strings λ and μ of respective lengths n and m, consists of finding a minimum-cost sequence of insertions, deletions, and substitutions (also called an edit script) that transform λ intoμ. In our setting a client owns λ and μ and outsources the computation to two servers without revealing to them information about either the input strings or the output sequence. Our solution is non-interactive for the client (who only sends information about the inputs and receives the output) and the client’s work is linear in its input/output. The servers’ performance is O(σmn) computation (which is optimal) and communication, where σ is the alphabet size, and the solution is designed to work when the servers have only O(σ(m + n)) memory. By utilizing garbled circuit evaluation in a novel way, we completely avoid public-key cryptography, which makes our solution particularly efficient.

Secure and private outsourcing of shape-based feature extraction

Conference
Shumiao Wang, Mohamed Nassar, Mikhail Atallah, Qutaibah Malluhi
International Conference on Information and Communications Security , Pages 90-99 ,Publisher Springer International Publishing

There has been much recent work on secure storage outsourcing, where an organization wants to store its data at untrusted remote cloud servers in an encrypted form, such that its own employees can query the encrypted data using weak devices (both computationally and storage-wise). Or a weak client wants to outsource an expensive computational task without revealing to the servers either the inputs or the computed outputs. The framework requires that the bulk of the computational burden of query-processing be placed on the remote servers, without revealing to these servers anything about the data. Most of the existing work in this area deals with non-image data that is keyword based, and the present paper is to deal with raw image data (without any keyword annotations). We demonstrate that shape-based image feature extraction, a particularly computationally intensive task, can be carried out within this framework, by presenting two schemes for doing so, and demonstrating their viability by experimentally evaluating them. Our results can be used in a number of practical situations. In one scenario the client has images and wants to securely outsource shape-based feature extraction on them, in another the server has encrypted images and the client wants a feature-extracted representation of those that are feature-rich.

Secure Outsourcing of Matrix Operations as a Service

Conference
Mohamed Nassar, Abdelkarim Erradi, Farida Sabri, Qutaibah M Malluhi
IEEE Sixth International Conference on Cloud Computing, 2013, Pages 918-925 ,Publisher IEEE

This paper reports the design of a cloud-based service for coordinating secure outsourcing of storage and computation of scientific data, particularly matrices. While this service may support different secure outsourcing protocols and mechanisms (e.g. homomorphic encryption, secret sharing and randomization), we hide all the complexity from end-users and move it to a middleware broker. The broker manages the communication between the client and one or more clouds. The requests submitted by users are automatically translated into WS-BPEL workflows and then executed by the broker workflow engine to coordinate the control flow and the data flow for outsourcing of matrix operations. We detail the architecture of our framework and the design of its key components. Our work facilitates real-world and practical deployment of recently proposed secure outsourcing protocols.

Secure outsourcing of network flow data analysis

Conference
Mohamed Nassar, Bechara al Bouna, Qutaibah Malluhi
IEEE International Congress on Big Data, 2013, Pages 431-432 ,Publisher IEEE

In this paper, we identify a new and challenging application for the growing field of research on data anonymization and secure outsourcing of storage and computations to the cloud. Network flow data analysis is of high importance for network monitoring and management. Network monitoring applications reveal new challenges not yet addressed in the secure outsourcing literature. The secure and verifiable outsourcing of computation on anonymized network flow records provides a practical tool for network operators in order to harness the cloud benefits, which untapped until now because of privacy concerns. We present representative use-cases and problems, and identify sample related work that can be utilized for developing an effective solution.

Similarity group-by operators for multi-dimensional relational data

Journal
Mingjie Tang, Ruby Y Tahboub, Walid G Aref, Mikhail J Atallah, Qutaibah M Malluhi, Mourad Ouzzani, Yasin N Silva
IEEE Transactions on Knowledge and Data Engineering ,Volume 28 ,Issue 2 ,Pages 510-523, Publisher IEEE

The SQL group-by operator plays an important role in summarizing and aggregating large datasets in a data analytics stack. While the standard group-by operator, which is based on equality, is useful in several applications, allowing similarity aware grouping provides a more realistic view on real-world data that could lead to better insights. The Similarity SQL-based Group-By operator (SGB, for short) extends the semantics of the standard SQL Group-by by grouping data with similar but not necessarily equal values. While existing similarity-based grouping operators efficiently realize these approximate semantics, they primarily focus on one-dimensional attributes and treat multi-dimensional attributes independently. However, correlated attributes, such as in spatial data, are processed independently, and hence, groups in the multi-dimensional space are not detected properly. To address this problem, we introduce two new SGB operators for multi-dimensional data. The first operator is the clique (or distance-to-all) SGB, where all the tuples in a group are within some distance from each other. The second operator is the distance-to-any SGB, where a tuple belongs to a group if the tuple is within some distance from any other tuple in the group. Since a tuple may satisfy the membership criterion of multiple groups, we introduce three different semantics to deal with such a case: (i) eliminate the tuple, (ii) put the tuple in any one group, and (iii) create a new group for this tuple. We implement and test the new SGB operators and their algorithms inside PostgreSQL. The overhead introduced by these operators proves to be minimal and the execution times are comparable to those of the standard Group-by. The experimental study, based on TPC-H and a social check-in data, demonstrates that the proposed algorithms can achieve up to three orders of magnitude enhancement in performance over baseline methods developed to solve the same problem.

Skyline Discovery and Composition of Multi-Cloud Mashup Services

Journal
Fan Zhang, Kai Hwang, Samee U Khan, Qutaibah M Malluhi
IEEE Transactions on Services Computing , Volume 9 , Issue 1 , Pages 72-83 ,Publisher IEEE

A cloud mashup is composed of multiple services with shared datasets and integrated functionalities. For example, the elastic compute cloud (EC2) provided by Amazon Web Service (AWS), the authentication and authorization services provided by Facebook, and the Map service provided by Google can all be mashed up to deliver real-time, personalized driving route recommendation service. To discover qualified services and compose them with guaranteed quality of service (QoS), we propose an integrated skyline query processing method for building up cloud mashup applications. We use a similarity test to achieve optimal localized skyline. This mashup method scales well with the growing number of cloud sites involved in the mashup applications. Faster skyline selection, reduced composition time, dataset sharing, and resources integration assure the QoS over multiple clouds. We experiment with the quality of Web service (QWS) benchmark over 10,000 Web services along six QoS dimensions. By utilizing block-elimination, data-space partitioning, and service similarity pruning, the skyline process is shortened by three times, when compared with two state-of-the-art methods.

The Extended Cube Connected Cycles: An Efficient Interconnection for Massively Parallel Systems.

Journal
Rafic A. Ayoubi, Qutaibah M. Malluhi, Magdy A. Bayoumi
IEEE Trans. Computers 45(5): 609-614 (1996)

The hypercube structure is a very widely used interconnection topology because of its appealing topological properties. For massively parallel systems with thousands of processors, the hypercube suffers from a high node fanout which makes such systems impractical and infeasible. In this paper, we introduce an interconnection network called The Extended Cube Connected Cycles (ECCC) which is suitable for massively parallel systems. In this topology the processor fanout is fixed to four. Other attractive properties of the ECCC include a diameter of logarithmic order and a small average interprocessor communication distance which imply fast data transfer. The paper presents two algorithms for data communication in the ECCC. The first algorithm is for node-to-node communication and the second is for node-to-all broadcasting. Both algorithms take O(log N) time units, where N is the total number of processors in the system. In addition, the paper shows that a wide class of problems, the divide and conquer class, is easily and efficiently solvable on the ECCC topology. The solution of a divide and conquer problem of size N requires O(log N) time units

The Hierarchical Hypercube: A New Interconnection Topology for Massively Parallel Systems

Journal
Qutaibah M. Malluhi, Magdy A. Bayoumi
IEEE Trans. Parallel Distrib. Syst. 5(1): 17-30 (1994) 1993

Interconnection networks play a crucial role in the performance of parallel systems. This paper introduces a new interconnection topology that is called the hierarchical hypercube (HHC). This topology is suitable for massively parallel systems with thousands of processors. An appealing property of this network is the low number of connections per processor, which enhances the VLSI design and fabrication of the system. Other alluring features include symmetry and logarithmic diameter, which imply easy and fast algorithms for communication. Moreover, the HHC is scalable; that is it can embed HHC’s of lower dimensions. The paper presents two algorithms for data communication in the HHC. The first algorithm is for one-to-one transfer, and the second is for one-to-all broadcasting. Both algorithms take O(log2 k), where k is the total number of processors in the system. A wide class of problems, the divide & conquer class (D&Q), is shown to be easily and efficiently solvable on the HHC topology. Parallel algorithms are provided to describe how a D&Q problem can be solved efficiently on an HHC structure. The solution of a D&Q problem instance having up to k inputs requires a time complexity of O(log2 k)

The similarity-aware relational database set operators

Journal
Wadha J Al Marri, Qutaibah Malluhi, Mourad Ouzzani, Mingjie Tang, Walid G Aref
Information Systems, Volume 59 ,Pages 79-93

Identifying similarities in large datasets is an essential operation in several applications such as bioinformatics, pattern recognition, and data integration. To make a relational database management system similarity-aware, the core relational operators have to be extended. While similarity-awareness has been introduced in database engines for relational operators such as joins and group-by, little has been achieved for relational set operators, namely Intersection, Difference, and Union. In this paper, we propose to extend the semantics of relational set operators to take into account the similarity of values. We develop efficient query processing algorithms for evaluating them, and implement these operators inside an open-source database system, namely PostgreSQL. By extending several queries from the TPC-H benchmark to include predicates that involve similarity-based set operators, we perform extensive experiments that demonstrate up to three orders of magnitude speedup in performance over equivalent queries that only employ regular operators.

The similarity-aware relational intersect database operator

Conference
Wadha J Al Marri, Qutaibah Malluhi, Mourad Ouzzani, Mingjie Tang, Walid G Aref
International Conference on Similarity Search and Applications , Pages 164-175 , Publisher Springer International Publishing

Identifying similarities in large datasets is an essential operation in many applications such as bioinformatics, pattern recognition, and data integration. To make the underlying database system similarity-aware, the core relational operators have to be extended. Several similarity-aware relational operators have been proposed that introduce similarity processing at the database engine level, e.g., similarity joins and similarity group-by. This paper extends the semantics of the set intersection operator to operate over similar values. The paper describes the semantics of the similarity-based set intersection operator, and develops an efficient query processing algorithm for evaluating it. The proposed operator is implemented inside an open-source database system, namely PostgreSQL. Several queries from the TPC-H benchmark are extended to include similarity-based set intersetion predicates. Performance results demonstrate up to three orders of magnitude speedup in performance over equivalent queries that only employ regular operators.

Tree-based special-purpose Array architectures for neural computing

Journal
Qutaibah M. Malluhi, Magdy A. Bayoumi, T. R. N. Rao
VLSI Signal Processing 11(3): 245-262 (1995)

A massively parallel architecture called the mesh-of-appendixed-trees (MAT) is shown to be suitable for processing artificial neural networks (ANNs). Both the recall and the learning phases of the multilayer feedforward with backpropagation ANN model are considered. The MAT structure is refined to produce two special-purpose array processors; FMAT1 and FMAT2, for efficient ANN computation. This refinement tends to reduce circuit area and increase hardware utilization. FMAT1 is a simple structure suitable for the recall phase. FMAT2 requires little extra hardware but supports learning as well. A major characteristic of the proposed neurocomputers is high performance. It takesO (logN) time to process a neural network withN neurons in its largest layer. Our proposed architecture is shown to provide the best number of connections per unit time when compared to several major techniques in the literature. Another important feature of our approach is its ability to pipeline more than one input pattern which further improves the performance.

Trust in cloud services: providing more controls to clients

Journal
Trust in cloud services: providing more controls to clients
IEEE Computer , Volume 46 , Issue 7 ,Pages 0094-96 , Publisher IEEE Computer Society
Trust is more important than money and will ultimately determine cloud computing’s success.

Updating outsourced anatomized private databases

Conference
Ahmet Erhan Nergiz, Chris Clifton, Qutaibah M Malluhi
Proceedings of the 16th International Conference on Extending Database Technology, Pages 179-190 , Publisher ACM

We introduce operations to safely update an anatomized database. The result is a database where the view of the server satisfies standards such as k-anonymity or l-diversity, but the client is able to query and modify the original data. By exposing data where possible, the server can perform value-added services such as data analysis not possible with fully encrypted data, while still being unable to violate privacy constraints. Update is a key challenge with this model; naïve application of insertion and deletion operations reveals the actual data to the server. This paper shows how data can be safely inserted, deleted, and updated. The key ideas are that data is inserted or updated into an encrypted temporary table until enough data is available to safely decrypt, and that sensitive information of deleted tuples is left behind to ensure privacy of both deleted and undeleted individuals. This approach is proven effective in maintaining the privacy constraint against an adversarial server. The paper also gives empirical results on how much data remains encrypted, and the resulting quality of the server’s (anatomized) view of the data, for various update and delete rates.

Using safety constraint for transactional dataset anonymization

Conference
Bechara Al Bouna, Chris Clifton, Qutaibah Malluhi
IFIP Annual Conference on Data and Applications Security and Privacy , Pages 164-178 , Publisher Springer Berlin Heidelberg

In this paper, we address privacy breaches in transactional data where individuals have multiple tuples in a dataset. We provide a safe grouping principle to ensure that correlated values are grouped together in unique partitions that enforce l-diversity at the level of individuals. We conduct a set of experiments to evaluate privacy breach and the anonymization cost of safe grouping.

Using the sadakane compressed suffix tree to solve the all-pairs suffix-prefix problem

Journal
Maan Haj Rachid, Qutaibah Malluhi, Mohamed Abouelhoda
Journal of BioMed research international ,Volume 2014 ,Publisher Hindawi Publishing Corporation

The all-pairs suffix-prefix matching problem is a basic problem in string processing. It has an application in the de novo genome assembly task, which is one of the major bioinformatics problems. Due to the large size of the input data, it is crucial to use fast and space efficient solutions. In this paper, we present a space-economical solution to this problem using the generalized Sadakane compressed suffix tree. Furthermore, we present a parallel algorithm to provide more speed for shared memory computers. Our sequential and parallel algorithms are optimized by exploiting features of the Sadakane compressed index data structure. Experimental results show that our solution based on the Sadakane’s compressed index consumes significantly less space than the ones based on noncompressed data structures like the suffix tree and the enhanced suffix array. Our experimental results show that our parallel algorithm is efficient and scales well with increasing number of processors.