- Three tasks to be performed in the process of improving the quality of datasets by means of the Software Development Life Cycle methodology with the description of the activity at each stage.
It is cheaper to correct database related issues when discovered in an earlier stage than when they erupt at the final phase of development. Therefore, there is a need to perform data quality checks in each phase of SDLC for the sake of coming up with an error free product as deliberated below;
At this stage is the totally coding and engineering of the system with the effort to meet the set system requirements. Bassil (2012) explained that for the sake of quality, there should be an iterative process in data assessment so that the end product is perfect. Software and hardware specifications are reviewed together with the system architecture. Data that is being created as well as existing data should be monitored well. However, there should be set processes of error detection with special tools like CA Veracode Greenlight and CA Service Virtualization.
This is where the code produced during the development phase is tested. For the purpose of refining quality datasets, there is a need to consider process control together with improvement. There should be both dynamic, static, and manual analysis carried out in this phase. A comprehensive array of functional, integration, performance, and even unit testing is done inconsideration of the language and system.
Here, systematic, application and administrative changes are witnessed in the system of the application being developed. There has to be an appropriate continuous monitoring metrics for the purpose of checking the quality of data hence providing the means for taking speedy action when need be. In this way, error correction is easily achieved hence quality data is realized.
- Actions to be performed with the aim of optimizing the selection of records and entire database performance improvement basing on quantitative data quality assessment.
Automated controls can be well applied in the design stage of SDLC whereby controls including processing, input and output are employed for the purpose of security, reliability, and integrity of the system and also datasets (Chikkerur et al., 2012). For instance, duplicate information and blank fields are avoided with the help of input controls like duplication checks and completeness checks. Automating process controls, on the other hand, monitors the correctness of the system in processing and also in information recording. Error detection, process design, and process control are some of the quality management techniques that can be used to improve quality assessment.
- Three maintenance plans together with three activities to be performed with the aim of improving data quality.
Three maintenance plans. Corrective plan, preventive plan, and maintenance plan are vital for the purpose of improving data quality. The corrective plan is done after a defect has been witnessed, unlike a preventive plan which is a precaution put in place to avoid errors that may emerge. The maintenance plan, on the other hand, involved daily serving of the entire system.
Three activities to be performed to improve data quality.
Error Detection and Correction. Here, activities that can be performed while improving data quality. Missing values are checked, the available data is compared to the correct baseline, and also the time stamp that is associated with the current data is examined. The complexity of data like the processing stages, outputs, and inputs is considered while implementing the policies that are correctional in nature.
Process control and improvement. The quality requirements of data are defined by the Total Data Quality Management (TDQM) which is a methodology that results in analyzed and improved data. The methodologies that support TDQM are quality dimension visualization, systematic demonstration of data and quality improvement optimization.
Process design. Here the data processes are built as new and the existing ones are redesigned for the purpose of either eliminating or reducing data errors. Therefore, the quality of data is improved where the very causes of defects are eliminated.
- (i)The most efficient method for planning proactive concurrency control together with lock granularities and how it minimizes security risks on a database in a multiuser environment.
As a result of multiple systems experiencing tasks simultaneously, conflicts erupt and results in inconsistencies. Rows, pages, cells, and even tables are locked by means of granular locking schemes. High and low granularity approach are two ways or rather methods that’s that serves databases that are distributed in nature with consistency. Therefore, maximum concurrency is attained with high granularity despite that it needs additional overhead unlike low granularity that reduces concurrency and at the same time requires minimal overhead. However, proactive concurrency control is attained within the system by means of providing extra overhead by means of locking granularity at diverse stages of object oriented hierarchy levels (Cowling and Liskov, 2012).
(ii) How to avoid record-level locking of the database that is in use due to its current transactions while employing the verify method in planning out of a system in a more effective manner.
Consistency and concurrency have to be controlled well in a multiuser database that experiences simultaneous execution of transactions so that consistent results can be obtained. Serializability model, which is a transaction isolation model is used to make it look like all the transactions always happen at one time. Multiple users are provided with a separate view of real-time data hence avoiding record-level locking interfering with the database with the help of a multi-version consistent model.
Centralized Verses Decentralized Database Management System
Challenges that come with big data. Big data generally implies to a massive amount of data that may be structured or unstructured to an extent that it is so large that the means of processing it with traditional software techniques and databases is difficult. The processing capacity that is currently available finds it difficult to manage its capacity and also its speed. There are challenges that come with big data which have not been an issue for the traditionally designed databases like the relational ones (Özsu and Valduriez, 2011). The first one is that big data is made up of cluster servers where each one has a slice of data that is stored in then. There are multiple uses of nodes among applications when communicating in this clusters. This makes it hard to protect big data since it needs one to secure the whole data center and not a single server.
The other challenge with big data is the fact that it lacks a standard cluster. Tuple stores, wide columnar stores, and graph data are just but a few to mention among the more than one hundred and fifty data variants available in bid data and each of them with a unique specialization. Components only can be swapped between many of these variations but things like resource manager, data model, data access layer and orchestration tools among others are interchangeable. While building these platforms, security was not considered but only performance and scalability was what was the building blocks. This leads to limited capabilities compared to those distributions that are commercial.
When you talk of compatibility between big data and the existing traditional tools, a number of traditional tools do not fit and work in a good way with the technologies that are seen in big data. The capabilities that the traditional products have is outpaced with the velocity of data, multi-node design, sheer scale and variety that comes with big data. There are also challenges in terms of scaling on some forms of security like masking, encryption that is row-level and even analyzing packets. However, some of the forms of security like query monitoring and content filtering generally do not work.
How NoSQL addresses these challenges. The term NoSQL is used to give the difference between the relational database and these platforms simple to carry the meaning of “Not Only SQL”. The most known way that NoSQL approaches big data security issues is by means of a model known as “walled garden security model”. In this approach, the entire structure is placed on a separate network allowing it to control its logical access through access controls and firewalls. This is to mean that within the NoSQL, there is no security but only on the outer protective shell of applications and network around the database (Hecht and Jablonski, 2011). It is a cost-effective and simple approach but only for organizations that are not so much worried about security.
The other way that NoSQL uses in approaching big data challenges is by means of third party products or leverages security tools that are made in the NoSQL cluster. Some of these tools include Kerberos which serves the function of node authentication, SSL or TSL which assists in securing communication, the transparent encryption which offers data-at-rest security among others. The only setbacks are that they do not control rogue admins despite being most comprehensive and effective as much as NoSQL is concerned.
Data-centric security is another NoSQL security model that is known for protecting data even before the very data moves to a data repository that is bigger. This is done with the help of basic tools like masking, tokenization, and also the data element encryption. In an event where the system that is tasked with processing data cannot be in one way or the other trusted, data-centric security model is employed. This, therefore, is to mean big data clustered are not trusted in information keeping by many enterprises. The controls are defined on data before any effort of moving that very data can be made.
NoSQL data models. Denormalization model is one of the NoSQL data models that entails copying of similar data into multiple tables or rather documents with the aim of simplifying the process of querying or so that a user’s records can fit into a certain data model. This model is advantaged whereby data needed for a query to be processed is grouped in one place hence resulting to simplicity in query processing. Unlike traditional databases where modeling-time normalization and what can be termed as query-time that adds more complexity on the side of the query processor, denormalization provides for storage of data in structures that are query friendly hence simple query processing.
Benefits of NoSQL. The main importance of denormalization is to tune a particular database to fit a certain application. Online Analytical Processing applications (OLAP) like financial reporting, business reporting and sales, and budgeting are the most beneficiaries of denormalization. This is due to their behavior of extracting data that has been kept for a longer period. Here, denormalization helps by avoiding joins in the databases, reduced tables, reduced foreign keys and allowing a star alteration method.
Business Intelligence Tools
These are application software that tasked with retrieving, transforming, reporting, and even analyzing data form systems that are internal or even external (Turban et al., 2010). There are several tools that are designed which can as well be used to report business performance like the once discussed below;
Actuate business intelligence and reporting tools (BIRT). It comes with the advantage of being open source which is purely Java coded with the capability of publishing reports across multiple data sources like XML, business relational databases, to even Java objects that are in-memory. It also has the character of being composed with a component of Java that is runtime. It has features like the single view of all data, user friendly, analytical techniques that are best practices, enterprise reporting, a performance indicator, faster in performance among others.
System application products in data processing (SAP) business intelligence. Popularly known to be an application that ranks at an enterprise level and usually for server systems and open clients too. It is currently ranked as the best among organizations due to its portability and quality services that it provides. It has features like simple warehouse architecture in terms of its data, it is flexible, its applications are compatible with any system, it can be easily utilized due to its’ modular concept, it has support in terms of cloud deployment and On-premise, and the best of all is that it can be easily integrated with SAP and other applications that are not SAP in nature. It has special add-ins that play a vital role as far as business performance reporting is concerned like excel add-ins and other BI platforms like arcplan, Cognos, QlikView among others.
Cost estimation. There are important considerations that either an individual or a company has to consider before purchasing any business intelligence product. For instance, when a company desires to venture into the business of buying the above discussed business intelligence tools, aspects like functionality, integration capabilities, and even the benefits that the product will bring to the company have to be considered. Looking at BIRT, for instance, it is estimated to be one of the most expensive business intelligence tools currently. It is estimated to cost around 20 000 dollars a year for a company to be able to get full services that it comes with. On the other hand, SAP is much cheaper than BIRT since it is estimated to cost 3213 dollars a year for a professional license.
The functionality part of BIRT considering the price that it comes with, it is more complex whereby the cost of training the users is also incorporated in the pricing of the software. There have to be certain configurations done to fit your business before it is released to the buyer. The integration part of BIRT also is complex meaning it has to run on Java platform only making it expensive for the buyer. SAP on the other has it is one of the tools that can be easily integrated to multiple environments and the good thing about it is that it works on any browser. This feature makes its cost to be lower. The functionality character of SAP is recommendable due to its ease to use and portability. This is the reason as to why the vendors did not include the cost of training and integration to the product. Therefore, most enterprises nowadays are going for SAP due to the benefits it comes with compared to BIRT. An organization does not require special servers and technicians to maintain and integrate SAP compared to the requirements that come with BIRT
Bassil, Y. (2012). A simulation model for the waterfall software development life cycle. arXiv preprint arXiv:1205.6904.
Chikkerur, S., Sundaram, V., Reisslein, M., & Karam, L. J. (2011). Objective video quality assessment methods: A classification, review, and performance comparison. IEEE transactions on broadcasting, 57(2), 165.
Cowling, J. A., & Liskov, B. (2012, June). Granola: Low-Overhead Distributed Transaction Coordination. In USENIX Annual Technical Conference (Vol. 12).
Hecht, R., & Jablonski, S. (2011, December). NoSQL evaluation: A use case oriented survey. In Cloud and Service Computing (CSC), 2011 International Conference on (pp. 336- 341). IEEE.
Özsu, M. T., & Valduriez, P. (2011). Principles of distributed database systems. Springer Science & Business Media.
Turban, E., Sharda, R., & Delen, D. (2010). Decision Support and Business Intelligence Systems (required). Google Scholar.