Monday, October 29, 2012

Team Work..!

Friday, September 28, 2012

Bill Inmon vs. Ralph Kimball

Bill Inmon's paradigm: Data warehouse is one part of the overall business intelligence system. An enterprise has one data warehouse, and data marts source their information from the data warehouse. In the data warehouse, information is stored in 3rd normal form. In the data warehousing field, we often hear about discussions on where a person / organization's philosophy falls into Bill Inmon's camp or into Ralph Kimball's camp. We describe below the difference between the two.

There is no right or wrong between these two ideas, as they represent different data warehousing philosophies. In reality, the data warehouse in most enterprises are closer to Ralph Kimball's idea. This is because most data warehouses started out as a departmental effort, and hence they originated as a data mart. Only when more data marts are built later do they evolveRalph Kimball's paradigm: Data warehouse is the conglomerate of all data marts within the enterprise. Information is always stored in the dimensional model.
into a data warehouse.

Source:http://www.1keydata.com/datawarehousing/inmon-kimball.html

Ethical hacker?

An ethical hacker is a computer and network expert who attacks a security system on behalf of its owners, seeking vulnerabilities that a malicious hacker could exploit. To test a security system, ethical hackers use the same methods as their less principled counterparts, but report problems instead of taking advantage of them. Ethical hacking is also known as penetration testing, intrusion testing and red teaming. An ethical hacker is sometimes called a white hat, a term that comes from old Western movies, where the "good guy" wore a white hat and the "bad guy" wore a black hat.

One of the first examples of ethical hackers at work was in the 1970s, when the United States government used groups of experts called red teams to hack its own computer systems. According to Ed Skoudis, Vice President of Security Strategy for Predictive Systems' Global Integrity consulting practice, ethical hacking has continued to grow in an otherwise lackluster IT industry, and is becoming increasingly common outside the government and technology sectors where it began. Many large companies, such as IBM, maintain employee teams of ethical hackers.
In a similar but distinct category, a hacktivist is more of a vigilante: detecting, sometimes reporting (and sometimes exploiting) security vulnerabilities as a form of social activism.

Data Vault Modeling

Data Vault Modeling is a database modeling method that is designed to provide historical storage of data coming in from multiple operational systems. It is also a method of looking at historical data that, apart from the modeling aspect, deals with issues such as auditing, tracing of data, loading speed and resilience to change.

Data Vault Modeling focuses on several things. First, it emphasizes the need to trace of where all the data in the database came from. Second, it makes no distinction between good and bad data ("bad" meaning not conforming to business rules),^[1] leading to "a single version of the facts" versus "a single version of the truth",^[2] also expressed by Dan Linstedt as "all the data, all of the time". Third, the modeling method is designed to be resilient to change in the business environment where the data being stored is coming from, by explicitly separating structural information from descriptive attributes.^[3] Finally, Data Vault is designed to enable parallel loading as much as possible,^[4] so that you can scale out for very large implementations.
An alternative (and seldom used) name for the method is "Common Foundational Integration Modelling Architecture."^[5]

Basic notions

Data Vault attempts to solve the problem of dealing with change in the environment by separating the business keys (that do not mutate as often, because they uniquely identify a business entity) and the associations between those business keys, from the descriptive attributes of those keys.

The business keys and their associations are structural attributes, forming the skeleton of the data model. The Data Vault method has as one of its main axioms that real business keys only change when the business changes and are therefore the most stable elements from which to derive the structure of a historical database. If you use these keys as the backbone of a Data Warehouse, you can organize the rest of the data around them. This means that choosing the correct keys for the Hubs is of prime importance for the stability of your model.^[13] The keys are stored in tables with a few constraints on the structure. These key-tables are called Hubs.

Hubs

Hubs contain a list of unique business keys with low propensity to change. Hubs also contain a surrogate key for each Hub item and metadata describing the origin of the business key. The descriptive attributes for the information on the Hub (such as the description for the key, possibly in multiple languages) are stored in structures called Satellite tables which will be discussed below.
The Hub contains at least the following fields:^[14]

a surrogate key, used to connect the other structures to this table.
a business key, the driver for this hub. The business key can consist of multiple fields.
the record source, can be used to see where the business keys come from and if the primary loading system has all of the keys available in other systems as well.
optionally, you can also have metadata fields with information about manual updates (user/time) and the extraction date.

A Hub is not allowed to contain multiple business keys, except when two systems deliver the same business key but with collisions that have different meanings.
Hubs should normally have at least one satellite.^[14]

Hub example

This is an example for a Hub-table containing Cars, surprisingly called "Car" (H_CAR). The driving key is Vehicle Identification Number.

Tuesday, September 25, 2012

How Quantum Computers Work

The massive amount of processing power generated by computer manufacturers has not yet been able to quench our thirst for speed and computing capacity. In 1947, American computer engineer Howard Aiken said that just six electronic digital computers would satisfy the computing needs of the United States. Others have made similar errant predictions about the amount of computing power that would support our growing technological needs. Of course, Aiken didn't count on the large amounts of data generated by scientific research, the proliferation of personal computers or the emergence of the Internet, which have only fueled our need for more, more and more computing power.

Will we ever have the amount of computing power we need or want? If, as Moore's Law states, the number of transistors on a microprocessor continues to double every 18 months, the year 2020 or 2030 will find the circuits on a microprocessor measured on an atomic scale. And the logical next step will be to create quantum computers, which will harness the power of atoms and molecules to perform memory and processing tasks. Quantum computers have the potential to perform certain calculations significantly faster than any silicon-based computer.
Scientists have already built basic quantum computers that can perform certain calculations; but a practical quantum computer is still years away. In this article, you'll learn what a quantum computer is and just what it'll be used for in the next era of computing.
You don't have to go back too far to find the origins of quantum computing. While computers have been around for the majority of the 20th century, quantum computing was first theorized less than 30 years ago, by a physicist at the Argonne National Laboratory. Paul Benioff is credited with first applying quantum theory to computers in 1981. Benioff theorized about creating a quantum Turing machine. Most digital computers, like the one you are using to read this article, are based on the Turing Theory. Learn what this is in the next section.

source:http://www.howstuffworks.com/quantum-computer.htm

How users see the programmers :)

Wednesday, August 22, 2012

Data Profiling

Data profiling, also called data archeology, is the statistical analysis and assessment of the quality of data values within a data set for consistency, uniqueness and logic.

The insight gained by data profiling can be used to determine how difficult it will be to use existing data for other purposes. It can also be used to provide metrics to assess data quality and determine whether or not metadata accurately describes the actual values in the source data. The data profiling process cannot identify inaccurate data; it can only identify business rules violations and anomalies.
Profiling tools evaluate the actual content, structure and quality of the data by exploring relationships that exist between value collections both within and across data sets. For example, by examining the frequency distribution of different values for each column in a table, an analyst can gain insight into the type and use of each column. Cross-column analysis can be used to expose embedded value dependencies and inter-table analysis allows the analyst to discover overlapping value sets that represent foreign key relationships between entities.

Source:http://searchdatamanagement.techtarget.com/definition/data-profiling

Pages

Translate