ASHOKA SAMRAT: SQL Questions

1) General Questions on SQL SERVER

What is RDBMS?

Relational Data Base Management Systems (RDBMS) are database management systems that maintain data records and indices in tables. Relationships may be created and maintained across and among the data and tables. In a relational database, relationships between data items are expressed by means of tables. Interdependencies among these tables are expressed by data values rather than by pointers. This allows a high degree of data independence. An RDBMS has the capability to recombine the data items from different files, providing powerful tools for data usage. (Read more here)

What are the Properties of the Relational Tables?

Relational tables have the following six properties:

Values are atomic.
Column values are of the same kind.
Each row is unique.
The sequence of columns is insignificant.
The sequence of rows is insignificant.
Each column must have a unique name.

What is Normalization?

Database normalization is a data design and organization process applied to data structures based on rules that help building relational databases. In relational database design, the process of organizing data to minimize redundancy is called normalization. Normalization usually involves dividing a database into two or more tables and defining relationships between the tables. The objective is to isolate data so that additions, deletions, and modifications of a field can be made in just one table and then propagated through the rest of the database via the defined relationships.

What is De-normalization?

De-normalization is the process of attempting to optimize the performance of a database by adding redundant data. It is sometimes necessary because current DBMSs implement the relational model poorly. A true relational DBMS would allow for a fully normalized database at the logical level, while providing physical storage of data that is tuned for high performance. De-normalization is a technique to move from higher to lower normal forms of database modeling in order to speed up database access.

How is ACID property related to Database?

ACID (an acronym for Atomicity Consistency Isolation Durability) is a concept that Database Professionals generally look for while evaluating databases and application architectures. For a reliable database, all this four attributes should be achieved.

Atomicity is an all-or-none proposition.

Consistency guarantees that a transaction never leaves your database in a half-finished state.

Isolation keeps transactions separated from each other until they are finished.

Durability guarantees that the database will keep track of pending changes in such a way that the server can recover from an abnormal termination. (Read more here)

What are the Different Normalization Forms?

1NF: Eliminate Repeating Groups

Make a separate table for each set of related attributes, and give each table a primary key. Each field contains at most one value from its attribute domain.

2NF: Eliminate Redundant Data

If an attribute depends on only part of a multi-valued key, then remove it to a separate table.

3NF: Eliminate Columns Not Dependent On Key

If attributes do not contribute to a description of the key, then remove them to a separate table. All attributes must be directly dependent on the primary key. (Read more here)

BCNF: Boyce-Codd Normal Form

If there are non-trivial dependencies between candidate key attributes, then separate them out into distinct tables.

4NF: Isolate Independent Multiple Relationships

No table may contain two or more 1:n or n:m relationships that are not directly related.

5NF: Isolate Semantically Related Multiple Relationships

There may be practical constrains on information that justify separating logically related many-to-many relationships.

ONF: Optimal Normal Form

A model limited to only simple (elemental) facts, as expressed in Object Role Model notation.

DKNF: Domain-Key Normal Form

A model free from all modification anomalies is said to be in DKNF.

Remember, these normalization guidelines are cumulative. For a database to be in 3NF, it must first fulfill all the criteria of a 2NF and 1NF database.

What is a Stored Procedure?

A stored procedure is a named group of SQL statements that have been previously created and stored in the server database. Stored procedures accept input parameters so that a single procedure can be used over the network by several clients using different input data. And when the procedure is modified, all clients automatically get the new version. Stored procedures reduce network traffic and improve performance. Stored procedures can be used to help ensure the integrity of the database.

e.g. sp_helpdb, sp_renamedb, sp_depends etc.

What is a Trigger?

A trigger is a SQL procedure that initiates an action when an event (INSERT, DELETE or UPDATE) occurs. Triggers are stored in and managed by the DBMS. Triggers are used to maintain the referential integrity of data by changing the data in a systematic fashion. A trigger cannot be called or executed; DBMS automatically fires the trigger as a result of a data modification to the associated table. Triggers can be considered to be similar to stored procedures in that both consist of procedural logic that is stored at the database level. Stored procedures, however, are not event-drive and are not attached to a specific table as triggers are. Stored procedures are explicitly executed by invoking a CALL to the procedure while triggers are implicitly executed. In addition, triggers can also execute stored procedures.

Nested Trigger: A trigger can also contain INSERT, UPDATE and DELETE logic within itself; so when the trigger is fired because of data modification, it can also cause another data modification, thereby firing another trigger. A trigger that contains data modification logic within itself is called a nested trigger. (Read more here)

What are the Different Types of Triggers?

There are two types of Triggers.

1) DML Trigger

There are two types of DML Triggers

1.Instead of Trigger
Instead of Triggers are fired in place of the triggering action such as an insert, update, or delete.

2. After Trigger
After triggers execute following the triggering action, such as an insert, update, or delete.

2) DDL Trigger

This type of trigger is fired against Drop Table, Create Table, Alter Table or Login events. DDL Triggers are always After Triggers.

What is a View?

A simple view can be thought of as a subset of a table. It can be used for retrieving data as well as updating or deleting rows. Rows updated or deleted in the view are updated or deleted in the table the view was created with. It should also be noted that as data in the original table changes, so does the data in the view as views are the way to look at parts of the original table. The results of using a view are not permanently stored in the database. The data accessed through a view is actually constructed using standard T-SQL select command and can come from one to many different base tables or even other views.

What is an Index?

An index is a physical structure containing pointers to the data. Indices are created in an existing table to locate rows more quickly and efficiently. It is possible to create an index on one or more columns of a table, and each index is given a name. The users cannot see the indexes; they are just used to speed up queries. Effective indexes are one of the best ways to improve performance in a database application. A table scan happens when there is no index available to help a query. In a table scan, the SQL Server examines every row in the table to satisfy the query results. Table scans are sometimes unavoidable, but on large tables, scans have a terrific impact on performance.

What is a Linked Server?

Linked Servers is a concept in SQL Server by which we can add other SQL Server to a Group and query both the SQL Server databases using T-SQL Statements. With a linked server, you can create very clean, easy–to-follow SQL statements that allow remote data to be retrieved, joined and combined with local data. Stored Procedures sp_addlinkedserver, sp_addlinkedsrvlogin will be used to add new Linked Server. (Read more here)

What is a Cursor?

A cursor is a database object used by applications to manipulate data in a set on a row-by-row basis, instead of the typical SQL commands that operate on all the rows in the set at one time.

In order to work with a cursor, we need to perform some steps in the following order:

Declare cursor
Open cursor
Fetch row from the cursor
Process fetched row
Close cursor
Deallocate cursor (Read more here)

What is Collation?

Collation refers to a set of rules that determine how data is sorted and compared. Character data is sorted using rules that define the correct character sequence with options for specifying case sensitivity, accent marks, Kana character types, and character width.

What is the Difference between a Function and a Stored Procedure?

UDF can be used in the SQL statements anywhere in the WHERE/HAVING/SELECT section, whereas Stored procedures cannot be. UDFs that return tables can be treated as another rowset. This can be used in JOINs with other tables. Inline UDF’s can be thought of as views that take parameters and can be used in JOINs and other Rowset operations.

What is subquery? Explain the Properties of a Subquery?

Subqueries are often referred to as sub-selects as they allow a SELECT statement to be executed arbitrarily within the body of another SQL statement. A subquery is executed by enclosing it in a set of parentheses. Subqueries are generally used to return a single row as an atomic value although they may be used to compare values against multiple rows with the IN keyword.

A subquery is a SELECT statement that is nested within another T-SQL statement. A subquery SELECT statement if executed independently of the T-SQL statement, in which it is nested, will return a resultset. This implies that a subquery SELECT statement can stand alone, and it does not depend on the statement in which it is nested. A subquery SELECT statement can return any number of values and can be found in the column list of a SELECT statement, and FROM, GROUP BY, HAVING, and/or ORDER BY clauses of a T-SQL statement. A subquery can also be used as a parameter to a function call. Basically, a subquery can be used anywhere an expression can be used. (Read more here)

What are Different Types of Join?

Cross Join

A cross join that does not have a WHERE clause produces the Cartesian product of the tables involved in the join. The size of a Cartesian product result set is the number of rows in the first table multiplied by the number of rows in the second table. The common example is when company wants to combine each product with a pricing table to analyze each product at each price.

Inner Join

A join that displays only the rows that have a match in both joined tables is known as inner Join. This is the default type of join in the Query and View Designer.

Outer Join

A join that includes rows even if they do not have related rows in the joined table is an Outer Join. You can create three different outer join to specify the unmatched rows to be included:

Left Outer Join: In Left Outer Join, all the rows in the first-named table, i.e. “left” table, which appears leftmost in the JOIN clause, are included. Unmatched rows in the right table do not appear.

Right Outer Join: In Right Outer Join, all the rows in the second-named table, i.e. “right” table, which appears rightmost in the JOIN clause are included. Unmatched rows in the left table are not included.

Full Outer Join: In Full Outer Join, all the rows in all joined tables are included, whether they are matched or not.

Self Join

This is a particular case when one table joins to itself with one or two aliases to avoid confusion. A self join can be of any type, as long as the joined tables are the same. A self join is rather unique in that it involves a relationship with only one table. The common example is when company has a hierarchal reporting structure whereby one member of staff reports to another. Self Join can be Outer Join or Inner Join. (Read more here)

What are Primary Keys and Foreign Keys?

Primary keys are the unique identifiers for each row. They must contain unique values and cannot be null. Due to their importance in relational databases, Primary keys are the most fundamental aspect of all keys and constraints. A table can have only one primary key.

Foreign keys are a method of ensuring data integrity and manifestation of the relationship between tables.

What is User-defined Functions? What are the types of User-defined Functions that can be created?

User-defined Functions allow defining its own T-SQL functions that can accept zero or more parameters and return a single scalar data value or a table data type.

Different Types of User-Defined Functions created are as follows:

Scalar User-defined Function

A scalar user-defined function returns one of the scalar data types. Text, ntext, image and timestamp data types are not supported. These are the type of user-defined functions that most developers are used to in other programming languages.

Inline Table-Value User-defined Function

An Inline table-value user-defined function returns a table data type and is an exceptional alternative to a view as the user-defined function can pass parameters into a T-SQL select command and in essence provide us with a parameterized, non-updateable view of the underlying tables.

Multi-Statement Table-Value User-defined Function

A multi-statement table-value user-defined function returns a table, and it is also an exceptional alternative to a view as the function can support multiple T-SQL statements to build the final result where the view is limited to a single SELECT statement. Also, the ability to pass parameters into a T-SQL select command or a group of them gives us the capability to in essence create a parameterized, non-updateable view of the data in the underlying tables. Within the create function command, you must define the table structure that is being returned. After creating this type of user-defined function, It can be used in the FROM clause of a T-SQL command unlike the behavior encountered while using a stored procedure which can also return record sets.

What is an Identity?

Identity (or AutoNumber) is a column that automatically generates numeric values. A start and increment value can be set, but most DBAs leave these at 1. A GUID column also generates unique keys. Updated based on the comment of Aaron Bertrand. (Blog)

What is DataWarehousing?

Subject-oriented, which means that the data in the database is organized so that all the data elements relating to the same real-world event or object are linked together;
Time-variant, which means that the changes to the data in the database are tracked and recorded so that reports can be produced showing changes over time;
Non-volatile, which means that data in the database is never over-written or deleted, once committed, the data is static, read-only, but retained for future reporting.
Integrated, which means that the database contains data from most or all of an organization’s operational applications, and that this data is made consistent.

What languages BI uses to achieve the goal?

BI uses following languages for achieve the Goal.

MDX – Multidimensional Expressions:

This language is used for retrieving data from SSAS cubes. It looks very similar to T-SQL, but it is very different in the areas of conceptualization and implementation.

DMX – Data Mining Extensions:

This is again used for SSAS, but rather than cubes it is used for data mining structures. This language is more complicated than MDX. Microsoft has provided many wizards in its BI tools, which further reduced number of experts for learning this language, which deals with data mining structures.

XMLA – XML for Analysis:

This is mainly used for SSAS administrative tasks. It is quite commonly used in administration tasks such as backup or restore database, copy and move database, or for learning Meta data information. Again, MS BI tools provide a lot of wizards for the same.
(Read More Here)

What is Standby Servers? Explain Types of Standby Servers.

Standby Server is a type of server that can be brought online in a situation when Primary Server goes offline and application needs continuous (high) availability of the server. There is always a need to set up a mechanism where data and objects from primary server are moved to secondary (standby) server. This mechanism usually involves the process of moving backup from the primary server to the secondary server using T-SQL scripts. Often, database wizards are used to set up this process.

Different types of standby servers are given as follows:

1) Hot Standby:

Hot Standby can be achieved in the SQL Server using SQL Server 2005 Enterprise Edition and the later enterprise versions. SQL Server 2005 has introduced Mirroring of database that can be configured for automatic failover in a disaster situation. In the case of synchronous mirroring, the database is replicated to both the servers simultaneously. This is a little expensive but provides the best high availability. In this case, both primary and standby servers have same data all the time.

2) Warm Standby:

In Warm Standby, automatic failover is not configured. This is usually set up using Log Shipping or asynchronous mirroring. Sometimes warm standby is lagging by a few minutes or seconds, which results into loss of few latest updates when the primary server fails and secondary server needs to come online. Sometimes a warm standby server that is lagging by a few transactions is brought back to the current state by applying the recent transaction log.

3) Cold Standby:

Code Standby servers need to be switched manually, and sometimes all the backups as well as the required OS need to be applied. Cold Standby just physically replaces the previous server.

(Read more here)

What is Dirty Read?

A dirty read occurs when two operations, say, read and write occur together giving the incorrect or unedited data. Suppose, A changed a row but did not committed the changes. B reads the uncommitted data but his view of the data may be wrong so that is Dirty Read.

Why can’t I use Outer Join in an Indexed View?

Rows can logically disappear from an indexed view based on OUTER JOIN when you insert data into a base table. This makes incrementally updating OUTER JOIN views relatively complex to implement, and the performance of the implementation would be slower than for views based on standard (INNER) JOIN.(Read More Here)

What is the Correct Order of the Logical Query Processing Phases?

The correct order of the Logical Query Processing Phases is as follows:

1. FROM
2. ON
3. OUTER
4. WHERE
5. GROUP BY
6. CUBE | ROLLUP
7. HAVING
8. SELECT
9. DISTINCT
10. TOP
11. ORDER BY

Thursday, December 8, 2011

SQL Questions