Basics of Database
- Union: $A\cup B$
- Intersection: $A\cap B$
- $A - B$
- Cartesian Product: $A \times B$
- Union in database: will combine the data with matching common columns.
NaturalJoin and EquiJoin:
--EquiJoin where we specify what condition is used to join SELECT * FROM table1 JOIN table2 ON (table1.id = table2.id) --NaturalJoin where conditions are chosen by the database automatically SELECT * FROM table1 NATURAL JOIN table2
- Procedural language: derived from structural programming, focuses on breaking task into variables, data structures, and subroutines.
- B-tree: nodes has subnodes; Read wikipedia for more.
- Index architectures:
- Clustered indices: sort data based on the key column, thus no row locator is needed.
- Non-clustered indices: store both column and row locator so that sorting is not needed.
- “If the table has a clustered index, or the index is on an indexed view, the row locator is the clustered index key for the row.” from HackerRank.
- Leaf vs Non-leaf:
- Leaf level pages (nodes): the end of the search through B-tree;
- Non-leaf pages (nodes): root and intermediate
- Fill factor: the leaf level pages are not always filled with data. You can specify some pages to be reserved for future growth of data. The value is from 1 to 100 percent. Server side default is usually 0 which means that all leaf pages are filled.
CREATE INDEX index_name ON table_name(column1, column2);
OLAP Explained: OLAP Cubes explained the operations such as SLICE (slice by one dimensions so that we get a N-1 subset of the data), DICE (slice more than two dimensions), DRILL DOWN/UP (navigate among levels of data), ROLL-UP (aggrete, consolidate, involves all the data relations for one or more dimensions), PIVOT (rotation).
- OLAP: Online analytical processing, with a core of OLAP cube (multidimensional cube, hypercube).
- Aggregations: speed up time.
Both star schema and snowflake schema are the souce of the cube metadata for OLAP.
About how those operations works, please find the quiz at OLAP Operation Types @ HackerRank
An example is
--Aggregation number of rows n1*n2*n3, --Rollup number of rows n1*n2*n3+n1+n1*n2+1 --Cube number of rows n1*n2*n3+n1*n2+n2*n3+n1*n3+n1+n2+n3+1
- ∀ = forall : https://en.wikipedia.org/wiki/Turned_A
= such that : https://en.wikipedia.org/wiki/Set-builder_notation
- ∧ = logical conjunction (and) : https://en.wikipedia.org/wiki/List_of_logic_symbols
- → = is a function
Stolen from HackerRank: https://www.hackerrank.com/challenges/databases-relational-calculus/forum/comments/325506
- Primary key: unique values for each row of data; can not contain null values.
Dependency means a column that determines others.
Partial dependency means that we have some column that depends on only some of the columns but has nothing to do with some other columns. In this case we can see some kind of redundency. Hence in 2NF we remove partial dependency.
Transitive dependency is that if x determines y, y determines z, then x determines z, which is the transitive dependency.
- NF: Normal Form,e.g., 1NF Rules, 2NF Rules, 3NF Rules. Refer to this article. BCNF (Boyce-Codd normal form) please refer to wiki. 4NF is also explained well in wiki.
1NF = can not be broken down into more elementary tables : Single value on each field; no repeating groups (for example table with multiple columns of products which are in principle the same is NOT 1NF) 2NF = 1NF + no partial dependency: non-key columns depends on primary keys. 3NF = 2NF + Transitive functional dependency of non-prime attribute on any super key should be removed. BCNF = A 3NF table that does not have multiple overlapping candidate keys is guaranteed to be in BCNF
- Superkey: set of attributes within a table whose values can be used to uniquely identify a tuple
- Candidate key: column(s) to identify unique records
- Nonkey Dependency: simply as the name indicates