Introduction to Databases Summary

1.1 Data Management in Main Memory?

Data storage is not persistent in main memory. In many applications data must be permanently stored and survive restarts or system crashes.

With the enormous amount of data available today, main memory often isn't large enough to hold all the data to be managed.

1.2 Data Management in Files?

File names simplify addressing data and the operating system already provides directory hierarchies to support the grouping of files.

However, the File System does not enforce an explicit format or model of the data inside a file. Errors in the read/write process to a file can make the data unusable.

image-20240130094742571

The file system does not support searching for data written inside of files. This functionality would need to be provided by a separate programm.

Other problems:

1.3 Databases!

Requirements on a DBS

Architecture of DBS:

Intensional Layer: DB Schema describes possible content, structure or type of data stored in the DBS. Changes are rarely and expensive because all content needs to be adapted to the new Schema.

image-20240129152419760

Extensional Layer: Extend of the DB, actual content, information on objects. Changes very frequently for example over 10'000 transactions per minute.

image-20240129152434657

Database Languages:

Abstract Architecture of DBS:

image-20240129155227190

Physical and Logical Data Intependence

 

2.1 Information Modeling

Overview:

Main Phases of DB-Design:

Abstraction Concepts:

2.2 Extended/Enhanced Entity Relationship Model

Elements of the ER Model

Step 1: Classification

An entity type E designates a set of attributes. A set of objects of the same type (i.e. having the same attributes) are grouped into an entity set or ext(E) (stands for "extend of the type E")

An attribute is a function which maps an entity to a particular property value. (e.g. maps the Entity "Nico" to the Age value "25")

Identifiers:

Identifiers of an entity type E is a set K of single-valued attributes. For two different objects of type E, their respective attribute from K must always differ. In other words: Identifiers must always be unique for their type. Examples: Immatrikulationsnummer for Students

=> Usually, artifical keys are used as identfiers because they do not need to change over the lifetime of their corresponding entities.

Graphical Notation of Entity Tapes:

image-20240129164313251

Step 2: Generalisation and Specialisation

Add structure to the type system by arranging types into a "type hierarchy": Determine wheter there are entity types whose attributes are subsets of other entity types.

Subtypes "inherit" all attributes from their supertypes.

Generalisation hierarchies must not have any cycles! But they also don't have to be trees.

image-20240129165216305

Conditions for Generalisation and Specialisation

Step 3: Relationship Types

A relationship set of type B also called ext(B) is a subset of the Cartesian product of the involved sets of objects. "ext(E1)" describes the "extend" of the type E1 which is the set of all objects of type E1.

image-20240129171048335

A single binary relation b is defined by an n-tuple

image-20240129171103658

where e1 is the identifier of an object of type E.

image-20240129171604328

Attributes of Relationship Types

Attributes can be added to relationship types just like with entity types.

image-20240129172746350

Higher-Degree Relationships

image-20240129173043543

Keys in Relationship Types

A subset K of the union of all keys of the involved entity types is denoted as (candidate) key of the relationship type B. Two different relationships from ext(B) must always have different keys.

If there exist multiple candidate keys, one is selected to be the identifier of B.

Any relationship type with attributes could also be modeled as an entity type.

image-20240129174113008

CNo and PNo are the identifiers which together describe the relationship type "Order". However, since the identifier for "Order" needs to be unique, a customer can order a product only once!

image-20240129174346229

Step 4: Cardinality Restrictions

Cardinality constraints are used to limit the number of relationships an entity type can be involved with.

1:1 Relationship

The objects have either none or exactly one relationship with each other.

An Employee can manage at most one Departement but each Department has exactly one manager

image-20240129175019701

m:1 Relationship

Each object on the "many" side (employee) is related to at most one object on the "one" side (department). Usually not the other way around.

Every Employee works in exactly one Departement. Each Department has a minimum of two to unlimited number of employees.

image-20240129175311991

m:n Relationship

Either object can have several relations to the other object.

image-20240129175825444

Conversion from n-ary Relationship Types into Entity Types with binary relationship types.

An n-ary relationship type can always be modeled as standalone entity type with only binary relationship types relating to it. A new identifier is needed though.

image-20240129180046755

image-20240129180350047

Recursive Relationship Types

A relationship type can connect to the same entity type for two or more times. To do this, role names for the respective role of the entity type must be introduced.

image-20240129180826697

Weak Entity Type

An entity type is called weak if it is in a strict hierarchical relationship to an "owner" entity type. i.e the entity type is not viable on its own.

image-20240129181249782

3.1 The Relational Model

In Relational databases, data is managed in relations (tables). Relations are comprised of sets of tuples (rows). Each tuple has the same set of attributes (columns).

Each attribute has a domain. In First Normal Form the domain is atomic.

In a relation (table), a single attribute (column) or a set of attributes is marked as primary key. The value of this key attribute (set) has to be unique in the extension of each relation. This is called entity integrity.

Relationships between to relations (tables) are possible by enforcing that the values of attributes A of Relation R coincide with the primary keys of another relation (another table). The Attributes referencing a primary key in another table are called foreign keys.

Domain

A Domain is a set of "atomic" values that can not be further subdivided. It can have finite or infinite cardinality (number of elements in the set).

Example:

But also sets of encapsulated values of abstract data types or Binary Large Objects (BLOBs). An Essential property is that these values can only be accessed as a whole.

Relation, Relational Schema, Attribute, Extension

Given a set of domains {D1, ... , DM}, a relation R consists of

Example:

Customers(CID, Name, City, Balance)

"Customer" is a relational schema with an ordered set of attributes. In this case, Name, City and Balance.

Tuples

An Element t of the extension val(R) is a tuple

 

image-20240129184948901

 

Remarks:

In the context of database languages, other terms are used instead of the mathematical ones:

First Normal Form

The limitation to only atomic domains for attributes is called first normal form (1NF) of the relational model.

image-20240129192012245

image-20240129192027367

Keys

Candidate Key

A set of attributes of some Relation is called candidate key if for any pair of tuples t, if the attribute values of the tuple are equal, then the tuples themselves are equal.

Example from the image above:

The set of Attributes {Name, University} can not be a candidate key for Student because there could be multiple students with the same name at the same university. However as soon as "SID" is in the set, the set would be a candidate key since there are no two different people with the same SID.

Key attribute

An attribute of a relation that is member of at least one candidate key is called key attribtue.

Primary Key

The primary key of a relation is a dedicated, manually chosen candidate key. The primary key might consist of several attributes.

Foreign Key

A set of attributes F is called foreign key if there exists a relation (table) in which F is primary key.

Null Value

An attribute of a tuple (row) can have a so-called "null value". In this case, the value of this attribute is undefined.

Primary Key Constraint
Foreign Key Constraint

3.2 Transformation from EERM => Relational Model

In General, each entity type and each relationshop type is mapped to a relational schema with the same name.

Rule 1: Core Entity Types

Core Entity Types are entity types which are neither a weak entity nor a subtype in type hierarchy

Let E be a core entity type, then E is mapped to relational schema E that contains the primary key K of the entity E and all attributes.

image-20240129194830526

Rule 2: Transformation Rules (Sub and Super Types)

Assume there is the Core Entity Type Person and two sub-types Student and Employee.

2a: Vertical Partitioning

Relation Person contains ext(Person). In other words: The table Person contains all persons, students and Employees. There are different tables for Student and Employee with additional attributes for those entitites.

2b: Horizontal Partitioning

Relation Person contains ext(Person) - ext(Student) - ext(Employee). In other words: The table Person contains only persons which are not Students or Employees. There are different tables for Student and Employee containing the attributes from Person as well as additional attributes for those entities.

Rule 3: Weak Entities

The Weak Entity W has a composite key as primary key consisting of its own primary key A1 and the primary key K from the entity E it depends on.

image-20240129201015302

Rule 4: Relationship Types

Rule 5: (Special Case of hierarchic, binary relationship types)

If there is a 1:m or 1:1 relationship between two entity types E1 and E2, then the primary key of E2 is also primary of the relationship type between E1 and E2.

Example: A Person owns 0 or more bank accounts and every bank account is owned by exactly 1 Person. Then the relations "Bank Account" and "Owns" can be merged together.

image-20240129202928583

Rule 6a: Structured Attributes

If an Entity has structured Attributes, each "sub-attribute" will be listed separately as independed attibute.

image-20240129203206146

Rule 6b: Multi-Valued Attributes

If an Entity has a multi-valued attribute, then each value of the multi-valued attribute is mapped to a schema of that Entity.

Example: University has a multivalued Location attribute with the locations {Basel, Zürich, Bern}

This will be mapped to:

image-20240129203909071

4.1 SQL Overview

SQL - Structured Query Language is the standard language for data definition and data manipulation in relational database systems.

SQL consists of two sub-languages, the DDL for data definition and the DML for data manipulation.

SQL contains:

4.2 SQL Data Definition Language

Simple Data Definition

image-20240129224754618

Examples:

image-20240129225436225

4.3 SQL Data Manipulation Language

Queries

image-20240129230235819

 

When executing a Basic SelectBlock,

Examples:

image-20240129230807516

image-20240129230821693

Complex Search Predicates - BETWEEN AND, LIKE

image-20240129230938562

image-20240129230945983

Test for Null Value / Membership in Set - IS NULL

image-20240129231044970

Nested Select Blocks with "IN" Predicate

image-20240129231156953

Quantified Comparisons - ALL / ANY

image-20240129231309613

 

 

Test, whether a Set is empty or not - (NOT) EXISTS

image-20240129231435707

Aggregate Functions - MAX / MIN / AVG / SUM / COUNT

image-20240129231904441

image-20240129231955590

GROUP BY / HAVING

image-20240129232037115

image-20240129232147169

Data Modification - INSERT

image-20240129232229665

image-20240129232237706

UPDATE

image-20240129232307192

DELETE

image-20240129232337212

Schema Changes - ALTER

image-20240129232422319

5.1 Semantic Data Integrity

Goal

Types of Integrity Constraints

Static Integrity Constraints have to be guaranteed at any point in time. Those can be primary and foreign key constraints. Application-specfic constraints on an attribute of a tuple, a complete tuple, several tuples of a relation or even on several relations.

Examples:

Dynamic Integrity Constraints have to be re-established at the end of a state change. They are dynamic because they depend on current values in the database. (NOT PART OF THIS LECTURE)

 

The evaluation of integrity constraints can take place at the end of single operation (SQL Command) or at the end of a transaction (at the end of a sequence of logically associated operations). Because they can be checked at the end of a transaction, constraints might be violated temporarily during the transaction.

If a violation is detected, the DBMS can either not execute the operation, undo it, abort the transaction or execute a failure handling operation to re-establish consistency.

5.2 Views (Virtual Relations)

Goal:

Guaranteeing data integrity is easier with less derived data. For example "balance" is derived from all orders a customer has done. It is easier to just calculate the balance on demand instead of having it explicitly stored.

Example:

image-20240130101239485

Views can be used for simplyfying queries. If a certain information is often needed, it can be stored in a view which can then be used by other queries. Views can be used to define other view.

When a query uses a view, the viewname is automatically substituted with the view definitions for the participating relations. This is done transitively for views that use other views.

Update on Views:

Insert / Delete operations on tuples in a view are only possible if they can be unambiguously mapped to a tuple of the stored relations.

Example: u is a correct update on a view if there exists exactly one update u' that leads to the same result via the stored relations.

image-20240130102205649

Other Examples:

image-20240130102251343

Check Option for Views

Tuples can be inserted or updated into a view even if the new value does not match the view definition. e.g a View shows all customers from Basel but we can update the location of a customer to Bern.

Now our update is not visible (disappears) from the view.

This can be prevented using the CHECK OPTION which will check updates against the view definition and rejects changes if they do not match.

Views for Data Independence

If the Schema of a DBS changes, a view can provide the "old" schema to applications that depend on it.

image-20240130103159834

5.3 Static Integrity Constraints: CHECK Constraints

Integrity constraints can be specified as part of the CREATE TABLE statement. We can chosse between

We can use the same predicates that we used for the "WHERE" clause for our CHECK constraints.

image-20240130103821986

image-20240130103900048

5.4 Static Integrity Constraints: Referential Integrity

If a relation references another one via foreign keys, we can specify what should happen if the referenced value is deleted.

image-20240130104358087

image-20240130104808240

image-20240130104837200

 

Deferrability

image-20240130105003476

5.5 Data Privacy and Access Control

SQL supports access control via the GRANT, ON, TO command to control access rights on specific objects in the DBS.

image-20240130105353153

image-20240130105433297

image-20240130105645056

Propagation / Revocation of Access Rights

image-20240130105716650

6.1 NoSQL: Overview

There are alternatives to relational database systems:

6.2 Key-value Stores

image-20240130110339635

Key-value stores offer get(key), put(key, value) and delete(key) operations to return, insert/change or delete a key-value pair.

Applications:

6.3 Column Stores

image-20240130110816807

Applications:

OLAP and OLTP

OALP (Online Analytical Processing) is designed for complex querying and analysis of data.

OLTP (Online Transaction Processing) is designed to handle a large number of short online transactions (inserts, updates, and deletes).

6.4 Document Databases

Each Document is associated with a unique key but in contrast to a key-value store, the value (document) has an inherent structure which is specified either via JSON or XML

JSON:

Supports Objects and Arrays

Example:

image-20240130112043103

XML:

Consists of elements which are delimited by tags which can be nested but must not overlap. Tags may contain attributes.

Example:

image-20240130112208663

Applications:

6.5 Graph Databases

Graph databases are well suited data structures to manage networked information. Graph databases can provide sophisticated queries like "Friends of Friends" or give answers to if two elements are "connected" and what the "shortest path" between them is.

Limitations of teh Relational Model

 

 

RECAP: QUESTIONS:

What are the reasons for using databases for storing data? (and not relying on main memory and/or the file system)

There is an imense amount of data which has to be stored and accessed somehow. Databases help to structure, store and manage this data.

Main memory is not an option because data is lost when the system restarts, crashes or shuts down. Also, Main memory is not large enough most of the time.

Data in file systems may survive restarts, but the file system does not enforce any explicit data format. When everyone just stores his data in "his" file, we end up with many files containing the same data but they might not be equal. => inconsistent, redundant

What is the difference between the intensional and the extensional layer of a database?

The intensional layer describes the structure of the data and the structure of relations. It describes the schema of the database. Changes are expensive and happen therefore only rarely. (all the data needs to be adapted to the new schema)

The extensional layer contains the data itself. It changes very frequently with every transaction which can happen thousands of times per minute.

What is the difference between a Data Definition Language (DDL) and a Data Manipulation Language (DML)

Data Definition Languages describes how data is structured / defined. It sets the schema and constraints. Example: CREATE TABLE

Data Manipulation Languages are used to work with the data in the database. Example: SELECT, INSERT

Why is logical and physical data independence needed?

When the physical layer of a database is changed, for example when a index-search-tree is added for a more efficient search, the logical layers above including the applications should not be affected.

Also when the logical layer is changed, for example when an attribute is added to an Entity in the database, this new attribute should not interfere with exisiting views of the database and be hidden by introducing an appropriate view. (However this is not always the case)

Which level of information modeling do ER models address?

The model the conceptual level of a database which allows structuring the application domain (the mini-world we try to model)

What are the key building blocks of the ER model?

Entity types (objects)

Attribute types (properties of the entities)

Relationship types (relations between entities)

The difficulty is to determine appropriate entity types, attribute types and relationship types.

In which case are role names required in denoting relationship types?

In the case a relationship type is connected to the Entity itself. (Recursion)

For example if a Component is part of another Component, then we need role names to differenciate if a Component is made of other Components or if the Component is a subcomponent of another Component. -- (that's a lot of components...)

How is a relationship identified (or: what should never be added to a relationship type)?

A relationship describes a relation between two or more entities. A relationship type should never have attributes that contain data which is not needed to describe to relation. Also duplicate data which can be derived from another entity should not be part of the relationship type.

What are common cardinalities for relationship types?

1:1 (one to one) - Each Entity relates to either 0 or 1 other Entity.

m:1 (many to one) - Each Entity on the "many" side relates to at most one entity on the "one" side.

m:n (many to many) - Each Entity can relate to several other entities

What are limitations of the ER model?

ER Models get very large and very complex pretty fast...

Which are the two elements of a relation?

A relation consists of its relational schema (structure) and its extension (data)

What does first normal form of the relational model mean?

In first normal form, all attribute values of a tuple in a relation need to be atomic.

What are the key inherent integrity constraints of the relational model?

Entity integrity and refential integrity are the most important integrity constraints and called key inherent integrity constraints.

Entity integrity makes sure that the attributes of a primary key are not null while refential integrity enforces a foreign key to either fully exist (all attributes of the foreign key are not null) or that the foreign key is null.

Name the differences between a candidate key and a primary key.

Candidate keys are sets of attributes of a relation which can uniquely identify each row in a table. There can be multiple (combinations of) attributes that act as candidate key and they can also contain null values.

The primary key is a (manually) chosen candidate key used to identify each row in a table and also to be used as foreign key in other tables. The primary key must be unique, not null but can be a composition of multiple attributes or candidate keys. Usually the primary key is some arbitrary number which does not need to change over the lifetime of the entity.

Which alternatives exist to map type hierarchies in the EERM to the relational model?

Vertical Partitioning: All objects are available as supertype. The primary key of subtypes is also the foreign key of supertypes. Subtypes only contain additional information for the specific subtype.

Horizontal Partitioning: Each Subtype directly contains all the information of the supertype without linking to the supertype (no foreign key).

What determines the key of relationship types when mapped to a relational schema?

The key of a relationship type when mapped to a relational schema is mainly determined by the cardinality restriction of the related entities.

If the cardinality restriction on all edges is (1,*) or (0,*), then the primary key is either a composition of all primary keys of the related entities or a multi-element-subset of it.

If there is an edge with the cardinality restriction (0,1) or (1,1) then the primary key of that related entity is also the candidate key for the relationship type.

In which case does a relationship type in EERM not have to be transformed into a dedicated relation?

If there exists a hierarchic binary relation. If every room belongs to a building but there is no room without a building, then the primary key of the building can just be a foreign key of the room. No extra relation type needed.

Which sub-languages are contained in SQL?
Which usages are possible with SQL?
What are the differences between the relational model and SQL?
How are natural joins expressed in SQL?

SELECT * FROM Customers NATURAL JOIN Orders

or

SELECT * FROM Customers, Orders WHERE Customers.CID = Orders.CID

What are aggregate functions in SQL?

Aggregate function map a multiset of values to a single value. (MAX, MIN, AVG, SUM, COUNT) all return a single value for the set they are used on.

What does grouping and filtering in SQL do?

Grouping allows us to partition a set of tuples according to the values of an attribute.

Filtering then allows us to filter for specific groups.

What is a correlated subquery - and what makes the correlation?

Multiple Queries can be nested with the "IN" predicate together with the WHERE clause.

Select ... From ... WHERE x IN (Select ... From ... WHERE y in (Select ... From ... ))

If some of the attributes of an inner query correlate to statements of the outer query, it is called a correlated subquery.

Give an example for a query that cannot be expressed in SQL?

However: Such Queries that can not be computed by SQL can be computed by turing compete programming languages. They would need to first get the needed data from the database to do the calculation and then return it separately.

What happens with existing tuples when the schema of a relation is altered?

The new attribute will be set to null for existing tuples. However a second query could calculate and fill the attribute for the existing tuples.

What is the difference between static and dynamic integrity constraints?

Static constraints have to be guaranteed at any point in time. They are inherent to the data model. For example primary key constraints or constraints on a tuple or relation.

Dynamic constraints have to be re-established at the end of a state change. For example some value of an attribute must not be reduced.

When can column constraints be used, when table constraints?

Column and table constraints can be used in the CREATE Table command.

Column constraints constrain the values of a single attribute (column) while table constraints constrain the values of multiple attributes or the entire table.

How can the point in time when foreign key constraints are checked be specified, which options do exist?

Referential Integrity Constraints or foreign key constraints can be checked with the DEFERRABLE and INITIALLY DEFERRED / IMMEDIATE keywords in the constraint block of a command.

We can check the constraints after every statement with DEFERRABLE and INITIALLY IMEDIATE or we can check after an entire transaction when changes are about to be committed with DEFERRABLE and INTIALLY DEFERRED.

Which actions can be executed when referential integrity is violated?

NO ACTION - The query gets rejected, nothing happens

CASCADE - Deletions / Updates are cascaded to every object that refer to the deleted object.

SET NULL or SET DEFAULT, set the values of the foreign key in other objects that refer to the deleted object to NULL or the default value specified.

Name three different reasons why views are important concepts on databases?
When can data be updated via a view?

Data can be updated directly via a view if and only if there exists exactly one operation that results in the same outcome on the explicitly stored relations. Usually, data in views that is derived from multiple attributes can not be changed because its unclear what exactly changed. (e.g "Balance" can not be changed since it is derived from all transactions and its unclear which transaction led to the change in the balance)

How can access rights be specified in a database system?

Access rights can be given with the GRANT command. Access rights are managed with users. Each user from a database can have different rights.

Example: GRANT SELECT ON SomeTable TO Bob

How can data in a key/value store be accessed?

A key value pair can be accessed with the get(key) command. Other commands are put(key, value) and delete(key) to update/insert or delete a pair.

How is the schema of a document database defined?

Document database schemas are usually defined with JSON or XML.

What is the difference between column stores and row stores?

In a row store the data of the entire row is stored together. This is useful in applications where all or most attributes are of interest at a time. Usually row-store is used for OLTP where an entire row is added or read many times.

In a column store, the data of the entire column is stored together. In contrast to row stores it is now easy to access one attribute of all rows instead of all attributes of one row. Usually column stores are used for OLAP where single or a few attributes are compared for all entries.

What is the difference between OLAP and OLTP?

OLAP, Online Analytical Processing is mainly used in business intelligence and focuses on the comparison of single attributes for all entries. Queries usually consists of complex aggregate functions with SUM, AVG or COUNT. It is therefore also updated in batch mode in a specified time interval and not real-time.

OLTP, Online Transaction Processing is used for fast and efficient transactions, i.e entire rows (entries) are added very frequently. The focus lies in performance of write-intensive operations. Provides a high level of data integrity due to real-time updates.

What does the term "impedance mismatch" mean?

Programming languages store data differently than most DBMS. For example OOPL store data as objects with functions and attributes while relational databases store data in tables with rows and columns.

Also OOPL have concepts like Polymorphism and Inheritance which are difficult to represent in relational models.

Also, Relational Query Languages are often set-oriented while programming languages process one task at a time.

Where and how is data stored in a graph database?

Graph Databases store data in Nodes and Edges (which connect nodes). Nodes and Edges are also labeled to define their type. (e.g Node => likes => Node or Node => dislikes => Node)

What is the difference between a tuple table and an object table?

Attributes of a tuple table can be tuples, collections, tables, embedded objects etc...

Attributes of an object table are user defined types. Just like user defined objects (classes) in OO Langauges.

What is a subtable?

In Object-Relational Databases, one attribute a Table can reference an entire Table (instead of just one tuple of another table like with foreign keys)

This referenced table is then called a sub-table

How can different invocations of recusive SQL be linked to each other?

image-20240130125138605