Category Archives: DBMS

Choosing the right database

Choosing the right database is never easy. I have already discussed types of NoSQL databases and choosing between NoSQL and SQL.

I will try to cover some common use cases here

Use CaseChoice
Temporary fast access as Key-ValueRedis Cache
Data to be stored in a time-series fashionOpenTSDB
Object/ File dataBlob Data
Text SearchElastic Search
Structured Data, with relations between objects, need transactional properties, ACID complianceRDBMS
Semi-Structured Data, XML/ JSON document but the structure is not fixed, Flexible queriesDocument Based- MongoDB
Data increases with time, and a limited set of queriesColumnar database- Cassandra
Graph relation between objectsGraphDB- Neo4J
Database choice

Some useful resources from the Internet.

Image
https://storage.googleapis.com/gweb-cloudblog-publish/images/Which-Database_v07-10-21_1.max-2000x2000.jpeg
https://cloud.google.com/blog/topics/developers-practitioners/your-google-cloud-database-options-explained
https://aws.amazon.com/startups/start-building/how-to-choose-a-database/

SQL vs NoSQL

An old video but still relevant

Key Take aways

  • Complex SQL queries- Joins : SQL
  • Transaction management / ACID Properties: SQL (commit and rollback are by default whereas you need to tackle them in NSQL)
  • Huge quantity of data/ fast scalability – NOSQL
  • Write Heavy – Logging system – NoSQL
  • Read Heavy- Queries/ indexes – SQL
  • Fixed Schema- SQL, Flexible schema-NOSQL (Alter table statements are costly and have restrictions)
  • JPA/ hibernate/ django- by default support SQL
  • Archiving and managing huge data- NoSQL

Checking postgres server logs

At times you might want to see what all queries had hit your database, specially while testing and analyzing a solution. Similar need occurred recently in a project using postgres database. One can simply set log_statement flag to ‘all’ in postgresql.conf

postgres@server:/etc/postgresql/9.5/main$ vi postgresql.conf

Look for

log_statement = ‘all’ # none, ddl, mod, all

You can also set log_destination in the same file. By default logs will be at

postgres@server:/var/log/postgresql$ ls

And finally restart postgres server

/etc/init.d/postgresql restart

Please note enable logs on production server is not recommended in normal scenarios as they consume a lot of space.

Types of NoSQL Databases

NoSQL databases can be divided into 4 major type

Key-Value: The simplest one, can be thought of as a hashmap. Data can grow exponentially without impacting performance much, as long as your keys are unique.

Example: Redis, Riak

Document Based: This is kind of an extension to key-value format, by providing a proper format to the value/ document being saved. Meta-Data is provided to make sure documents are tagged and searchable.

Example: MongoDB, CouchDB

Column Based: In contrast to Row Based storage of normal RDBMS, a column based storage, keeps data stored columnwise. This gives an advantage for searching data based on columns easily and at the same time lets your data grow upto large levels by supporting distribution of data.

Good description of Column-based storage: https://en.wikipedia.org/wiki/Column-oriented_DBMS

Example: Cassandra, Vertica

Graph Based: This kind of database is ideal for data that is connected to each other in some logical way. Or in simple words, if you can represent your data in form of graph. One good example is A is friend of B, B is friend of C, so we can recommend A to be friends with C.

Example: Neo4J, OrientDB

Additional reads

https://www.3pillarglobal.com/insights/exploring-the-different-types-of-nosql-databases

http://opensourceforu.com/2017/05/different-types-nosql-databases/

http://www.jamesserra.com/archive/2015/04/types-of-nosql-databases/

https://en.wikipedia.org/wiki/NoSQL

CAP Theorem (you can choose 2 of the three- Consistency, Availability and Partition tolerance) Based analysis of NoSQL databases

http://blog.nahurst.com/visual-guide-to-nosql-systems

Generating ER diagram from database -2

Sometime back I wrote about DBvisualizer to generate schema ER design from database.

Here is another way by using schemaspy.

http://schemaspy.sourceforge.net/

This is a simple java based tool/ jar file. As per example given in link above, all you need to run the jar file providing database access details.

java -jar schemaSpy.jar -t dbType -db dbName [-s schema] -u user [-p password] -o outputDir 

You might want to give database drivers jar file path. For example, for Postgres

java -jar /home/kamal/pathto/schemaSpy_5.0.0.jar -t pgsql -db dbnamehere -s public -u dhusername -p dbpassword -host localhost -port 5432  -o /home/kamal/outputdir -dp /home/kamal/pathto/postgresql-9.3-1104.jdbc4.jar

Data Modeling at different levels

When you are designing database for an application, there can be 3 core levels at which you can design your database.

1. Conceptual Level: At this level you are only aware of high level entities and their relationships. For example you know that you have “Employee” Entity who “works for” a “Department” and “has” an “Address”. You are not worried about details.

2. Logical Level: You try to add as much details as possible, without worrying about how it will actually be converted to a physical database structure. So will provide any attributes for “Employee” i.e. Id, FirstName, LastName, AddressId, Salary and define primary and foreign key relations.

3. Physical Level: This is the actual representation of your database design with exact column names, types etc.

database

More info- http://www.1keydata.com/datawarehousing/data-modeling-levels.html

Shared Nothing vs Shared Everything

In database cluster implementation we can have multiple ways to make sure how different nodes will communicate with each other.

Shared nothing approach: None of the nodes will use others memory or storage. This is best suited for the solutions where inter node communication is not required, i.e. a node can come up with a solution on its own.

Shared Memory: In this approach memory is shared, i.e. each node/ processor is working with same memory. This is used when we need nodes to share solutions/ calculations done by other nodes and are available in memory.

Shared Everything: In this approach nodes share memory plus storage. This makes sense when nodes are working on problem where calculations and data created/ used by node is dependent on others.

Further Reads:
https://www.quora.com/What-are-the-differences-between-shared-nothing-shared-memory-and-shared-storage-architectures-in-the-context-of-scalable-computing-analytics

https://en.wikipedia.org/wiki/Shared_nothing_architecture

Kill blocking session in Oracle

To check which all session are live on an oracle server

select * from v$session

To check blocking sessions

SELECT
O.OBJECT_NAME,
S.SID,
S.SERIAL#,
P.SPID,
S.PROGRAM,
SQ.SQL_FULLTEXT,
S.LOGON_TIME
FROM
V$LOCKED_OBJECT L,
DBA_OBJECTS O,
V$SESSION S,
V$PROCESS P,
V$SQL SQ
WHERE
L.OBJECT_ID = O.OBJECT_ID
AND L.SESSION_ID = S.SID
AND S.PADDR = P.ADDR
AND S.SQL_ADDRESS = SQ.ADDRESS;

To kill a session

alter system kill session ‘SID,SERIAL#’;