Skip to content


Books, books, books …

I find reading books on software development quite interesting. Not that they are my main source of information on programming. Reading tutorials, blog posts, documentation and playing out with a tool, framework or language is a great way to expand skills. However, I see reading books as fundamental way to advance knowledge of software development in the long run. It helps me to better understand underlying principles of programming and provides me with different perspectives on how to approach problem solving. Hence, I was enjoying books since the beginning of my professional career. I got surprised when I started compiling below list, that it is already 35 books which I have read in the last 6 years. Some of them I regret I wasted my time on, but some of them, I would read again. For some of them, I thought they were amazing until I changed my opinion later and started considering them just OK.

Bellow, I list each of the books together with short description on what it is about, what are the main lessons learned and some critique if I didn’t like something.

Code Craft: The Practice of Writing Excellent Code – Pete Goodliffe

This was among the first books I have read and I can heartily recommend it to people who are just starting their career in industry. It covers main topics important for a success in a software engineering career. Both technical as well as non-technical aspects are explained.

You will learn some important lessons on naming, self-documenting code, proper way to comment code, exception handling, defensive programming, tools, testing, building software, optimization, debugging, security and so on.

Furthermore, it explains, software design, software architecture and the way how software becomes complex over time and how it organically grows. For example if says, that even though smaller components could be improved to some extent, it is not practically possible to change the fundamental architectural decisions in a mature software project. Newcomers often overestimate what is possible and think that complex (outdated) legacy application could be made green quickly. However, once you learn this is not the case, you start playing less Don Quixote role in projects and start aiming for practically achievable goals which could still be quite significant. Also, once you see how some architectural decision hurt in a long run you may be better positioned in future green field projects when making new important decisions.

Other things like source code management, requirements, code reviews and methodologies which are inseparable from software development field are given proper attention.

In addition to technical  aspects, book explains the social environment of project teams and explains different types of teams and programmers. Interestingly, you’ll see many portrayed personas around you. You will also learn that your social connections and soft skills will have stronger impact on your career, which comes as a bit of surprise to young programming souls. Get used to it soon.

Design Patterns: Elements of Reusable Object Oriented Software – Erich Gamma, Richart Helm, Ralph Johnson, John Vlissides

This is a bible for almost any Object-Oriented shop. It is (at least was) thought at university and is absolutely necessary for many technical interviews. If you don’t know (at least some of) these patters, some people will consider you almost as illiterate in programming. Half of one of my courses at university was covering solely these patterns. Many of them are just a common sense of sound design choices with a name attached to it. Even though some patterns are applicable outside of OO, most of them aren’t (or at least they would feel a bit artificial in other context). Main selling point is that these patterns aid communication between programmers and make the software easier to understand.

I buy this to some extent and I use some of these patterns myself. However, I don’t think that Object-Oriented design is the best way to organize the code and I feel that pure OO code is often over engineered by pretending that everything is an object (see Kingdom of Nouns) and the main goal of encapsulation, which accounts for a lot of indirection and complexity in code, seems to be unattainable at larger scale. I still find OO programming applicable in certain situations, especially if you  think of something as an object in real world it makes sense to model it as an object in the code. I just don’t like blindly following OO principles and applying them to everything in the name of good practices and then ending up with a big ball of object mud. I experienced what OO application with 10000 classes looks like and I can comment that higher level organization in code is needed  to control dependencies. If it is left only to patterns, you’ll get, for bigger projects, inevitably big ball of mud.

Refactoring Improving Design of Existing Code – Martin Fowler, Kent Back

Another classic book related to Object Oriented programming. It lists countless number of techniques how to improve design of your Object-Oriented code. In my opinion, most of these techniques perfectly make sense and improve code readability at smaller scale. Book also mentions Big Refactoring but it is not too big, as it focuses on untangling inheritance or turning  procedural code to object oriented  (like procedural code is a bad smell). It still talks about the scope of several classes. It misses to address the real Big Refactoring and to cover what happens with dependencies once number of classes goes above 1000. This is where you start feeling the pain of Object-Oriented Design (OOD). Of course, in cases where OOD makes sense, most of the refactoring techniques listed, make sense too and help reduce complexity in small and improve cohesion of classes. It is also fair to mention that this book was written 1999 at the time Object-Oriented paradigm exploded in popularity and only many years later, some of the problems with it started becoming more apparent.

Patterns of Enterprise Application Architecture – Martin Fowler

This is a great book and it covers very important architectural patterns commonly found in enterprise applications. Three-layer architecture is well explained. This book is where First Law of Distributed Object Design comes from. It clearly demystifies the magic power of RPC calls and states importance of executing something locally instead of paying high network cost. It offers several strategies for addressing Object-relational impedance mismatch (Active Record, Data Mapper etc.), concurrency (optimistic and pessimistic concurrency control, transactions), session state and where to keep it (client, server, memory, disk, db), domain logic patterns (transaction script, domain model, service layer, etc.), some web presentation patterns (MVC; unfortunately reactive patterns were not popular at that time) and many more.

I think, this is really valuable book and I would recommend any software developer working in enterprise domain to read it. It discusses important stuff.

Building Microservices: Designing Fine-Grained Systems – Sam Newman

After a long domination of SOA, services are becoming smaller (to a micro level) and more nimble. Even though I don’t approach these buzzword-driven books with much passion, this one surprised me in positive way. It is also nice to see a buzzword which promotes distributed system design.

Microservices is what you get if you break monolith into pieces. Monolith, meaning all happens as a part of one single process (think of operating system process with it’s own address space). The way to scale it out is normally to start multiple instances and put a load balancer in front of them. This service/process has a sole ownership of data in database and integrates with other applications through web services or exchanging some messages. This was the SOA world. Now, under Miroservices movement, it is advised to split this monolith further down into smaller pieces. This advice didn’t look really novel to me as application I am working on is already split in many smaller processes and bigger work is carried out through various interactions between these processes. However, it seems that monolithic applications are still dominant in industry (maybe for a good reason).

Book covers a lot of common sense which you may have seen before like: do one thing and do it well, high cohesion, low coupling, centralized logging and monitoring, conformity across services, standardizing on small number of integration technologies between services and so on. What I liked was focus on aligning services to boundary context (concept from Domain-Driven Design) and avoiding technical services but rather going for functional services (covering one coherent business sub-domain). Of course, prerequisite for this is to know the business domain well and understand where the boundaries for bounded context are.

Comparison between Orchestration and Choreography and the strength of Choreography for more complex systems was very eye-opening. Backends for Frontends and Data Pump were another interesting patterns which I haven’t seen mentioned under these names before.

Continuous Delivery: Reliable Software Releases through Build, Test and Deployment Automation – Jez Humble and David Farley

This book covers idea of completely automating release and Deployment processes. Developers are committing code to repository and each commit triggers creation of test environment with this last commit and running all unit tests. Deploying to staging environments is no more than one click away and even deploy to production environment is that simple. Code should be very easy to deliver and it should be delivered continuously (or as often as possible).

There is a lot of good advice on this automation and I have really seen this work well in practice. Jenkins, as tool for this matter, seems to be most popular in enterprise world and I also had some touch with it. However, ideas from this book, are not tool dependent and it is possible to get quite far by simply scripting this automation yourself.

Domain-Driven Design: Tackling Complexity in the hearth of Software – Eric Evans

Timeless masterpiece! Especially the core ideas like Ubiquitous Language and Bounded Context really open the eyes on the essentials of business domain and how it should be organized. After reading it, I understood that the promise of one Grand Design (one big data model covering the whole enterprise) cannot hold and that big system has to be split into smaller cohesive Bounded Contexts. Then, those bounded contexts can be organized into bigger pictures while still guarantying their internal integrity. It is interesting to notice that today still many people believe in Grand Design but I myself have never seen one really becomes the reality and not that I haven’t seen the tries. The only small objection I have on the book is that it focuses, at times, on implementation choices (Repository pattern) and Object-oriented design even though the concepts are much more general and applicable using any implementation technique.

If you work in Enterprise software development and still believe in Grand Design I would like to strongly recommend you this book.

Working Effectively with Legacy Code – Michael Feathers

This book seems quite popular in some OO circles, but to me it came as a disappointment. It provides a long list of refactoring recipes for OO application and thus overlaps to some extent with other book (Refactoring Improving Design of Existing Code). According to this book, tests are everything. If you can’t build big test suites covering each little corner of your Object Oriented application then you are in very bad shape and will have hard time to maintain the legacy monster. When it comes to testing, my philosophy is a little different and it shouldn’t be that much about the coverage but first and foremost about understanding the nature of application and main data manipulation points and then constructing test cases which really test the critical points. I think that tests shouldn’t be about quantity but about quality.

I believe that this book (together with the next one) is the main reason why many companies implemented test coverage as enforceable metric which developers have to deliver. The fact that it is easy to measure code coverage made it vary appealing to enforce it. I have seen the code coverage bar to be set as high as 90 percent. As a result of it, developers are forced to write and maintain unit tests even in cases where they provide almost no value. I don’t see this as a fruitful strategy and if you can’t motivate your team to come up with a good testing strategy which fits the context and provides the most value per test case, you are not going to get any better with this mechanical approach either.

Clean Code: A Handbook of Agile Software Craftsmanship – Robert Martin

I have seen fellows in a past following these practices as a religion. While I have no bad opinion of this book and find much advice in it very sound, I tend not to follow it before assessing myself what is important in this context. If followed as a religion, I am not quite sure you’ll end up with a simple and sound design. Author has a very strong opinion on the ideas he is suggesting that at times they sound like a Silver Bullet. In my opinion, complexity is mainly in the state (and state transformation) and not in the code itself and this book’s focus is mostly on OO design, tests, rules on how long function should be etc. I don’t disagree with it, but I believe priorities should be differently structured (algorithms, data structures, data transformation, functional programing where suitable, etc.) if our aim is simplicity of design. I believe more in design driven development than in test driven development. For example even if you would choose wrong data structures and hence make processing much more complex and slower, you could still do quite well based on criteria for clean code from this book if you make your classes SOLID, your methods short and your tests covering most of execution paths.

Enterprise Integration Patterns – Designing, Building and Deploying Messaging Solutions – Gregor Hohpe and Bobby Woolf

I just skimmed though parts of this book, but I went through each pattern it covers. Also, all patterns are well covered on book’s website (great resource). It lists all relevant patterns related to massage based integration. Many of these patterns can be easily applied with Enterprise Service Bus products. I’m not big fan of transformations happening in ESB but I rather prefer dump pipes and smart endpoints. Of course these patterns could be applied in smart endpoints too.

What I found so valuable with this book is it’s focus on data and data transformation as opposed to complex code constructs. You can compose various patterns in data pipeline and end up with very easy to understand solution. Despite software being hard to visualize, this is not the case for data pipelines. Just look at nice pictures depicting each pattern, it is often enough to look at them to understand what is going on (textual description could  be omitted). On the other hand, not everything is data pipeline (with a series of transformations and routing). However, what is data pipeline should be structured based on patterns from this book. Data transformation is the essence in the data pipelines and any further wrappers or indirection in implementation I see as accidental complexity.

Beginning Unix – Paul Love and Joe Merlino

Since I didn’t use Linux at university or at home and during my time at Microsoft, Windows was a god, I didn’t have a chance to learn Linux before I joined HP where I joined a team which was (and is) using only Linux. Today, I am proud user of Linux and find it very enjoyable. Interestingly, during studies I built my own operating system as a part of course assignment but still this knowledge of internal operating systems concepts is not very useful (even though they help better understand underlying principles) when working on another OS. You simply have to learn your way around the new system.

This book is  very decent introductory book to Linux operating system. All fundamentals are there: filesystem, user management, process management, scheduling, main commands, bash programming etc. If you are new to Linux, this is great resource to get the basic concepts right. Good foundation helps future learning too.

Linux System Administration: Solve Real-life Linux Problems Quickly – Tom Adelstein and Bill Lubanovic

My second book on Unix/Linux was covering a bit more advanced topics such as setting up a Linux server, installing and configuring various services, commonly found on Linux, such as Apache, DNS server (bind), mail server (postfix), load balancer, data backup etc. Well, after reading this one, coupled with real experiments on development machine, I started feeling more like Linuxer.

Advanced Programming in the Unix Environment – Richard Stevens and Stephen Rago

This is the real Unix/Linux bible written by great authors. A classic. After reading this book in detail, I started understanding Linux and and the POSIX interface quite well. This book covers in great details (almost 900 pages) all fundamental POSIX APIs. There is file (and directory) handling, standard I/O library, process control (fork, exit, exec, system, etc.), process relationships, Interprocess Communication (semaphores, shared memory, pipes, unix sockets, TCP/IP), terminal I/O and so on.

Well, after completing this one, I could have (and often win) an argument with much older and experienced Linux warriors. It’s not that this book is very practical, but it does so good job in explaining fundamentals of Unix based systems. If you want to be great Linux user, I would highly recommend to take the time and read this book.

Unix Network Programming: The Sockets Networking API – Richard Stevens and Bill Fenner

Another classic, again by great authors. I have read no more than one third of this huge book (almost 1000 pages) but even that much equipped me in understanding TCP sockets and Linux networking in general very well. Having shown good interest during studies for networks (and even considering career in networking) helped me learning this quicker. In addition to understanding, this book also helped me master tools like netstat, tcpdump and wireshark. Often, I use these in troubleshooting network related problems in my applications when they arise.

Seven Databases in Seven Weeks: A Guide to Modern Databases and the NoSQL Movement – Eric Redmond and Jim Wilson

I think I did  Five Databases in Four Week: PostgreSQL, Riak, MongoDB, Neo4J and Redis. This was quite nice adventure in NoSQL land. This book is really practical and is organized as tutorial where you go step by step and set up a database, create schema (if necessary), do some simple data manipulation and eventually learn some advanced feature of given store like indexing, sharing, full-text search, replication etc.

I think it is important to understand what is out there in database world apart from Relational Databases. I was lucky enough to work with several of these: relational DB, key-value, document-oriented and column-oriented. It is also important to understand what are the use cases where each one shines and use it for that purpose. If all you know is relational DB you may be temped to implement even a queue abstraction on top of it, even though queue abstraction could be very problematic for relational DB due to it’s dynamic nature (and relation DB is more suited for less dynamic records data).

Learning Apache Cassandra – Mat Brown

This is a decent introductory book on learning fundamentals of Apache Cassandra and building correct mental model of it. It is quite practical and you can play with Cassandra in development machine as you read to get more hands-on experience. This was my first encounter with Column-store kind of database. This is very different from columnar databases (like Vertica) in the way it organizes data on disk. Cassandra combines ideas from Dynamo (Riak being open source variant) and Big Table (HBase being open source variant) into one database. It think that database with this kind of design choice makes much sense in situations where high write throughput is needed and where database must be highly available. It doesn’t surprise me that many big companies rely on it. Learning these fundamentals was quite fun and I would enjoy working with the real production instance some time in future.

Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable and Maintainable System – Martin Kleppmann

I can’t say enough good things about this book. It is that good. And by the way it is available only as early release. It takes such a complex topic of distributed database systems and decomposes it into main building blocks and then explains each complex idea in very easy to read manner. It discusses various trade-offs in distributed database design and mentions how popular databases (SQL as well as NoSQL) have approached those problems and how they solved them and what that means for you and your data.

For example book covers topics such as  Data Models (key-value, document, relational, graph) and query languages for those data models,  Storage and Retrieval (B-Tree vs LSM-Tree type of storage engines),  Encoding (XML, JSON, Protocol Buffers, Apache Avro, Apache Trift), Replication (synchronous, asynchronous, leader failure, leaderless (Dynamo style) replication, master-master replication etc.), Partitioning (hash partitioning, range partitioning, partitioning of secondary indexes etc.), Transactions (weaker and stronger isolation levels, serializability, understanding what each guarantee does and does not provide etc.), Consistency and Consensus (Paxos, RAFT, Zab, Zookeeper, etcd), Bach Processing (map-reduce, Hadoop), Stream Processing (useing log as primary data and using streams to keep derived data stores up to date)

In addition to covering most important concepts from distributed systems, this book also offers great references to other useful material across the internet. After each chapter you can find large number (close to hundred) of pointers to great resources for further reading (I enjoyed many of these too). If you are interested in Distributed Systems and have issues in connecting the dots this is the right book for you. It connects the dots!

Using the STL: The C++ Standard Template Library – Robert Robson

Gentle introduction to C++ STL library. Book is a bit older but it covers well 1998 ANS C++ Standard which was the latest major release before C++11. Generic STL Iterators, Containers and Algorithms are well explained. At that time (several years ago) it was quite good introduction and most of C++ code out there is based on this version of library. However, if you are considering new project I would recommend starting at least from C++11.

C++ 11 Standard Library: A Tutorial and Reference (2nd Edition) – Nicolai Josuttis

In depth overview of new C++11 features and new STL Library. C++11 bring many new features. In addition to smaller language and library improvements, we now have lambdas and native language support for threads.

Professional Node.js: Building Javascript Based Scalable Software – Pedro Teixeira

Decent intermediate book on Node.js platform. It explains Node.js way of thinking, event-driven non blocking I/O , main libraries (mostly around providing and using network services on TCP and HTTP level) and higher level frameworks such as Express.js. Node.js and Express.js frameworks have excellent documentation online and since this platform was (and still is) evolving at the rapid speed, it is better to look online then in few years old book.

JavaScript: The Definitive Guide

Huge book getting into very great details of Javascript language and Browser API. I have read about two thirds of it, but in general I think I could have better spent my time. It is just too detailed and you probably don’t need to use each small feature of the language. It is enough to learn and use The Good Parts… I found more practical the second part of the book which focuses on Browser API.

JavaScript: The Good Parts –  Douglas Crockford

Extremely lightweight (only about hundred pages) and popular book. I think you’ll never need more than this to use JavaScript effectively. It also makes it very light reference on the language. If you forgot something (say about function closure), you can quickly refresh your knowledge. If you do some JavaScript programming, I recommend to have this book in your shelf.

Effective Enterprise Java – Ted Neward

This book is full of good advice on how to write and organize a good Java (J2EE) code. However, most of it, seem to me as more general practices of good design and not necessarily Java specific. I have seen those general practices (keep it simple, minimize lock window, establish threat model, avoid RPC etc.) at many other places.  Now we know that they hold in Java world too 🙂

Maximum Security: A Hacker’s Guide to Protecting Your Computer Systems and Network – Anonymous

This book does a great job of explaining various attack strategies by hackers as well as a tools to help protect against them. For example, it covers topics of Firewalls, Intrusion Detection Systems, Scanners, Spoofing, Sniffers, DoS, Rootkits, Viruses, Crackers etc. It is quite comprehensive but maybe also at times a bit outdated. In any case, I don’t regret any hour spent on it as I really learned many new things.

Writing Secure Code (2nd Edition) – Michel Howard and David LeBlanc

This is a classic book. It covers all important security topics that you as a developer need to know in order to write secure applications. It is a required reading at Microsoft (and I have read it during my time there). Book introduces Threat Modelling which basically means creating Threat Model (something like dataflow diagram) of you application and trying to understand (together with the team) what are critical/weak points in it. Then, all treats should be ranked and mitigation strategies for each one should be listed. This sounds as a sane approach.

The meat of the book is the list of top ten security  issues developers should pay attention to when developing code like Buffer Overflow, Access Control, Least Privilege Principle, Protecting Secret Data, Validating Input, SQL Injection and few others. This was definitely good and practical reading on important topic.

Cryptography for Dummies – Chey Cobb

Gentle introduction to Symmetric and Asymmetric encryption and Public Key Infrastructure. Not that I learned much by reading it as I already knew about most of the topics from other sources, but this made it quite easy read and I picked up few things along the way like PGP encryption.

Real-Time Analytics: Techniques to Analyze and Visualize Streaming Data – Byron Ellis

This book introduces modern Streaming Data Platform and Real-Time Analytics enabled by it. It is improvement to ETL processes where you always see somewhat outdated data. There is a value in (up to the current moment) recent data in many applications (for example, fraud detection).

This book covers high level introduction on architecture of Streaming Data Platforms. Coordination service Zookeeper as inherent part of many systems in this area is given proper attention. Then, Apache Kafka and Apache Flume are presented as main systems capturing streaming data in a form of logs. These streams need to be processed and for that reason two stream processors are explained: Apache Storm and Apache Samza. This was the most interesting part of the book, to really understand the structure of these systems which comprise a streaming data platform. The value of these technologies is not only in Real-Time Analytics but they can also serve as the main integration layer keeping disparate applications in enterprise together. For example all data could be stored in logs and then stream processors could keep all derived data stores (caches, materialized views, search indexes, databases, etc.) up to date. More on this idea you can find on Martin Kleppmann’s blog.

Book also (lightly) covers a couple of topics which are not strictly related to data streaming such as NoSQL databases (Redis, MongoDB, Cassandra), visualization techniques based on d3.js library (by the way, great visualization library) and approximation of streaming data (because it could come in big volumes).

Enterprise Supply Chain Management: Integrating Best in Class Processes – Vivek Sehgal

Programming in certain business domain requires knowing that business domain well. Since my applications were covering mostly processes in Supply Chain Management and Ordering, I was recommended this book from former fellow programmer. Book is explaining the main processes in Supply Chain Management and most of these are implemented in big ERP systems and are followed (usually in customized form) by many (especially bigger) companies. Probably, this book is not relevant for all software developers but probably some business domain related book is recommended to any developer in enterprise arena. In the end, the main value created by enterprise software developers is automation of business processes.

Don’t Make Me Think: A Common Sense Approach to Web Usability (2nd Edition) – Steve Krug

Your users may not find your interface as intuitive as you do. Understand how users come to your website and how quickly they leave if they don’t find what they need. This book is very relevant for web development where you are trying to attract users on a global scale. In case of enterprise application this is still relevant but a bit less as there there is usually a training for each feature so even in cases something is not as intuitive small number of users will be trained how to use it and will have to use it. This is more about how to make page appealing to random internet user and make them engage with your website.

Peopleware: Productive Projects and Teams – Tom DeMarco and Tim Lister

This is the only book with a management perspective which I have read. It explains the art of creating and managing great teams. Many people say this is a classic and I totally agree with it. It takes a holistic approach to management and advocates people-oriented management as a recipe for success in the long run. It explains concisely more than 30 aspects of organizational dynamics and people management. Unfortunately, not all of these recommendation come to reality in every organization (probably because of some facts from the next book).

Managing With Power: Politics and Influence in Organizations – Jeffrey Pfeffer

This book was a main book behind one elective course “Power Games” which I didn’t take during my master studies. However, I decided to read it later. Office politics and understanding the power dynamics in organization is the center topic. It starts from the individual and gives recommendation on how to gain power in an organization. There is even a 7-step process which you can follow: 1. What are your goals? 2. Diagnose social network and inter-dependencies  3. What are their (of people you interact with) points of view? 4. What are their power bases? 5.What are your power bases? 6. Which strategies seem most effective in given situation? 7. Chose one of these and execute.

The main problem with this is that it takes perspective of individual employee and that is not necessarily good for team or organization. Book even explains some destructive mechanisms of exercising power for your own advantage.

I find the book at the same time explaining something which is unjust and very real. Politics seems to be unavoidable, but I still prefer places where it is constrained and played only to limited extent.

Operating System Concepts – Abraham Silberschatz and Peter Galvin

This is a bible for understanding Operating System concepts. I haven’t read the complete book but the parts I have read, I enjoyed very much. Operating systems often seem as a mystery at first, but after taking some time understanding them (and constructing as I did as a part of one course assignment) they can be decomposed in very logical components. It is not absolutely necessary to understand internal working of Operating System, but I find it really interesting to think of what is happening when I make a system call and what Operating System is probably doing. It also helps reasoning about those system calls and estimating their price.

The Data Warehouse Toolkit: The Definitive Guide to Dimensional Modeling – Ralph Kimball and Margy Ross

The world of data analytics has it’s bible too. Kimball has quite strong opinion on his approach and Star and Snowflake schemas for data anlytics purposes.

The best way, in my opinion, to understand what Star and Snowflake sachems are, is to start from entity-relational diagram of fully normalized data model for your domain. This represents the real shape of your data. Maybe it is not suitable for analytic type of queries because it would require many expensive joins and could be quite complex, and your data analysts want something simpler and faster. However, it is still very rich and detailed model of your data. Of course not all entities are made the same and some are more stable like reference data (currency rate), some entities are less stable like master data (customer, product), then coarse-grained transnational data (like order) is more dynamic and fine-grained transnational data (think of track and trace information during a shipment) are even more dynamic. We can group entities into three buckets: reference, master and transnational data (but in reality some entities could live on the edge so we can think of it as a continuum without exact boundaries).

More dynamic entities tend to depend on more stable ones. For examples track and trace information depends on shipment, shipment depends on order, order depends on customer and product etc. Now you need to pick one of the more dynamic tables. From Kimball terminology that means choosing the grain. This tables becomes your Fact Table. It sits at the center of what will become Star Schema. Now, this table has a relationships to some more stable entities. Take them and put them together with your Fact Table. You now have a Star Schema with Fact Table at the center and Dimensional Tables around it. However, this still limits the scope of your schema and you would would like to extract some intelligence from tables which are related from you first-level Dimensional Tables so you pull few more tables related to some of your Dimensional Tables. That’s it, now your Star Schema became more hairy and starts looking like a snowflake and thus Snowflake Schema. It has a Fact Table at it’s center surrounded by first-level Dimensional Tables which are further surrounded by other second-level Dimensional Tables. Of course there could be any number of levels but you get the point. When you query you start by selecting constrains on Dimensional Tables which dictate selectivity on the Fact Table. That’s how you slice and dice your facts.

Of course, it is not that simple and there are many details on what Fact and Dimensional Tables should look like and how to do the naming and how to denormalize to get better performance or understandability etc. Also, this schema is supposed to hold information for much longer time than transnational systems do, so that more intelligence (insights) could be extracted out of it.

The main benefit of this approach is simplicity and data analysts find it so easy to understand. It is also very straight forward to slice and dice data and get insights. Performance is also good due to recommended denationalization in Dimensional Tables. However, the problem appears when you want to draw some insights from multiple transnational tables because you have one single grain and one single Fact Table. At this moment author is not giving up on Dimensional Modeling but starts looking on workarounds how to still make it work. This results in adding complications in once-clean Start Schema and it starts looking at some places more like Entity Relationship diagram.

Unfortunately, this parallel is not made in the book and Dimensional Modeling is presented as a revolutionary approach to data modeling and the only correct one. It relies on ETL (Extract, Transform, Load) processes to bring data to the schema over the night. So this reasoning gives up on real-time analytics and focuses on historic data. Moreover, it is not shown how this schema could be implemented (but columnar databases come to mind).

Overall, if you read this book I would recommend you to switch on your critical mind and evaluate advice in the context of your application and your data and what you will do in case you need to draw insight from multiple transnational tables.

Pro Git – great learning resource, free, https://git-scm.com/book/en/v2

This a Great resource for learning GIT. Also, advanced topics are covered. I use it as a reference when solving some issues around GIT and I rarely find a need to look elsewhere. Highly recommended.

Your Code as a Crime Scene: Use Forensic Techniques to Arrest Defects, Bottlenecks, and Bad Design in Your Programs – Adam Tornhill

This book is about applying anaytics and drawing various insights out of your GIT repository. Some interesting techniques are presented on how to query your repo and which answers you can get. The good thing is that this is based on real data. If you are interested in this, I can recommend to glance over it, but I probably wouldn’t read it again as I don’t find much value in the proposed insights.

 

 

 



Close Bitnami banner
Bitnami