20 Feb 2018

CST/Cogs Framework - IT Organisational Principles for Craftsmanship and Innovation

Introduction


My name is Jonathan. I have been working for 11 years, trying to improve the performance of systems that use databases. Through that experience (and with observing leading people in my industry), I have developed a knack for viewing everything as a system and then identifying bottlenecks within that system.

As of the middle of last year, I have started to use this knack and apply it to human systems at work. I have also studied intensively some concepts from: psychology, philosophy, political theory, social systems, economics and business strategy.

After noticing some short comings that began to increasingly frustrate me at work and in the spirit of 'don't just complain, try to fix it', I have come up with a system of organising work in IT organisations that I have given a lot of thought to.

I plan in this post (or white paper) to explain some shortcoming with our current way of working in IT and a possible future or improvement to those systems.


In the Beginning 




IT organisations or the IT department within organisations, typically used to look like the diagram above. You would have Developers, QA, Database Administrators, System Administrators and Network Administrators. Some companies still have this same structure with slightly different divisions.

There were problems with this structure over time. The main one that I would say is that over-time, the objectives of the different teams diverged from that of the overall company to that of the priorities of the team. Meaning, they became fiefdoms or tribes and started warring with each other.

Not physically warring with each other. More like a sort of

  • Territorial protectionism: "This falls into our areas and we will decide whether to do it or not" 
  • Resource allocation: "Team X needs us to do Y. It will take a lot of work and I can't be bothered with it now. I'll just tell them to write me a ticket and I'll put it in the backlog for a while" 
  • Communication process creep: "I know that the ticket was sent 2 months ago, but I have not received the detailed documentation of what to do, nor do I have written authorisation from manager X and head of Y"




If you look at the above chart as a hierarchy or a social system, it would look like Feudalism.

Story: A Java consultant once joined a company for a 6 month contract with a similar Feudalistic structure. He asked the DBA team to give him an Oracle dev database so that he can develop what was asked of him. He wrote up a ticket and waited. After a while of not getting the database, he continued with other things and tried to compensate with what he had available. There was some back and forth between the heads of his department and he did mention the lack of a dev database in meetings.
However, the contract finished at the end of 6 months and he left the company. 1 month later, he received an email that his Oracle dev database is now ready for him to use.


And Then What Happened


Around the start of the first dot-com boom, small start ups started to rise. In those start-ups, it was expected that developers, set up the entire system - what we call full stack developers, today. As those companies succeeded and grew, some chose not to split off responsibilities to the format of feudalist model, but instead decided to add more multi-skilled developers.
This produced the following and arguably the current model for small to mid-sized companies:




Now what you have is what I call a developer-centric IT company and if I were to pick a hierarchical structure for it, I would say Monarchy.
There are a few phenomena that happened to get us here: job compression and automation.

Job compression means that a company decided to restructure its processes to have fewer stages which reduces the need for wait time between stages. 


The example above shows a mortgage approval process. There are 4 stages. Each stage is a person with different expertise and different authority. Between each stage, there 'work request' sits in that person's inbox until they can get to it. The combined processing time and queuing time is 18 days.

Job compression would give 1 person enough authority and expertise to make a decision on the approval process. 


You have now reduced the time it takes to approve a mortgage from 18 days to 7 days. Note that this was largely accomplished by reducing the overall queue time. 

As more developers needed to take care of more areas of expertise, they did so by using certain developer philosophies to solve problems and in this case used automation. This brought about certain innovations like Puppet, Chef and Ansible along side previous SysAdmin innovations like virtualisation and later, cloud computing. 
You can now, using code, boot up a container of a web server with the all files, scripts and images and run a slew of black box tests against it to see if it fully works.

Accordingly, developers now take on several roles in the IT organisation:
  • Development
  • Business Analysis
  • Quality Assurance
  • Database Administration
  • System Administration (now DevOps)
  • Security
  • Data Engineering

However, it is difficult to hold all that information inside one's head and developers are using these automations as a crutch to progress with their original work. For example, you can download a few Puppet modules and install as well as begin monitoring a new high availability database, but you have now lost the expertise (in the company) of what is going on under the hood and how to fix issues when they occur.

Very few innovations have been made in the areas outside the realm of pure developing as there are less experts in companies to make those innovations. 
For example, while we have automated processes for storing and managing database schema changes, we have not had any innovations with deploying dev/test/staging databases that contain actual data to test against. Nor can we use existing automated systems for managing schema changes when our production databases become too big.

There is a general 'uneasy' feeling when needing to make changes to systems we don't fully understand. This negates the 'safe to fail' environments which we use today to make innovations. We also tend to apply 'philosophies' that work in one area and to another. This is sometimes helpful, but other times detrimental. 

Story 1: I was involved in a data batching process that roughly required 200 million items to be processed through an existing API. Had that process gone through the usual way, it would have taken 64 days with the average chance of crashing. 
The idea to improve this process was to add more web servers and parallel the work into as many threads as possible. This is a common philosophy that developers have picked up due to limitations with the speed of cores on CPUs. As core speeds have not improved in 7 years, the only option to improve performance would be to split the work across a number of threads.
I identified that API spent the majority of its time making database calls and that ultimately, the bottleneck would be the hard disk IO and certain mutexes. 
I recommended offloading part of the work to the database: This involved loading 200 million items to a temporary table in the database that took 7.5 minutes, using a single thread. The rest of the work still needed to go through the API and took 8 hours to complete. Had the whole process been applied against the database in an efficient manner, I would assume it would take up to 45 mins.

Story 2: A company had a batch process that took around 2 hours and had a detrimental effect on the website during that time. I configured the database to handle such loads better and brought the time down to 30 mins using 6 application servers. I rewrote the batch process to be more 'database friendly'  (push down work to the database) and reduced the time down to 3 minutes and 1 application server.


Future?


So far, we have had a Feudalistic hierarchy with issues with warring fiefdoms and we have had Monarchy with lost expertise and reduced innovation in those areas.
How can we leverage more advanced social systems such as a Capitalistic/Democracy?

Lets look at Capitalism for a moment. Capitalism says that most things in nature follow the Pareto principle. You have 20% of the people who produce 80% of the products or services. The opposite would be Communism, where you would say: "I need 500 people to start farming wheat. You there! 500 random people! start farming wheat". The idea with Capitalism is to encourage innovation and progress by awarding those people who produce more of the products or services. At the same time, if you are not one of those top 20% of people, you can move to another market of products or services and become the top 20% there. 
So the idea for Capitalism is to create many markets or areas of skill where the top people can then innovate and drive progress. 

What could happen in the future is that IT companies can structure their teams based on competency-based hierarchies. Meaning, areas of specific expertise and philosophies which are exlusive to one particular domain over another, thus maximising results to the domain they are suited to.

I have a list of, what I believe, are those specialist domains, but before I do, I would like to go over two concepts: 'economies of scope' and 'complexity = mess'


Economies of Scope is a term from the world of business. You have probably heard of economies of scale, where you have a few products and you try to have bigger factories and bigger machines to pump out the same product in large quantities which would mean cheaper costs. 
For example, you can have a factory that makes 3 types of sandwiches. You purchase bigger machines and improve your processes as much as possible to make those 3 sandwiches as fast as possible and remove all possible waste. 
Economies of scope, on the other hand, is a system where you try to produce different and varied products at a cheaper price. For example, take Subway. You can go in one and produce a high variety of sandwiches at slightly higher price than if you would buy a prepackaged sandwich in a shop.



The idea with economies of scope is to break down the process of creating new products into sub-processes that have a very defined scope and then set up communication systems to co-ordinate between those defined processes as well as have some synergy between them.


Complexity = Mess means that a complex system is difficult to work with. It is also difficult to work in a mess. Now complexity doesn't exactly equal a mess, but both of them are not an ordered and organised system. So (complexity or mess) is Chaos and not Order, in this context.



For us to get to order, we need to simplify the system by organising the mess with rules. Too many rules, lead to complexity, so once there, we need to either remove unneeded rules or find patterns or philosophies to the rules and use those to simplify the system.

CST/Cogs Framework


CST/Cogs Framework


The idea with this framework is to build on what we have discussed so far. It would be a more Capitalistic system where the right expertise is managing the correctly defined scope. That way, innovation and progress can occur in an optimal environment. 
The currency that is traded in this environment is skill. The idea is that skill leads to revenue. Similar to how currently, it's money leads to progress.

The theme in this framework is: Craftsmen making individual cogs in a large machine. Craftsman meaning expert in their area, making the best possible cogs. But are part of a system that needs all its cogs to work together to achieve its objectives.   

We have divided the areas, but we might risk falling back to Feudalism. How can we prevent that?

The framework needs to focus on three philosophies: Competency, Simplification and Transparency. The higher we have of each, the better for the overall system. 

Transparency is ultimately, the best way to prevent fiefdoms from occurring. Fiefdoms usually silo and represent information to other parts of the company to benefit itself. 
For example, lets say an unethical manager would like a talented individual to stay in their division. That manager can simply not promote that individual and even give negative reviews to keep them where they are. 
If, however, HR had access to objective metrics about all the employees, they could see that that person produced good work and has been in there position for some years. They would promote that person before they move to another company.

Some metrics that help can be included in Transparency:
  • Time until first 100 lines of code (gitprime.com)
  • Complexity rating of class (PMD)
  • 95% API response time
  • Average time for SEV2 tickets resolution
  • Orders per week
  • Website feature usage (clicks) per week
  • Usefulness of App feature - survey




Let's take three measures of the output of a system to see how these philosophies could work: Speed, Control and Quality.

Speed

  • Competency: If we have experts, then we can make the best choices to build the products instead of trying out many choices until we reach the right one. 
  • Simplification: If we simplify the system as much as we can, we can both integrate new systems faster as well as produce easy to use systems. In a lot of ways, simplifying equals business agility as it helps you change the business faster to meet the needs of the marketplace. 
  • Transparency: If we have metrics that show us were bottleneck are in the system, we can make those systems as fast as possible.

Control

  • Competency: If we have a high degree competency for a defined scope and area, then we have a high degree of control over the system. 
  • Simplification: If the system is simplified, it is easy to use it.
  • Transparency: If the movement of work is transparent, we can see monitor the time it takes to exchange communication and complete work in the system. Another way of looking at it is that one cog is moving slower and is slowing the system down. Ultimately, this is where a manager would need to step in.

Quality

  • Competency: If we have craftsmen, the cogs they produce are of high quality. 
  • Simplification: If the products we deliver have been simplified, it provides an easy to use product for the customer (perceived quality).
  • Transparency: If we have metrics to see how popular the new product is and how it is used, we can improve the quality of that product. Ultimately, this will need direction from 'the business' and would require interaction with Technical Business Analysts (TBAs in the diagram).



New Roles

This framework has a definition for an old role: Managers and a new role which I felt should be included that I call: Technical/Business Analyst. Both are very important for the framework, so I will explain them now.

Managers

I would like to start off with saying that managers do not equal team leaders. In the developer-centric companies, there are very few managers and there are mainly team leaders: developers that have been promoted to lead other developers.

Dilbert.com

It is no secret that people do not like managers that have no idea about their technical role. In addition, there was a study that determined that 65% of managers actually produced negative value for the company. On the other side, good managers produce huge value (Pareto Principle) for the company and it should not be something we write-off.

Currently, with the lack of managers in IT companies, there is a reliance on hiring someone who 'is the right fit' and are basically outsourcing the need to manage to the individual. If they don't work well, then there is something wrong with them.

In the context of a Capitalistic/Democracy, what role would managers play?
Well, in a Democracy, there is a need for Law-makers to make systems for people to interact in a helpful way to society. There is also a need for Courts for dispute resolution.

Managers should think of systems inside the company that promote honesty, tolerance and freedom of speech. Managers should also resolve disputes in the company and look for workplace complications before they become a full blown warring tribe. Bare in mind, that this framework encourages experts and experts usually have opinions.


Following the values of the framework, lets go over what a manager should do:

  • Competency: The manager should be competent enough at coming up with social systems  that are effective for that specific company culture. The idea is that the cogs turn smoothly.
  • Simplification: The manager should set out rules in those systems, but set out very few rules and then enforce them. With regards to communication, less is more. The manager should make sure that a group can handle things in their own expertise and scope and try to reduce communication dependancies.
  • Transparency: The manager should implement metrics gathering to both know how the IT company is performing, but also be transparent to stakeholder outside IT and build up trust with them.


Technical Business Analyst


Business Analysts seem to be something that only large companies have. That is a shame as there has been some huge innovation in documenting and expressing business knowledge in the last 5 years.
I have been looking into Business Process Modelling Notation 2.0 and Decision Modelling Notation and I have found it very useful in bridging an understanding gap between business and IT.

Story 1: I was trying out using decision tables to document requirements. I talked with the Product Manager and asked her to give it a try. She took a ticket that a developer quoted as taking 5-8 days to implement. She went over the requirements and built a decision table in excel. She then showed it to the original developer, who said: "If this is all that is required, then it should take 1-2 days to implement".

Story 2: I was working on a way to document technical processes. I went over some code and found an if-then-else "pyramid of doom" in it. I then tried to put the conditions from the code into a decision table. After I was finished, I showed it to the original developer and he instantly understood it and made a correction to the table. I then proceeded to tell the business analysts in the company that were extremely impressed that that developer understood it so quickly. Apparently, they have had difficulties communicating business requirements to him before.


In the old way BPMN 1.0, mapping a process would look something like this:



I am sure, everyone has ran into something like this glued to a wall in an office. It's not very clear what is going on.

What happens in BPMN 2.0 and DMN, is as follows:

Decision Table - Discount Decision

And then, the process mapping is simplified:

BPMN 2.0 - Notice the small square/hash icon in the discount decision

The magic happens in three different ways:
  1. The business logic is captured in an easy to understand way for the business user (notice, its in Excel)
  2. That same decision table is understood by the developer
  3. The process mapping is now easy to understand and therefore easier to understand more parts of the system.

We've gone over the business side, but we can go a bit further and apply this same process mapping to the technical side:

DMN for a Technical Process


So when you go into the 'Process Order' task from the diagram above, you would goto a technical process diagram listed below:

DMN for a Technical Process


Technical Business Analyst should be the ones to go over both and create both of these types of diagrams and tables. This should achieve a couple of things:
  1. Provide a counter-balance and due diligence to new business requirements: "I understand you would like this new feature. Could you please explain to me in detail what it is that you need?"
  2. Reduce the time groups of developers spend next to whiteboards.
  3. Reduce risk by using decision tables to notice scenarios that were not considered: "We have Active for CustomerStatus, but I don't see a scenario where the OrderStatus is suspended."
  4. Reduce the meetings between developers and business users.
  5. Reduce the scope that developers need to work on and increase focus on a specific task.
  6. Create a system of business and technical documentation. 

TBAs should spend time going over the backlog of tickets. This should increase the velocity of the team if the tickets are very well defined.




When a new ticket is taken on by the team, a developer and a QA engineer should pick up the same ticket: The QA should start writing functionality tests based on the scenarios in the decision table and the developer should write the code and test it against those tests.

This role should cover the following points from 'Boehm's Top 10 Software Defect Reduction list':
  1. Finding and fixing a software problem after delivery is often 100 times more expensive than finding and fixing it during the requirements and design phase
  2. Current software projects spend about 40 to 50 percent of their effort on avoidable rework.
  3. About 80 percent of avoidable rework comes from 20 percent of the defects


In addition, this role should also prevent or at least greatly reduce cancelled projects or priority changes. I understand that these are extremely demoralising for developers.

Let us finish up by going over the framework values with this role:
  • Competency: This is a new role for most small-to-medium companies. It should streamline the development process by adding an expert into the right area and reducing the scope of work for other people in the company.
  • Simplification: Having easy to understand diagrams and documentation simplifies development work. TBAs should also identify parts of the system that could be simplified (value stream mapping) and suggest very specific and narrow work for technical debt.
  • Transparency: TBAs should make the whole system easy to understand for both IT and business users, outside of it.  



F.A.Qs


  • Is this system a replacement for Agile? 
    • No, its completely complimentary to it and would probably better serve the principle of having 'multi disciplinary teams'.
  •  How do you prioritise or expedite work in this system?
    • That would be up to the manager. Technically, if you would like the option of expediting, you would need to leave some spare capacity in the teams.
  • What if there is not enough skill in house?
    • If you don't have the skills you need in the company, then consider bringing in an outside consultant - even if its for a few days. You will not gain new innovations, but you will gain from other company's experience.
  • What would happen there isn't enough work to justify a new field?
    • It could be very possible to let one person in the company have a dual-role and still have time to try and innovate in this new field. 
  • How can I split up an area of expertise without it leading to a huge overhead of communication?
    • That would really depend on you and your needs. You need to find a balance of 'less is more' with regards to communication, but also have enough work concentrated in front of an expert for them to recognise patterns and generate innovation.