Tech for Product Managers
As Product Managers, we write user stories to define what should be built. Usually, they are around the functionality of how the product should behave when a user interacts with it. But who likes a beautiful-looking app that takes forever to load. Even Google ranks a website considering time to load as one of the ranking parameters. Think about Apple, it delights users by great design, experience, and performance, all of which are non-functional.
The non-functional requirements specify the quality of whatever is being built and constrain the functional requirements.
Therefore, it is equally important if not more to consider them while building your product. For example, a retail sales website will likely see surges in users around Black Friday. No matter how well that application handles normal traffic, it must be designed to handle such surges seamlessly. You have to keep in mind, the product properties and focus on user expectations while defining how the system should do what it does.
Now, there are so many NFRs that matter to different stakeholders. Requirements like maintainability, portability, reusability, testability are important to the team whereas time to market, reporting, speed, and compliance is important to the business. Similarly, what matters to the business may not be what users find value in. As product managers, we should define that balance and prioritize.
Based on the NFRs defined by you as a product manager, high-level design decisions are made. For example, the system should be able to support 300+ million users, the system should have low latency and high availability. Given these requirements, the decision of which database to use, what should be the architecture, where should the load balancers be placed, what type of caching do we need, etc. can be made.
Let’s discuss the most important NFRs that are important to all stakeholders:
It is the time a system takes to fulfill a request. So for WhatsApp, it is measured in terms of the time it takes for message transfer. There are many ways to improve the performance of your system. Let’s discuss a few.
Partitioning is one such technique. The client-side requests data which is stored in a database. A database is essentially just tables and columns and the larger the set, the longer it takes to retrieve the data. So storing data in smaller chunks called partitions is a solution. This can be done in two ways, breaking tables row-wise, also called sharding or horizontal partitioning, and breaking tables column-wise also called vertical partitioning. It should be done in such a way that it optimizes the time and computation power of reading or writing operations depending on the use case.
Caching is something you would have probably already encountered. When a new request appears, the system first checks if the data is available in the cache. If found, also called a cache hit, the request is served else the request goes to the main database. There are two types of caching, client-side caching (done for mostly static websites or content) or server-side caching ( dynamic websites or storing large videos/images). Time to live is defined for all the content after which the contents of the cache are refreshed, to prevent offering stale data. The cache has limited capacity so items in the cache are continuously evicted based on some rules like least frequently used etc.
Browser side caching: When you request a webpage for the first time, your browser gets it from the server and renders it. It also stores a copy on the hard drive so that the next time you request this page, it loads it quickly.
DNS caching: It is a collection of DNS lookups that helps the browser quickly retrieve an IP without going to the DNS server.
Server-side caching: The data gets stored on a cache server that sits along with the main server.
This needs no explanation. We often work with very sensitive user information like payment details, personal information and the users should not be worried about the security of their data. It is crucial to enable encryption and or multi-factor authentication in your application. Presenting it with copy that makes users understand why this is important can quickly help you gain the trust of your users.
This ensures that every request gets a sure response. Apps or websites should not become unavailable at any time of the day. Even scheduled maintenances should happen when the traffic is the lowest.
A robust system does not lose data. This is achieved by creating redundant servers that store copies of data. Replication is the process of creating redundancy. Master-slave database configuration is one way this is done. The master gets all the requests and it updates the slaves in the background. If the master goes down, one of the slaves is promoted as the master.
This ensures that the data is consistent across all nodes. For example, transactional data has to be consistent. Every node should be updated with the latest information before a read happens otherwise someone will be able to pay for their H&M shirt with zero balance in their account.
The system should work expectedly at all times. For example, it could mean keeping backups of the chat so that the user can restore them on a new device in case their device crashes.
It is the traffic an app/website can support. When a system doesn’t scale, it may take forever to load or even crash. So imagine a website has a database on a single computer with all the startup details. When multiple new users come to the website at the same time, the computer has to process the requests one by one, which is quite slow. So, the developers buy a bigger computer with bigger RAM, CPU, and storage which can handle more requests. This is called vertical scaling. As you would have quickly realized there is a ceiling. So, the developers go on to add more servers and distribute work between them. This is called horizontal scaling. It creates a distributed system where a set of computers work together to achieve a common goal without a single point of failure.
Now, who allocates /distributes the requests? A load balancer figures out which machines in the system are online and available to work and distributes tasks between available machines. It can sit wherever you need to balance the load and uses algorithms like round-robin, or least requests or least response time to assign tasks depending on the use case.
These are a few requirements that you will have to define and refine as a product manager in conjunction with your tech team. You have to act as a gatekeeper who ensures these requirements are met and product decisions and trade-offs are centered around them to build a quality product.