Principles of System Design – Part 1

When developing systems, three major considerations should be addressed: reliability, scalability, and maintainability. These terms are often used, and I want to provide explanations for each of them in this blog.

Reliability

The capacity of a system to tolerate errors or issues in order to avert breakdowns or total shutdowns is referred to as reliability. Large systems are constructed with fault-tolerant components. The art and beauty of system design lie in creating fault-tolerant systems from fault-intolerant components.

Faults can be classified as either hardware or software. A big data center, for example, with hard drives and MTTFs of 50-100 years will see discs fail every day. Memory corruption will occur on a frequent basis. Redundancies can be used to solve hardware flaws. Disks, for example, may be kept in RAID configuration, data centers can have numerous power backups, and CPUs can be hot-swapped.

Software errors can occur for a variety of causes. One rogue process might consume your system resources and produce a systematic crash across all nodes, or the applications’ operational assumptions can alter, resulting in crashes. Understanding business needs and developing resilience to manage deviations from them, better monitoring to broadcast warnings early on, better unit testing, and lastly designing better abstractions and interfaces to quickly isolate problems are all ways to handle software failures.

Scalability

The ability of a system to offer appropriate performance in the face of growing demand is referred to as scalability. A system load can be expressed in terms of parameters that best convey the technical purpose of an application. For example, the expected number of writes (posts) per second or reads (posts in timeline view) per second might be used to represent the load on a social networking website. For expressing system stress, you may alternatively use peak reads/writes rather than average.

When the system’s load parameter is altered, performance is the system’s operational characteristic. For example, system average reaction time might be used to assess performance. Performance may also be measured in terms of reaction time distribution. As a result, you may estimate the 99th percentile reaction time to be less than one second and the average response time to be 300 milliseconds. Performance indicators are frequently included in your SLA with consumers.

Maintainability

Maintainability refers to developing code that can be readily understood, refactored, and upgraded by someone other than the original author. Machines will eventually understand any incomprehensible spaghetti code. Good code should be legible and simple to understand in order for teams to cooperate. Good code should also have the appropriate amount of abstraction, as well as clear APIs and interfaces, so that new functionality may be readily added to an existing codebase.


Perigeon Software is a software development firm. With a fresh perspective and dedicated attention to each client, we provide a complete IT solution globally. By defining, designing, and developing solutions tailored to meet our clients’ business objectives, we ensure that our clients get the maximum return on their investment and support them in tackling evolving business and technological issues. Our mission is to provide the best customer service and expertise using the most practical and robust web technologies/software technologies to satisfy our clients’ IT objectives and to provide them with the business and competitive advantage they needed to succeed.

To learn more about perigeon’s portfolio, visit: http://perigeon.com/portfolio/

To learn more about perigeon’s Salesforce capabilities, visit: http://perigeon.com/salesforce/

Drop us a mail at possibilities@perigeon.com to discuss your salesforce requirement.