Coding and Debugging Software and Systems

Sunday, May 7, 2023

Mirdin Coding Tips

Tradeoffs for Pre and Post Conditions for Code Blocks

A weaker precondition or less preconditions for a code block will be more general and less restrictive. A stronger precondition or more preconditions results in more specific or guaranteed inputs to the code block.

A stronger postcondition or more postconditions offer better guarantees for the output of the code block, because they restrict the development of the code block with more specificity. A weaker postcondition or fewer postconditions offer less guarantees, but provides more room to change the implementation later.

Decision Tables and the Code Smell due to Boolean Blindness

Decision tables or Parnas tables is clear in table form, but their T and F outputs are not descriptive and results in boolean blindness:

|n%3|n%5| f(n) |

|---|---|---------|

| T | T | "Both" |

| T | F | "Three" |

| F | T | "Five" |

| F | F | "Nil" |

To remove boolean blindness in Haskell, we introduce pattern matching with types:

data ThreeFive = Both | Three | Five | Neither

f :: Int -> ThreeFive

f n | n%3 && n%5 = Both

| n%3 = Three

| n%5 = Five

| otherwise = Neither

Code Smells due to Linguistic Antipatterns

from https://www.linguistic-antipatterns.com/

- multiple methods with similar names or effects which are confusing

- fields or functions with names that do not match their specificity or generality

- functions with names strongly and wrongly associated with inapplicable specifications or properties

- functions with names suggesting a return type but there's no return value

- names of functions, parameters or fields do not match their types

- functions with names suggesting purity, but they actually have side effects

Quality through Representability and Validness

Representability in code is obtained by including only the necessary information and not represent things that don't exist in reality. Validness in code is achieved when different real world situations are clearly distinguishable in the code. The goal is SPOT or Single Point of Truth, where the data structure or system that is created has a one to one relationship with the real-world system, and quality is built into the code from the start instead of trying to test it in later. Testing quality in later means that quality is treated as an afterthought, and efforts to improve or fix the data structure or system are made after it has already been built.

Exhibit "Good Taste" when Coding

The crux of the “good taste” requirement is the elimination of edge cases, which tend to reveal themselves as conditional statements. The fewer conditions we test for, the better our code “tastes”.

Example to loop over every point in the grid and used conditionals to test for the edges:

for (r = 0; r < GRID_SIZE; ++r) {

for (c = 0; c < GRID_SIZE; ++c) {

// Top Edge

if (r == 0)

grid[r][c] = 0;

// Left Edge

if (c == 0)

grid[r][c] = 0;

// Right Edge

if (c == GRID_SIZE - 1)

grid[r][c] = 0;

// Bottom Edge

if (r == GRID_SIZE - 1)

grid[r][c] = 0;

}

Better version of the code with "good taste":

for (i = 0; i < GRID_SIZE; ++i) {

// Top Edge

grid[0][i] = 0;

// Bottom Edge

grid[GRID_SIZE - 1][i] = 0;

// Left Edge

grid[i][0] = 0;

// Right Edge

grid[i][GRID_SIZE - 1] = 0;

}

Tradeoffs for DRY or Modularisation vs Coupling

DRY or Don't Repeat Yourself or modularisation reduces code bloat, but this may result in more coupled code, especially if there are cross dependencies between modules. When the software design is apparent or clearly represented in the code, and its structure subsetted hierarchically, there will be less coupled code. It is often not possible or practical to completely remove coupling from code, because some level of coupling and necessary dependencies are necessary for different parts of a software system to communicate and interact with each other. The goal is to minimise and manage this coupling to create a more maintainable and flexible codebase.

Tuesday, December 6, 2022

Misconceptions When Learning Multiple Programming Languages And How To Learn More Effectively

When we create bugs during coding, it is sometimes due to typos and forgetfulness. These can be resolved by checking through the code a few times. When they arise due to misconceptions based on different mental models used in different programming languages, they will have to be researched on, reasoned about, and relearned. In some cases, the keywords and mental models stored in our LTM based on knowing one programming language can be transferred when we learn to comprehend another programming language. The tools we use, such as the IDE, may be the same too. This is called transfer during learning.

Another form of learning called transfer of learning usually happens only on language constructs or tools that are similar between languages, such as the conditional statements, indentation, and various keyboard shortcuts in the IDE. Transfer of learning allows us to apply what we already know less consciously in unfamiliar situations. There is transfer of automatised skills, such as keyboard shortcuts, which is called low-road transfer. There is transfer of more complex skills, such as declaration of variables and types, which is called high-road transfer, and which happens more consciously.

There is also near transfer describing transfer of skills between similar programming languages, and far transfer which is transfer of skills between dissimilar languages. Finally, there is positive transfer, where mental models from knowing one programming language can be reused or adapted for learning another programming language. There is also negative transfer, which results in misconceptions, where existing mental models create the wrong assumptions when learning a new programming language. Certain existing mental models may be so influential that learners of the new programming language may become critical of the language if it doesn't fit their mental models.

Resolving misconceptions may require conceptual change. This involves changing the existing concept or mental model in the learner's mind, which may require unlearning existing concepts, not just adapting them. When misconceptions arise due to the use of faulty basic reasoning that are still relevant to handle more complex problems, it may inhibit reasoning towards the right conclusion and so, has to be suppressed. One effective way of overcoming misconceptions, is to research on and create a checklist of known misconceptions in learning the new programming language after knowing another programming language. To detect misconceptions in codebases we are working on, we may pair program, group program, or simply just run the program, create tests to verify it, and then add documentation in relevant places to prevent future mistakes.

Various factors influence the amount of transfer from one programming language to another:

* our Mastery of one programming language.

* the Similarity between the programming languages, the tools, the algorithms used, or even the context or environment of learning and coding.

* knowing about and learning the Key Attributes and Associated Knowledge that will improve our learning of a new programming language.

* our Feelings about the programming task, language, algorithm, or IDE.

* Paying Attention to the Similarities and Differences (in syntax, type system, programming concepts, runtime, programming and testing environments and practices, IDE), and writing them down.

Reference:

Felienne Hermans, "The Programmer's Brain: What every programmer needs to know about cognition", 2021.

Mental Models in Programming

This post is made referencing Chapter 6 "Getting better at solving programming problems" from the book "The Programmer's Brain: What every programmer needs to know about cognition".

Mental models are in general stored in the LTM, and relevant or selected ones are identified and recalled for use. They are also stored in the WM for processing temporarily because they are learnt, created, organised and adapted there to form different representations of the code. Hence, both flashcards and visualisations will help us to remember and use mental models. The STM is likely used to support mental models too, although it is used more for assisting in the storage of inputs and outputs from the processing in WM. Mental models are generally used to solve problems.

In trying to solve the problem of understanding complex code, we can use the models of state diagrams, dependency graphs, architectural diagrams of the code, or entity relationship diagrams of a software system, which are all explicitly created outside our brains. Hence, they are local models, which are in general computationally simpler or quantitatively smaller so that our memories can process and store them with less assistance at the start. With the help of local models, users' WM can focus on creating larger mental models with more ease. A mental model of code enables reasoning about the relevant elements, interactions and constraints in the code. This means asking and answering questions about the code and refining the mental models. Mental models can also be based on remembered schemata of trees, networks and systems from our LTM. The details depend on the domain, programming language, and architecture of the code, but they generally have data structures, design patterns, architectural patterns, diagrams, and modelling tools.

In general, mental models are incomplete because they simplify and abstracts away irrelevant details. These abstractions can be specific to notional machines, which are models used for reasoning about how computers execute code at the required level of abstraction. Mental models are not permanent and can be adapted to fit the problems. The degree of complexity and nature of the mental models are dependent on the users' abilities, expertise and beliefs, so multiple mental models can coexist and they can be inconsistent with each other. Simpler mental models are sometimes locally coherent, but globally inconsistent. Mental models are usually kept simple and concrete so that users' brains consume less energy in processing the model and information inputs, and if more processing or storage is required, they are usually done outside of our brains, such as by pen and paper, and by using computers.

This is another helpful post about mental models, with a "Hierarchy of needs for code and systems" that is quite relevant to coding:

https://copyconstruct.medium.com/effective-mental-models-for-code-and-systems-7c55918f1b3e

Sunday, December 4, 2022

Reaching a Deeper Understanding of Code

I write this based on the book "The Programmer's Brain: What every programmer needs to know about cognition" Chapter 5 "Reaching a deeper understanding of code".

For a deeper understanding of code, we can use Jorma Sajaniemi's framework for the roles of variables. These are the roles in the framework: fixed value, container, organiser, temporary, flag, walker, most recent holder, stepper, follower, gatherer, most wanted holder. These roles can be determined via a series of questions:

* If the variable is of constant value, it is a Fixed Value.

* If not, then is it temporary storage? Yes means that it's either a Container, Organiser or Temporary.

* If not, then if it is just used for checking, it's a Flag.

* If not, then is there repetitive traversal over loops? Yes means that it's a Walker.

* As a Walker, it can also be considered the Most Recent Holder.

* If the Walker variable as Most Recent Holder is counts each loop in a predetermined manner, it's a Stepper.

* If it is coupled to another variable to keep track of its previous or subsequent value, it is a Follower.

* If it is accumulating, it is a Gatherer.

* If it is selective of the value it holds, it is the Most Wanted Holder.

We can do up a table with rows consisting of the variables, and columns indicating the names, types, operations and roles of the variables. We can also include comments from the code and experiences with the code as columns to elaborate on our understanding. It will also help to print the code on paper for annotations to be done on it, and to name variables, methods and classes descriptively from the start, based on their roles.

There are 4 steps to take to move from a superficial text knowledge of code, to a deeper plan knowledge of code. Firstly, find a focal point to know where to start reading the program. Then, expand knowledge from the focal point by inspecting the code and finding related code (variables, methods, and classes) from that entry point. Then, understand a broader concept from a set of related entities linked by Call Patterns associated with methods. Finally, understand concepts across multiple entities based on the data structures, structural operations and constraints in the code. This final conceptual understanding can be expressed in the documentation for the code.

The best predictor for programming ability and programming accuracy is not numeracy skills, but in fact consists of a combination of working memory capacity and reasoning skills. The best predictor for learning rate is language ability. Reading code is similar to reading any normal textual content, because when we read code, we also identify keywords and try to relate their meanings together. This requires initial code scanning by the reader to get an overview of the structure of the code. In general, code reading is less linear than reading normal textual content.

There are roughly 7 strategies for code reading comprehension:

* Activating prior knowledge by actively thinking of related things that are stored in our LTM.

* Monitoring by keeping track using annotations of what we do not understand in the code.

* Determining the most important lines of code in a program based on their roles and related entities.

* Inferring the meaning of the names of variables, methods and classes by using our WM and LTM.

* Visualising the code by using annotations, dependency graphs, and operation and state tables.

* Asking questions to better understand the code's algorithms, data structures, assumptions, techniques, decisions, alternatives, constraints, goals and functionalities.

* Summarising the code in natural language documentation.

Friday, December 2, 2022

How To Better Remember Code And Understand It

In the book "The Programmer's Brain: What every programmer needs to know about cognition", three cognitive processes are laid out: the LTM (Long Term Memory), the STM (Short Term Memory), and the Working Memory (WM). The knowledge stored in LTM persist through time and are what we recall from the past. This knowledge store is usually bigger. The STM is used to handle incoming new information and is much smaller than LTM. It is generally considered to be able to store only 2 to 6 items at any moment, and is meant to facilitate the processing or calculation of information in the WM. The WM is essentially a processor or calculator that changes or processes incoming information.

When we read code, the incoming information will first go through a filter that is based on knowledge from the LTM, where information considered to be not important for processing is left out before storage in the STM. From the STM, information will sent to the WM to be processed for comprehension in different ways and to different degrees depending on the time available, before storage in the LTM. The storage in the LTM may be lossy depending on the way the information is comprehended and associated with other information and knowledge, as well as over a longer period of time due to lack of refreshing or recalling. Storage strength is therefore improved by repeated study, while retrieval strength is improved by recalling what we have studied. Repetition and recollection intervals should be spaced out and interleaved.

Code reading and comprehension can be faster based on a few factors, such as familiarity with the programming language, the presence of well chunked comments in the code, the presence of familiar design patterns in the structure of the code, the use of meaningful names for variables and class names and meaningful log messages in the code, and the quick use of iconic memory to visually capture the overall structure of the code. Essentially, we should write code well in order for others and our future self to read and comprehend it faster.

Flashcards will help with memory retention to improve familiarity with the syntax of the programming language. We should test our memory recall with the same set of flashcards once a month to improve our LTM storage. For complex code or programming concepts, elaboration of our memory network through repeated active thinking and reflecting on the information to build mental schemata and actively connect new knowledge to existing related memories will strengthen our LTM. This will mean recreating and improving the content of flashcards.

Besides lack of knowledge due to inadequate storage in our LTM, we can also suffer from a lack of processing power in the WM, and this is related to the lack of information due to the small size of our STM or due to insufficient information provided by the code, because in both instances, the WM will have more to process to clear what has overloaded our STM, or engage in conjectures and assumptions which will both occupy STM and take up WM processing space and time. The WM's capacity is also about 2 to 6 items at a time, which is known as the cognitive load. The WM can better process information when they have been divided efficiently into chunks. The cognitive load in our brain can be divided into 3 types: the intrinsic load due to the complexity of the problem, the extraneous load due to distractions external to the problem, and the germane load caused by the processing necessary to store our thoughts into LTM.

There are several ways to reduce cognitive load on the WM: refactoring or changing the internal structure of the code to reduce duplication or improve readability; replacing unfamiliar language constructs such as lambdas, list comprehensions or ternary operators; using code synonyms in flashcards; and creating a dependency graph by annotating complex and interconnected code, and/or creating a state table containing the intermediate values of variables in timed sequence to read code that is heavy on calculations, in order to understand the overall coherence of the program.

Wednesday, November 23, 2022

Learning PureScript - Discipline is Freedom

This post is written based on the book "Functional Programming Made Easier: A Step-by-Step Guide".

In the 1960s, the Structured Programming paradigm was developed to overcome the Spaghetti Code situation resulting from an overuse of GOTO statements. Essentially, GOTO via Jump instructions was phased out, but there was a lot of resistance to the new Structured Programming paradigm. Today, we are trying to replace the Imperative Programming (IP) paradigm with the Functional Programming (FP) paradigm. There are five reasons for this: global state, mutable state, purity, optimisation, and types.

The Global State or Global Variables in the IP paradigm allow the change of data at any time anywhere in the program, which makes it difficult to remember or reason about every possible use. Hence, there has to be tight coupling between Global Variables, but this is easily broken by non-compliant code. Concurrency is not enforced, and there can be variable name collisions. Object-Oriented Programming (OOP) does not resolve the problems because we can easily create a Singleton Object that contains all our variables, which can be exposed publicly, in an emulation of Global Variables. FP makes it impossible to have Global State.

In IP, variables are mutable, which results in programs that are much harder to reason about because values can change drastically. This causes programs to be more fragile. In FP, there are no variables that are mutable. There are only expressions linking the variable names to a fixed quantity, boolean, string or other variables. There is referential transparency because these variables can be replaced by its values without changing the program's behaviour. Loops in IP, which require mutable variables, become recursive functions in FP, which parallel the mathematical definitions and are easy to understand.

In FP, there is functional purity when functions are pure, because such functions take one or more inputs, perform a computation, and returns a result. In IP, a result may not necessarily be returned because the functions may be used to perform multiple other tasks, such as printing a value to the screen, which do not require returning a result from the computation. These other tasks are called Side Effects. In FP, pure functions have no Side Effects because the same input just heats up the CPU and will always produce the same output without writing to files, displaying to screen, or updating to database. Functions with multiple parameters can also be rewritten as a set of functions each requiring only one parameter, and all of them Curried together.

In FP, there is memoisation because the results of difficult calculations can be stored, so that the results can be looked up in their cached values instead of being recalculated. Hence, there is optimisation. There is also optimisation because there are no Side Effects in FP, because Side Effects cannot be cached, and when a computation resulting in a Side Effect is performed multiple times in IP, there will be multiple costly Side Effects.

In FP, types are static because they can be checked at compile time, and hence, type errors can be detected early, refactoring can be done more easily, and IDE's can better support code development. Hence, there is better code quality and correctness. Static Types, as compared to Dynamic Types, limit the flexibility in coding and require explicit typing, which will result in more cumbersome code expressions. However, some FP languages support Type Inference, which frees the programmer from explicitly declaring Types because the compiler can infer the Types based on the usage of the expressions. Some FP languages mix type definitions with variable names, which can be more confusing. PureScript separates the type definitions from the variable expressions.

Monday, November 21, 2022

Locating the source of a problem

To locate the source of a problem, we can either begin where the problem occurs and work towards the source of the bug, or start at the top level of the application and drill down until the buggy source is located.

When a program crashes and if the error messages indicate a specific problem routine, we can use this troubleshooting method:

https://maxloo-coding-debugging.blogspot.com/2022/11/carefully-examine-data-at-routines.html

This will likely lead to the source of the bug. This is a process of identifying the bug by following the calls in a sequence of routines to its origins in order to identify the bug. When a program freezes, this method can also be used by starting from a memory dump. Memory dumps are possible if there are tools or commands available for this. Otherwise, we'll have to create a dump of log messages from the entries and exits of our code routines and examine them. One source of the problem is the libraries we use for our code.

If the problem is an emergent property which cannot be readily associated with any part of the code, then we'll have to begin at the top level of the code, break down the code into parts, and examine the contribution of each part to the problem individually. These problems relate to the performance, security and reliability of our code.

Reference:
Diomidis Spinellis, "Effective Debugging: 66 Specific Ways to Debug Software and Systems"