Pattern Recognition, Generalisation & Abstraction

Once a problem has been decomposed into smaller tasks, it is useful to try and identify common themes or patterns that might exist in other programs. This helps the programmer to save time reinventing the wheel when a solution to a given problem may already exist.

In this lesson, we will learn about the process of identifying common patterns in a Program including:

  1. Pattern recognition
  2. Pattern generalisation and abstraction
  3. Representing parts of a problem or system in general terms

Media Attachments: Presentation

1. Pattern Recognition

Pattern Recognition

Patterns exist everywhere. If you were to look at how your day is organised in your School or College, you will see that it follows a pattern:

  1. Your day will start at a set time
  2. It will be broken up into a number of lessons of a set length
  3. You will have a lesson with a teacher and the teacher will take a register
  4. You may or may not be set homework for a particular lesson.

This pattern holds true for each day of the week for most students in most schools and colleges.

This pattern can then be applied to any systems that tracks and monitors student data, including attendance, punctuality and recording homework marks. Such systems are known as Information Management Systems (IMS).

Recognising patterns – things that are common between problems or programs – is one of the key aspects of computational thinking.

Pattern recognition is based on five key steps:

Identifying common elements in problems or systems

Once you identify a common pattern, there is more than likely going to be an existing solution to the problem.

For example, you might want to search for a student in a school IMS. To do this you would need to use a searching algorithm, like a Binary Search or a Linear Search. We will look at searching algorithms later on in the course.

Identifying and Interpreting common differences in problems or systems

Two different Student IMS systems might have different ways of taking a register.

One system might simply record present and absent. Another system might record, present, planned absence, unplanned absence and late.

Although these are differences, all School and College IMS systems fundamentally need to be able to take a register.

How they go about doing it is different.

Identifying individual elements within problems

The elements can be broken down into inputs, processes and outputs.

In the case of the school register, the input will be a Character entered against the student name It could be ‘/’ or ‘P’ if the student is present, and ‘N’, ‘\’ or ‘L’ if they are not present.

Behind the scenes, a process will occur to add up the number of times the student was present for a lesson. This data will be saved in a database.

This data will also be output as a Percentage Attendance score for each student.

Describing patterns that have been identified

Once you have identified a pattern, you can now start to describe it. It might be a new pattern that occurs several times in your own program, or it might exist elsewhere in other programs.

In 1994, four Software engineers, nicknamed the ‘Gang of Four’, Erich Gamma, Richard Helm, Ralph Johnson and John Vlissides, published a book on design patterns which formalised patterns in software use.

Making predictions based on identified patterns

Once you have identified a pattern you can speculate whether it can be reused in your existing program, or used in another program.

Further Thought

Have a look at the following website about the Gang of Four design patterns https://www.tutorialspoint.com/design_pattern/design_pattern_overview.htm

Can you spot any patterns about the patterns?

2. Pattern Generalisation and Abstraction

Pattern Generalisation and Abstraction

Abstraction means hiding the complexity of something away from the thing that is going to be using it.

For example, when you press the power button on your computer, do you know what is going on?

Of course not, your computer just turns itself on. That’s all you need to know.

The process of powering up your computer and loading the Operating System into RAM memory from the Boot Sector has been hidden from you.

Abstraction in computational thinking is a technique where we split individual parts of the program down into imaginary ‘black boxes’ that carry out operations.

We don’t care HOW they do them only that they work.

There are two parts to this phase:

  1. Identify the information required to solve a problem. You will need to know the type and format of your information and when it is required.
  2. Filter out information you do not need and be able to justify this.

Let’s consider our Student IMS. A teacher wants to look up details about a specific student. To do this, they type the student’s surname, click ‘enter’, and information is displayed

The information needed will be surname only.

This will give us a list of students with the specific surname, but the information brought back would include their first, middle and last name, and their year of registration.

The data needs to be text only.

Information not needed is gender, age and date of birth as all this will be obtained from the student search.

Abstraction of the Search Process in a Student IMS
Figure 2.1 Abstraction of the Search Process in a Student IMS

The ‘Search for A Student’ process does not know that the Student Search Pattern connects to a database and gets a list, all it knows is that it gives the black box a surname, and gets back some results.

This is Abstraction; the student search functionality is hidden away from the rest of the system.

Generalisation happens when you can spot common themes between patterns. For example, you might want to search for students in a class, or who are being taught by a specific teacher – all these involve some form of searching, the only thing that differs is what you are searching for.

Further Thought

Think of your two favourite games. Can you think of any abstraction in each one? Can you think of any generalisation of processes between the two?

3. Representing Parts of a Problem or System in General Terms

Representing Parts of a Problem or System in General Terms

Consider the student search system, it can be represented using the following terms:

  • Variables – these are the values that will change – in this case the surname of a student.
  • Constants – this will be something that is likely to remain fixed for a while, e.g. a student will typically study a 2-year course.
  • Key Processes – these are the things that are critical to the system – for example, searching the database for a given student surname.
  • Repeated Processes – these are things that happen multiple times, for example adding students with matching surnames to the list of students.
  • Inputs – the values entered into the system, in this case, it is the student surname.
  • Outputs – the information produced by the system, in this case, a list of students with a matching surname, also including their first name, middle name and year of registration.

Further Thought

Think back to your student planner program from Lesson 1. Can you identify all the general terms that you would need for this program to securely manage your timetable and your homework?

Lesson Summary

So to summarise what we have learned in this lesson:

  • Patterns are things that are the same within a problem and between problems.
  • Identifying patterns means that there is probably an existing solution already out there.
  • Pattern recognition is based on the 5 key steps of:
    • Identifying common elements in problems or systems
    • Identifying and Interpreting common differences in problems or systems
    • Identifying individual elements within problems
    • Describing patterns that have been identified
    • Making predictions based on identified patterns.
  • Pattern abstraction is hiding the complexities of one pattern from another.
  • Pattern generalisation is spotting things that are common between patterns.
  • We can represent parts of a system in general terms, including Variables, Constants, Key Processes, repeated Processes, Inputs and Outputs.