Before you start designing your data structure it is a good idea to first stop and think about what you want to use it for. If you require a simple structure to store your data before applying a function to each element of data (also called mapping) you probably want to use a list. If you want to store a huge amount of information but still want to be able to find individual elements fast, you probably want to use a tree. If you want to store a list of paths that a router can redirect a network package to (routing information), and it’s essential that it goes fast to find and holding a fixed sized array is no problem, then a hash-table probably is what you want.
In a real setting where you aren’t creating the data structure for teaching purposes, you would in many cases be done after this step. There are many libraries which have created data structures for you to use. But in an academic setting, you want to create it yourself. Still if given the choice it can be a good idea to know which structure is good where.
After choosing which data structure to use in your particular setting, you want to start designing the data structure. There are a few steps you want to do when designing:
- Are there any specific requirements that you have on the data structure?
- Which types should it be able to operate on?
- How much do you plan to reuse it and how important is performance?
The answer for if there are any specific requirements may be answered by the reasoning of the choice of data structure. The types it should be able to operate on is usually answered with “any type” if you know the concept of typecasting (if the programming language you use allows it). In some cases, you may want to restrict which types that can be used with the data structure, perhaps because you want high predictability of the system, and then it makes sense to use for example numerical values as the only type.
You should also decide if you plan to reuse the data structure in the future and what kind of design choices you should think about because of it. If you plan to create a list, for example, you may want to think about the scenario where the list is used as a stack or a queue. The single linked list might be easy to implement and work well for stacks, but for queues where each element is added to the front and removed from the end, the single linked list might cause the list to work slower. There are of course workarounds for the single linked list by keeping a pointer to the last element in the case of the queue, but all these stuff are things that should be thought about since there are other cases which might not be as easily fixed with a hack. Some might not only affect performance as the previous example with the single linked list but might actually require a lot of modifications to reuse.
After the data structure is planned you need to specify which functionality it should have. If we go back to the example of the single linked list it needs to have the following functionality:
- Create/initiate a list
- Add an element to the list
- Remove an element from the list
- Remove the list
This list would work but some extra functionality could improve usability such as:
- Get the length of the list
- Get a value from the list
- Print the list
The list could, of course, implement any other feature which makes sense in the setting in which it’s used. For example, functionality to sort the list or remove duplicates.
When we have the different functions that we want to implement we should then sit down and think about how it could be implemented. Often when programmers get a problem in front of them they then to want to start implementing right away (talking from my own experience), but after many years of programming, I’ve realized that I often create much better programs when I first stop and think about the problem.
When it comes to data structures it’s often a good idea to think about how the different functionalities can be solved recursively. That is with one base case and one or more recursive cases. To show how this process could be done I’ll go through the functionalities of the single linked list listed above. But before doing that let’s say that the values we want to manage the single linked list are numerical values.
Before starting we can draw an illustration of how each element of data (commonly called a node) looks, and to make it simple suppose that a list is just a series of nodes. The content of each node is then a chunk of data and a pointer to the next element.
Going back to the functions, using the data node illustrated above the create function is a piece of cake.
Create: Take nothing as input and send back a NULL pointer. When creating a list you assign a node pointer the value of the create function, in this case, NULL. In another setting where you might use a list-node containing, for example, a node pointer and a length variable, you might want to initialize that length variable to 0 and assign NULL to the pointer. In that case, the Create function would require a little bit more.
When it comes to the add function we have some stuff to consider. Should the add function add a node to the front, the back or a specific place in the list? Would it perhaps be a good idea to create one function for each of these specific cases (perhaps to enable easier reuse if the list is going to be used as a stack or a queue in the future)?
If we decide to create only a function that adds the node to a specified position (and at the last position if the position given is too large and at the first, if the position given is less than one) then we might solve it accordingly in a recursive manner:
Add : in-values(node pointer, data to add, current position value, position value):
If the position value is less than 1 then create a new element with the data to add as data value and connect the node pointer to it.
If the node pointer points to NULL then create a new element with the data to add as data value and connect the node pointer to it.
Else if the node pointer points to another node and the current position value is less than the position value, recursively call the Add function with all in-values the same except current position value equal to current position value + 1.
Else if the current position is equal to the position value then create a new element with the data to add as data value and connect the node pointer to it and the newly created element’s pointer to the node that the in-value node pointer pointed at.
This add function would be able to add an element to the linked list. An obvious improvement that could have been made to this function would be to create another function that creates a new node. Then you could just call that function at the respective position in the code. There are also other ways to create this add functionality. One way would be to create a function which added an element to the start to the list and another that added to the end of the list. If the first function was made in a smart way, the second could then just call on the first (in addition to a few lines of code of its own) to allow it to work. A third function could then be created to allow users to add wherever they want.
To remove an element from the list you’d want to do something similar to the add function. But instead of creating and connecting elements you’d want to find and then disconnect, remove and then reconnect (if any remaining) elements.
To remove the list you’d want to first run the remove function until there is no first element. Then set the node pointer used to define the list to NULL.
As you might have already realized by now most functionality of the list consist of two steps. Finding your way to the right place in the list by stepping forward incrementally (also called traversing), and then do any action that you want to do. Even getting the length, a value or printing the list works this way. To get the length or printing the list you’d want to traverse (step through) the full list, in the first scenario counting each node as you go and in the other printing each. To print a value you’d do just as in the add example, traverse to the right position and then simply read the data value in the node.
In almost all data structures the same rules apply, you traverse to a certain position and then do something either on each step there or at a specified position. Most of the times things can simply be done if you think recursively and if you get into any kind of trouble with seeing the solution, try drawing it! When drawing data structures try to draw in such a way so that the algorithm get clear, so that you can on the image and see how different selections could give different results.
This is all for this post on designing data structures. Please comment if there is any part that you’d like me to write more about or which you want me to go deeper into.