Data Structure and Algorithms | @bagnascojhoel

➡️

Repository for exercises

Date formulas are not yet recognized by the react-notion-x library.

Starting Date: ‣

Topic: Maps, LRU

The key needs to be unique within the Map. In JS objects with same properties might collide for lack of uniqueness.

Maps are efficient because of the hashing function used to find a value from a key.

If a hashing function results in a certain value more frequently than others, it will tear up Map’s efficiency.

LRU is Least Recently Used.

Notes

Map is a collection of key-value pairs, where each key resolves to a single pair and no collision happens.

A map has a load factor. Which is the amount of data divided by its capacity. E.g. if a map has 7 items and can hold up to 10 items, its load factor is .7.

Maps have keys and values. A key is a hashable value and is used to look up data. A value is the value associated with a key.

Collision is when two keys of a map are equal.

When using hash function over the key, you will make a modulus of its value over map’s capacity. Undoubtedly, some keys will resolve in a same spot of the map. So there’s a need to have those keys even when they hash to the same spot:

Using an array list of key-value pairs, you can go the hashed index and continuing traversing until you find a vacant position. When the hashed index is occupied you will have to traverse N times until a vacant spot.
Using an array list of lists, you can go to the hashed index and add the pair on that list. This way you can use whatever data structure or search algorithm on the inner list.

To keep the map efficient we need to increase its capacity after adding too much pairs. A way to know if we should increase its capacity is by looking at the load factor. For instance, if it is equal to 0.7, re-map.

We increase a map’s capacity by traversing its items and adding them to the larger map while hashing the keys again.

Least Recently Used is a data structure built upon combining other operations. As its name implies, it provides easy access to the least recently used node. It is a great structure for cache, for instance.

LRU are built combining a hash map with doubly linked list. The hash map provides the O(1) find and the doubly linked list allows for traversing and low cost re-ordering of nodes.

Resources

📌

SUMMARY: the advanced data structure all base upon the basic ones. Maps use hash to provide nearly constant time to read access.

Starting Date: ‣

Topic: Graphs, Dijkstra’s Shortest Path Algorithm

Recall

Trees are specializations of graphs.

Breadh-first search uses queue; deepth-first, stack.

Notes

Graphs are structures of nodes (or edges) connected by links (or vertices). These are then limited to create specialized structures with their own set of properties — much like any other data structure.

A graph cycle is any graph in which the first node is also the last one. An acyclic graph simply does not have any graph cycle.

A graph is connected when starting from any node, there is a path to reach any other node on the graph. If a graph is not connected, it is disconnected.

Directed graphs (also called digraphs) have directed vertices. So, from a node, you can only go trough vertices directed elsewhere. Undirected graphs don’t have directed vertices.

Graphs’ edges can also have weights — more generally speaking, they can have labels. These graphs are called weighted graphs. On undirected graphs, a weight would be applied for both nodes connected to an vertices — a symmetric relationship.

For instance google maps recommendations on which street to take during rush hour is implemented with weighed graphs.
Edges can have any data on them without it becoming a weighted graph. That only changes if one of them is considered for traversing the graph.

A directed acyclic graph (DAG) has directed edges in a way that no cyclic path can be formed. (Look on resources.)

The Big O of a graph is commonly stated in terms of vertices and edges: O(vertices * edges). So it considers checking every vertex of every edge.

There are two common implementations of a graph: adjacency list and adjacency matrix. Both adjacency refer to the vertices between edges.

Adjacency matrix has V columns and V rows, where V is the number of vertices in the graph. This may leave a lot of reserved memory unused, since for a edge with a single vertex, an array of V size is reserved.
Adjacency list uses a list containing a list of vertices for each edge. E.g. for a vertex (0, 5), the first item of the list will have a list with adjacencyList[0] = [5]; and the fifth item will have adjacencyList[5] = [0]. So adjacencyList[0] returns the list of edges that 0 is connected to.

Dijkstra’s shortest path algorithm consists on keeping track of minimal distances to all edges and the vertex that has that shortest distance (aka previous array). We then traverse the graph starting from a specific edge, look at the vertices the edge has, evaluate if these vertices are the shortest to each edge; if they are, update the minimal distance for each. After that we keep doing that for until there are no more unseen edges. When there are no more edges to traverse, use the previous array to build the path.

Be aware of loops on cyclic graphs.

When implementing Dijkstra’s shortest path algorithm with simply arrays, you’ll have a runtime of O(V^2). If you use a Min-Heap to keep track of the edges left to traverse, you can get O(V + ElogV) (the logV comes from the process Min-Heap does when inserting a new node).

Resources

📌

SUMMARY: if you now how to work with trees, you should know how to work with graphs, and vice-versa. Dijkstra’s algorithm uses previous array to build back up the path and iterate edges calculating the shortest distance of any vertex connected to it.

Starting Date: ‣

Topic: Heap, Trie

Recall

A min-heap has the current key smaller than its children. That is true for all sub-trees.

Heap does not compare siblings. Only the current node with its children.

Trie tree is mostly used for autocomplete.

Notes

Heap is a data structure also named Priority Queue. It is a special type of complete binary tree. It can be a min-heap or a max-heap.

Min-heap has every one with children greater than the current value. So if the root of a min-heap is 5, all other nodes must be greater than or equal to 5.
Max-heap has every child node smaller than its parent. E.g. node has value 100, all nodes under it must be smaller than 100.

Heap is weak ordered. That means there is a rule for ordering but it does not affect the whole tree. So part of it will be sorted, other parts won’t.

A min-heap states that the node must be the smaller than its children. That is valid for all sub-trees. But there is not rule comparing the left and right children, besides completes of the tree (so leaf nodes should be on the right side, always). The same goes for max-heap.

Heap can be implemented using an array, instead of using nodes, for extremely better performance. All properties are kept when you hold some other information and use two formulas to walk through the items. Take that i is the current index on the array:

2*i + 1 equals to the index of the left child;
2*i + 2equals to the index of the right child;
(i - 1)/2 equals to the index of the parent.

A trie tree (retrieval tree) is used for locating specific keys within a set. It is easy to think of it as the data structure behind auto-complete. A node has N children and by deeply-traversing the tree and connecting the node values, we have a word. (Look resource trie-tree.) You know you have traversed enough to form a word by a property on the node: isWord for instance.

Resources

📌

SUMMARY: some trees can also be implemented with arrays (like on heap). Heap is great for finding min or max value. Trie tree doesn’t require traversing that much for autocomplete.

Starting Date: ‣

Topic: Trees

Recall

A tree node (an anything on software development) can have whatever properties. One that tree nodes sometimes have is a parent reference.

Double ⇒ 2 N

Quadruple ⇒ Area ⇒ N^2

Eight folds ⇒ Volume ⇒ N^3

Binary Trees are categorized based upon the number of children and completion level.

Notes

A tree is the composition of nodes. Each of those nodes has a list of children nodes. There are a lot of specializations of trees.

The most common one is the Binary Tree. A node of this tree has at max two children. So it is commonly used for sorting algorithms.

The height of a tree is the longest distance between the root and a child.

The leaf of a tree is a node without any children.

A balanced tree has all the leaves at the same level. Meaning their distance from the root to any leaf should have the same height.

A tree does not need to be balanced . You can have a node family with height of 1000 and and the other children of the root have heights of 2.
For some algorithms, the more unbalanced a tree, the more problematic it becomes.

The branching factor is the amount of children a tree has.

Traversing a tree is different than other linear data structures which have only one way to do so (like Lists).

You can traverse a binary tree in three different ways: pre-order, in-order, post-order. What changes between them is when we visit the current node.

With pre-order, the node is visited before its children.
In-order, the left child is visited, then the node.
Post-order, we visit the left child, then the right one and lastly the current node.

Bread-first search on a binary tree means we look at the current node’s value and the values of its direct children. If none of those tree values are the one we want, we then recurse the search for the left and right nodes.

Binary Search Tree is a specialization in which each node has two children. The left child contains only values that are lesser than the current node. The right child contains only values that are greater than the current node.

It is faster for dynamic lists since we insert in order and don’t need to resize anything.
Note that no children (not even the inherited ones) from the left side of a node can be greater than it. The same occurs for the right side: no children can be lesser than it.

The performance of algorithms on a Binary Search Tree does not simply depends on the quantity of nodes it has, but also on how its height is distributed (balanced). Depending on how balanced your is, the algorithms’ performance might go from O(logN) to O(N).

Some kinds of binary search trees are self-balancing. AVL and red-black trees use variant algorithms to maintain a height balanced tree.
There are also some self-balancing trees that are not always height balanced, but are balanced in some way. For instance the Splay tree is quick to access recent accessed nodes.

A binary tree can be categorized from the number of children it has. In a full binary tree every node has 0 or 2 children. A degenerate binary tree contains nodes with only one child. On the other hand, the skewed binary trees have only one child and all belong to one side (e.g. every node has only the right child).

When categorizing a binary tree on the basis of completion levels you can have:

a complete binary tree, has all levels, except by the last one, filled and the leaf elements must lean towards the left;
in a perfect binary tree, all nodes have two children and the leaves are all on the same level;
the last one, a balanced binary tree, must have a maximum difference in height of 1, between both sub-trees.

Resources

📌

SUMMARY: trees can be structured in many ways. Most algorithms on trees can make use of recursion. A binary tree is great for sorting and when it is not balanced, it can have slow search.

Starting Date: ‣

Topic: Recursion, Quick Sort

Recall

Quick Sort can be worst than its usual O(N log N). But it has the advantage of sorting in-place.

Notes

A recursion needs two main steps: base and recursion. Base is the condition that will interrupt the recursion. The recursion step is where the current procedure is called again, with different inputs.

The recursion step of a recursive procedure has operations split into pre-recursion and post-recursion. For example, look the code (recursion-1) on resources.

Quick Sort uses the Divide and Conquer Strategy. We pick an element (the pivot) and swap it so that it goes to the index it should be. Now, with a single value at its right position, we divide the remaining elements into before the pivot and after the pivot. Then using recursion, we run the same algorithm into both parts.

It uses an idea of weak sorting. Where when partitioning the array, we do one iteration over the elements and swap elements to sort them; but the array doesn’t need to be entirely sorted at the end.

Quick Sort is not always O(N logN). If the array it is sorting is reversed, and we start picking the pivot from the last element, we will end up only using one of the parts in the recursion. That will make the algorithm run N * [(N -1) + (N -2) + (N -3) + … (N - N)]. So it can be O(N ^ 2).

Quick Sort, in contrast with another algorithm of runtime O(N logN), the Merge Sort, uses way less operations and memory. Quick Sort sorts in-place using swaps. Merge Sort requires allocation for multiple arrays along its execution.

Resources


function recursiveFunction(input) {
		if (input === 0) {
			return 0;
		}

		// code here is executed before reaching the base conditional
		const nextValue = recursiveFunction(input--); 
		// code here is executed after the base has been reached, on the way back from the execution
		return nextValue;
}

recursion-1

📌

SUMMARY: some algorithms have scenarios where they perform different than the “common” worst case scenario (Quick Sort when the data source is reversed).

Starting Date: ‣

Topic: Array vs Linked List, Array List, Array Buffer

Recall

You gotta consider the algorithm you are writing. Think of usability, running time, and memory space.

Ring Buffer is a Queue implemented with an Array.

When you are doing performance testing you should use largely different sizes of data: 10, 100, 1000, 10.000, 100.000, 1.000.000.

Notes

There is a trade-off for each choice. Arrays provide indexes; Linked Lists provide managing of elements at low cost. It depends on your algorithm. Binary Search can only be implemented on an Array. Push and Pop, are way more efficient on Linked Lists.

On Linked Lists you don’t have access to indexes, so you cannot hop from one node to another. You will have to traverse all nodes between the starting (either from head or tail) to your node.

When you want to insert an element at the middle (or beginning of it) of the Array, you’ll have to shift all other values on it (O(N)). On a Linked List, you just need to replace 4 references (O(1)).

You can have a middle term between Linked Lists and Arrays with an Array List. It provides Linked List like operation but it uses an Array under the hood to have constant time access.

An Array Buffer (also called Ring Buffer or Circular Buffer) is a contiguous structure that allows for push and pop operations using an Array. So its a way to implement a Queue while leveraging an Array’s power, such as constant time get’s.

You can infer the data structure behind an interface testing the running time of its operations.

Resources

📌

SUMMARY: an Array List is a middle term between an Array and a Linked List. It can be your go to choice, but always consider the specific use case before choosing the structure.

Starting Date: ‣

Topic: Linked List, Queue, Stack

Recall

The running time of an operation must exclusively consider that operation’s steps. (E.g. for deletion on a Linked List, you must assume you already have the node you want to delete.)

Queue and Stack are Linked Lists with restricted operations.

Notes

Linked List is a container of nodes. It can have the head property which points to the first node of the list; and can also have the tail property, which points to the last node of the list. It might also have other properties, but those two are the most common ones (one example of another useful property would be the length of the list).

Each node of a Linked List has fixed set of properties: value and next. value is the actual value of the node, and next points to the next node.

Linked List has another common implementation: Double Linked List. Which besides the next reference, has a previous reference as well.

Some operations on Linked Lists are O(1), others are O(N).

O(1): insert, deletion, get head or tail (considering you already have the node where you want to insert or delete);
O(N): get anywhere.

Queue is a Linked List which can only insert items on the tail and get or remove items from the head. Since there is no need to traverse the list for any of these operations, operation on a are essentially constant time.

Queues normally have another operation which is peek. Which returns the next element, without actually removing it from the queue.

Just like Queue, Stack is a Linked List with limitations. It is a First In, Last Out structure. So, the first element pushed to it, will be the last to be popped.

Resources

📌

SUMMARY: Linked List is a connection between nodes, which have some operations on it. If you want a more specific use case, you can restrict its operations to get a Queue or Stack.

Date: ‣

Topic: Bubble Sort

Recall

We can solve recurrence relations with the Master Theorem.

Bubble Sort’s runtime is defined by . Which comes from .

Notes

Bubble sort consist of going through the array, and swapping elements. E.g. you go from 0 to N, swapping adjacent values so that the bigger is on the right (if you want ascending order) and the smaller on the left.

Bubble sort’s running time is . That is true because for each value in the array (so N times) we compare them N times, subtracting one at each iteration. This is a recurrence relation that can be solved using the Master Theorem.

To understand an algorithm’s runtime mathematically you have to infer how many times a certain amount of operations is done. To do that you will have to rely on mathematical concepts like roots, logarithms, arithmetic progression, geometrical progression, …

You can find the sum of elements between a range that is progressing by a specific pace with .

Resources

📌

SUMMARY: as Big O is defined upon an expression; to find out the amount of iterations during an algorithm you might need to rely on mathematical concepts, such as arithmetic progression, logarithms, …

Date: ‣

Topic: Array, Linear Search, Binary Search, Two Crystal Balls Problem

Recall

Array is different than a list.

Reading and writing on an Array is O(1).

Linear search is O(N).

You need to take note of the limits of the array.

Keep in mind the bound of your search area. Some changes to the algorithm are needed if you choose inclusive bounds or lower inclusive and higher exclusive.

When you shorten your source set by a value derived from its size, your algorithm will probably be faster than using a constant.

Don’t just look for loops to guess the Big O of an algorithm. Consider if and how the source set changes per each iteration.

Notes

Array is just a contiguous sequence of data. So you just use an offset and the size of each unit of data to access specific positions of the array.

A LinkedList is not an Array, as it can distribute itself across the memory.

It needs a specific size. It is different than a LinkedList that can grow as much as the memory just by adding a reference to the next value. An Array would need to be replicated at a “new” position on the memory with a greater length.

When your source set is ordered, we can use Binary Search, an O(log N) search algorithm. That is because at each iteration, the source set is halved.

If both bounds of the array are inclusive, there will be a scenario where the lower and higher bounds are equal. If one of the bounds are exclusive, it will not happen.

The Two Glass Balls Problem: you have two equal glass balls; you have to find out the optimal point in which they break when released from. Once you’ve used one ball, it breaks and you will only have one more ball.

To solve the Two Glass Balls Problem you can assume an array of boolean, representing positions of height. So, starting from the ground, the first positions will be false, since the ball is not high enough, followed by positions of true. You can use one of the balls, jumping a constant offset (like a linear search that is not one-by-one). Once you find the position of your first true (highBound), and you have the last position where you got a false (lowBound), you can run a regular linear search iterating only over highBound - lowBound, instead of the whole array.

How should you choose the offset size to solve the Two Glass Balls Problem in a more efficient way? If we used any constant, it would still have O(N) (because constants are dropped). So it has to be a value based upon N itself. If we use sqrt(N) as the offset size, we will have O(N).

For the Two Glass Balls Problem, you could use log(N) as your offset, but it is worst than using sqrt(N). On this algorithm, you jump by an offset. The smaller your offset, the smaller your jumps. Thus, more jumps to clear the array. As the value of N grows, log(N) grows endlessly slower than sqrt(N).

Resource

📌

SUMMARY: Array is a contiguous space of memory. Reading and writing to it takes O(1). Take care of bounds of your source set. Consider how the source set changes on each iteration to understand the Big O.

Date: ‣

Topic: Big O Time Complexity

Recall

As your input grows, how fast does computation or memory usage grows?

In real life, constants are important; but they are not relevant theoretically.

There are a multiple other ways to measure time complexity, but Big O is the most common.

Notes

An incorrect data structure picked for an algorithm might throw away the performance of the algorithm.

You always ignore constants in Big O.

Big O is not exact; is just approximately of the worst case scenario (it doesn’t have to be worst case, but it is its most common usage).

You gotta think about the size of your data set when choosing an algorithm. Some algorithms are fast for small sets of data, where for large ones, they lose to their siblings.

O(log N) is for algorithms that on every iteration the source set is halved (e.g. quicksort). So O(N log N), is an algorithm that iterates on the whole source set log N times; so, every iteration the amount of whole searches is halved (binary search trees).

Visuals

📌

SUMMARY: Big O is not exact, but a good enough approximation to pick an algorithm for a use case.

Date: ‣

Topic: Introduction

Recall

It will cover only Algorithms that have been used on an actual job necessity.

Notes

Even tough we don’t actually think about algorithms on a regular basis during our work, they are there, implicitly.

They are really important for entering a high-paying job.

It will cover time and space complexity. We never use it during our work, but it will be necessary for interviews.

Visuals

📌

SUMMARY: it seems pretty worth it. I will probably have to study a lot.