I will be using NetworkX Python (2.4) library along with Matplotlib (3.2.2). (Updated on 01.06.2020)
First, we are defining a simple method to draw the graph and the centrality metrics of nodes with a heat map.
Zachary’s Karate Club graph is defined as the example graph G
.
It is basically a social network of members of an university karate club,
where undirected edges connects people who interact outside the club.
We also need a directed graph to demonstrate some other centrality measures.
Here we are defining the toy directed graph DiG
which is given as an example
in wikipedia.
Degree of a node is basically number of edges that it has. The basic intuition is that, nodes with more connections are more influential and important in a network. In other words, the person with higher friend count in a social network, the more cited paper (in-degree) in a scientific citation network is the one that is more central according to this metric.
For directed graphs, in-degree, number of incoming points, is considered as importance factor for nodes.
Eigenvector centrality is a basic extension of degree centrality, which defines centrality of a node as proportional to its neighbors’ importance. When we sum up all connections of a node, not all neighbors are equally important. Let’s consider two nodes in a friend network with same degree, the one who is connected to more central nodes should be more central.
First, we define an initial guess for the centrality of nodes in a graph as \(x_i=1\). Now we are going to iterate for the new centrality value \(x_i'\) for node \(i\) as following:
\[x_i' = \sum_{j} A_{ij}x_j\]Here \(A_{ij}\) is an element of the adjacency matrix, where it gives \(1\) or \(0\) for whether an edge exists between nodes \(i\) and \(j\). it can also be written in matrix notation as \(\mathbf{x'} = \mathbf{Ax}\). We iterate over t steps to find the vector \(\mathbf{x}(t)\) as:
\[\mathbf{x}(t) = \mathbf{A^t x}(0)\]The drawing also shows, the nodes which have the same number of connections are not necessarily in the same heat map color. The one that is connected to more central nodes are more hot in this visualization.
However, as we can see from the definition, it is a problematic measure for directed graphs. Let’s say that a new research paper is published and it references a handful of existing papers. It would not contribute to any of those referenced papers in this citation network because it is not cited by any other papers and has zero eigenvector centrality. In other words, eigenvector centrality would not take zero in-degree nodes into account in directed graphs such as citation networks.
Here the contribution from zero in-degree nodes is zero; consequently, all values are zero except two nodes which are referencing each other.
Katz centrality introduces two positive constants \(\alpha\) and \(\beta\) to tackle the problem of eigenvector centrality with zero in-degree nodes:
\[x_i = \alpha \sum_{j} A_{ij} x_j + \beta,\]again \(A_{ij}\) is an element of the adjacency matrix, and it can also be written in matrix notation as \(\mathbf{x} = \alpha \mathbf{Ax} + \beta \mathbf{1}\). This \(\beta\) constant gives a free centrality contribution for all nodes even though they don’t get any contribution from other nodes. The existence of a node alone would provide it some importance. \(\alpha\) constant determines the balances between the contribution from other nodes and the free constant.
Although this method is introduced as a solution for directed graphs, it can be useful for some applications of undirected graphs as well.
PageRank was introduced by the founders of Google to rank websites in search results. It can be considered as an extension of Katz centrality. The websites on the web can be modeled as a directed graph, where hypermedia links between websites determines the edges. Let’s consider a popular web directory website with high Katz centrality value which has millions of links to other websites. It would contribute to every single website significantly, nevertheless not all of them are important. To overcome that issue, contribution value is divided by out-degree of the node:
\[x_i = \alpha \sum_{j} A_{ij} \frac{x_j}{k_j^{out}} + \beta,\]where \(k_j^{out} = 1\) for zero out-degree nodes to avoid division by zero. It can also be written in matrix terms as:
\[\mathbf{x} = \alpha \mathbf{A D^{-1} x} + \beta \mathbf{1},\]where \(\mathbf{D}\) is a diagonal matrix with elements \(D_{ii} = max(k_i^{out}, 1)\).
As the drawing demonstrates, the nodes with fewer out-degree contributes way more to each node compared the Katz Centrality. Here the node at the top right gets only reference of a very important node, and it becomes way more important compared to the Katz Centrality; on the other hand, the node in the center which gets contribution from high out-degree nodes loses its importance.
Up until this point, we have discussed the measures that captures high node centrality, however, there can be nodes in the network which are important for the network, but they are not central. In particular, let’s consider a survey (review) article in a scientific citation network. The article itself is not necessarily stating a new discovery and it is not central; but nevertheless it is a helpful material to acquire knowledge on a topic because it captures a lot of central research articles. In order to find out such nodes, HITS algorithm introduces two types of central nodes: Hubs and Authorities. Authorities are the one that most cited by Hubs and Hubs are the one that citing the most high Authority nodes.
Authority Centrality is defined as the sum of the hub centralities which point to the node \(i\):
\[x_i = \alpha \sum_{j} A_{ij} y_j,\]where \(\alpha\) is constant. Likewise, Hub Centrality is the sum of the authorities which are pointed by the node \(i\):
\[y_i = \beta \sum_{j} A_{ji} x_j,\]with constant \(\beta\). Here notice that the element of the adjacency matrix are swapped for Hub Centrality because we are concerned with outgoing edges for hubs. So in matrix notation:
\[\mathbf{x} = \alpha \mathbf{Ay}, \quad \mathbf{y} = \beta \mathbf{A^Tx}.\]As it can be seen from the drawing, HITS Algorithm also tackles the problem with zero in-degree nodes of Eigenvector Centrality. These zero in-degree nodes become central hubs and contribute to other nodes. Yet we can still use a free centrality contribution constant like in Katz Centrality or other variants.
Closeness Centrality is a self-explanatory measure where each node’s importance is determined by closeness to all other nodes. Let \(d_{ij}\) be the length of the shortest path between nodes \(i\) and \(j\), the average distance \(l_i\) is such as:
\[l_i = \dfrac{1}{n} \sum_{j}d_{ij}\]Since we are looking for the closer node, the Closeness Centrality \(C_i\) is inverse proportional to average length \(l_i\), so:
\[C_i = \dfrac{1}{l_i} = \dfrac{n}{\sum_{j}d_{ij}}\]Here we are using an unweighted graph and all edges have weight \(1\) distance cost for calculating shortest path length \(d_{ij}\). This measure can be used to determine the central distribution point in a delivery network.
Betweenness Centrality is another centrality that is based on shortest path between nodes. It is determined as number of the shortest paths passing by the given node. For starting node \(s\), destination node \(t\) and the input node \(i\) that holds \(s \ne t \ne i\), let \(n_{st}^i\) be 1 if node \(i\) lies on the shortest path between \(s\) and \(t\); and \(0\) if not. So the betweenness centrality is defined as:
\[x_i = \sum_{st} n_{st}^i\]However, there can be more than one shortest path between \(s\) and \(t\) and that will count for centrality measure more than once. Thus, we need to divide the contribution to \(g_{st}\), total number of shortest paths between \(s\) and \(t\).
\[x_i = \sum_{st} \frac{n_{st}^i}{g_{st}}\]References and further reading
Jaccard similarity (a.k.a. Jaccard index, Intersection over Union or Jaccard similarity coefficient) is a measure to find similarity between two sample sets. It is defined as the size of the intersection divided by the size of the union of the sample sets. Let \(A\) and \(B\) be two sets and Jaccard similarity \(J\) is a measure such as:
\[J(A,B) = \frac{|A \cap B|}{|A \cup B|}\]To demonstrate the method, let’s consider a toy example where we try to measure similarity between three fruit baskets in terms of common fruit types. We see the baskets as sets with distinct fruits:
\[A = \{Pineapple, Apple, Raspberry, Blueberry\} \\ B = \{Apple, Banana, Strawberry, Peach, Blueberry, Orange\} \\ C = \{Raspberry, Blueberry, Pear\}\]As the model shows, us we have 2 intersected fruits over 8 union between sets \(A\) and \(B\), so the Jaccard similarity for \(A\) and \(B\) yields:
\[J(A,B) = \frac{|A \cap B|}{|A \cup B|} = \frac{2}{8} = 0.25\]Moreover, we can fill a similarity table for all pairs of cells as following:
A | B | C | |
A | 1 | 0.25 | 0.4 |
B | 0.25 | 1 | 0.125 |
C | 0.4 | 0.125 | 1 |
This table demonstrates that, although \(A\) has 2 fruits in common with \(B\) as well as \(C\), \(A\) is more similar to \(C\) than to \(B\), because the Jaccard similarity is proportional to the union.
Fingerprinting in computing implies mapping any kind of large input data into a bitwise smaller data. For our example, we are working with different fruits, but when we have many more types of fruits and large strings as inputs, it is not convenient to work with these large identifiers. Instead, we use a fingerprint to represent every object in the sets. For each fruit, the fingerprint should be ideally (virtually) unique data to represent them. (However, there can be collusion for two objects depending on fingerprint size, see Rabin fingerprint [Broder 1993] for a detailed real life example)
\[A = \{1, 2, 6, 7\} \\ B = \{2, 3, 4, 5, 7, 8\} \\ C = \{6, 7, 9\}\]Now every fruit is represented as a smaller size data. We can easily store the elements in the set as 4 bit (\(2^4=16\) unique identifier). This would make sets easier to store and compute.
Now imagine a real life problem: instead of fruits in a couple of baskets, we compare every store on earth. Similarity measure between stores are defined as Jaccard similarity of distinct product sets. There are millions of stores with an average of thousands of product types. To find out the most similar grocery store to one of your favorites, you need to load all these sets and compute intersection and union sizes for each pairs. Instead of dealing with large sets, which requires a lot of computing time and memory, MinHash can provide a sketch to approximate this measure in a scalable way by computing a small fixed sized sketch which represents each large set.
Here is the intuition of this hashing method: let \(\pi\) be chosen to represent a random permutation function which permutes the set object data. In our toy example, we define the object data to be 4 bit, and lets consider every possible value for 4 bit data (0, 1, …, 15) and the permutation function \(\pi\) shuffles them. After permuting the set elements, the minimum value is picked.
The chance of having the same minimum value after this permutation is equal to the number of common elements proportional to the union.
\[Pr(min\{\pi(S_A)\}=min\{\pi(S_B)\})= \frac{|S_A \cap S_B|}{|S_A \cup S_B|} = J(A,B)\]Since this is a probabilistic approximation of the actual value, having more of these function can increase the accuracy. We define how many of permutation function \(\pi\) we are gonna use and store the minimum value as sketch value. This count \(t\) would also determine our sketch size: the fixed size data to represent our set \(S_A\) as sketch \(\overline{S}_A\):
\[\overline{S}_A = (min\{\pi_1(S_A)\}, min\{\pi_2(S_A)\},...,min\{\pi_t(S_A)\})\]Back to our toy problem, we are going to use sketch size \(t=3\), that means we need 3 permutation functions like \(\pi\). Our set members are 4 bit integers \(0, 1,..., 15\), so the permutation function is such as \(\pi: \mathbb{Z}_{15} \rightarrow \mathbb{Z}_{15}\). Let our first permutation function \(\pi_1\) be:
\[\pi_1 = \begin{bmatrix} 2& 8& 6&14&11& 9& 3& 5&10& 4& 7& 0&15&13&12& 1 \\ 0& 1& 2& 3& 4& 5& 6& 7& 8& 9&10&11&12&13&14&15 \end{bmatrix}.\]This two-row notation defined \(\pi_1\) function maps 2 to 0, 8 to 1, 6 to 2, 14 to 3, … and so on. We are gonna apply this method into members of sets as following:
\[\begin{align} min\{\pi_1(S_A)\} & = min\{\pi_1(\{1, 2, 6, 7\})\} \\ & = min\{15, 0, 2, 10\} \\ & = 0 \end{align}\]For our sketch size \(t = 3\), we need additional permutation functions like:
\[\pi_2 = \begin{bmatrix} 3&15&13&11& 6& 8& 9& 0& 4& 7& 1&12&10& 5&14& 2 \\ 0& 1& 2& 3& 4& 5& 6& 7& 8& 9&10&11&12&13&14&15 \end{bmatrix}.\] \[\pi_3 = \begin{bmatrix} 12& 8& 3& 6& 9& 1&14&10& 7& 2& 5&13& 4&11& 0&15 \\ 0& 1& 2& 3& 4& 5& 6& 7& 8& 9&10&11&12&13&14&15 \end{bmatrix}.\]Now we can calculate the sketches for sets \(S_A\), \(S_B\) and \(S_C\) with permutation functions \(\pi_1\), \(\pi_2\) and \(\pi_3\).
\[\overline{S}_A = (min\{\pi_1(S_A)\}, min\{\pi_2(S_A)\},min\{\pi_3(S_A)\})\] \[\overline{S}_A = (0,4,3) \\ \overline{S}_B = (0,0,1) \\ \overline{S}_C = (2,4,3) \\\]We can use this sketch value to estimate Jaccard similarity by calculating how many permutation function returns the same minimum value between sets (Tanimoto coefficieny of arrays). Here is the approximated similarity table for MinHash sketches:
\(\overline{S}_A\) | \(\overline{S}_B\) | \(\overline{S}_C\) | |
\(\overline{S}_A\) | 1 | 0.33 | 0.66 |
\(\overline{S}_B\) | 0.33 | 1 | 0 |
\(\overline{S}_C\) | 0.66 | 0 | 1 |
This particular toy example demonstrates that, we can find out the most similar set to A with only using smaller sized sketches accurately. Since this is a random permutation based approximation method, the accuracy can differ based on sketch size and permutation functions. Another advantage is that, when a new set \(D\) arrives, we only need to compute 3x4 bit sized sketch and compare them with other sketches to find out most similar to it.
References and further reading
The objective function is to minimize the sum of squared distances of each observation to its cluster mean.
\[\operatorname*{argmin}_{\mu_i,...,\mu_k} E(k) = \sum_{i=1}^{k} \sum_{x_j \in C_i} \|x_j-\mu_i \|^2\]Since \(E(k)\) has numerous local minima, there is no algorithm known today to guarantee an optimal solution.
It is most widely known algorithm to solve k-means clustering and sometimes it is mistaken for the method itself. However, it is:
[code] [presentation]
References
C. Bauckhage. “NumPy/SciPy Recipes for Data Science: Computing Nearest Neighbors.”
]]>User
.
We need also Customer
and Maintainer
classes as a custom type of User
,
so we inherit them from the super class.
Customer
and Maintainer
subclasses, which are inherited from User
,
have username
and password
fields as common but also some additional specific fields.
When we want to store those objects into a relational database system,
they need to be mapped as tables.
However, relational databases don’t support inheritance.
In this example we have one base class and two subclasses to map into the relational database as tables.
There are three approaches with their trade-offs to do that.
User table:
id | username | password | customerNumber | employeeNumber | type |
---|---|---|---|---|---|
1 | Alice | 123 | 10001 | NULL | Customer |
2 | Bob | abc | NULL | 10001 | Maintaner |
In this approach, all fields, which are defined under a super (parent) class, are stored in a single table.
It is easy to query and retrieve different types from one table without need of join statements.
However, a query to get Customer
class object would also return irrelevant employeeNumber
field;
hence all regarding fields should be specified in the select statement.
Another problem is that, it is not possible to use constrains such as not null for a subclass.
For example, customerNumber
is an essential field for all Customer
records.
Yet applying not null constrain for customerNumber
would prevent us from storing other objects without customerNumber
such as Maintainer
.
User table
id | username | password | type |
---|---|---|---|
1 | Alice | 123 | Customer |
2 | Bob | abc | Maintaner |
Customer table
id | customerNumber |
---|---|
1 | 10001 |
Maintainer table
id | employeeNumber |
---|---|
2 | 10001 |
In this approach, there exist one database table per class.
Separate tables provide consistent data storage with constrain definitions but it is more complex to query a subclass.
It requires to write some join statements which reduces the performance.
For example, to get a Customer
object, it needs to be join with User
table.
Customer table
id | username | password | customerNumber |
---|---|---|---|
1 | Alice | 123 | 10001 |
Maintainer table
id | username | password | employeeNumber |
---|---|---|---|
2 | Bob | abc | 10001 |
In this approach, there exist a table for each concrete class.
Every concrete class has a table with duplicated fields.
In case of updating a field type of the base class would require to migrate multiple tables.
For example, if we change character size of password
field, we need to alter both Maintaner
and Customer
tables.
It might lead a conflict.
"C:\Program Files (x86)\IIS Express\iisexpress.exe" /path:C:\MyWeb /port:8084
However, not all Windows machines have IIS Server enabled by default, I needed another solution for client profiles as well. Eventually, I wrote a simple C# class called SimpleHTTPServer to serve static files in a given directory. It is using System.Net.HttpListener Class.
First, I define a private HttpListener
member and create a listener for assigned port.
private HttpListener _listener;
private int _port;
private void Listen()
{
_listener = new HttpListener();
_listener.Prefixes.Add("http://*:" + _port.ToString() + "/");
_listener.Start();
while (true)
{
try
{
HttpListenerContext context = _listener.GetContext();
Process(context);
}
catch (Exception ex)
{
}
}
}
This while loop is endless and it processes the context in a private method whenever a new context appears.
Since this method is synchronous, your UI would become unresponsive.
That’s why we need to run it as a new thread when our SimpleHTTPServer
is constructed.
private Thread _serverThread;
public SimpleHTTPServer(string path, int port)
{
this.Initialize(path, port);
}
private void Initialize(string path, int port)
{
this._rootDirectory = path;
this._port = port;
_serverThread = new Thread(this.Listen);
_serverThread.Start();
}
It is also important to stop it before exiting the app. Therefore, another public method is defined to be called before terminating.
public void Stop()
{
_serverThread.Abort();
_listener.Stop();
}
When a new context appears, Process
method is called to handle the request.
Like any HTTP server, it should handle path value and also if it is some root path /
to retrieve the default files.
Thus, a default file lists are defined as constant.
private readonly string[] _indexFiles = {
"index.html",
"index.htm",
"default.html",
"default.htm"
};
Besides, for file extensions like .html
, .png
, .js
; a mime type list is needed.
private static IDictionary<string, string> _mimeTypeMappings =
new Dictionary<string, string>(StringComparer.InvariantCultureIgnoreCase) {
{".asf", "video/x-ms-asf"},
{".asx", "video/x-ms-asf"},
{".avi", "video/x-msvideo"},
{".bin", "application/octet-stream"},
...
};
Now we are ready to implement Process
method.
private void Process(HttpListenerContext context)
{
string filename = context.Request.Url.AbsolutePath;
Console.WriteLine(filename);
filename = filename.Substring(1);
if (string.IsNullOrEmpty(filename))
{
foreach (string indexFile in _indexFiles)
{
if (File.Exists(Path.Combine(_rootDirectory, indexFile)))
{
filename = indexFile;
break;
}
}
}
filename = Path.Combine(_rootDirectory, filename);
if (File.Exists(filename))
{
try
{
Stream input = new FileStream(filename, FileMode.Open);
//Adding permanent http response headers
string mime;
context.Response.ContentType = _mimeTypeMappings.TryGetValue(Path.GetExtension(filename), out mime) ? mime : "application/octet-stream";
context.Response.ContentLength64 = input.Length;
context.Response.AddHeader("Date", DateTime.Now.ToString("r"));
context.Response.AddHeader("Last-Modified", System.IO.File.GetLastWriteTime(filename).ToString("r"));
byte[] buffer = new byte[1024 * 16];
int nbytes;
while ((nbytes = input.Read(buffer, 0, buffer.Length)) > 0)
context.Response.OutputStream.Write(buffer, 0, nbytes);
input.Close();
context.Response.StatusCode = (int)HttpStatusCode.OK;
context.Response.OutputStream.Flush();
}
catch (Exception ex)
{
context.Response.StatusCode = (int)HttpStatusCode.InternalServerError;
}
}
else
{
context.Response.StatusCode = (int)HttpStatusCode.NotFound;
}
context.Response.OutputStream.Close();
}
Here is the result:
It is easy to use as other known single line HTTP servers for Python, Ruby, Node.js and so on. Creating server with auto assigned port:
string myFolder = @"C:\folderpath\to\serve";
SimpleHTTPServer myServer;
//create server with auto assigned port
myServer = new SimpleHTTPServer(myFolder);
//Creating server with specified port
myServer = new SimpleHTTPServer(myFolder, 8084);
//Now it is running:
Console.WriteLine("Server is running on this port: " + myServer.Port.ToString());
//Stop method should be called before exit.
myServer.Stop();
Webots is a robot simulation environment widely used for educational purpose. You can edit environment with its GUI and write controller program for mobile robots in C, C++, Java, Python and MATLAB. It is very convenient tool to work on robotic algorithms, but it is not free. Fortunately, I had Webots EDU license to work with this tool.
The robot knows the distance to goal and the direction. Here is the plan of the environment:
I used a pioneer robot equipped with following sensors:
The algorithms were implemented in three individual projects with the same sensors. These motion planning algorithms’ goal is to reach the target. Ported tree represents the target in the simulation environment.
Kinect detects the obstacle and the robot turns right or left to avoid it. It starts to move around this obstacle and the infra-red sensors control the distance through the obstacle.
Two GPS sensors are located in the robot to determine the position and angle of the target. Back GPS to center GPS angle is defined as “currentAngle”, back GPS to target is defined as “targetAngle”. (Although the accuracy of GPS wouldn’t allow to determine this in real world, it was a quite good approach for this environment I believe.)
When the robot in “GO_TO_TARGET” state, it tries to reduce the difference of two angles. When the target angle is bigger than current angle; speed of right wheel is increased and speed of left wheel is decreased. It is vice versa when the target angle is smaller than current angle. This progress continues up to threshold value. If the difference of two angles is smaller than that value, both wheels are set to normal speed.
It was not defined as a monotonically right or left turning algorithm so the pioneer robot turns in direction that has longer distance to the wall. It follows the wall until the side sensors not detect an obstacle and then it tries to go the target. When it reaches the target, the main while loop is over and the robot stops.
The Robot has two states
The robot calculates its initial angle to target. In “GO_TO_TARGET” state, it tries to reach the target. If it detects an obstacle, it switches to “TURN_RIGHT_FOLLOW” or “TURN_LEFT_FOLLOW” state. Actually they are making the same things accept turning directions. In following state, the robot follows the obstacle until reaching the same angle that in the initial step.
Here is the code:
References
Concept of normalization and normal forms were introduced, after the invention of relational model. Database normalization is essential procedure to avoid inconsistency in a relational database management system. It should be performed in design phase. To achieve this, redundant fields should be refactored into smaller pieces.
Normals forms are defined structures for relations with set of constraints that relations must satisfy in order to detect data redundancy and correct anomalies. There can be following anomalies while performing a database operation:
First Normal Form has initial constraints, further normal forms like 2NF, 3NF, BCNF, 4NF, 5NF would add new constraints cumulatively. In other words, every 2NF is also in 1NF; every relation in 3NF is also in 2NF. If all group of relations are represented as sets, following figure can be drawn:
As it can be seen, the relations satisfy 5NF also would satisfy all other normal forms.
This is the most basic form of relation. Constraints:
As an example, we have movies and actors to be store in a relational database. We have following dependency model:
Here is the table for 1NF:
MOVIEID | TITLE | COU | LANG | ACTORID | NAME | ORD |
---|---|---|---|---|---|---|
6 | Usual Suspects | UK | EN | 308 | Gabriel Byrne | 2 |
228 | Ed Wood | US | EN | 26 | Johnny Depp | 1 |
70 | Being John Malkovich | US | EN | 282 | Cameron Diaz | 2 |
1512 | Suspiria | IT | IT | 745 | Udo Kier | 9 |
70 | Being John Malkovich | US | EN | 503 | John Malkovich | 14 |
Anomaly examples for this model:
To correct these anomalies, the new constraint is introduced:
Regarding this new constraint, the table should be divided into three tables that every attribute has only functional primary key. Now dependency model became like this:
Here are the three tables for 2NF:
MOVIEID | TITLE | COU | LANG |
---|---|---|---|
6 | Usual Suspects | UK | EN |
228 | Ed Wood | US | EN |
70 | Being John Malkovich | US | EN |
1512 | Suspiria | IT | IT |
ACTORID | NAME |
---|---|
308 | Gabriel Byrne |
26 | Johnny Depp |
282 | Cameron Diaz |
745 | Udo Kier |
503 | John Malkovich |
MOVIEID | ACTORID | ORD |
---|---|---|
6 | 308 | 2 |
228 | 26 | 1 |
70 | 282 | 2 |
1512 | 745 | 9 |
70 | 503 | 14 |
2NF can correct anomalies listed for 1NF but there are some remaining anomalies:
For this form following constraint is added:
With this new constraint, country and language should be part of an individual table and country field should be the primary key of this new table. Here is the new dependency model:
Here are all tables for 3NF. (There is no change for Actor and Actor-Movie Matching table.)
COU | LANG |
---|---|
UK | EN |
US | EN |
IT | IT |
MOVIEID | TITLE | COU |
---|---|---|
6 | Usual Suspects | UK |
228 | Ed Wood | US |
70 | Being John Malkovich | US |
1512 | Suspiria | IT |
3NF can correct anomalies listed for 2NF. Although 3NF is sufficient to avoid delete, update and insert conflicts in most cases, there are some further normal forms like BCNF, 4NF and 5NF.
References
Algorithm | Worst case | Best case | Average case |
---|---|---|---|
Quicksort | \(O(n^2)\) | \(O(n\log{}n)\) | \(O(n\log{}n)\) |
Insertion sort | \(O(n^2)\) | \(O(n)\) | \(O(n^2)\) |
Radix sort | \(O(kN)\) | \(O(kN)\) | \(O(kN)\) |
In this experiment, the input is a pair of number and word. First, the file will be read and then numbers will be sorted in descending order with each sorting algorithm. Here is how the input looks like:
In order to test boundary cases for different input orders, three different input versions are used:
Here are average runtime for each file and algorithm:
Algorithm | data1.txt | data2.txt | data3.txt |
---|---|---|---|
Quicksort | 1980 ms | 680 ms | 1840 ms |
Insertion Sort | 11270 ms | 5890 ms | 400 ms |
Radix Sort | 680 ms | 690 ms | 690 ms |
Based on the result:
data2.txt
has the best run-time for three inputs because it is random.data3.txt
is the best case and data1.txt
is the worst case.This experiment shows that, order of the input is important as input size. Although size of inputs are the same, the order changes the runtime dramatically.
Here is the code in C++:
References