The PDF-version of the following article is also available for download at https://www.pdf-archive.com/2016/12/13/kmeansre/ or http://www.filehosting.org/file/details/625473/KMeansRE.pdf
“K-means is a widely used method in cluster analysis. In my understanding, this method does NOT require ANY assumptions, i.e., give me a data set and a pre-specified number of clusters, k, then I just apply this algorithm which minimize the SSE, the within cluster square error.” – David Robinson, Data Scientist at StackOverflow.
Introduction
K-Means clustering is one of the most reliable unsupervised learning data mining algorithms used as a workaround to the famous clustering problem that had previously been considered to be non-computational. The following algorithm was initially proposed by Hugo Steinhaus and Stewart Lloyd in 1950 and improved by James McQueen in 1967. Since that time, K-means clustering algorithm is used to solve a variety of computational problems such as sum of squares error (SSE) minimization in regression analysis, graphical image segmentation, recommender engines development, maintaining efficient software features that provide public safety over the Internet and anti-theft protection, etc.
In this article, we’ll discuss about how to use K-means clustering algorithm to implement a simple recommender engine that performs user-to-item collaborative filtering to recommend articles published in a certain social media website to a community of users according to their personal preferences.
Background
In this section of the following article we’ll provide an understanding of the very basic representation of the classical k-means clustering algorithm used to solve many computational problems. In particular, we’ll thoroughly discuss about the essentials of the k-means clustering procedure as well as
K-Means Clustering Algorithm Fundamentals
In general, k-means algorithm provides a solution to the trivial classification problem by splitting up a certain dataset into - clusters, each one containing a number of the most similar data items (or just “observations”) arranged into a cluster based on a minima distance to the nearest “mean”, which, in turn, is being a “prototype” of the following cluster. In the other words, k-means clustering algorithm selects items from the entire dataset that have a closest distance to a particular nearest mean and builds a new cluster. According to the following algorithm, k-means procedure selects the number of the most similar items for each nearest mean, the value of which is previously computed for each particular new cluster being created.
In this case, for each nearest mean’s value we’re performing a computation of a separate cluster containing those data items that are the most similar to the current nearest mean. Another important condition of the clustering process is that each item being selected by k-means procedure from the given dataset must have the closest distance (e.g. must be the most similar) to the other items included in the same cluster. The main idea of performing k-means clustering is that if two different items being selected have the minima distance (i.e. similarity) to the same nearest mean, then the actual distance between these items also tends to the minima.
The value of each item in the given dataset is basically represented by a set of attributes that describe the following item. The set of attributes for the given item normally has a numerical representation, in which each attribute is a real value. That’s why, the entire set of attributes associated with a data item is denoted as a certain vector in - dimensional Euclidean space, where is an actual number of attributes that describes the data item ().
In turn, the value of the nearest mean computed for each new cluster is also represented as a numerical real vector of attributes having a particular distance to the vector of attributes for each item in the - dimensional Euclidean space. Actually, the vector of attributes for a certain nearest mean, contains those attributes which values are the most common to each item in the cluster. As we’ve already mentioned above, the nearest mean, itself, provides a generalization for a certain number of items being selected into a cluster, and serves as a prototype of the entire cluster. For example, if we have two items belonging to the same cluster and each one having a certain vector of attributes , then the vector of attributes , associated with the nearest mean for these two items, would consist of those values that have a closest distance to the values of attributes that are contained in both vectors for one item and another item respectively.
The distance between a particular data item in the given dataset and the nearest mean is computed by using an objective function, such as Euclidean, Manhattan or Taxicab distance, which value normally represents similarity between an item and the nearest mean based on the distance between the vectors of attributes of either the nearest mean or a particular item in a cluster. According to the k-means clustering algorithm it’s the most preferable to use Euclidean distance function rather than the other functions that allow to compute the following distance. The Euclidean distance is normally computed as a sum of squares of partial distances between particular attributes that describe the nearest mean and an item, and can be formulated as follows:
(1)
, where - the distance between two vector of attributes of either the nearest mean and an item in a cluster; - the value of attribute of the item ; - the value of attribute of the current nearest mean; - the actual number of attributes;
Basically, there’s a variety of methods by using which we can compute the nearest mean’s vector of attributes for each particular cluster. In the most cases, the nearest mean’s attributes vector is computed as the vector that represents a center of mass for a certain subset of items. The following center of mass is also called a “centroid” of a newly built cluster. Particularly, the centroid’s vector of attributes is obtained by computing an average (“mean”) of attribute values of all items within a cluster. In order to compute the average vector of attributes, we actually need to perform by-component addition of multiple vectors of attributes associated with the all items in a cluster. The following formula illustrates the computation of a single centroid’s attribute value which is a component of vector for the given centroid:
(2)
, where - the value of attribute in the vector of attributes for the given centroid; - the value of attribute of the item ; - the actual number of items in a cluster;
Generally, the step-by-step k-means clustering procedure can be formulated as follows:
0. Build an initial cluster from the entire dataset of items: Before we begin we need to assign all items in the given dataset to an initial cluster and append it to the array of clusters. Also we’ll have to select a set of multiple centroids for the initial cluster. Notice that, unlike the other succeeding new clusters that will be further produced by the k-means procedure, to perform an efficient clustering, the initial cluster should contain more than one centroid;
1. Randomly initialize centroids of the initial cluster by using the either Forgy or Random Partition methods: To perform initialization we actually have to generate a vector of attributes for each current centroid by assigning random values to each attribute , of the vector . Also, at this step, we assign the initial values to the loop counter variables and ;
2. For the current cluster retrieved from the array of clusters find the items with the minima distance to each centroid and produce the new clusters: During this step we retrieve a cluster from the array and proceed with the next steps to create one or more new clusters based on the set of centroids belonging to the current cluster;
3. For the current centroid , compute the actual distance to each item by using Euclidean distance formula (1): At this point we need to compute the distances between the vector of attributes associated with the current centroid and the vector of attributes for each item;
4. Choose items having the closest distance to the current centroid and produce a new cluster by assigning each item with the minima distance to the newly created cluster: To do this, we basically use the following formulato find all items with vector of attributes that satisfies the minima distance criteria to the current centroid’s vector of attributes ;
5. Compute the new value of centroid based on the formula (2) and assign it to the new cluster produced at the step 4: During this essential step we will compute the new centroid’s vector of attributes as a center of mass represented as a by-component vectors of attributes , summation for each item contained within the newly built cluster. After that, we’ll assign the new centroid to the following cluster. In case, if we cannot compute the new value of centroid for the new cluster, go to the next step of the algorithm, otherwise proceed with step 7;
6. Output the newly built cluster: At this point we assume that there’s no more clustering possible for that particular cluster. That’s why, we pass the following cluster to the output. Similarly, we’ll perform the output for each new cluster that exactly meets the condition applied in the previous step 5. After we’ve produced the output, go to step 8 and proceed with the next centroid ;
7. Append the new cluster obtained at the previous steps to the array of clusters and proceed with the next step of the following algorithm: At this step we will insert the new cluster obtained into the array of clusters, which is basically used as clusters queue while performing the recursive clustering, during which, each one cluster in the array is separated into the number of new clusters based on the values of one or many of its centroids. Finally, we proceed with the next step of the algorithm;
8. Select the next centroid for which we’ll be performing the actual clustering, and perform a check if this is the final centroid of the current cluster () retrieved from the array of clusters: If so, proceed with the step 9, otherwise perform steps 3-7 to produce the rest of new clusters for each centroid of the current cluster;
9. Select the next cluster from the array of clusters and perform the check if this is the final cluster in the clusters queue () or there’s no ability to re-compute the new centroid for the currently processed cluster: If so, terminate the k-means clustering process, otherwise return to the step 2;
Fig. 1. Illustrates the k-means clustering process that aims to produce three new clusters based on centroids. Each item is represented by a vector of attributes () in two-dimensional Euclidean space (,):
Using K-Means Clustering to Produce Recommendations
Besides the classical k-means clustering algorithm, in this article, we will provide a detailed explanation of the k-means clustering algorithm based on an example of implementing a simple recommender engine used to recommend articles to the users that visit a social media website.
To produce recommendations, k-means procedure basically performs clustering based on the two types of objects – the either “users” or “items” (i.e. articles), associated with the two vectors of either the user or item attributes. Particularly, the k-means recommender engine implements user-to-item collaborative filtering approach and recommends a certain number of items to a specific user according to its personal preferences. By performing clustering, the k-means procedure actually performs “filtering” through a large number of articles described by the number of attributes which values exactly correspond to the set of preferences of a particular user. In this case, the main idea of the k-means clustering procedure is that it attempts to find the similarity between user and item objects that itself have different semantics, but reside in the same Euclidean - dimensional space, the correlation between which is represented by the actual distance between the users and items vectors of attributes. In this case, those users and items vectors of attributes are used as sets of trivial latent parameters by using which we establish the relationship between the user and item objects. In this case, the similarity measure, represented by the Euclidean distance between the either user or item’s vector of attributes, is computed based on the most similar attributes (e.g. having the closest distances between one another). In the other words, we actually obtain the similarity between users and items by computing the sum of squares of partial distances between particular attributes. The following process generally implies the either sum of squares error minimization (SSE) performed by using the ordinary least squares (OLS) method in regression analysis, or singular value decomposition (SVD) computation in latent sematic analysis, capable of providing the user-to-item recommendations.
To do this, k-means clustering algorithm aims to find the number of items that are the most similar to a specific user having its own preferences represented by a vector of attributes , and computes clusters in which a specific user, associated with the number of items, serves as the nearest “mean” (or “centroid”) of the entire cluster. In this particular case, the similarity of users and items is typically represented by Euclidian distance (i.e. “metric”) between two vectors of both a user’s and item’s attributes respectively. For that purpose, we use slightly different formulas to perform the similarity computation:
(3)
, where - the weighted measure of similarity value between a user and an item; - the value of attribute of the item ; - the value of attribute for a specific user; - the coefficient of relevance between attribute of the user’s and item’s vectors of attributes;- the actual number of attributes;
Unlike the conventional Euclidean distance formula that allows to compute the actual distance between two vector of attributes, discussed above, the measure of similarity is obtained as a weighted distance value represented as a sum of squares of partial distances between the either user’s or item’s particular attributes, each one is multiplied by a specific weight coefficient . In this case we basically deal with the distance-based similarity which value depends on the more than one parameter. A set of multiple extra parameters of the following formula (3) is represented as a dividend that includes the sum of those extra parameters for a specific attribute: , where - is an extra parameter of attribute for a specific user or item.
In this case, besides the partial distance between user’s and item’s attributes we also use a single parameter – the relevance between those attributes. Since we implement the contextual data model to be used to produce recommendations, we will employ the measure of lexicographical relevance as the parameter of the formula (3).
(4)
, where - the value of lexicographical relevance between attribute of a specific user and item; - the character in the string representation of attribute of a specific item; - the character in the string representation of attribute of a specific user; - the actual length of the smallest of two strings that represent the either user or item attribute ;
To compute the relevance of the either user or item attributes, we normally use a trivial linear search algorithm by iterating over the two strings that represent attribute and for each position we’re performing the comparison of characters located at the same position . If those characters and exactly match, then we add the value of to the totally estimated value of the relevance . The parameter is the actual length of the smallest string the represent the either user’s or item’s attribute . The more about the lexicographical representation of users and items attributes as an essential part of contextual data representation model is discussed in the next section of the following article.
Also, to provide more flexible distance-based similarity computation using formulas (3,4), it’s recommended to modify formula (3) by using another extra parameter, that represents the actual number of either user’s or item’s attributes that exactly match having the value of relevance . The main reason why we use the following parameter in formula (3) is that the similarity of specific users and items, represented by the value of actual distance between the either user’s or item’s vector of attributes, largely depends on the number of attributes that are lexicographically equal. In this case, formula (3) has the following representation:
(5)
A large number of attributes that lexicographically match ensures that the Euclidean distance between the two vectors of either user’s and item’s attributes is typically small and obviously the measure of similarity between a specific user and item normally grows. However, there’s a special case when a specific user and item selected are merely dissimilar (e.g. the either vectors of user’s or item’s attributes don’t have any attributes that exactly match) and the value of extra parameter . In this case, we assume the value of parameter to be equal to 0.01, which means that the actual distance between the either user’s or item’s vector of attributes tends to be large. In the other words, we use the value of parameter , which in turn provides a typically large distance between dissimilar user and item to avoid the case in which the most dissimilar items have a closer distance to a specific user rather than the other of items having a greater value of distance-based similarity compared to any of existing dissimilar items.
Using the code
using System;
using System.Collections;
using System.Collections.Generic;
using System.Text.RegularExpressions;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
namespace KMeans
{
class Attrib : ICloneable
{
private string m_Name;
private double m_Value;
public Attrib(string name, double value)
{
Name = name; Value = value;
}
public object Clone()
{
Attrib TargetAttrib = (Attrib)this.MemberwiseClone();
TargetAttrib.Name = Name; TargetAttrib.Value = Value;
return TargetAttrib;
}
public string Name
{
get { return m_Name; }
set { m_Name = value; }
}
public double Value
{
get { return m_Value; }
set { m_Value = value; }
}
}
class IAttribList : ICloneable, IEnumerable<Attrib>
{
private List<Attrib> m_AttribList = null;
public IAttribList()
{
if (m_AttribList == null)
m_AttribList = new List<Attrib>();
}
public void Add(Attrib attrib_item)
{
m_AttribList.Add((Attrib)attrib_item.Clone());
}
public object Clone()
{
IAttribList TargetAttribList = new IAttribList();
foreach (Attrib attrib in m_AttribList)
TargetAttribList.Add(attrib);
return TargetAttribList.Count() > 0 ?
(IAttribList)TargetAttribList.Clone() : null;
}
public Attrib this[int iIndex]
{
get { return m_AttribList[iIndex]; }
set { m_AttribList[iIndex] = value; }
}
public int Count() { return m_AttribList.Count(); }
public IEnumerator GetEnumerator()
{
return m_AttribList.GetEnumerator();
}
IEnumerator<Attrib> IEnumerable<Attrib>.GetEnumerator()
{
return (IEnumerator<Attrib>)this.GetEnumerator();
}
}
class Item : ICloneable
{
private bool m_IsUser;
private bool m_Exists;
private string m_ItemText;
private double m_Distance;
private IAttribList m_AttribList = null;
public Item(string item_text, IAttribList attributes,
double distance, bool is_user, bool exists)
{
if (AttribList == null)
AttribList = (IAttribList)attributes;
IsUser = is_user; Exists = exists;
ItemText = item_text; Distance = distance;
}
public IAttribList GetAttribList()
{
return m_AttribList;
}
public object Clone()
{
Item TargetItem = (Item)this.MemberwiseClone();
IAttribList TargetAttribList = new IAttribList();
foreach (Attrib attrib in this.AttribList)
TargetAttribList.Add(attrib);
if (TargetAttribList.Count() > 0)
TargetItem.AttribList = (IAttribList)TargetAttribList.Clone();
TargetItem.IsUser = this.IsUser; TargetItem.Exists = this.Exists;
TargetItem.ItemText = this.ItemText; TargetItem.Distance = this.Distance;
return TargetItem;
}
private double Relevance(string attrib1, string attrib2)
{
double nRelevance = 0;
int nLength = attrib1.Length < attrib2.Length ? attrib1.Length : attrib2.Length;
for (int iIndex = 0; iIndex < nLength; iIndex++)
nRelevance += (attrib1[iIndex] == attrib2[iIndex]) ? (double)1 / nLength : 0;
return nRelevance;
}
public double EuclDW(Item item)
{
int nCount = 0;
int iIndex = 0; double nDistance = 0;
while (iIndex < item.AttribList.Count() && iIndex < AttribList.Count())
{
double nRel = Relevance(item.AttribList[iIndex].Name, AttribList[iIndex].Name);
if (nRel == 1) nCount++;
nDistance+= Math.Pow(item.AttribList[iIndex].Value - AttribList[iIndex].Value, 2.0) *
((double)((nRel > 0) ? nRel : 1));
iIndex++;
}
return Math.Sqrt(nDistance) * ((double)1 / ((nCount > 0) ? nCount : 0.01));
}
public string ItemText
{
get { return m_ItemText; }
set { m_ItemText = value; }
}
public double Distance
{
get { return m_Distance; }
set { m_Distance = value; }
}
public bool IsUser
{
get { return m_IsUser; }
set { m_IsUser = value; }
}
public bool Exists
{
get { return m_Exists; }
set { m_Exists = value; }
}
public IAttribList AttribList
{
get { return m_AttribList; }
set { m_AttribList = value; }
}
}
class IItemsList : ICloneable, IEnumerable<Item>
{
private List<Item> m_ItemsList = null;
public IItemsList()
{
if (m_ItemsList == null)
m_ItemsList = new List<Item>();
}
public void Add(Item item)
{
m_ItemsList.Add(item);
}
public object Clone()
{
IItemsList TargetItems = new IItemsList();
foreach (Item item in m_ItemsList)
{
IAttribList TargetAttribList = new IAttribList();
foreach (Attrib attrib in item.GetAttribList())
TargetAttribList.Add(new Attrib(attrib.Name, attrib.Value));
if (TargetAttribList.Count() > 0)
TargetItems.Add(new Item(item.ItemText, TargetAttribList,
item.Distance, item.IsUser, item.Exists));
}
return TargetItems;
}
public Item this[int iIndex]
{
get { return m_ItemsList[iIndex]; }
set { m_ItemsList[iIndex] = value; }
}
public int Count() { return m_ItemsList.Count(); }
public void RemoveAt(int iIndex) { m_ItemsList.RemoveAt(iIndex); }
public IEnumerator GetEnumerator()
{
return m_ItemsList.GetEnumerator();
}
IEnumerator<Item> IEnumerable<Item>.GetEnumerator()
{
return (IEnumerator<Item>)this.GetEnumerator();
}
}
class ICluster : ICloneable
{
private IItemsList m_Items;
private IItemsList m_Centroids;
public ICluster(IItemsList cnts_list, IItemsList items_list)
{
Items = (IItemsList)items_list;
Centroids = (IItemsList)cnts_list;
}
public object Clone()
{
IItemsList TargetItemsList = new IItemsList();
IItemsList TargetCentroidsList = new IItemsList();
ICluster TargetCluster = (ICluster)this.MemberwiseClone();
foreach (Item centroid in Centroids)
TargetCentroidsList.Add(centroid);
foreach (Item item in Items)
TargetItemsList.Add(item);
TargetCluster.Items = TargetItemsList;
TargetCluster.Centroids = TargetCentroidsList;
return TargetCluster;
}
public IItemsList Items
{
get { return (IItemsList)m_Items; }
set { m_Items = (IItemsList)value; }
}
public IItemsList Centroids
{
get { return (IItemsList)m_Centroids; }
set { m_Centroids = (IItemsList)value; }
}
}
class IClustersList : ICloneable, IEnumerable<ICluster>
{
private List<ICluster> m_ClustersList = null;
public IClustersList()
{
if (m_ClustersList == null)
m_ClustersList = new List<ICluster>();
}
public void Add(ICluster cluster)
{
m_ClustersList.Add(cluster);
}
public object Clone()
{
IClustersList TargetClustersList = new IClustersList();
foreach (ICluster cluster in m_ClustersList)
{
IItemsList TargetCentroidsList = new IItemsList();
foreach (Item centroid in (IItemsList)cluster.Centroids.Clone())
TargetCentroidsList.Add(new Item(centroid.ItemText, (IAttribList)centroid.AttribList.Clone(),
centroid.Distance, centroid.IsUser, centroid.Exists));
IItemsList TargetItemsList = new IItemsList();
foreach (Item item in (IItemsList)cluster.Items.Clone())
TargetItemsList.Add(new Item(item.ItemText, (IAttribList)item.AttribList.Clone(),
item.Distance, item.IsUser, item.Exists));
TargetClustersList.Add(new ICluster((IItemsList)TargetCentroidsList.Clone(),
(IItemsList)TargetItemsList.Clone()));
}
return TargetClustersList;
}
public ICluster this[int iIndex]
{
get { return m_ClustersList[iIndex]; }
}
public int Count() { return m_ClustersList.Count(); }
public IEnumerator GetEnumerator()
{
return m_ClustersList.GetEnumerator();
}
IEnumerator<ICluster> IEnumerable<ICluster>.GetEnumerator()
{
return (IEnumerator<ICluster>)this.GetEnumerator();
}
}
class KMeans
{
private IItemsList m_Items;
private IItemsList m_Users;
private IClustersList m_Clusters;
private IClustersList m_TargetClusters;
private readonly System.Random rnd = new System.Random();
private double m_MinVal = 0;
private double m_MaxVal = 0;
private void Swap(ref IAttribList attribs, int indexA, int indexB)
{
Attrib temp = attribs[indexA];
attribs[indexA] = attribs[indexB];
attribs[indexB] = temp;
}
private void Normalize(IItemsList items_list, int n_attribs,
bool is_users, ref double min_val, ref double max_val)
{
if (min_val == 0 && max_val == 0)
{
min_val = (double)1 / n_attribs;
max_val = (double)n_attribs / (n_attribs + 1);
}
for (int iItem = 0; iItem < items_list.Count(); iItem++)
{
IAttribList AttribsTarget = items_list[iItem].AttribList;
for (int iAttrib = 0; iAttrib < AttribsTarget.Count(); iAttrib++)
if (AttribsTarget[iAttrib].Value > 1 || is_users == false)
AttribsTarget[iAttrib].Value = ((AttribsTarget[iAttrib].Value /
(n_attribs + 1)) - min_val) / (max_val - min_val) + 0.01;
}
}
public int LoadItemsFromFile(string filename)
{
using (System.IO.FileStream fsFile = new System.IO.FileStream(filename,
System.IO.FileMode.Open, System.IO.FileAccess.Read, System.IO.FileShare.Read))
{
using (System.IO.StreamReader fsStream = new System.IO.StreamReader(
fsFile, System.Text.Encoding.UTF8, true, 128))
{
int iItem = 0;
int nAttrib = 1; string textBuf = "\0";
while ((textBuf = fsStream.ReadLine()) != null)
{
if (!String.IsNullOrEmpty(textBuf))
{
IAttribList TargetAttribList = new IAttribList();
string sItemPattern = " => "; string[] sItemTokens;
if ((sItemTokens = Regex.Split(textBuf, sItemPattern)) != null)
{
for (int iToken = 0; iToken < 2; iToken++)
{
string sPattern = " "; string[] sTokens;
if ((sTokens = Regex.Split(sItemTokens[iToken], sPattern)) != null)
{
foreach (string token in sTokens)
{
bool bExists = false; int nToken = 0;
int nIndex = iItem; double nCoeff = 0;
while (--nIndex >= 0 && bExists == false)
{
nToken = 0;
IAttribList attribs = m_Items[nIndex].AttribList;
while (nToken < attribs.Count() && bExists == false)
bExists = (attribs[nToken++].Name.Equals(token)) ? true : false;
}
nCoeff = (bExists == true) ?
m_Items[nIndex + 1].AttribList[nToken - 1].Value : nAttrib;
bool bInAttribList = false; int iAttrib = 0;
while (iAttrib < TargetAttribList.Count() && !bInAttribList)
bInAttribList = (token.Equals(TargetAttribList[iAttrib++].Name)) ? true : false;
if (bInAttribList == false)
TargetAttribList.Add(new Attrib(token, nCoeff));
nAttrib++; }
}
}
}
m_Items.Add(new Item(textBuf, TargetAttribList, 0, false, false));
iItem++; }
}
Normalize(m_Items, nAttrib, false, ref m_MinVal, ref m_MaxVal);
fsStream.Close();
}
fsFile.Close();
}
return m_Items.Count();
}
public int LoadUsersFromFile(string filename)
{
using (System.IO.FileStream fsFile = new System.IO.FileStream(filename,
System.IO.FileMode.Open, System.IO.FileAccess.Read, System.IO.FileShare.Read))
{
using (System.IO.StreamReader fsStream = new System.IO.StreamReader(
fsFile, System.Text.Encoding.UTF8, true, 128))
{
int iItem = 0;
int nAttrib = 1; string textBuf = "\0";
while ((textBuf = fsStream.ReadLine()) != null)
{
if (!String.IsNullOrEmpty(textBuf))
{
string sPattern = " => "; string[] sTokens;
IAttribList TargetAttribList = new IAttribList();
if ((sTokens = Regex.Split(textBuf, sPattern)) != null)
{
foreach (string token in sTokens[1].Split(new char[] { ' ' }))
{
bool bExists = false; int nToken = 0;
int nIndex = 0; double nCoeff = 0;
while (nIndex < m_Items.Count() && bExists == false)
{
nToken = 0;
while (nToken < m_Items[nIndex].AttribList.Count() && bExists == false)
bExists = (m_Items[nIndex].AttribList[nToken++].Name.Equals(token)) ? true : false;
nIndex++;
}
if (bExists == false)
{
int nItem = iItem - 1;
bool bUserAttrib = false;
while (nItem >= 0 && bUserAttrib == false)
{
nToken = 0;
while (nToken < m_Users[nItem].AttribList.Count() && !bUserAttrib)
bUserAttrib = (m_Users[nItem].AttribList[nToken++].Name.Equals(token)) ? true : false;
nItem--;
}
nCoeff = (bUserAttrib == true) ? m_Users[nItem + 1].AttribList[nToken - 1].Value : nAttrib;
}
else nCoeff = m_Items[nIndex - 1].AttribList[nToken - 1].Value;
TargetAttribList.Add(new Attrib(token, nCoeff));
nAttrib++; }
m_Users.Add(new Item(textBuf, TargetAttribList, 0, true, false));
iItem++; }
}
}
Normalize(m_Users, nAttrib, true, ref m_MinVal, ref m_MaxVal);
fsStream.Close();
}
fsFile.Close();
}
return m_Users.Count();
}
public KMeans()
{
m_Items = new IItemsList();
m_Users = new IItemsList();
}
public void Compute(int nInitialCentroids, int nItemsPerCluster)
{
m_TargetClusters = new IClustersList();
while (m_TargetClusters.Count() < m_Users.Count())
{
m_Clusters = new IClustersList();
if (nInitialCentroids != m_Users.Count())
{
List<int> CentroidIndexes = new List<int>();
int nInitialCentroid = rnd.Next(0, m_Users.Count());
while (CentroidIndexes.Count() < nInitialCentroids)
{
double nDistance = 0, nDistanceSum = 0;
double nDistanceMin = 0; int nCntMin = -1;
for (int nItem = 0; nItem < m_Users.Count(); nItem++)
{
if (nItem != nInitialCentroid)
{
if ((nDistance = Math.Pow(m_Users[nItem].EuclDW(m_Users[nInitialCentroid]), 2.0)) >= 0)
{
if (nDistance < nDistanceMin || nCntMin == -1)
{
bool bFound = false; int iCntIndex = 0;
while (iCntIndex < CentroidIndexes.Count() && bFound == false)
bFound = (CentroidIndexes[iCntIndex++] == nItem) ? true : false;
if (bFound == false)
{
nDistanceMin = nDistance; nCntMin = nItem;
}
}
nDistanceSum += nDistance;
}
}
}
nDistanceSum = rnd.NextDouble() * nDistanceSum;
int nIndex = 0; double nSum = 0;
while (nIndex < m_Users.Count() && nSum < nDistanceSum)
{
int iTargetIndex = 0; bool bFound = false;
double nDist = Math.Pow(m_Users[nIndex++].EuclDW(m_Users[nCntMin]), 2.0);
while (iTargetIndex < CentroidIndexes.Count() && !bFound)
bFound = (CentroidIndexes[iTargetIndex++] == nIndex) ? true : false;
if (bFound == false)
nSum += nDist;
}
if (nCntMin != -1)
CentroidIndexes.Add(nCntMin);
}
IItemsList CentroidItems = new IItemsList();
for (int iIndex = 0; iIndex < CentroidIndexes.Count(); iIndex++)
CentroidItems.Add(m_Users[CentroidIndexes[iIndex]]);
m_Clusters.Add(new ICluster(CentroidItems, m_Items));
}
else m_Clusters.Add(new ICluster(m_Users, m_Items));
for (int iCluster = 0; iCluster < m_Clusters.Count(); iCluster++)
{
IItemsList Items = (IItemsList)m_Clusters[iCluster].Items.Clone();
IItemsList Centroids = (IItemsList)m_Clusters[iCluster].Centroids.Clone();
for (int iCentroid = 0; iCentroid < Centroids.Count(); iCentroid++)
{
IAttribList attribsA = Centroids[iCentroid].AttribList;
for (int iAttrib = 0; iAttrib < attribsA.Count(); iAttrib++)
for (int iItem = 0; iItem < Items.Count(); iItem++)
{
IAttribList attribsB = Items[iItem].AttribList;
for (int nAttrib = 0; nAttrib < attribsB.Count(); nAttrib++)
if (attribsB[nAttrib].Name.Equals(attribsA[iAttrib].Name))
if (iAttrib < attribsB.Count()) Swap(ref attribsB, iAttrib, nAttrib);
}
double nDistanceAvg = 0;
List<int> ItemsIndexes = new List<int>();
int nNeighbors = Items.Count() < nItemsPerCluster ? Items.Count() : nItemsPerCluster;
bool bDone = false;
while (ItemsIndexes.Count() < nNeighbors && !bDone)
{
nDistanceAvg = 0;
double nDistanceMin = 0; int nItemMin = -1;
for (int iItem = 0; iItem < Items.Count(); iItem++)
{
double nDistance = 0;
if (m_Clusters[iCluster].Items[iItem].Exists == false)
{
Item temp = new Item(null, attribsA, 0, false, false);
if ((nDistance = Items[iItem].EuclDW(temp)) >= 0)
{
if (((nDistance < nDistanceMin) ||
(nItemMin == -1)) && Items[iItem].ItemText != null && nDistance <= 1.0)
{
bool bExists = false;
int nItem = ItemsIndexes.Count() - 1;
while (nItem >= 0 && bExists == false)
bExists = (ItemsIndexes[nItem--] == iItem) ? true : false;
if (bExists == false)
{
nDistanceMin = nDistance; nItemMin = iItem;
}
}
}
}
nDistanceAvg += nDistance / Items.Count();
}
if (nItemMin > -1)
ItemsIndexes.Add(nItemMin);
else bDone = true;
}
for (int iIndex = 0; iIndex < ItemsIndexes.Count(); iIndex++)
m_Clusters[iCluster].Items[ItemsIndexes[iIndex]].Exists = true;
Centroids[iCentroid].Distance = nDistanceAvg;
IItemsList TargetItems = new IItemsList();
for (int iItem = 0; iItem < ItemsIndexes.Count(); iItem++)
{
if (m_Clusters[iCluster].Centroids.Count() <= 1)
Items[ItemsIndexes[iItem]].Distance =
Items[ItemsIndexes[iItem]].EuclDW(m_Clusters[iCluster].Centroids[0]);
Items[ItemsIndexes[iItem]].Exists = false;
TargetItems.Add(Items[ItemsIndexes[iItem]]);
}
int nMinAttribs = -1;
for (int iItem = 0; iItem < TargetItems.Count(); iItem++)
{
int nAttribCount = TargetItems[iItem].AttribList.Count();
if (nAttribCount < nMinAttribs || nMinAttribs == -1)
nMinAttribs = nAttribCount;
}
IAttribList attribs = new IAttribList();
for (int nAttrib = 0; nAttrib < nMinAttribs &&
nAttrib < Centroids[iCentroid].AttribList.Count(); nAttrib++)
{
double nAttribAvg = 0; int nCount = 0;
for (int iItem = 0; iItem < TargetItems.Count(); iItem++)
if (TargetItems[iItem].AttribList[nAttrib].Value > 0)
{
nAttribAvg += TargetItems[iItem].AttribList[nAttrib].Value;
nCount++;
}
attribs.Add(new Attrib(Centroids[iCentroid].AttribList[nAttrib].Name,
((nAttribAvg / (double)(nCount + 1)) - m_MinVal) / (m_MaxVal - m_MinVal) + 0.01));
}
bool bDiff = false; int nIndex = 0;
while (nIndex < attribs.Count() && nIndex < attribsA.Count() && bDiff == false)
bDiff = (attribs[nIndex].Value != attribsA[nIndex++].Value) ? true : false;
if (bDiff == true)
{
IItemsList TargetCentroids = new IItemsList();
TargetCentroids.Add(new Item(Centroids[iCentroid].ItemText,
attribs, nDistanceAvg, Centroids[iCentroid].IsUser, false));
m_Clusters.Add(new ICluster(TargetCentroids, TargetItems));
}
else
{
bool bExists = false; int iTargetCluster = 0;
while (iTargetCluster < m_TargetClusters.Count() && !bExists)
bExists = m_TargetClusters[iTargetCluster++].Centroids[0].
ItemText.Equals(Centroids[iCentroid].ItemText) ? true : false;
if (bExists == false)
{
IItemsList TargetUsers = new IItemsList();
TargetUsers.Add(new Item(Centroids[iCentroid].ItemText,
Centroids[iCentroid].AttribList, Centroids[iCentroid].Distance, true, true));
m_TargetClusters.Add(new ICluster((IItemsList)TargetUsers.Clone(),
(IItemsList)TargetItems.Clone()));
}
}
}
}
}
double nDiAvg = 0;
for (int iCluster = 1; iCluster < m_TargetClusters.Count(); iCluster++)
nDiAvg += m_TargetClusters[iCluster].Centroids[0].Distance / (m_TargetClusters.Count() - 1);
double nD0Avg = 0; int nClustersCount = 0;
for (int iCluster = 1; iCluster < m_TargetClusters.Count(); iCluster++)
for (int nCluster = iCluster + 1; nCluster < m_TargetClusters.Count(); nCluster++)
{
nD0Avg += m_TargetClusters[iCluster].Centroids[0].EuclDW(
m_TargetClusters[nCluster].Centroids[0]);
nClustersCount++;
}
nD0Avg /= nClustersCount; double nQ = ((m_TargetClusters.Count() - 1) * nDiAvg) / nD0Avg;
for (int iCluster = 0; iCluster < m_TargetClusters.Count(); iCluster++)
{
IItemsList ItemsList = m_TargetClusters[iCluster].Items;
IItemsList Centroids = m_TargetClusters[iCluster].Centroids;
Console.WriteLine("\nCluster={0}, Centroid=\"{1}\"", iCluster, Centroids[0].ItemText);
Console.WriteLine("-----------------------------------------------------------");
for (int iAttrib = 0; iAttrib < Centroids[0].AttribList.Count(); iAttrib++)
Console.WriteLine("\"{0,-30}\" => u({1},{2}) = {3}", Centroids[0].AttribList[iAttrib].Name,
iCluster, iAttrib, Centroids[0].AttribList[iAttrib].Value);
Console.WriteLine("-----------------------------------------------------------");
for (int iItem = 0; iItem < ItemsList.Count(); iItem++)
{
Console.WriteLine("\n(cluster={0}, item={1}, distance={2})\n{3}",
iCluster, iItem, ItemsList[iItem].Distance, ItemsList[iItem].ItemText);
Console.WriteLine("-----------------------------------------------------------");
for (int iAttrib = 0; iAttrib < ItemsList[iItem].AttribList.Count(); iAttrib++)
Console.WriteLine("\"{0,-30}\" => i({1},{2}) = {3}", ItemsList[iItem].AttribList[iAttrib].Name,
iCluster, iAttrib, ItemsList[iItem].AttribList[iAttrib].Value);
Console.WriteLine("-----------------------------------------------------------");
}
}
Console.WriteLine("\n===========================================================");
Console.WriteLine("Recommendations:");
Console.WriteLine("===========================================================\n");
for (int iCluster = 0; iCluster < m_TargetClusters.Count(); iCluster++)
{
IItemsList ItemsList = m_TargetClusters[iCluster].Items;
Console.WriteLine("{0}", m_TargetClusters[iCluster].Centroids[0].ItemText);
Console.WriteLine("======================================================");
for (int iItem = 0; iItem < ItemsList.Count(); iItem++)
Console.WriteLine("{0}", ItemsList[iItem].ItemText);
Console.WriteLine();
}
Console.WriteLine("KMeans Statistics:");
Console.WriteLine("===========================================================");
Console.WriteLine("The total number of clusters nClustersCount = {0}\n", m_TargetClusters.Count());
Console.WriteLine("Average distance between clusters nDiAvg = {0}", nDiAvg);
Console.WriteLine("Average distance within a cluster nD0Avg = {0}\n", nD0Avg);
Console.WriteLine("Average quality of KMeans clustering nQ = {0}\n", nQ);
}
}
class Program
{
static void Main(string[] args)
{
string[] filenames = new string[2] { @"ARTICLES.TXT", @"USERS.TXT" };
string[] sBanner = new string[2] { "K-Means++ Recommender Engine v.1.2.65535 (Euclidix) by Arthur V. Ratz",
"=====================================================================\n" };
for (int iBanner = 0; iBanner < 2; iBanner++)
Console.WriteLine(sBanner[iBanner]);
KMeans km = new KMeans();
int nItemsCount = km.LoadItemsFromFile(filenames[0]);
int nUsersCount = km.LoadUsersFromFile(filenames[1]);
int nInitialUsers = 0;
int nItemsPerCluster = 0, nItemsPerClusterMax = 0;
if (nItemsCount > 0 && nUsersCount > 0)
{
do
{
nItemsPerClusterMax = nItemsCount / nUsersCount;
Console.Write("Enter the number of articles per user [2-{0}]: ", nItemsPerClusterMax);
nItemsPerCluster = int.Parse(Console.ReadLine());
} while (nItemsPerCluster < 2 || nItemsPerCluster > nItemsPerClusterMax);
do
{
Console.Write("\nEnter the number of users (initial centroids) [1-{0}]: ", nUsersCount);
nInitialUsers = int.Parse(Console.ReadLine());
} while (nInitialUsers < 1 || nInitialUsers > nUsersCount);
}
km.Compute(nInitialUsers, nItemsPerCluster);
Console.Read();
}
}
}
Output
K-Means++ Recommender Engine v.1.2.65535 (Euclidix) by Arthur V. Ratz
=====================================================================
Enter the number of articles per user [2-9]: 5
Enter the number of users (initial centroids) [1-3]: 3
Cluster=0, Centroid="User0 => C++ Linux"
-----------------------------------------------------------
"C++ " => u(0,0) = 0,0641355115153619
"Linux " => u(0,1) = 0,135503210492002
-----------------------------------------------------------
(cluster=0, item=0, distance=0,0181865932348965)
ShaneMcDonald. Microsoft Visual C++ Static and Dynamic Libraries => C++ Visual-Studio Dev virtual-machine Virtualization
-----------------------------------------------------------
"C++ " => i(0,0) = 0,0676382057531689
"Microsoft " => i(0,1) = 0,175408304563312
"Visual " => i(0,2) = 0,17791458593099
"ShaneMcDonald. " => i(0,3) = 0,172902023195634
"Static " => i(0,4) = 0,182927148666345
"and " => i(0,5) = 0,15535805362189
"Dynamic " => i(0,6) = 0,187939711401701
"Libraries " => i(0,7) = 0,190445992769378
"Visual-Studio " => i(0,8) = 0,122776395842079
"Dev " => i(0,9) = 0,0400691107087137
"virtual-machine " => i(0,10) = 0,200471118240089
"Virtualization " => i(0,11) = 0,202977399607767
-----------------------------------------------------------
(cluster=0, item=1, distance=0,0216683871819057)
Nish Nishant. Using lambdas - C++ vs. C# vs. C++/CX vs. C++/CLI => C# C++ Windows Visual-Studio Dev QA Architect
-----------------------------------------------------------
"C++ " => i(0,0) = 0,0676382057531689
"Nishant. " => i(0,1) = 0,0876884566945908
"Using " => i(0,2) = 0,0901947380622685
"lambdas " => i(0,3) = 0,0927010194299463
"- " => i(0,4) = 0,095207300797624
"Nish " => i(0,5) = 0,0851821753269131
"vs. " => i(0,6) = 0,10021986353298
"C# " => i(0,7) = 0,102726144900657
"C++/CX " => i(0,8) = 0,107738707636013
"C++/CLI " => i(0,9) = 0,112751270371368
"Windows " => i(0,10) = 0,0726507684885244
"Visual-Studio " => i(0,11) = 0,122776395842079
"Dev " => i(0,12) = 0,0400691107087137
"QA " => i(0,13) = 0,127788958577435
"Architect " => i(0,14) = 0,130295239945112
-----------------------------------------------------------
(cluster=0, item=2, distance=0,0216683871819057)
Nish Nishant. 7 reasons C++ devs will love the VS 14 CTP => Visual C++ C++11 C++14 VC14.0 C++ Windows Dev Architect
-----------------------------------------------------------
"C++ " => i(0,0) = 0,0676382057531689
"Nishant. " => i(0,1) = 0,0876884566945908
"7 " => i(0,2) = 0,243077901490611
"reasons " => i(0,3) = 0,245584182858289
"Nish " => i(0,4) = 0,0851821753269131
"devs " => i(0,5) = 0,250596745593644
"will " => i(0,6) = 0,253103026961322
"love " => i(0,7) = 0,255609308329
"the " => i(0,8) = 0,0576130802824579
"VS " => i(0,9) = 0,260621871064355
"14 " => i(0,10) = 0,263128152432033
"CTP " => i(0,11) = 0,26563443379971
"Visual " => i(0,12) = 0,17791458593099
"C++11 " => i(0,13) = 0,273153277902744
"C++14 " => i(0,14) = 0,275659559270421
"VC14.0 " => i(0,15) = 0,278165840638099
"Windows " => i(0,16) = 0,0726507684885244
"Dev " => i(0,17) = 0,0400691107087137
"Architect " => i(0,18) = 0,130295239945112
-----------------------------------------------------------
(cluster=0, item=3, distance=0,032726256995075)
Pablo Aliskevicius. 4350% Performance Improvement with the StringBuilder for C++! => C++ Linux Windows STL Eclipse MFC Dev
-----------------------------------------------------------
"C++ " => i(0,0) = 0,0676382057531689
"Linux " => i(0,1) = 0,0701444871208466
"4350% " => i(0,2) = 0,0475879548117469
"Performance " => i(0,3) = 0,0500942361794247
"Improvement " => i(0,4) = 0,0526005175471024
"with " => i(0,5) = 0,0551067989147802
"the " => i(0,6) = 0,0576130802824579
"StringBuilder " => i(0,7) = 0,0601193616501357
"for " => i(0,8) = 0,0626256430178134
"C++! " => i(0,9) = 0,0651319243854911
"Pablo " => i(0,10) = 0,0425753920763915
"Aliskevicius. " => i(0,11) = 0,0450816734440692
"Windows " => i(0,12) = 0,0726507684885244
"STL " => i(0,13) = 0,0751570498562021
"Eclipse " => i(0,14) = 0,0776633312238798
"MFC " => i(0,15) = 0,0801696125915576
"Dev " => i(0,16) = 0,0400691107087137
-----------------------------------------------------------
(cluster=0, item=4, distance=0,0930085673936484)
Michael Chourdakis. One line template call for a dynamically loaded DLL function - Extensions to GetProcAddress() and Invoke() => C++ Win32
-----------------------------------------------------------
"C++ " => i(0,0) = 0,0676382057531689
"Chourdakis. " => i(0,1) = 0,343329156197721
"One " => i(0,2) = 0,345835437565398
"line " => i(0,3) = 0,348341718933076
"template " => i(0,4) = 0,350848000300754
"call " => i(0,5) = 0,353354281668431
"for " => i(0,6) = 0,0626256430178134
"a " => i(0,7) = 0,358366844403787
"dynamically " => i(0,8) = 0,360873125771465
"loaded " => i(0,9) = 0,363379407139142
"DLL " => i(0,10) = 0,36588568850682
"function " => i(0,11) = 0,368391969874498
"- " => i(0,12) = 0,095207300797624
"Extensions " => i(0,13) = 0,373404532609853
"to " => i(0,14) = 0,375910813977531
"GetProcAddress() " => i(0,15) = 0,378417095345209
"and " => i(0,16) = 0,15535805362189
"Invoke() " => i(0,17) = 0,383429658080564
"Michael " => i(0,18) = 0,340822874830043
"Win32 " => i(0,19) = 0,38844222081592
-----------------------------------------------------------
Cluster=1, Centroid="User1 => C# .NET"
-----------------------------------------------------------
"C# " => u(1,0) = 0,0935222110939783
".NET " => u(1,1) = 0,15019656028131
-----------------------------------------------------------
(cluster=1, item=0, distance=0,0110986410505081)
Rahul Rajat Singh. A Beginner's Tutorial - Type Casting and Type Conversion in C# => C# .NET
-----------------------------------------------------------
"C# " => i(1,0) = 0,102726144900657
".NET " => i(1,1) = 0,170395741827956
"Singh. " => i(1,2) = 0,137814084048146
"A " => i(1,3) = 0,140320365415823
"Beginner's " => i(1,4) = 0,142826646783501
"Tutorial " => i(1,5) = 0,145332928151179
"- " => i(1,6) = 0,095207300797624
"Type " => i(1,7) = 0,150345490886534
"Casting " => i(1,8) = 0,152851772254212
"and " => i(1,9) = 0,15535805362189
"Conversion " => i(1,10) = 0,160370616357245
"in " => i(1,11) = 0,162876897724923
"Rahul " => i(1,12) = 0,13280152131279
"Rajat " => i(1,13) = 0,135307802680468
-----------------------------------------------------------
(cluster=1, item=1, distance=0,0110986410505081)
Faisal(mfrony). Access Modifiers for Beginners => C# .NET1.1 .NET2.0 .NET .NET1.0
-----------------------------------------------------------
"C# " => i(1,0) = 0,102726144900657
".NET " => i(1,1) = 0,170395741827956
"Modifiers " => i(1,2) = 0,295709810211843
"for " => i(1,3) = 0,0626256430178134
"Beginners " => i(1,4) = 0,300722372947199
"Faisal(mfrony). " => i(1,5) = 0,290697247476488
".NET1.1 " => i(1,6) = 0,305734935682554
".NET2.0 " => i(1,7) = 0,308241217050232
"Access " => i(1,8) = 0,293203528844166
".NET1.0 " => i(1,9) = 0,313253779785588
-----------------------------------------------------------
(cluster=1, item=2, distance=0,0110986410505081)
V.Lorz. Adding JavaScript scripting support to one application: One easy way => C# .NET
-----------------------------------------------------------
"C# " => i(1,0) = 0,102726144900657
".NET " => i(1,1) = 0,170395741827956
"JavaScript " => i(1,2) = 0,428542722698764
"scripting " => i(1,3) = 0,431049004066441
"support " => i(1,4) = 0,433555285434119
"to " => i(1,5) = 0,375910813977531
"one " => i(1,6) = 0,438567848169475
"application: " => i(1,7) = 0,441074129537152
"One " => i(1,8) = 0,345835437565398
"easy " => i(1,9) = 0,446086692272508
"way " => i(1,10) = 0,448592973640186
"V.Lorz. " => i(1,11) = 0,423530159963408
"Adding " => i(1,12) = 0,426036441331086
-----------------------------------------------------------
(cluster=1, item=3, distance=0,0110986410505081)
Florian Rappl. An advanced introduction to C# - Lecture Notes Part 1 of 4 => C# .NET
-----------------------------------------------------------
"C# " => i(1,0) = 0,102726144900657
".NET " => i(1,1) = 0,170395741827956
"An " => i(1,2) = 0,719271361349382
"advanced " => i(1,3) = 0,72177764271706
"introduction " => i(1,4) = 0,724283924084737
"to " => i(1,5) = 0,375910813977531
"Florian " => i(1,6) = 0,714258798614026
"- " => i(1,7) = 0,095207300797624
"Lecture " => i(1,8) = 0,734309049555448
"Notes " => i(1,9) = 0,736815330923126
"Part " => i(1,10) = 0,501224882361418
"1 " => i(1,11) = 0,741827893658482
"of " => i(1,12) = 0,744334175026159
"4 " => i(1,13) = 0,746840456393837
"Rappl. " => i(1,14) = 0,716765079981704
-----------------------------------------------------------
(cluster=1, item=4, distance=0,0110986410505081)
B. Ahmed (KU-00). Asynchronous programming and Threading in C# (.NET 4.5) => C# .NET4.5 .NET
-----------------------------------------------------------
"C# " => i(1,0) = 0,102726144900657
".NET " => i(1,1) = 0,170395741827956
"(KU-00). " => i(1,2) = 0,852104273836302
"Asynchronous " => i(1,3) = 0,85461055520398
"programming " => i(1,4) = 0,857116836571658
"and " => i(1,5) = 0,15535805362189
"Threading " => i(1,6) = 0,862129399307013
"in " => i(1,7) = 0,162876897724923
"B. " => i(1,8) = 0,847091711100947
"(.NET " => i(1,9) = 0,869648243410046
"4.5) " => i(1,10) = 0,872154524777724
".NET4.5 " => i(1,11) = 0,87716708751308
"Ahmed " => i(1,12) = 0,849597992468624
-----------------------------------------------------------
Cluster=2, Centroid="User2 => Java Javascript"
-----------------------------------------------------------
"Java " => u(2,0) = 0,030550711996943
"Javascript " => u(2,1) = 0,5603509244
-----------------------------------------------------------
(cluster=2, item=0, distance=0,00499811776007409)
Dr. Song Li. A Simple Method to Upload Files by jQuery Ajax Calls => Java Javascript jQuery Ajax Dev Architect
-----------------------------------------------------------
"Java " => i(2,0) = 0,027537703870325
"Javascript " => i(2,1) = 0,556363072450329
"Li. " => i(2,2) = 0,639070357583694
"A " => i(2,3) = 0,140320365415823
"Simple " => i(2,4) = 0,64408292031905
"Method " => i(2,5) = 0,646589201686727
"to " => i(2,6) = 0,375910813977531
"Upload " => i(2,7) = 0,651601764422083
"Files " => i(2,8) = 0,654108045789761
"by " => i(2,9) = 0,656614327157438
"jQuery " => i(2,10) = 0,659120608525116
"Ajax " => i(2,11) = 0,661626889892794
"Calls " => i(2,12) = 0,664133171260472
"Dr. " => i(2,13) = 0,634057794848339
"Song " => i(2,14) = 0,636564076216016
"Dev " => i(2,15) = 0,0400691107087137
"Architect " => i(2,16) = 0,130295239945112
-----------------------------------------------------------
(cluster=2, item=1, distance=0,00803725509399426)
Sergey Zubarev. MPSC Lock Free Intrusive Linked Queue with state => Java Architect
-----------------------------------------------------------
"Java " => i(2,0) = 0,027537703870325
"Zubarev. " => i(2,1) = 0,581425886127106
"MPSC " => i(2,2) = 0,583932167494784
"Lock " => i(2,3) = 0,586438448862462
"Free " => i(2,4) = 0,588944730230139
"Intrusive " => i(2,5) = 0,591451011597817
"Linked " => i(2,6) = 0,593957292965495
"Queue " => i(2,7) = 0,596463574333173
"with " => i(2,8) = 0,0551067989147802
"state " => i(2,9) = 0,601476137068528
"Sergey " => i(2,10) = 0,578919604759428
"Architect " => i(2,11) = 0,130295239945112
-----------------------------------------------------------
(cluster=2, item=2, distance=0,0558027646300975)
U. Calakovic. Swing Based GUIs in JRuby => Ruby Java Windows Swing Dev
-----------------------------------------------------------
"Java " => i(2,0) = 0,027537703870325
"Calakovic. " => i(2,1) = 0,458618099110897
"Swing " => i(2,2) = 0,0350565479733582
"Based " => i(2,3) = 0,463630661846252
"GUIs " => i(2,4) = 0,46613694321393
"in " => i(2,5) = 0,162876897724923
"JRuby " => i(2,6) = 0,471149505949285
"Ruby " => i(2,7) = 0,473655787316963
"U. " => i(2,8) = 0,456111817743219
"Windows " => i(2,9) = 0,0726507684885244
"Dev " => i(2,10) = 0,0400691107087137
-----------------------------------------------------------
(cluster=2, item=3, distance=0,133988415037227)
Salar Hafezi. Understanding Recursion in Java through Basic Number Theory => Java Windows Design Dev
-----------------------------------------------------------
"Java " => i(2,0) = 0,027537703870325
"Hafezi. " => i(2,1) = 0,914761308028246
"Understanding " => i(2,2) = 0,917267589395924
"Recursion " => i(2,3) = 0,919773870763601
"in " => i(2,4) = 0,162876897724923
"Salar " => i(2,5) = 0,912255026660568
"through " => i(2,6) = 0,927292714866634
"Basic " => i(2,7) = 0,929798996234312
"Number " => i(2,8) = 0,93230527760199
"Theory " => i(2,9) = 0,934811558969668
"Windows " => i(2,10) = 0,0726507684885244
"Design " => i(2,11) = 0,629045232112983
"Dev " => i(2,12) = 0,0400691107087137
-----------------------------------------------------------
(cluster=2, item=4, distance=0,22911612749109)
Christopher Swiderski. JAX-WS: Using Apache CXF to Create a Bottom-Up Web Service, Web Service Client, and Securing the Web Service => Java Windows Apache Dev
-----------------------------------------------------------
"Java " => i(2,0) = 0,027537703870325
"Swiderski. " => i(2,1) = 0,789447239644359
"JAX-WS: " => i(2,2) = 0,791953521012036
"Using " => i(2,3) = 0,0901947380622685
"Apache " => i(2,4) = 0,796966083747392
"CXF " => i(2,5) = 0,79947236511507
"to " => i(2,6) = 0,375910813977531
"Create " => i(2,7) = 0,804484927850425
"a " => i(2,8) = 0,358366844403787
"Bottom-Up " => i(2,9) = 0,809497490585781
"Web " => i(2,10) = 0,812003771953458
"Service, " => i(2,11) = 0,814510053321136
"Service " => i(2,12) = 0,819522616056492
"Client, " => i(2,13) = 0,822028897424169
"and " => i(2,14) = 0,15535805362189
"Securing " => i(2,15) = 0,827041460159525
"the " => i(2,16) = 0,0576130802824579
"Christopher " => i(2,17) = 0,786940958276681
"Windows " => i(2,18) = 0,0726507684885244
"Dev " => i(2,19) = 0,0400691107087137
-----------------------------------------------------------
===========================================================
Recommendations:
===========================================================
User0 => C++ Linux
======================================================
ShaneMcDonald. Microsoft Visual C++ Static and Dynamic Libraries => C++ Visual-Studio Dev virtual-machine Virtualization
Nish Nishant. Using lambdas - C++ vs. C# vs. C++/CX vs. C++/CLI => C# C++ Windows Visual-Studio Dev QA Architect
Nish Nishant. 7 reasons C++ devs will love the VS 14 CTP => Visual C++ C++11 C++14 VC14.0 C++ Windows Dev Architect
Pablo Aliskevicius. 4350% Performance Improvement with the StringBuilder for C++! => C++ Linux Windows STL Eclipse MFC Dev
Michael Chourdakis. One line template call for a dynamically loaded DLL function - Extensions to GetProcAddress() and Invoke() => C++ Win32
User1 => C# .NET
======================================================
Rahul Rajat Singh. A Beginner's Tutorial - Type Casting and Type Conversion in C# => C# .NET
Faisal(mfrony). Access Modifiers for Beginners => C# .NET1.1 .NET2.0 .NET .NET1.0
V.Lorz. Adding JavaScript scripting support to one application: One easy way => C# .NET
Florian Rappl. An advanced introduction to C# - Lecture Notes Part 1 of 4 => C# .NET
B. Ahmed (KU-00). Asynchronous programming and Threading in C# (.NET 4.5) => C# .NET4.5 .NET
User2 => Java Javascript
======================================================
Dr. Song Li. A Simple Method to Upload Files by jQuery Ajax Calls => Java Javascript jQuery Ajax Dev Architect
Sergey Zubarev. MPSC Lock Free Intrusive Linked Queue with state => Java Architect
U. Calakovic. Swing Based GUIs in JRuby => Ruby Java Windows Swing Dev
Salar Hafezi. Understanding Recursion in Java through Basic Number Theory => Java Windows Design Dev
Christopher Swiderski. JAX-WS: Using Apache CXF to Create a Bottom-Up Web Service, Web Service Client, and Securing the Web Service => Java Windows Apache Dev
KMeans Statistics:
===========================================================
The total number of clusters nClustersCount = 3
Average distance between clusters nDiAvg = 0,0487435885265023
Average distance within a cluster nD0Avg = 41,496025364381
Average quality of KMeans clustering nQ = 0,0023493136076759
Points of Interest
The following C#.NET code has been especially developed to be used in ASP.NET web application projects to maintain the recommendation engine of online webstores and other websites that promote their commercially distributed products or digital content over the Internet.
History
- December 11, 2016 - The first version of article has been published