$$ A = \left[\begin{array}{cc} 0 & B \\ B^T & 0 \end{array}\right].$$
A classic paper is the one by Freeman, "Finding Social Groups: A Meta-Analysis of the Southern Women Data (2003)", who compared and analyzed 21 approaches to community detection.
Block Models
- A block model approach has been proposed by Doreian, Bagatelj, and Ferligolj in their paper "Generalized Blockmodeling of Two-mode Network Data (2004)". Their approach is based on modeling blocks via specific attributes (all-ones, all-zeros, row-regular, etc.) and then inferring the position of these blocks in $B$. They do so by proposing a sequential heuristic in which they minimize the difference between the ideal block model and the actual adjacency matrix (more details are here). Hence, they require not only to specify the number of communities, but also the type of blocks used in the model. Different choices yield different solutions, as they illustrate for the Southern Women dataset. Their approach can also be used for weighted graphs.
Stochastic block models assume that the number of edges between two nodes is drawn from some probability model, and that the probabilities are different for edges within blocks and between blocks. Usually, the block probabilities are estimated, and the blocks are inferred by maximizing the (log-)likelihood of the graph given the block model.
- The paper "Efficiently inferring community structure in bipartite networks (2014)" from Larremore, Clauset, and Jacobs proposes a block model in which the number of edges are assumed to be drawn from a Poisson distribution, admitting different Poisson parameters for every block and every pair of blocks. They introduced a second model which admits modelling bipartite graphs with broad degree distributions. The authors suggested a sequential algorithm with a heuristic to escape local minima. The algorithm is implemented for MATLAB and R. The algorithm seems to work also for weighted graphs, but requires the numbers of blocks (i.e., communities) as input.
Spectral/Modularity-based Methods
Modularity is a measure of how well a graph can be split into modules. It is based on a null model (typically one in which the edges are randomized such that the vertex degrees remain constant) and the comparison of a given partition of the network to a partition of this null model. Modularity is usually maximized by spectral methods. Although these methods seem to work also for weighted graphs, the derivation of the null model is only justified for unweighted graphs (i.e., for $B$ comprising of 0s and 1s).
- Barber proposed a modularity measure for bipartite networks in "Modularity and Community Detection in Bipartite Networks (2007)". He proposed optimizing this measure recursively: Fixing the row partition, the column partition can be obtained easily, and vice-versa. He hence alternatively optimizes these two partitions, calling his algorithm BRIM. Barber also suggested a bisection approach to determine the number of communities. A fundamental drawback is that his method requires the same number of clusters for both rows and columns (see eq. (21) in the paper). The cost function has been used in a MATLAB implementation, which is described in this paper.
- A modularity measure was also proposed by Guimera, Sales-Pardo, and Nunes Amaral in "Module Identification in Bipartite and Directed Networks (2007)". They optimize their cost function via simulated annealing, but show that their results do not differ from modularity-based methods applied to $B^TB$.
- Dormann and Strauss proposed a simulated-annealing Monte Carlo method for maximizing modularity in "A method for detecting modules in quantitative bipartite networks (2014)". They measure modularity using Barber's definition. Their algorithm, QuanBiMo, is based on fitting a tree to a graph, a method proposed here. An R package is available.
Projection Methods
Projection Methods usually work on $B^TB$ (weighted projection) or on $\max\{1,B^TB\}$ (unweighted projection) rather than on $A$. In that sense, they work on less information than is available in the original model, but admit using community detection methods for unipartite graphs. If $B$ is binary, $B^TB$ and $BB^T$ contain all "relevant" information about $A$ (thus, $B$), hence a dual-projection approach could potentially work as well as an approach targeted at $A$. This, at least, is argued by Everett and Borgatti in "The dual-projection approach for two-mode networks (2013)".
- Such an approach was taken by Alzahrani, Horadam, and Boztas in "Community Detection in Bipartite Networks Using Random Walks (2016)". They propose using Infomap, a flow-based method. Infomap minimizes the map equation, an equation that measures the description length of a random walker in the graph defined by $B^TB$. The infomap code is available for C, R, and Python, and also for MATLAB.
- Melamed investigated the dual-projection approach in "Community Structures in Bipartite Networks: A Dual-Projection Approach (2014)". He used a modularity-maximizing walktrap algorithm (he is vague about the the actual details, but provides an R script) and compares the results (presumably) to those of Barber, concluding that the performance is improved if the number of row clusters and number of column clusters is unequal.
Other Techniques
- A label propagation algorithm (LPA) was proposed by Raghavan, Albert, and Kumara in "Near linear time algorithm to detect community structures in large-scale networks (2007)". Initially, every node is assigned a unique label; then, in each step, each node assumes the label that is shared by most of its neighbors. The algorithm has been adapted to bipartite networks by Liu and Murata in "Community Detection in Large-scale Bipartite Networks (2009)" and proposed as a preprocessing step for Barber's BRIM. The algorithm, LP&BRIM, is available for MATLAB. Also Barber and Clarke presented a bi-partite version of the LPA, called LPAb, in "Detecting network communities by propagating labels under constraints (2009)". LPAb was in turn again improved with a greedy agglomerative heuristic by Liu and Murata. They dubbed the improved algorithm LPAb+ in their paper "An Efficient Algorithm for Optimizing Bipartite Modularity in Bipartite Networks (2010)". Finally, Beckett improved LPAb+ by restarting it multiple times with different initializations. He called his algorithm DIRTLPAb+ in his paper "Improved community detection in weighted bipartite networks (2016)", he speaks about his results in a blog post, and makes the core code available for R, Julia, and MATLAB.
- In "A Novel Clustering Methodology Based on Modularity Optimisation for Detecting Authorship Affinities in Shakespearean Era Plays (2016)" Naeni, Craig, Berretta, and Moscato proposed maximizing the modularity of a unipartite graph using a memetic algorithm. The unipartite network is a k-NN graph, obtained by computing (dis-)similarities between the nodes, which are in turn based on the difference between rows of $B$. This difference is measures using the Jensen-Shannon Divergence. Apparently, the technique does not require specifying the number of communities, but instead it requires specifying the $k$ for the k-NN graph.
- Weng, Cho, and Wu proposed a tailor-made algorithm for community detection in movies: "RoleNet: Movie Analysis from the Perspective of Social Networks (2009)". They first discover leading roles via centrality ranking, then micro-communities via links between actors, and then macro-communities via connecting micro-communities to leading roles. Their analysis is based on $B^TB$ rather than on $A$, and it suffers from the shortcoming that leading roles can never be in the same macro-community (they are excluded from micro-communities by default).