Sparsity is a means and not an aim in inference of gene regulatory networks

Publicerad 2010 av Torbjörn E. M. Nordling

Naturvetenskap Network Inference Regularization Sparse Networks Engineering And Technology Lasso Gene Regulatory Networks Natural Sciences Teknik Och Teknologier
Typ av publikation: Konferensbidrag
Typ av innehåll: Refereegranskad publ.

Ingår i:


Availability of high-throughput gene expression data has lead to numerous attempts to infer network models of gene regulation based on expression changes. The low number of observations compared to the number of genes, the low signal-to-noise ratios, and the system being interampatte make the inference problem ill-posed and challenging. To solve the problem a majority of all published approaches resort to regularization, e.

g. the LASSO penalty is used to find a sparse model. Regularization is known to introduce a bias, but its effect on inferred gene regulatory networks has hardly been investigated. In machine learning and compressed sensing, where regularization has been widely applied and studied, the objective is to reproduce a signal and the actual variable selection is of minor importance as long as the signal is reproduced well.

In network inference, on the other hand, the variable selection is crucial since we want to identify the true topology of the network and a minimal number of links is not an aim per se. We first study the inference problem in a deterministic setting in order to gain insight and derive conditions on when the regularization causes false negative and positive links.

By viewing the problem as a parameter identifiability problem, we establish three cases in which a subset of the parameters can be uniquely determined. Finally we devise conditions for invalidation of the inferred links using existing or additional data; resulting in an iterative procedure of inference and experiment design that significantly increases the confidence in the inferred network model.