Implement own methods

A major strength of pygenesig is its flexibility. Here, we show how you can implement own methods for generating and testing signatures.

To standardize the API for creating and testing signatures, pygenesig provides two abstract classes, SignatureGenerator and SignatureTester which serve as template for the different methods.

Implement a SignatureGenerator

To get started, read the apidoc of SignatureGenerator. Essentially, all you have to do is to create a child class of SignatureGenerator and implement the method _mk_signatures(expr, target). The method needs to return a signature dictionary as described in the data preparation tutorial.

For example:

class MySignatureGenerator(SignatureGenerator): 
    def _mk_signatures(self, expr, target):
        tissues = set(target)
        return {
            tissue: get_enriched_genes_with_my_method(expr, target, tissue)
                for tissue in tissues
        }

You can also override the constructor __init__() to pass additional parameters to your class. Make always sure to call the parent constructor, though, as it performs some consistency checks on the data:

class MySignatureGenerator(SignatureGenerator):
    def __init__(self, expr, target, param1=42):
        super(MySignatureGenerator, self).__init__(expr, target)
        self.param1 = param1 

    def _mk_signatures(self, expr, target):
        # ...

Example: GiniSignatureGenerator

This is what our GiniSignatureGenerator looks like:

class GiniSignatureGenerator(SignatureGenerator):
    def __init__(self, expr, target, min_gini=.7, max_rk=3, min_expr=1, max_rel_rk=.33, aggregate_fun=np.median):
        super(GiniSignatureGenerator, self).__init__(expr, target)
        self.min_gini = min_gini
        self.max_rk = max_rk
        self.min_expr = min_expr
        self.aggregate_fun = aggregate_fun
        self.max_rel_rk = max_rel_rk

    def _mk_signatures(self, expr, target):
        df_aggr = collapse_matrix(expr, target, axis=1, aggregate_fun=self.aggregate_fun)
        return get_gini_signatures(df_aggr, min_gini=self.min_gini, max_rk=self.max_rk, min_expr=self.min_expr,
                                   max_rel_rk=self.max_rel_rk)

Implement a SignatureTester

To get started, read the apidoc of SignatureTester. Essentially, all you have to do is to create a child class of SignatureTester and implement the method _score_signatures(expr, signatures). The method returns a j x n score matrix with j signatures and n samples

For example, a random predictor, being agnostic of the signatures could look like:

import random 
import numpy as np

class MySignatureTester(SignatureTester):
    def _score_signatures(self, expr, signatures):
        n_tissues = len(signatures) # number of signatures (=j)
        n_samples = expr.shape[1]   # number of samples (=n)
        return np.array ([
            [
                random.random() for i in range(n_samples)
            ] for k in range(n_tissues)
        ])

Obviously, you want to implement something meaningful instead!

Beyond signatures

Actually, instead of passing signature dictionaries, you can pass whatever model between a SignatureGenerator and a SignatureTester, as long as the two implementations ‘speak the same language’. For example, one could imagine using a support vector machine for predicting the tissue. In that case you could pass the trained model instead of the signatures.