

# Diffusion-based generation of gene regulatory network from scRNA-seq data with DigNet #

**DigNet will be detailed if the manuscript is accepted!**

**Please write to [zpliu@sdu.edu.cn](mailto:zpliu@sdu.edu.cn) if you have any questions.**

**Please cite our paper if it helps you.**

![workfolw](https://github.com/zpliulab/DigNet/blob/main/images/framework.png)

## DigNet User Guide

```DigNet``` is a tool designed specifically for scRNA-seq data analysis and network modeling. This guide will instruct you on how to train and test models with ```DigNet```, as well as the requirements for preparing and inputting data.

## Dependencies![Python](https://img.shields.io/badge/python-3.8-blue "Python")

Before you start using DigNet, please make sure the following dependencies are installed. These packages are necessary for running DigNet:

#### key packages:
- python==3.8
- torch==2.0.0
- torch-geometric==0.11.4

#### other packages:
- networkx
- joblib
- pickle
- numpy
- pandas
- scikit-learn
- scipy


You can use pip to install these dependent packages, we recommend you use conda to create the environment and install these packages!

```bash
conda create diffusion-dignet
conda activate diffusion-dinet
conda install python==3.8
```

We recommend using a virtual environment to manage dependencies to avoid any version conflicts.


## Folder and files

- `pre_train/`: The pre-trained DigNet model provided in this article
- `Cancer_datasets/`: Contains preprocessed results of the E-MTAB-8107 data used in the manuscript
- `pathway/`: sub-function file containing data preprocessing
- `pathway/simulation/`: synthetic network and gene expression profile generated by SERGIO
- `discrete/`: model-related configuration files
- `discrete/models/`: Graph Transformer architecture
- `denoising_diffusion_pytorch/`: Contains related sub-function files used for training/testing/initializing DigNet
- `config.py`: Configure hyperparameters for training or testing DigNet
- `DigNet.py`: Contains the process framework of DigNet
- `Download_TF_file.py`: used to download TF list
- `make_final_net.py`: Integrated voting sub-function
- `Tutorial.py`: A quick tutorial for using DigNet


<div align="center">
  <img src="https://github.com/zpliulab/DigNet/blob/main/images/network.gif" alt="Schematic diagram of DigNet generation network" style="width: 200px; height: 100px;"/>
</div>


## Using DigNet

"Tutorial.py" gives the data training and testing process, you can run this file to learn how to use DigNet!


## A. Training the Model

Depending on your data situation, we divide the training process into two scenarios:

### 1. Input scRNA-seq Gene Expression Profiles with Corresponding GRN

If you have multiple gene expression profiles and their corresponding Gene Regulatory Networks (GRN), you can use these data for model training directly. For specific data format and requirements, please refer to `Supplement S1`.

### 2. Input scRNA-seq Gene Expression Profiles Only

For cases where only gene expression profiles are available, we offer a method to construct a reference network. You can follow the steps described in the manuscript.

**To Start Training**: Once you have prepared the above files, use the following command to start training your model:

```python
DigNet.train('Your data file path')
```

## B. Testing the Model

To test a trained DigNet model, you will need to prepare the following files:

1. **Gene Expression Profile File**: Similar in format to the training file. If true network information is available, it will be evaluated.
2. **Pre-trained DigNet Model File**: Can be found in our GitHub project's `/result` directory, with files ending in `*.pth`.
3. **PCA Parameter Model File**: This is a PCA parameter file exported based on the training data.

**Optional Parameter**:

- `args.test_pathway`: You can specify a subset of gene sets for network construction, which can be an ID number in the KEGG database or a user-defined list in table format.

**To Start Testing**:

```python
DigNet.test('Gene expression profile file path', 'Pre-trained model file path', 'PCA parameter model file path')
```

## Supplementary Notes

### Supplement S1: Data Format Requirements

If your data includes raw data and networks, it can be directly imported in `*.data` format or any pickle saved file. Note, the file should be a dataset containing multiple list-type variables, each with the following structure:

- **'net' variable**: Contains the adjacency matrix of network data, a 0-1 weight matrix (Numpy ndarray). Non-0-1 values can also be loaded, sized cell * cell.
- **'exp' variable**: Contains experimental data in DataFrame format, preprocessed scRNA-seq results (CSV format), sized gene * cell.

### Supplement S2: Data Preprocessing Recommendation

Before inputting sequencing data, we recommend completing matrix completion and quality control. `CancerDatasets/Create_BRCA_data.py` offers a simple example of processing data. High-quality gene expression information should be stored in table format (csv or xlsx), with the first row and column being cell numbers and gene symbol IDs, respectively.
