Recently Published
Homework 5 for DATA 101
Homework 5 for DATA 101
Document
Code along 10
Part 2 to Seurat on Gastric Carcinoma Data GSE308231 randomForest top genes Added to Pathologies database
In this part 2 project, we add part 2 to part 1 separated by equal signs and 3 stars after the QC, filtering, normalizing, getting high variability genes, clustering with KNN and UMAP and TSNE, then get fold change values in part 2 for top 20 plus top 10 in Seurat's algorithm and add those FC values as well, test the significance in predicting the class type of GC or PM for Gastric Carcinoma or Peritoneal Metastasis, and scored 100% accuracy on both sets of genes, but 100% accuracy on the training and testing hold out validation set for the top 20 fold change values after omitting 0.000000 values after removing NAs and Infinites. Then added them to our pathologies database. Links in document also to the Tableua dashboard on FCs for each pathology we analyzed so far just by FCs related to EBV, and not but close, Fibromyalgia, Lyme disease, EBV infection, mononucleosis (only one in miRNA and no genes the same in other sets), multiple sclerosis, Hodgkin's Lymphoma, Natural Killer T Cell Lymphoma, Gastric Carcinoma, and HIV infected Hodgkin's with EBV, and uterine fibroids. We will see after gathering more data how well a model can be tuned with these top genes of fold change values to predict pathologies or show their similarities across pathologies by gene affects from disease.