Chuming Chen | Software Projects Portfolio

🚀 Recent & Active Projects

KNODE 2026

Knowledge Discovery Engine (KNODE). A comprehensive platform bridging AI and Graph Analytics to explore complex biomedical networks. Features Global Hybrid Search powered by Lucene, Advanced Graph Analytics via Neo4j Graph Data Science (GDS) Studio, Guided Discovery Workflows, and a dedicated AI Chatbot Assistant for conversational intelligence over the graph dataset.

Next.js React Docker Tailwind CSS TypeScript Neo4j GDS Cytoscape.js Graph Analytics

MIMIC-IV-KG 2026

Knowledge Graph and converter for the MIMIC-IV clinical dataset. Analyzes and converts MIMIC-IV data files into a format suitable for direct import into the Neo4j graph database. Provides comprehensive schemas mapping entities like Patients, Admissions, ICU Stays, Diagnoses, and Procedures, along with their complex relationships.

Converter Script: Converts raw MIMIC-IV datasets to node & relationship CSVs.

Python Neo4j Knowledge Graph MIMIC-IV

ProKN-Data 2025

A repository containing the necessary codes and notebooks to generate data for importing into the Neo4j graph database using the neo4j-admin import command. Includes components for different data sources to build an integrated knowledge network.

notebooks: Jupyter notebooks for generating integrated knowledge data.

Python Jupyter Neo4j Data Integration

ProKN-Website 2025

The front-end repository for the Protein Knowledge Network (ProKN). Developed based on the MaayanLab Knowledge Graph UI, this project serves as a Next.js web interface allowing interactive exploration of the integrated ProKN graph data. Supports deployment on Vercel and Docker-compose orchestration.

Next.js React Docker Tailwind CSS TypeScript

NIGMS-Sandbox-UD 2024

Practical Data-Centric AI/ML for Biomedical Researchers — a cloud-based training module developed under the NIGMS Sandbox initiative (NIH Award 3T32GM142603-03S1). Equips biomedical researchers with data science and AI/ML skills to make data FAIR and AI/ML-ready, using AWS SageMaker with interactive Jupyter notebooks. Includes lectures, tutorials, and hands-on exercises across 5 submodules.

Submodule 1: Introduction to AI/ML — core concepts, NumPy & Pandas.
Submodule 2: Data Science Life Cycle, FAIR Data Principles, Data-Centric & Responsible AI/ML.
Submodule 3: Data Preparation — cleaning, feature engineering, scaling & selection.
Submodule 4: Model Building, Evaluation, Interpretation & Deployment.
Submodule 5: AI/ML for Biomedical Applications — deep learning, protein classification, drug activity prediction.

Python Jupyter AWS SageMaker TensorFlow scikit-learn FAIR Data

BioProKGTM / KG_Builder 2024

Bioprocess Knowledge Graph builder pipeline. Provides systematic Jupyter notebooks to aggregate and construct a Neo4j knowledge graph using Ontological Concepts, Dictionary Concepts, Text Mining Data, and PubMed metadata. Includes steps for graph exploratory data analysis and Neo4j setup. Published in PLoS ONE (2025).

Cleanup script: Graph database reset utilities.
Ontology loader: Scripts for creating nodes/edges from Ontologies.
Text Mining loader: Scripts for digesting text mining results into relations.

Python Jupyter Neo4j Text Mining PubMed API

PRO_REST_API 2017

Protein Ontology (PRO) RESTful API and Linked Open Data service, powered by a SPARQL endpoint on OpenLink Virtuoso. Exposes protein-related entities on the Semantic Web using URIs and RDF, enabling querying and integration with other Linked Open Datasets. Provides a Swagger UI REST API, a YASGUI SPARQL endpoint, and RDF data dumps. Published in Scientific Data (2020) and Nucleic Acids Research (2017).

PRO Linked Open Data: Browse protein ontology data via Faceted Browser interface.
SPARQL Endpoint: YASGUI-powered interface for querying PRO RDF data.
REST API: Swagger UI and SPARQL endpoint powered RESTful API for PRO Linked Open Data.

SPARQL RDF Swagger UI RESTful API Virtuoso OWL

Peptide Match 2013

A fast peptide match service for UniProt Knowledgebase (UniProtKB), designed to quickly retrieve all occurrences of a query peptide across millions of protein sequences with isoforms. Powers the official UniProt Peptide Search. Published in Bioinformatics (2013).

create_data: Prepare data for creating the Lucene index.
index_data: Create Lucene index and deploy the match service.
peptidematch_web: Java web application — original web interface for the Peptide Match service.
peptidematchapi2: Swagger UI powered REST API (v2) for the Peptide Match service.
peptidematchws: Asynchronous RESTful API powering the UniProt Peptide Search.
peptidematch_cmd: Command-line tool for querying peptide sequences against a custom protein database.

Java Apache Lucene RESTful API Swagger UI UniProtKB Tomcat

PIRUniRule 2011

PIRUniRule (PIR SiteRule and PIR NameRule) — a rule-based protein functional annotation system integrated into UniProt's UniRule framework. Leverages InterPro family signatures, taxonomic data, and experimental evidence to automatically propagate annotations from reviewed to unreviewed proteins at scale. Includes a web predictor and command-line tool. Published in Database (2019) and Bioinformatics (2020).

pir_unirule: Parser and writer for PIR Site Rule and PIR Name Rule.
pirsitepredict: PIRSitePredict web application for protein functional site prediction.
pirsitepredictcommander: Standalone command-line tool for PIRSitePredict.

Java UniRule InterPro UniProtKB RESTful API Tomcat

Representative Proteomes 2011

A stable, scalable, and unbiased proteome set for sequence analysis and functional annotation. Representative Proteomes (RPs) are algorithmically selected from Representative Proteome Groups (RPGs) based on co-membership in UniRef50 clusters, capturing the broadest sequence space. Thresholds at 75%, 55%, 35%, and 15% allow users to tune proteome granularity for their analysis needs. Published in PLoS ONE (2011).

RP_production: Source code for generating representative proteome groups.

Perl Bash UniRef50 UniProtKB

📚 Past Projects

Bacterial Pathogene Diagnostics 2010

Sequence analysis to identify Category A bacterial pathogen proteins and species identification from host DNA.

PerlBashBLASTp/BLASTx/tBLASTnHTML

URA — UniRule Annotation 2009

Web-based automated protein annotation system.

OracleHibernateJavaJSP/JSFTomcat

OWLMan 2008

Scalable management framework for large-scale OWL ontologies.

JavaMySQLOWLAPIPellet

OntoCM 2007

Change metrics suite for evolving OWL ontologies (class, definition, hierarchy, axiom metrics).

JavaOWLAPIPellet

DBBE Website 2006

Department of Biostatistics, Bioinformatics and Epidemiology (MUSC) website.

PHPHTML/CSSMySQL

Parallel fMRI Data Mining 2006

Parallel Bayesian probabilistic component analysis for deception detection brain imaging.

MatlabMPILinux Cluster

OMSSAWeb 2006

Web interface for OMSSA MS/MS search engine with PBS cluster scheduling and email notifications.

CGI/PerlLinux ClusterPBS/Maui

WWWmpiBlast 2006

Web frontend for parallel BLAST search on Linux cluster with LDAP authentication.

CGI/PerlLinux ClustermpiBLASTLDAP

Universal LIMS 2005

Laboratory Information Management System initial release.

PHPHTML/CSS/JSMySQL/PostgreSQL

AGML Central 2005

XML parser for Melanie 2-DE gel data and visualization tool.

PHPMatlabJava AppletPostgreSQL

Online Gradebook 2004

Course assignment dropbox used by 2000+ students per semester.

TomcatJSPMySQL

NessusWeb 2003

Open source web interface for Nessus network security scanner with SSL and access control.

NessusTomcat/JSPMySQLJSSE

Choice Apparel System 2003

Order entry and inventory management for wholesale dealer.

VB.NETASP.NETADO.NETSQL Server

Non-destructive Integrity Evaluation 2001

Multi-channel non-destructive integrity evaluation program.

LabView

📁 Dr. Chuming Chen

🚀 Recent & Active Projects

📚 Past Projects