📁 Dr. Chuming Chen

Software Projects Portfolio — Created & Owned Repositories

🚀 Recent & Active Projects

KNODE 2026
Knowledge Discovery Engine (KNODE). A comprehensive platform bridging AI and Graph Analytics to explore complex biomedical networks. Features Global Hybrid Search powered by Lucene, Advanced Graph Analytics via Neo4j Graph Data Science (GDS) Studio, Guided Discovery Workflows, and a dedicated AI Chatbot Assistant for conversational intelligence over the graph dataset.
Next.js React Docker Tailwind CSS TypeScript Neo4j GDS Cytoscape.js Graph Analytics
Knowledge Graph and converter for the MIMIC-IV clinical dataset. Analyzes and converts MIMIC-IV data files into a format suitable for direct import into the Neo4j graph database. Provides comprehensive schemas mapping entities like Patients, Admissions, ICU Stays, Diagnoses, and Procedures, along with their complex relationships.
Python Neo4j Knowledge Graph MIMIC-IV
A repository containing the necessary codes and notebooks to generate data for importing into the Neo4j graph database using the neo4j-admin import command. Includes components for different data sources to build an integrated knowledge network.
  • notebooks: Jupyter notebooks for generating integrated knowledge data.
Python Jupyter Neo4j Data Integration
The front-end repository for the Protein Knowledge Network (ProKN). Developed based on the MaayanLab Knowledge Graph UI, this project serves as a Next.js web interface allowing interactive exploration of the integrated ProKN graph data. Supports deployment on Vercel and Docker-compose orchestration.
Next.js React Docker Tailwind CSS TypeScript
Practical Data-Centric AI/ML for Biomedical Researchers — a cloud-based training module developed under the NIGMS Sandbox initiative (NIH Award 3T32GM142603-03S1). Equips biomedical researchers with data science and AI/ML skills to make data FAIR and AI/ML-ready, using AWS SageMaker with interactive Jupyter notebooks. Includes lectures, tutorials, and hands-on exercises across 5 submodules.
  • Submodule 1: Introduction to AI/ML — core concepts, NumPy & Pandas.
  • Submodule 2: Data Science Life Cycle, FAIR Data Principles, Data-Centric & Responsible AI/ML.
  • Submodule 3: Data Preparation — cleaning, feature engineering, scaling & selection.
  • Submodule 4: Model Building, Evaluation, Interpretation & Deployment.
  • Submodule 5: AI/ML for Biomedical Applications — deep learning, protein classification, drug activity prediction.
Python Jupyter AWS SageMaker TensorFlow scikit-learn FAIR Data
Bioprocess Knowledge Graph builder pipeline. Provides systematic Jupyter notebooks to aggregate and construct a Neo4j knowledge graph using Ontological Concepts, Dictionary Concepts, Text Mining Data, and PubMed metadata. Includes steps for graph exploratory data analysis and Neo4j setup. Published in PLoS ONE (2025).
Python Jupyter Neo4j Text Mining PubMed API
Protein Ontology (PRO) RESTful API and Linked Open Data service, powered by a SPARQL endpoint on OpenLink Virtuoso. Exposes protein-related entities on the Semantic Web using URIs and RDF, enabling querying and integration with other Linked Open Datasets. Provides a Swagger UI REST API, a YASGUI SPARQL endpoint, and RDF data dumps. Published in Scientific Data (2020) and Nucleic Acids Research (2017).
  • PRO Linked Open Data: Browse protein ontology data via Faceted Browser interface.
  • SPARQL Endpoint: YASGUI-powered interface for querying PRO RDF data.
  • REST API: Swagger UI and SPARQL endpoint powered RESTful API for PRO Linked Open Data.
SPARQL RDF Swagger UI RESTful API Virtuoso OWL
A fast peptide match service for UniProt Knowledgebase (UniProtKB), designed to quickly retrieve all occurrences of a query peptide across millions of protein sequences with isoforms. Powers the official UniProt Peptide Search. Published in Bioinformatics (2013).
  • create_data: Prepare data for creating the Lucene index.
  • index_data: Create Lucene index and deploy the match service.
  • peptidematch_web: Java web application — original web interface for the Peptide Match service.
  • peptidematchapi2: Swagger UI powered REST API (v2) for the Peptide Match service.
  • peptidematchws: Asynchronous RESTful API powering the UniProt Peptide Search.
  • peptidematch_cmd: Command-line tool for querying peptide sequences against a custom protein database.
Java Apache Lucene RESTful API Swagger UI UniProtKB Tomcat
PIRUniRule (PIR SiteRule and PIR NameRule) — a rule-based protein functional annotation system integrated into UniProt's UniRule framework. Leverages InterPro family signatures, taxonomic data, and experimental evidence to automatically propagate annotations from reviewed to unreviewed proteins at scale. Includes a web predictor and command-line tool. Published in Database (2019) and Bioinformatics (2020).
Java UniRule InterPro UniProtKB RESTful API Tomcat
A stable, scalable, and unbiased proteome set for sequence analysis and functional annotation. Representative Proteomes (RPs) are algorithmically selected from Representative Proteome Groups (RPGs) based on co-membership in UniRef50 clusters, capturing the broadest sequence space. Thresholds at 75%, 55%, 35%, and 15% allow users to tune proteome granularity for their analysis needs. Published in PLoS ONE (2011).
  • RP_production: Source code for generating representative proteome groups.
Perl Bash UniRef50 UniProtKB

📚 Past Projects

Bacterial Pathogene Diagnostics 2010
Sequence analysis to identify Category A bacterial pathogen proteins and species identification from host DNA.
PerlBashBLASTp/BLASTx/tBLASTnHTML
URA — UniRule Annotation 2009
Web-based automated protein annotation system.
OracleHibernateJavaJSP/JSFTomcat
OWLMan 2008
Scalable management framework for large-scale OWL ontologies.
JavaMySQLOWLAPIPellet
OntoCM 2007
Change metrics suite for evolving OWL ontologies (class, definition, hierarchy, axiom metrics).
JavaOWLAPIPellet
DBBE Website 2006
Department of Biostatistics, Bioinformatics and Epidemiology (MUSC) website.
PHPHTML/CSSMySQL
Parallel fMRI Data Mining 2006
Parallel Bayesian probabilistic component analysis for deception detection brain imaging.
MatlabMPILinux Cluster
OMSSAWeb 2006
Web interface for OMSSA MS/MS search engine with PBS cluster scheduling and email notifications.
CGI/PerlLinux ClusterPBS/Maui
WWWmpiBlast 2006
Web frontend for parallel BLAST search on Linux cluster with LDAP authentication.
CGI/PerlLinux ClustermpiBLASTLDAP
Universal LIMS 2005
Laboratory Information Management System initial release.
PHPHTML/CSS/JSMySQL/PostgreSQL
AGML Central 2005
XML parser for Melanie 2-DE gel data and visualization tool.
PHPMatlabJava AppletPostgreSQL
Online Gradebook 2004
Course assignment dropbox used by 2000+ students per semester.
TomcatJSPMySQL
NessusWeb 2003
Open source web interface for Nessus network security scanner with SSL and access control.
NessusTomcat/JSPMySQLJSSE
Choice Apparel System 2003
Order entry and inventory management for wholesale dealer.
VB.NETASP.NETADO.NETSQL Server
Non-destructive Integrity Evaluation 2001
Multi-channel non-destructive integrity evaluation program.
LabView