Laboratory Information Management System | Full Stack Ecoinformatics

Laboratory Information Management for Commercial eDNA Operations

The eDNA LIMS is a full-stack laboratory information management system built for Jonah Ventures, a commercial environmental DNA laboratory. The system manages the complete sample lifecycle — from client intake and kit fulfillment through DNA extraction, sequencing and qPCR assays, and results delivery — across two coordinated codebases: an interactive R Shiny application and a bespoke API deployed as an R package on AWS Lambda.

I served as the lead developer and system architect for both components.

The Problem

Commercial eDNA laboratories process hundreds to thousands of samples concurrently, each progressing through a multi-stage workflow: client onboarding, kit and vial orders, sample receipt, batching, DNA extraction, assay setup, sequencing or qPCR runs, and results delivery. Each stage involves distinct personnel, equipment, and data requirements.

Before the LIMS, sample tracking relied on a combination of spreadsheets and manual record-keeping. This approach introduced several operational risks:

No centralized view of sample status across workflow stages
Manual data entry at each handoff point, increasing error rates
Difficulty coordinating work across lab technicians, project managers, and clients
No structured mechanism for managing extraction plates, assay runs, or task assignment
Limited ability to search, audit, or report on historical sample data

The laboratory needed a purpose-built system that could track samples through every stage while remaining responsive to the fast-paced, hands-on realities of bench work.

System Architecture

The system is split into two repositories, each deployed independently:

Shiny Application (UI)

The user interface is a modular Shiny application built with the golem framework, comprising over 70 modules organized around core laboratory workflows. The application uses an event-driven architecture via the gargoyle package, decoupling module communication from Shiny’s reactive graph to maintain clarity as the system grows.

Key interface areas include:

Dashboard — Real-time overview of receiving, order, extraction, and task status
Receiving — Sample intake via manual entry, plate maps, or bulk sample sheet upload with validation
Orders — Kit and vial request management with ShipStation fulfillment integration
Lab Workflows — DNA extraction plate management, assay configuration (NGS and qPCR), task assignment and run tracking
Search — Cross-entity lookup by client, sample, batch, order, or project with fuzzy matching
Client Management — Account administration, project organization, personnel, and communications

All API calls are executed asynchronously using the promises and mirai packages, keeping the interface responsive during database operations.

Lambda API (Backend)

The API is an R package deployed to AWS Lambda as a Docker container image via ECR. It exposes over 175 handler functions through a single Lambda URL, with each request specifying the target function and its parameters. Authentication uses AWS SigV4 request signing tied to IAM roles.

The primary data store is a DynamoDB table organized using single-table design. Eleven record types — including projects, samples, batches, extracts, tasks, runs, orders, and sample sheets — share a single table with composite keys and five global secondary indexes supporting access patterns ranging from client-scoped queries to status-based filtering with temporal sharding.

Additional AWS integrations include S3 for sample sheet storage and email archival, SES for client communications, Cognito for user authentication, and Secrets Manager for third-party API credentials.

Technical Design

Several architectural decisions reflect the operational constraints of a working laboratory:

Event-driven over reactive. The gargoyle event system avoids deeply nested reactive dependencies that become difficult to reason about as module count grows. State changes propagate through explicit event triggers rather than implicit reactive invalidation.

Async-first API layer. Every database operation runs asynchronously, preventing long-running queries from freezing the interface. This is particularly important during batch operations that touch hundreds of records.

Single-table DynamoDB. The single-table design consolidates all record types into one table with carefully designed key structures, enabling efficient cross-entity queries without joins. Status-based indexes use temporal sharding (e.g., TASK#COMPLETE#2024Q1) to distribute read load across partitions.

Two-stage Docker builds. Both the Shiny application and the Lambda API use a two-stage Docker build process: a base image with pinned R dependencies (managed by renv), and a thin application layer installed on top. This separates dependency management from deployment, reducing build times and ensuring reproducibility.

Outcome

The eDNA LIMS replaced a manual, spreadsheet-driven workflow with a structured system that tracks samples from intake through delivery. Lab technicians, project managers, and administrators now operate from a shared interface with real-time visibility into sample status, extraction progress, and task assignments.

The project demonstrates how R — often associated with analysis scripts — can serve as the foundation for production-grade operational software when paired with appropriate architectural patterns and cloud infrastructure.

Laboratory Information Management for Commercial eDNA Operations

The Problem

System Architecture

Technical Design

Outcome

Have a Similar Challenge?