2026-01-29 14:56:13 +07:00
2026-01-29 14:31:48 +07:00
2026-01-29 14:34:03 +07:00
2026-01-29 14:30:23 +07:00

DSP Platform Docker Setup

This repository now includes a Docker-based development environment that brings up:

  • PHP + Apache web server (with Rscript available for the automated analyses)
  • MySQL 8.0 database seeded with the db/niph_dsps.sql dump on first run
  • phpMyAdmin for administering the database through the browser
  • JupyterHub (per-user R-enabled JupyterLab) for isolated notebook environments

Prerequisites

  • Docker Desktop (or Docker Engine + Docker Compose plugin)
  • ~2 GB of free disk space for the base images

Quick start

# From the project root
docker-compose up --build

Once the stack is healthy you can reach the services at:

Service URL Notes
PHP application http://localhost:8082 Uses DB credentials from docker-compose.yml
phpMyAdmin http://localhost:8081 Login with dsp_user / dsp_pass (or MySQL root)
JupyterHub https://localhost OAuth handshake redirects to your private notebook (published on port 443)
MySQL localhost:3307 (host access) Database niph_dsps, user dsp_user / dsp_pass

The first docker-compose up will import db/niph_dsps.sql automatically. Subsequent runs keep the data volume (mysql_data).

Configuration

Key environment variables are defined in docker-compose.yml. Adjust them if you need different credentials or ports. The PHP application now reads its database configuration from the following variables (with sensible defaults for non-Docker setups):

  • DB_HOST
  • DB_PORT
  • DB_NAME
  • DB_USER
  • DB_PASS

api/run_r_script.php also honours RSCRIPT_PATH if you need to override the default location of the Rscript executable.

When the portal is hosted on a different hostname (for example, an Ubuntu server on your LAN), set the following variables—either in your shell or a .env file consumed by Docker Compose—to keep the embedded JupyterHub session aligned with browser security rules:

  • JUPYTER_EXTERNAL_URL full base URL that the PHP app should point at (e.g. https://niphdev.local)
  • JUPYTERHUB_PORT published port if you map JupyterHub to something other than 443 (legacy deployments can continue to set JUPYTER_PORT)
  • DSP_APP_ORIGINS space-separated list of origins allowed to call notebook APIs (CORS)
  • DSP_FRAME_ANCESTORS space-separated list of origins permitted to embed JupyterHub in an iframe

Platform roles at a glance

The application enforces the following roles via ist_tbl_users.isu_status and the helper functions in includes/auth.php. Use this matrix to confirm which actions (upload, read, download, approve) each role can take before issuing credentials:

Role Primary workspace Upload / manage data sources Approve access requests Request / read / download datasets Jupyter / R access
DAC Staff admin/ area Full oversight of every dataset, classification, and content entry. Manage any permission, revoke and audit usage. Can impersonate workflows when testing, but typically not used for research downloads. Enable per-user via isu_can_run_r; also seeds OAuth credentials.
Data Owner data_owner/ Create and maintain their own catalogue entries and metadata. Approve, reject, or revoke requests for the data they own. Access their own approved files plus anything they have requested from others. Optional; grant by setting isu_can_run_r = 1. Only approved files sync into their notebook.
Data Contributor data_hybrid/ Similar to owners, contributors can upload/publish datasets delegated to them. Limited to the resources they registered or steward. Can request access to other datasets and, once approved, read/download/analyze. Optional per account; ideal for analysts who both publish and consume data.
Data User data_user/ Browse-only catalogue view. Cannot approve requests. May request access, then read/download once a Data Owner or DAC Staff approves the request. Optional; if enabled, only their approved files appear in Jupyter.

Tip: updating a users role or R access flag happens under Admin → Manage Users. Toggle the “Allow R/Jupyter” switch to control whether uploads are synchronized into their personal notebook volume.

To wire DSP into JupyterHub via OAuth, also provide:

  • DSP_OAUTH_CLIENT_ID / DSP_OAUTH_CLIENT_SECRET
  • DSP_OAUTH_AUTHORIZE_URL, DSP_OAUTH_TOKEN_URL, DSP_OAUTH_USERINFO_URL
  • JUPYTERHUB_OAUTH_CALLBACK
  • JUPYTERHUB_USER_PATH and JUPYTERHUB_USERNAME_TEMPLATE if you need custom routing/usernames
  • JUPYTERHUB_CULL_API_TOKEN (optional) set to enable the idle culler service

Seed or update the OAuth client after setting these env vars:

docker-compose exec app php scripts/seed_jupyterhub_client.php

The JupyterHub deployment trusts requests and iframe parents from localhost:8082, 127.0.0.1:8082, and https://dsp.niph.org.kh by default. To allow different origins (for example your own DSP deployment), set:

  • DSP_APP_ORIGINS space-separated list of origins that should be accepted for CORS/websocket requests (e.g. DSP_APP_ORIGINS="https://dsp.niph.org.kh").
  • DSP_FRAME_ANCESTORS space-separated list of origins allowed to embed the notebook in an iframe (e.g. DSP_FRAME_ANCESTORS="https://dsp.niph.org.kh").

JupyterHub is published on host port 443 (configurable via the JUPYTERHUB_PORT environment variable in docker-compose.yml), so a deployment reachable at https://dsp.niph.org.kh works out of the box.

Project directories shared with containers

Host directory Container (app) Container (Jupyter)
. (project root) /var/www/html
r_scripts/ /var/www/html/r_scripts /home/jovyan/work/r_scripts
uploads/jupyter_workspace /var/www/html/uploads/jupyter_workspace /home/jovyan/work (per-user mount inside spawned notebook)

Uploads remain writable from the PHP container. If you run into permission warnings on macOS/Linux, chmod -R 777 uploads (or a tighter group-based permission) on the host usually resolves it. The path is bind-mounted into the dsp_app container, so ensure permissions are adjusted on the host side.

  • Uploaded files are stored under uploads/datasources/ with names like datasource_<unique>_<original-stem>.ext. This keeps paths unique while preserving a readable hint of the original filename. The default PHP upload limit is set to 20M (see docker/custom.ini).

  • The logs/app.log file (created via config.php) records upload activity—if you do not see [DataSource] entries after an upload, confirm the app container can reach MySQL (docker exec dsp_app php -r 'require "config.php"; echo "connected";').

Architecture Overview

graph LR
    subgraph Client
        U[Browser / API Consumer]
    end

    subgraph Docker Stack
        A[PHP + Apache<br/>dsp_app]
        B[(MySQL 8.0<br/>dsp_db)]
        C[phpMyAdmin<br/>dsp_phpmyadmin]
        D[Jupyter Notebook<br/>dsp_jupyter]
        V1[(uploads/datasources)]
        V2[(r_scripts)]
    end

    U -->|HTTPS/HTTP :8082| A
    U -->|HTTPS/HTTP :8081| C
    U -->|HTTPS :443| D
    A <-->|SQL :3306| B
    C -->|Admin SQL| B
    A -.shared volume .-> V1
    A -.shared volume .-> V2
    D -.shared volume .-> V1
    D -.shared volume .-> V2

Traffic legend: solid lines represent runtime traffic, dotted lines represent bind-mounted volumes that synchronize datasets and R scripts between containers.

Need the raw Mermaid for presentations? See assets/diagrams/data_ecosystem.mmd.

Data Model Snapshot

erDiagram
    IST_TBL_PEOPLE ||--o{ IST_TBL_USERS : "fkisp_id_of"
    IST_TBL_PEOPLE ||--o{ DSPS_TBL_DATASOURCE : "fkisp_id_of"
    DSPS_TBL_TYPEDATASOURCE ||--o{ DSPS_TBL_DATASOURCE : "fkdspstds_id"
    DSPS_TBL_DSPSCATEGORY ||--o{ DSPS_TBL_DATASOURCE : "fkdspscate_id"
    DSPS_TBL_DATASOURCE ||--o{ DSPS_TBL_DATASOURCE_PERMISSION : "fkdspsds_id"
    IST_TBL_PEOPLE ||--o{ DSPS_TBL_DATASOURCE_PERMISSION : "fkisp_id_of (requester)"
    DSPS_TBL_DATASOURCE ||--o{ DSPS_TBL_DATASOURCE_USED : "fkdspsdsused_id"
    IST_TBL_PEOPLE ||--o{ DSPS_TBL_DATASOURCE_USED : "fkisp_id_of (consumer)"

The diagram highlights how every dataset anchors to a person record, while permissions and usage logs capture cross-person interactions for auditing.

Analytics Catalog

Analytics scripts live in r_scripts/ and are exposed through api/run_r_script.php. Each script receives two CLI arguments: the absolute path to a CSV prepared by PHP and a JSON string of runtime parameters.

Script Purpose Required Parameters Optional Parameters Output
data_summary.R Smoke-test script that confirms connectivity between PHP and R, echoing the received file path and parameters. None Any JSON payload is echoed back in params_received. JSON with message, data_file, and the raw parameter string.
descriptive_stats.R Generates descriptive statistics for every numeric column (count, mean, median, SD, min, max, missing) and returns up to five preview rows. None (operates on all numeric columns). encoding (default UTF-8), guess_max to control type inference. JSON payload containing numeric_columns keyed by column name plus sample_rows. Missing values are encoded as null.
category_frequency.R Builds a frequency distribution for a categorical column. Useful for validating controlled vocabularies or spotting dominant categories. column name of the column to profile. top_n (default 10), encoding (default UTF-8), include_missing (false by default). JSON with the analyzed column, configuration echo, and frequencies (value/count rows) sorted by frequency.

Adding another R script

  1. Drop the script into r_scripts/ and ensure it prints JSON via jsonlite::toJSON(...).
  2. Append the filename and human-readable label to $allowed_r_scripts inside api/run_r_script.php.
  3. Document the new script in the table above so stakeholders understand its expected parameters and output contract.

Useful commands

# Stop and remove containers, keeping the database volume
docker-compose down

# Stop containers and remove the database volume (fresh start)
docker-compose down -v

# Tail logs from all services
docker-compose logs -f

Running Tests

PHPUnit is configured via Composer:

# Install dependencies (first run)
composer install

# Execute the test suite
composer test

If you prefer running inside the app container:

docker-compose exec app composer install
docker-compose exec app composer test

Troubleshooting

  • MySQL already initialised: remove the mysql_data named volume (docker-compose down -v) to force a clean import.
  • Rscript not found: ensure the PHP container has R installed (docker-compose build again). Set RSCRIPT_PATH in docker-compose.yml if R lives elsewhere.
  • Port clashes: adjust the published ports (8082, 8081, 443, 3307) in docker-compose.yml to free ones on your machine.
  • Need the OAuth tables?: run docker-compose exec db mysql -u root -p niph_dsps < db/migrations/20241103_oauth_tables.sql then insert your JupyterHub client credentials.

Happy hacking!

Description
No description provided
Readme 18 MiB
Languages
PHP 95.4%
Hack 2.7%
Python 0.9%
R 0.6%
Dockerfile 0.2%
Other 0.2%