12 KiB
DSP Platform Docker Setup
This repository now includes a Docker-based development environment that brings up:
- PHP + Apache web server (with Rscript available for the automated analyses)
- MySQL 8.0 database seeded with the
db/niph_dsps.sqldump on first run - phpMyAdmin for administering the database through the browser
- JupyterHub (per-user R-enabled JupyterLab) for isolated notebook environments
Prerequisites
- Docker Desktop (or Docker Engine + Docker Compose plugin)
- ~2 GB of free disk space for the base images
Quick start
# From the project root
docker-compose up --build
Once the stack is healthy you can reach the services at:
| Service | URL | Notes |
|---|---|---|
| PHP application | http://localhost:8082 | Uses DB credentials from docker-compose.yml |
| phpMyAdmin | http://localhost:8081 | Login with dsp_user / dsp_pass (or MySQL root) |
| JupyterHub | https://localhost | OAuth handshake redirects to your private notebook (published on port 443) |
| MySQL | localhost:3307 (host access) | Database niph_dsps, user dsp_user / dsp_pass |
The first docker-compose up will import db/niph_dsps.sql automatically. Subsequent runs keep the data volume (mysql_data).
Configuration
Key environment variables are defined in docker-compose.yml. Adjust them if you need different credentials or ports. The PHP application now reads its database configuration from the following variables (with sensible defaults for non-Docker setups):
DB_HOSTDB_PORTDB_NAMEDB_USERDB_PASS
api/run_r_script.php also honours RSCRIPT_PATH if you need to override the default location of the Rscript executable.
When the portal is hosted on a different hostname (for example, an Ubuntu server on your LAN), set the following variables—either in your shell or a .env file consumed by Docker Compose—to keep the embedded JupyterHub session aligned with browser security rules:
JUPYTER_EXTERNAL_URL– full base URL that the PHP app should point at (e.g.https://niphdev.local)JUPYTERHUB_PORT– published port if you map JupyterHub to something other than443(legacy deployments can continue to setJUPYTER_PORT)DSP_APP_ORIGINS– space-separated list of origins allowed to call notebook APIs (CORS)DSP_FRAME_ANCESTORS– space-separated list of origins permitted to embed JupyterHub in an iframe
Platform roles at a glance
The application enforces the following roles via ist_tbl_users.isu_status and the helper functions in includes/auth.php. Use this matrix to confirm which actions (upload, read, download, approve) each role can take before issuing credentials:
| Role | Primary workspace | Upload / manage data sources | Approve access requests | Request / read / download datasets | Jupyter / R access |
|---|---|---|---|---|---|
| DAC Staff | admin/ area |
✅ Full oversight of every dataset, classification, and content entry. | ✅ Manage any permission, revoke and audit usage. | ✅ Can impersonate workflows when testing, but typically not used for research downloads. | ✅ Enable per-user via isu_can_run_r; also seeds OAuth credentials. |
| Data Owner | data_owner/ |
✅ Create and maintain their own catalogue entries and metadata. | ✅ Approve, reject, or revoke requests for the data they own. | ✅ Access their own approved files plus anything they have requested from others. | ✅ Optional; grant by setting isu_can_run_r = 1. Only approved files sync into their notebook. |
| Data Contributor | data_hybrid/ |
✅ Similar to owners, contributors can upload/publish datasets delegated to them. | ✅ Limited to the resources they registered or steward. | ✅ Can request access to other datasets and, once approved, read/download/analyze. | ✅ Optional per account; ideal for analysts who both publish and consume data. |
| Data User | data_user/ |
❌ Browse-only catalogue view. | ❌ Cannot approve requests. | ✅ May request access, then read/download once a Data Owner or DAC Staff approves the request. | ✅ Optional; if enabled, only their approved files appear in Jupyter. |
Tip: updating a user’s role or R access flag happens under Admin → Manage Users. Toggle the “Allow R/Jupyter” switch to control whether uploads are synchronized into their personal notebook volume.
To wire DSP into JupyterHub via OAuth, also provide:
DSP_OAUTH_CLIENT_ID/DSP_OAUTH_CLIENT_SECRETDSP_OAUTH_AUTHORIZE_URL,DSP_OAUTH_TOKEN_URL,DSP_OAUTH_USERINFO_URLJUPYTERHUB_OAUTH_CALLBACKJUPYTERHUB_USER_PATHandJUPYTERHUB_USERNAME_TEMPLATEif you need custom routing/usernamesJUPYTERHUB_CULL_API_TOKEN(optional) – set to enable the idle culler service
Seed or update the OAuth client after setting these env vars:
docker-compose exec app php scripts/seed_jupyterhub_client.php
The JupyterHub deployment trusts requests and iframe parents from localhost:8082, 127.0.0.1:8082, and https://dsp.niph.org.kh by default. To allow different origins (for example your own DSP deployment), set:
DSP_APP_ORIGINS– space-separated list of origins that should be accepted for CORS/websocket requests (e.g.DSP_APP_ORIGINS="https://dsp.niph.org.kh").DSP_FRAME_ANCESTORS– space-separated list of origins allowed to embed the notebook in an iframe (e.g.DSP_FRAME_ANCESTORS="https://dsp.niph.org.kh").
JupyterHub is published on host port 443 (configurable via the JUPYTERHUB_PORT environment variable in docker-compose.yml), so a deployment reachable at https://dsp.niph.org.kh works out of the box.
Project directories shared with containers
| Host directory | Container (app) | Container (Jupyter) |
|---|---|---|
. (project root) |
/var/www/html |
– |
r_scripts/ |
/var/www/html/r_scripts |
/home/jovyan/work/r_scripts |
uploads/jupyter_workspace |
/var/www/html/uploads/jupyter_workspace |
/home/jovyan/work (per-user mount inside spawned notebook) |
Uploads remain writable from the PHP container. If you run into permission warnings on macOS/Linux,
chmod -R 777 uploads (or a tighter group-based permission) on the host usually resolves it. The path is bind-mounted into the dsp_app container, so ensure permissions are adjusted on the host side.
-
Uploaded files are stored under
uploads/datasources/with names likedatasource_<unique>_<original-stem>.ext. This keeps paths unique while preserving a readable hint of the original filename. The default PHP upload limit is set to20M(seedocker/custom.ini). -
The
logs/app.logfile (created viaconfig.php) records upload activity—if you do not see[DataSource]entries after an upload, confirm the app container can reach MySQL (docker exec dsp_app php -r 'require "config.php"; echo "connected";').
Architecture Overview
graph LR
subgraph Client
U[Browser / API Consumer]
end
subgraph Docker Stack
A[PHP + Apache<br/>dsp_app]
B[(MySQL 8.0<br/>dsp_db)]
C[phpMyAdmin<br/>dsp_phpmyadmin]
D[Jupyter Notebook<br/>dsp_jupyter]
V1[(uploads/datasources)]
V2[(r_scripts)]
end
U -->|HTTPS/HTTP :8082| A
U -->|HTTPS/HTTP :8081| C
U -->|HTTPS :443| D
A <-->|SQL :3306| B
C -->|Admin SQL| B
A -.shared volume .-> V1
A -.shared volume .-> V2
D -.shared volume .-> V1
D -.shared volume .-> V2
Traffic legend: solid lines represent runtime traffic, dotted lines represent bind-mounted volumes that synchronize datasets and R scripts between containers.
Need the raw Mermaid for presentations? See
assets/diagrams/data_ecosystem.mmd.
Data Model Snapshot
erDiagram
IST_TBL_PEOPLE ||--o{ IST_TBL_USERS : "fkisp_id_of"
IST_TBL_PEOPLE ||--o{ DSPS_TBL_DATASOURCE : "fkisp_id_of"
DSPS_TBL_TYPEDATASOURCE ||--o{ DSPS_TBL_DATASOURCE : "fkdspstds_id"
DSPS_TBL_DSPSCATEGORY ||--o{ DSPS_TBL_DATASOURCE : "fkdspscate_id"
DSPS_TBL_DATASOURCE ||--o{ DSPS_TBL_DATASOURCE_PERMISSION : "fkdspsds_id"
IST_TBL_PEOPLE ||--o{ DSPS_TBL_DATASOURCE_PERMISSION : "fkisp_id_of (requester)"
DSPS_TBL_DATASOURCE ||--o{ DSPS_TBL_DATASOURCE_USED : "fkdspsdsused_id"
IST_TBL_PEOPLE ||--o{ DSPS_TBL_DATASOURCE_USED : "fkisp_id_of (consumer)"
The diagram highlights how every dataset anchors to a person record, while permissions and usage logs capture cross-person interactions for auditing.
Analytics Catalog
Analytics scripts live in r_scripts/ and are exposed through api/run_r_script.php. Each script receives two CLI arguments: the absolute path to a CSV prepared by PHP and a JSON string of runtime parameters.
| Script | Purpose | Required Parameters | Optional Parameters | Output |
|---|---|---|---|---|
data_summary.R |
Smoke-test script that confirms connectivity between PHP and R, echoing the received file path and parameters. | None | Any JSON payload is echoed back in params_received. |
JSON with message, data_file, and the raw parameter string. |
descriptive_stats.R |
Generates descriptive statistics for every numeric column (count, mean, median, SD, min, max, missing) and returns up to five preview rows. | None (operates on all numeric columns). | encoding (default UTF-8), guess_max to control type inference. |
JSON payload containing numeric_columns keyed by column name plus sample_rows. Missing values are encoded as null. |
category_frequency.R |
Builds a frequency distribution for a categorical column. Useful for validating controlled vocabularies or spotting dominant categories. | column – name of the column to profile. |
top_n (default 10), encoding (default UTF-8), include_missing (false by default). |
JSON with the analyzed column, configuration echo, and frequencies (value/count rows) sorted by frequency. |
Adding another R script
- Drop the script into
r_scripts/and ensure it prints JSON viajsonlite::toJSON(...). - Append the filename and human-readable label to
$allowed_r_scriptsinsideapi/run_r_script.php. - Document the new script in the table above so stakeholders understand its expected parameters and output contract.
Useful commands
# Stop and remove containers, keeping the database volume
docker-compose down
# Stop containers and remove the database volume (fresh start)
docker-compose down -v
# Tail logs from all services
docker-compose logs -f
Running Tests
PHPUnit is configured via Composer:
# Install dependencies (first run)
composer install
# Execute the test suite
composer test
If you prefer running inside the app container:
docker-compose exec app composer install
docker-compose exec app composer test
Troubleshooting
- MySQL already initialised: remove the
mysql_datanamed volume (docker-compose down -v) to force a clean import. - Rscript not found: ensure the PHP container has R installed (
docker-compose buildagain). SetRSCRIPT_PATHindocker-compose.ymlif R lives elsewhere. - Port clashes: adjust the published ports (
8082,8081,443,3307) indocker-compose.ymlto free ones on your machine. - Need the OAuth tables?: run
docker-compose exec db mysql -u root -p niph_dsps < db/migrations/20241103_oauth_tables.sqlthen insert your JupyterHub client credentials.
Happy hacking!