Readme
This file will attempt to detail the assumptions and workflow of the project.
There is a ticket system to keep track
of what has been done and what still needs to be done.
Installation
GHC
We develop in a bare Haskell Platform environment. Stack is not used at the
moment, due to the complexity of importing local packages that are not yet in
Hackage.
At the moment, the code needs to work with our Ubuntu 16.04 LTS (Xenial)
server, which uses GHC 7.10.3. On that distribution, it should be enough to
do:
sudo apt install haskell-platform{,-doc,-prof}
On other OSes, the easiest way to get this specific version is perhaps to use
the generic installer:
wget -O /tmp/hp.tar.gz \
https://www.haskell.org/platform/download/7.10.3/haskell-platform-7.10.3-unknown-posix-x86_64.tar.gz
tar xf /tmp/hp.tar.gz
sudo ./install-haskell-platform.sh
# We also need to change some flags
sed -i 's/\(.*"C compiler flags",\s*"\)\(.*\)/\1-fno-PIE \2/g
;s/\(.*"C compiler link flags",\s*"\)\(.*\)/\1-no-pie \2/g
;s/\(.*"ld flags",\s*"\)\(.*\)/\1-no-pie \2/g' \
/usr/local/haskell/ghc-7.10.3-x86_64/lib/ghc-7.10.3/settings
Database
Since the database is SQLite3, we need the SQLite binary and libraries. On
Debian-based distributions, this amounts to:
sudo apt install sqlite3 libsqlite3-dev
On Windows, you can get the required executables and DLLs at
sqlite.org.
The initial live database can later be built with the database-builder.exe
binary, like so:
./database-builder.exe -o advise-me.db
Web server
To run the binary locally, you can use any web server with CGI support. We can
do the following to use Apache to serve CGI scripts from the
/usr/lib/cgi-bin
directory on Debian-based distributions:
sudo apt install apache2
sudo a2enmod cgid
For other OSes, check this
guide.
Haskell environment
The source code of the project is contained in Git and Subversion
repositories. To obtain it:
git clone \
https://github.com/ideas-edu/ideas
cd ideas; make src/Ideas/Main/Revision.hs; cd -
svn checkout \
https://ideastest.science.uu.nl/svn/ideas/Tutors/math-types
svn checkout \
https://ideastest.science.uu.nl/svn/ideas/Tutors/Advise-Me/trunk
Install the sandbox:
cd trunk
cabal sandbox init
cabal sandbox add-source ../ideas
cabal sandbox add-source ../math-types
cabal install \
--only-dependencies \
--enable-tests \
--enable-executable-profiling \
--enable-library-profiling
cabal configure \
--enable-tests \
--enable-executable-profiling \
--enable-coverage
We use make
, because there are many different files and interdependencies.
Reading the Makefile
should give an idea of the workflow. It is also
recommended to make a config.mk
file, overriding the variables in the
Makefile
so that they point to the correct directories:
tee config.mk << EOF
IDEAS_DIR = ../ideas/src
MATHTYPES_DIR = ../math-types/src
CGI_BIN = /usr/lib/cgi-bin
EOF
Bayesian networks
To create the Bayesian networks, Genie
is used. We used to interface with the SMILE library for using the networks,
but that is now done in Haskell itself by transforming the original .xdsl
files into a Haskell interface. See network-builder.exe
.
Compiling
Now, we can compile the binaries. make processing
should take care of
everything for us, but of course the binaries can also be created by cabal
separately.
Note that there is an xlsx
cabal flag that is on by default, because
building the xlsx
library (used for reading human assessments) is not
straightforward on every machine. If you find that the xlsx
library is
causing issues and you do not need its functionality, do cabal configure -flags="-xlsx"
before building.
Project structure
The following directories are important to know.
app/
: Haskell executables and scripts.
src/
: Haskell sources to the Advise-Me library.
tests/
: Haskell sources to the testing suite.
test-data/
: Test input requests for the testing suite and shell scripts to
send test input to the server.
hpc-*
: Haskell code coverage reports as generated by the recipe in the
Makefile.
pilots/
:
raw/
: Databases, mostly untouched as they were collected during pilot
or evaluation studies.
processed/
: Databases that are created from the raw data after the
fact, by processing it in various ways using database-builder.exe
. The
Makefile
contains recipes to create these files.
assessments/
: Excel spreadsheets that mirror the names in the
processed/
directory. These spreadsheets contain evaluations by humans
of the same data. They can be used to evaluate or debug the application,
using report.exe
, or to change or annotate the processed data. There
are also documents in this directory that are non-machine readable,
containing remarks of IDEAS' output by a human examiner.
regressions/
: This directory contains .exp
files that concatenates
the expected output of the processed databases. This allows for a
rudimentary regression test, using diff
.
networks/
: Bayesian networks created in Genie, and a supporting XML file
containing translations of the labels.
Apart from the main advise-me.cgi
binary, there are a couple of auxiliary
binaries to use:
-
The advise-me.cgi
binary provides the service: you provide input
via a
POST or GET request, and it will respond with the information you
requested. There are also additional commands that can be given to make it
do other things, like rerunning or reporting. Some of these are
deprecated, and they aren't documented well.
-
network-builder.exe
builds, given an .xdsl
file from networks/
, the
interface file necessary for running that network in our Haskell
environment. Unfortunately, it cannot itself be actually built: it depends
on the Advise-Me library, which itself depends on the files that it is
supposed to generate! From cabal-install
version 2, I believe that we
could use its autogeneration facilities. For now, as a crutch, we run
app/NetworkBuilder.hs
as a script — see the Makefile
.
-
The database-builder.exe
binary is a tool to create the initial database
and process existing databases. It gives us the ability to reuse input
data collected from a previous run and generate new output for it, as well
as annotate the database with information tables. As there are many flags
and switches, call it with --help
for more info.
To inspect the resulting databases or to examine statistics, there are
multiple options.
-
advise-me-admin.cgi
provides a web interface to inspect the databases
and report on statistics.
-
report.exe
can be used offline to compare assessments from IDEAS in the
database against human assessments with the humanvsmachine
subcommand.
It can also count how often evidence occurs with the priors
subcommand.
Finally, it can generate a legacy HTML page with diagnostics info, similar
to the overview in advise-me-admin.cgi
.
Testing
Tests that are implemented now relate exclusively to finding the evidence.
Other tests are mostly non-existent, so functionality may break without
warning. (For more fine-grained information on how well the evidence matches
our expectations, see report.exe
.)
Rudimentary regression tests can be performed with a diff
, simply to check
whether the output has changed since the last update. make regressions
does
this for you.
cabal test
runs the tasty
test suite with particular example requests, to
check if they still find the evidence we expect. Whenever you fix a specific
bug, please add a test along with the relevant request XML.
Coverage
To inspect code coverage, do cabal clean
and cabal configure --enable-coverage
and rebuild the binaries that you want to test. After
running the binaries, .tix
files will be created (that you can optionally
combine with hpc sum *.tix
). From the tix
and mix
files, you can
generate a HTML coverage index or a statistics report. For example:
hpc report
--hpcdir=dist/hpc/vanilla/mix/Advise-me-0.1
--hpcdir=dist/hpc/vanilla/mix/database-builder.exe
database-builder.exe.tix
Profiling
If you have installed the libraries with --enable-library-profiling
and
configured cabal with --enable-library-profiling --enable-executable-profiling
, then you can build a profiling version of the
main CGI binary. The Makefile
contains a recipe for a PDF report.