Running LIRICAL

LIRICAL is a command-line Java tool that runs with Java version 17 or higher. LIRICAL can be run both with and without genomic data in form of a VCF file from genome, exome, or NGS gene-panel sequencing.

On typical computers, LIRICAL will run from about 15 to 60 seconds in phenotype-only mode, ~5 minutes with a typical exome file, or longer if a whole-genome file is used as input.

To get help, run LIRICAL with a command or with the option “-h”:

lirical --help
LIkelihood Ratio Interpretation of Clinical AbnormaLities

Usage: lirical [-hV] [COMMAND]
  -h, --help      Show this help message and exit.
  -V, --version   Print version information and exit.
Commands:
  download, D     Download files for LIRICAL.
  prioritize, R   Run LIRICAL from CLI arguments.
  phenopacket, P  Run LIRICAL from a Phenopacket.
  yaml, Y         Run LIRICAL from a YAML file.

See the full documentation at https://lirical.readthedocs.io/en/master

Note

We assume that the lirical command alias was set as described in the Set up alias section.

Run LIRICAL with a specific command with the -h option to get information about the command:

lirical download -h
Usage: lirical download [-hVw] [-d=<datadir>]
Download files for LIRICAL.
  -d, --data=<datadir>   directory to download data (default: data)
  -w, --overwrite        overwrite previously downloaded files (default: false)
  -h, --help             Show this help message and exit.
  -V, --version          Print version information and exit.

LIRICAL has four main commands, download, prioritize, phenopacket, and yaml. We will not discuss the download command since it has already been covered in the LIRICAL data files section

Shared CLI options

LIRICAL offers several commands for receiving phenotype and genotype inputs via CLI, phenopacket, or a YAML file. However the commands share many CLI arguments for setting up the resource paths, the analysis configuration, and where results should be written. We describe the shared CLI arguments in this section.

Resources

The options from this group set up resources required for LIRICAL analysis.

  • -d | --data: path to LIRICAL data directory. Required if the data folder is not set up next to the LIRICAL JAR file.

  • -e19 | --exomiser-hg19: path to Exomiser variant database for hg19. Required if the analysis is run with exome/genome sequencing files and --assembly is set to hg19.

  • -e38 | --exomiser-hg38: path to Exomiser variant database for hg38. Required if the analysis is run with exome/genome sequencing files and --assembly is set to hg38.

  • -b | --background: path to file with background variant frequencies for genes. This option should not be used unless there is a very good reason to do that. The background variant frequencies are bundled with the LIRICAL code. See Background variant frequencies for more info.

  • --parallelism: the number of workers/threads to use. The value must be a positive integer (default: 1).

Configuration options

The configuration options tweak the analysis.

  • -g | --global: global analysis, see Global mode for more info (default: false).

  • --sdwndv: show diseases even if no deleterious variants are found in the gene associated with the disease. The option is a flag (takes no value) and its presence will lead to showing all diseases, even those with no deleterious variants. Only applicable to the HTML and TSV reports when running with a VCF file (genotype-aware mode). The JSON report will include all diseases all the time.

  • --transcript-db: transcript database (default: RefSeq), see Transcript databases for more info.

  • --use-orphanet: use Orphanet annotations (default: false).

  • --strict: use strict penalties if the genotype does not match the disease model in terms of number of called pathogenic alleles (default: false).

  • --pathogenicity-threshold: Variants with greater pathogenicity score is considered deleterious (default: 0.8).

  • --validation-policy: set the level of input sanity check, see Analysis input validation for more info. Choose from MINIMAL, LENIENT, STRICT (default MINIMAL).

  • --dry-run: check if the inputs meet the validation policy requirements, report any issues, and exit without running the analysis (default: false).

Output options

The output options dictate the format and location for the analysis results.

  • -o | --output-directory: where to write the analysis outputs (default: current working directory).

  • -f | --output-format: Output format to use for writing the results, can be provided multiple times. Choose from html, tsv, and json (default: html).

  • -x | --prefix: prefix of the output files (default: lirical)

  • -t | --threshold: minimum post-test probability to show diagnosi.s in the HTML report. The value must be in range \([0, 1]\). The option must not be used with -m | -mindiff option at the same time.

  • -m | --mindiff: Minimal number of differential diagnoses to show.

  • --display-all-variants: Display all variants in the HTML report, not just the variants passing the pathogenicity threshold (default: false).

LIRICAL prioritization commands

LIRICAL provides three commands for receiving phenotype and genotype inputs via CLI, as a phenopacket, or as a YAML file.

prioritize - run LIRICAL with via CLI options

Since v2 release, all required inputs can be provided as command line arguments of the prioritize command. This leads to a rather lengthy CLI. However, the CLI can be useful e.g. for using with pipeline engines such as Nextflow or Snakemake.

The prioritize command takes the following options:

  • -p | --observed-phenotypes: a comma-separated IDs of HPO IDs that correspond to the phenotype terms observed in the proband.

  • -n | --negated-phenotypes: a comma-separated IDs of HPO IDs that correspond to the phenotype terms negated/excluded in the proband.

  • --assembly genome build, choose from hg19 or hg38, must be provided if --vcf is used (default: hg38).

  • --vcf: path to VCF file with exome/genome sequencing results. The file can be compressed.

  • --sample-id: proband’s identifier, must be provided if running with a multi-sample VCF file (default: subject).

  • --age: proband’s age as an ISO8601 duration. (e.g. P9Y for 9 years, P2Y3M for 2 years and 3 months, or P33W for the 33th gestational week).

  • --sex: proband’s sex, choose from MALE, FEMALE, UNKNOWN (default: UNKNOWN).

phenopacket - run LIRICAL with a Phenopacket

LIRICAL can be run with clinical data (HPO terms) only or with clinical data and a VCF file representing the results of gene panel, exome, or genome sequencing. The preferred input format is Phenopackets, an open standard for sharing disease and phenotype information. This is a new standard of the Global Alliance for Genomics and Health that links detailed phenotype descriptions with disease, patient, and genetic information.

PhenopacketGenerator

For convenience, we provide a tool called PhenopacketGenerator that can be used to create a Phenopacket with a list of HPO terms and the path to a VCF file with which LIRICAL can be run.

LIRICAL can be run with clinical data (HPO terms) only or with clinical data and a VCF file representing the results of gene panel, exome, or genome sequencing.

Let’s consider an example of an individual with Pfeiffer syndrome:

{
  "id": "pfeiffer-example",
  "subject": {
    "id": "example-1"
  },
  "phenotypicFeatures": [{
    "type": {
      "id": "HP:0000244",
      "label": "Turribrachycephaly"
    }
  }, {
    "type": {
      "id": "HP:0001363",
      "label": "Craniosynostosis"
    }
  }, {
    "type": {
      "id": "HP:0000453",
      "label": "Choanal atresia"
    }
  }, {
    "type": {
      "id": "HP:0000327",
      "label": "Hypoplasia of the maxilla"
    }
  }, {
    "type": {
      "id": "HP:0000238",
      "label": "Hydrocephalus"
   }
  }],
  "metaData": {
    "createdBy": "Peter R.",
    "resources": [{
      "id": "hp",
      "name": "human phenotype ontology",
      "namespacePrefix": "HP",
      "url": "http://purl.obolibrary.org/obo/hp.owl",
      "version": "2018-03-08",
      "iriPrefix": "http://purl.obolibrary.org/obo/HP_"
    }],
    "phenopacketSchemaVersion": "2.0.0"
  }
}

Save the file above as pfeiffer.json.

Running LIRICAL with clinical data

LIRICAL will perform phenotype-only analysis if the phenopacket command incantation does not contain a --vcf option. In this case, the only required argument is the phenopacket:

lirical phenopacket -p pfeiffer.json

Running LIRICAL with a VCF file

Alternatively, LIRICAL can include the VCF file if the path is provided using --vcf option. Note, we must also provide --assembly and -e19 (or -e38) options to indicate the genome assembly and path to Exomiser variant database:

lirical phenopacket -p pfeiffer.json --vcf path/to/pfeiffer.vcf.gz --assembly hg19 -e19 /path/to/exomiser/2302_hg19_variants.mv.db

yaml - running LIRICAL with a YAML file

The other allowed input format is YAML input format.

A typical command that runs LIRICAL using settings shown in the YAML file with the default data directory would be simply:

lirical yaml -y example.yml

This will run the phenotype-only analysis of the Patient 4.

To run the genotype-aware analysis, modify the YAML file such that the vcf field points to the location of the VCF file on your file system. Then, the analysis is run as:

lirical yaml -y example.yml --assembly hg19 -e19 /path/to/exomiser/2302_hg19_variants.mv.db

Choosing between YAML and Phenopacket input formats

How should users choose between YAML or Phenopacket as input?