Advanced Usage:
Parameters:
The following table outlines user-controllable parameters that can be adjusted at run time:
Parameter Name |
Default Value |
Description |
---|---|---|
infile |
N/A, required |
Prefix to plink library or .raw file to be used as input |
out |
‘chrY_hgs’ |
Prefix to .out and .all files generated by SNAPPY |
min_hap_score |
0.6 |
Minimum match score for a haplogroup to be considered for assignment |
min_deep_score |
0.8 |
Minimum score to switch from highest scoring haplogroup to the deepest haplogroup for assignment |
ref_files_dir |
‘ref_data’ |
Directory where SNAPPY’s reference files are saved |
id2pos |
‘id_to_pos.txt’. |
File listing SNP ids and corresponding positions |
pos2allele |
‘pos_to_allele.txt’ |
File listing SNP positions and corresponding alleles |
hg2snp |
‘y_hg_and_snps.sort’ |
File listing markers and haplogroups |
tree_strct |
‘tree_structure.txt’ |
file listing haplogroup parent-child relationships for haplogroups that do not conform to naming conventions |
ancestral_hg_depth |
2 |
number of ancestral haplogroups to check when considering whether a haplogroup receives a score |
truncate_haps |
N/A |
file with list of haplogroups past which SNAPPY will not make assignments |
All adjustable parameters can be accessed at runtime by calling SNAPPY followed by –help. To adjust a parameter, append a double hyphen (–) followed immediately by the parameter name, a space, and the desired value for that parameter.
Example:
python SNAPPY_v123.py --infile plink_prefix --min_hap_score 0.7
Notes and Considerations:
All reference files included in the current distribution of SNAPPY use positions from human genome version GRCh37. Genotype positions from other versions of the human genome may result in inaccurate results.
Prior to running SNAPPY, it may be necessary to check for strand concordance with the Y-chromosome of GRCh37, and to flip and/or remove ambiguous sites and those whose variants correspond to genotyping from the non-reference strand.
A key aspect of the SNAPPY’s success is the robust nature of the Y-chromosome tree and the inclusion of informative variants on the Multi-Ethnic Genotyping Array (MEGA). SNAPPY’s current implementation was designed and tested using genotyping data from the MEGA, which includes over 11,000 variants on the Y-chromosome. SNAPPY should readily apply to other arrays, but care should be taken to ensure that arrays have a sufficient number of loci that are included in the reference library.
Genotyping by sequencing (GBS) is increasingly popular, and data generated through GBS is compatible with SNAPPY, provided that all sites passing quality filters are included in the output genotypes during variant calling (this can be accomplished, for example, using the –emit-all argument in GATK’s variant calling pipeline). Otherwise, haplogroup-informative sites where the reference sequence used in variant calling has a derived allele may not be included in the genotype file.