Advanced Usage:

Parameters:

The following table outlines user-controllable parameters that can be adjusted at run time:

Parameter Name	Default Value	Description
infile	N/A, required	Prefix to plink library or .raw file to be used as input
out	‘chrY_hgs’	Prefix to .out and .all files generated by SNAPPY
min_hap_score	0.6	Minimum match score for a haplogroup to be considered for assignment
min_deep_score	0.8	Minimum score to switch from highest scoring haplogroup to the deepest haplogroup for assignment
ref_files_dir	‘ref_data’	Directory where SNAPPY’s reference files are saved
id2pos	‘id_to_pos.txt’.	File listing SNP ids and corresponding positions
pos2allele	‘pos_to_allele.txt’	File listing SNP positions and corresponding alleles
hg2snp	‘y_hg_and_snps.sort’	File listing markers and haplogroups
tree_strct	‘tree_structure.txt’	file listing haplogroup parent-child relationships for haplogroups that do not conform to naming conventions
ancestral_hg_depth	2	number of ancestral haplogroups to check when considering whether a haplogroup receives a score
truncate_haps	N/A	file with list of haplogroups past which SNAPPY will not make assignments

All adjustable parameters can be accessed at runtime by calling SNAPPY followed by –help. To adjust a parameter, append a double hyphen (–) followed immediately by the parameter name, a space, and the desired value for that parameter.

Example:

python SNAPPY_v123.py --infile plink_prefix --min_hap_score 0.7

Notes and Considerations:

All reference files included in the current distribution of SNAPPY use positions from human genome version GRCh37. Genotype positions from other versions of the human genome may result in inaccurate results.
Prior to running SNAPPY, it may be necessary to check for strand concordance with the Y-chromosome of GRCh37, and to flip and/or remove ambiguous sites and those whose variants correspond to genotyping from the non-reference strand.
A key aspect of the SNAPPY’s success is the robust nature of the Y-chromosome tree and the inclusion of informative variants on the Multi-Ethnic Genotyping Array (MEGA). SNAPPY’s current implementation was designed and tested using genotyping data from the MEGA, which includes over 11,000 variants on the Y-chromosome. SNAPPY should readily apply to other arrays, but care should be taken to ensure that arrays have a sufficient number of loci that are included in the reference library.
Genotyping by sequencing (GBS) is increasingly popular, and data generated through GBS is compatible with SNAPPY, provided that all sites passing quality filters are included in the output genotypes during variant calling (this can be accomplished, for example, using the –emit-all argument in GATK’s variant calling pipeline). Otherwise, haplogroup-informative sites where the reference sequence used in variant calling has a derived allele may not be included in the genotype file.