3.1.  Initialization functions

(integer$)initializeAncestralNucleotides(is sequence)

This function, which may be called only in nucleotide-based models, supplies an ancestral nucleotide sequence for the model.  The sequence parameter may be an integer vector providing nucleotide values (A=0, C=1, G=2, T=3), or a string vector providing single-character nucleotides ("A", "C", "G", "T"), or a singleton string providing the sequence as one string ("ACGT..."), or a singleton string providing the filesystem path of a FASTA file which will be read in to provide the sequence (if the file contains than one sequence, the first sequence will be used).  Only A/C/G/T nucleotide values may be provided; other symbols, such as those for amino acids, gaps, or nucleotides of uncertain identity, are not allowed.  The two semantic meanings of sequence that involve a singleton string value are distinguished heuristically; a singleton string that contains only the letters ACGT will be assumed to be a nucleotide sequence rather than a filename.  The length of the ancestral sequence is returned.

A utility function, randomNucleotides(), is provided by SLiM to assist in generating simple random nucleotide sequences.

(void)initializeGeneConversion(numeric$ nonCrossoverFraction, numeric$ meanLength, numeric$ simpleConversionFraction, [numeric$ bias = 0], [logical$ redrawLengthsOnFailure = F])

Calling this function switches the recombination model from a “simple crossover” model to a “double-stranded break (DSB)” model, and configures the details of the gene conversion tracts that will therefore be modeled.  The fraction of DSBs that will be modeled as non-crossover events is given by nonCrossoverFraction.  The mean length of gene conversion tracts (whether associated with crossover or non-crossover events) is given by meanLength; the actual extent of a gene conversion tract will be the sum of two independent draws from a geometric distribution with mean meanLength/2.  The fraction of gene conversion tracts that are modeled as “simple” is given by simpleConversionFraction; the remainder will be modeled as “complex”, involving repair of heteroduplex mismatches.  Finally, the GC bias during heteroduplex mismatch repair is given by bias, with the default of 0.0 indicating no bias, 1.0 indicating an absolute preference for G/C mutations over A/T mutations, and -1.0 indicating an absolute preference for A/T mutations over G/C mutations.  A non-zero bias may only be set in nucleotide-based models.  This function, and the way that gene conversion is modeled, fundamentally changed in SLiM 3.3.

Beginning in SLiM 4.1, the redrawLengthsOnFailure parameter can be used to modify the internal mechanics of layout of gene conversion tracts.  If it is F (the default, and the only behavior supported before SLiM 4.1), then if an attempt to lay out gene conversion tracts fails (because the tracts overlap each other, or overlap the start or end of the chromosome), SLiM will try again by drawing new positions for the tracts – essentially shuffling the tracts around to try to find positions for them that don’t overlap.  If redrawLengthsOnFailure is T, then if an attempt to lay out gene conversion tracts fails, SLiM will try again by drawing new lengths for the tracts, as well as new positions.  This makes it more likely that layout will succeed, but risks biasing the realized mean tract length downward from the requested mean length (since layout of long tracts is more likely fail due to overlap).  In either case, if SLiM attempts to lay out gene conversion tracts 100 times without success, an error will result.  That error indicates that the specified constraints for gene conversion are difficult to satisfy – tracts may commonly be so long that it is difficult or impossible to find an acceptable layout for them within the specified chromosome length.  Setting redrawLengthsOnFailure to T may mitigate this problem, at the price of biasing the mean tract length downward as discussed.

(object<GenomicElement>)initializeGenomicElement(io<GenomicElementType> genomicElementType, integer start, integer end)

Add a genomic element to the chromosome at initialization time.  The start and end parameters give the first and last base positions to be spanned by the new genomic element.  The new element will be based upon the genomic element type identified by genomicElementType, which can be either an integer, representing the ID of the desired element type, or an object of type GenomicElementType specified directly.

Beginning in SLiM 3.3, this function is vectorized: the genomicElementType, start, and end parameters do not have to be singletons.  In particular, start and end may be of any length, but must be equal in length; each start/end element pair will generate one new genomic element spanning the given base positions.  In this case, genomicElementType may still be a singleton, providing the genomic element type to be used for all of the new genomic elements, or it may be equal in length to start and end, providing an independent genomic element type for each new element.  When adding a large number of genomic elements, it will be much faster to add them in order of ascending position with a vectorized call.

The return value provides the genomic element(s) created by the call, in the order in which they were specified in the parameters to initializeGenomicElement().

(object<GenomicElementType>$)initializeGenomicElementType(is$ id, io<MutationType> mutationTypes, numeric proportions, [Nf mutationMatrix = NULL])

Add a genomic element type at initialization time.  The id must not already be used for any genomic element type in the simulation.  The mutationTypes vector identifies the mutation types used by the genomic element, and the proportions vector should be of equal length, specifying the relative proportion of mutations that will be drawn from the corresponding mutation type (proportions do not need to add up to one; they are interpreted relatively).  The id parameter may be either an integer giving the ID of the new genomic element type, or a string giving the name of the new genomic element type (such as "g5" to specify an ID of 5).  The mutationTypes parameter may be either an integer vector representing the IDs of the desired mutation types, or an object vector of MutationType elements specified directly.  The global symbol for the new genomic element type is immediately available; the return value also provides the new object.

The mutationMatrix parameter is NULL by default, and in non-nucleotide-based models it must be NULL.  In nucleotide-based models, on the other hand, it must be non-NULL, and therefore must be supplied.  In that case, mutationMatrix should take one of two standard forms.  For sequence-based mutation rates that depend upon only the single nucleotide at a mutation site, mutationMatrix should be a 4×4 float matrix, specifying mutation rates for an existing nucleotide state (rows from 03 representing A/C/G/T) to each of the four possible derived nucleotide states (columns, with the same meaning).  The mutation rates in this matrix are absolute rates, per nucleotide per gamete; they will be used by SLiM directly unless they are multiplied by a factor from the hotspot map (see initializeHotspotMap()).  Rates in mutationMatrix that involve the mutation of a nucleotide to itself (A to A, C to C, etc.) are not used by SLiM and must be 0.0 by convention.

It is important to note that the order of the rows and columns used in SLiM, A/C/G/T, is not a universal convention; other sources will present substitution-rate/transition-rate matrices using different conventions, and so care must be taken when importing such matrices into SLiM.

For sequence-based mutation rates that depend upon the trinucleotide sequence centered upon a mutation site (the adjacent bases to the left and right, in other words, as well as the mutating nucleotide itself), mutationMatrix should be a 64×4 float matrix, specifying mutation rates for the central nucleotide of an existing trinucleotide sequence (rows from 063, representing codons as described in the documentation for the ancestralNucleotides() method of Chromosome) to each of the four possible derived nucleotide states (columns from 03 for A/C/G/T as before).  Note that in every case it is the central nucleotide of the trinucleotide sequence that is mutating, but rates can be specified independently based upon the nucleotides in the first and third positions as well, with this type of mutation matrix.

Several helper functions are defined to construct common types of mutation matrices, such as mmJukesCantor() to create a mutation matrix for a Jukes–Cantor model.

(void)initializeHotspotMap(numeric multipliers, [Ni ends = NULL], [string$ sex = "*"])

In nucleotide-based models, set the mutation rate multiplier along the chromosome.  Nucleotide-based models define sequence-based mutation rates that are set up with the mutationMatrix parameter to initializeGenomicElementType().  If no hotspot map is specified by calling initializeHotspotMap(), a hotspot map with a multiplier of 1.0 across the whole chromosome is assumed (and so the sequence-based rates are the absolute mutation rates used by SLiM).  A hotspot map modifies the sequence-based rates by scaling them up in some regions, with multipliers greater than 1.0 (representing mutational hot spots), and/or scaling them down in some regions, with multipliers less than 1.0 (representing mutational cold spots).

There are two ways to call this function.  If the optional ends parameter is NULL (the default), then multipliers must be a singleton value that specifies a single multiplier to be used along the entire chromosome (typically 1.0, but not required to be).  If, on the other hand, ends is supplied, then multipliers and ends must be the same length, and the values in ends must be specified in ascending order.  In that case, multipliers and ends taken together specify the multipliers to be used along successive contiguous stretches of the chromosome, from beginning to end; the last position specified in ends should extend to the end of the chromosome (i.e. at least to the end of the last genomic element, if not further).

For example, if the following call is made:

initializeHotspotMap(c(1.0, 1.2), c(5000, 9999));

then the result is that the mutation rate multiplier for bases 0...5000 (inclusive) will be 1.0 (and so the specified sequence-based mutation rates will be used verbatim), and the multiplier for bases 5001...9999 (inclusive) will be 1.2 (and so the sequence-based mutation rates will be multiplied by 1.2 within the region).

Note that mutations are generated by SLiM only within genomic elements, regardless of the hotspot map.  In effect, the hotspot map given is intersected with the coverage area of the genomic elements defined; areas outside of any genomic element are given a multiplier of zero.  There is no harm in supplying a hotspot map that specifies multipliers for areas outside of the genomic elements defined; the excess information is simply not used.

If the optional sex parameter is "*" (the default), then the supplied hotspot map will be used for both sexes (which is the only option for hermaphroditic simulations).  In sexual simulations sex may be "M" or "F" instead, in which case the supplied hotspot map is used only for that sex (i.e., when generating a gamete from a parent of that sex).  In this case, two calls must be made to initializeHotspotMap(), one for each sex, even if a multiplier of 1.0 is desired for the other sex; no default hotspot map is supplied.

(object<InteractionType>$)initializeInteractionType(is$ id, string$ spatiality, [logical$ reciprocal = F], [numeric$ maxDistance = INF], [string$ sexSegregation = "**"])

Add an interaction type at initialization time.  The id must not already be used for any interaction type in the simulation.  The id parameter may be either an integer giving the ID of the new interaction type, or a string giving the name of the new interaction type (such as "i5" to specify an ID of 5).

The spatiality may be "", for non-spatial interactions (i.e., interactions that do not depend upon the distance between individuals); "x", "y", or "z" for one-dimensional interactions; "xy", "xz", or "yz" for two-dimensional interactions; or "xyz" for three-dimensional interactions.  The dimensions referenced by spatiality must be defined as spatial dimensions with initializeSLiMOptions(); if the simulation has dimensionality "xy", for example, then interactions in the simulation may have spatiality "", "x", "y", or "xy", but may not reference spatial dimension z and thus may not have spatiality "xz", "yz", or "xyz".  If no spatial dimensions have been configured, only non-spatial interactions may be defined.

The reciprocal flag may be T, in which case the interaction is guaranteed by the user to be reciprocal: whatever the interaction strength is for exerter B upon receiver A, it will be equal (in magnitude and sign) for exerter A upon receiver B.  In principle, this allows the InteractionType to reduce the amount of computation necessary by up to a factor of two (although it may or may not be used).  If reciprocal is F, the interaction is not guaranteed to be reciprocal and each interaction will be computed independently.  The built-in interaction formulas are all reciprocal, but if you implement an interaction() callback, you must consider whether the callback you have implemented preserves reciprocality or not.  For this reason, the default is reciprocal=F, so that bugs are not inadvertently introduced by an invalid assumption of reciprocality.  See below for a note regarding reciprocality in sexual simulations when using the sexSegregation flag.

The maxDistance parameter supplies the maximum distance over which interactions of this type will be evaluated; at greater distances, the interaction strength is considered to be zero (for efficiency).  The default value of maxDistance, INF (positive infinity), indicates that there is no maximum interaction distance; note that this can make some interaction queries much less efficient, and is therefore not recommended.  In SLiM 3.1 and later, a warning will be issued if a spatial interaction type is defined with no maximum distance to encourage a maximum distance to be defined.

The sexSegregation parameter governs the applicability of the interaction to each sex, in sexual simulations.  It does not affect distance calculations in any way; it only modifies the way in which interaction strengths are calculated.  The default, "**", implies that the interaction is felt by both sexes (the first character of the string value) and is exerted by both sexes (the second character of the string value).  Either or both characters may be M or F instead; for example, "MM" would indicate a male-male interaction, such as male-male competition, whereas "FM" would indicate an interaction influencing only female receivers that is influenced only by male exerters, such as male mating displays that influence female attraction.  This parameter may be set only to "**" unless sex has been enabled with initializeSex().  Note that a value of sexSegregation other than "**" may imply some degree of non-reciprocality, but it is not necessary to specify reciprocal to be F for this reason; SLiM will take the sex-segregation of the interaction into account for you.  The value of reciprocal may therefore be interpreted as meaning: in those cases, if any, in which A interacts with B and B interacts with A, is the interaction strength guaranteed to be the same in both directions?  The sexSegregation parameter is shorthand for setting sex constraints on the interaction type using the setConstraints() method; see that method for a more extensive set of constraints that may be used.

By default, the interaction strength is 1.0 for all interactions within maxDistance.  Often it is desirable to change the interaction function using setInteractionFunction(); modifying interaction strengths can also be achieved with interaction() callbacks if necessary.  In any case, interactions beyond maxDistance always have a strength of 0.0, and the interaction strength of an individual with itself is always 0.0, regardless of the interaction function or callbacks.

The global symbol for the new interaction type is immediately available; the return value also provides the new object.  Note that in multispecies models, initializeInteractionType() must be called from a non-species-specific interaction() callback (declared as species all initialize()), since interactions are managed at the community level.

(void)initializeMutationRate(numeric rates, [Ni ends = NULL], [string$ sex = "*"])

Set the mutation rate per base position per gamete.  To be precise, this mutation rate is the expected mean number of mutations that will occur per base position per gamete; note that this is different from how the recombination rate is defined (see initializeRecombinationRate()).  The number of mutations that actually occurs at a given base position when generating an offspring genome is, in effect, drawn from a Poisson distribution with that expected mean (but under the hood SLiM uses a mathematically equivalent but much more efficient strategy).  It is possible for this Poisson draw to indicate that two or more new mutations have arisen at the same base position, particularly when the mutation rate is very high; in this case, the new mutations will be added to the site one at a time, and as always the mutation stacking policy will be followed.

There are two ways to call this function.  If the optional ends parameter is NULL (the default), then rates must be a singleton value that specifies a single mutation rate to be used along the entire chromosome.  If, on the other hand, ends is supplied, then rates and ends must be the same length, and the values in ends must be specified in ascending order.  In that case, rates and ends taken together specify the mutation rates to be used along successive contiguous stretches of the chromosome, from beginning to end; the last position specified in ends should extend to the end of the chromosome (i.e. at least to the end of the last genomic element, if not further).

For example, if the following call is made:

initializeMutationRate(c(1e-7, 2.5e-8), c(5000, 9999));

then the result is that the mutation rate for bases 0...5000 (inclusive) will be 1e-7, and the rate for bases 5001...9999 (inclusive) will be 2.5e-8.

Note that mutations are generated by SLiM only within genomic elements, regardless of the mutation rate map.  In effect, the mutation rate map given is intersected with the coverage area of the genomic elements defined; areas outside of any genomic element are given a mutation rate of zero.  There is no harm in supplying a mutation rate map that specifies rates for areas outside of the genomic elements defined; that rate information is simply not used.  The overallMutationRate family of properties on Chromosome provide the overall mutation rate after genomic element coverage has been taken into account, so it will reflect the rate at which new mutations will actually be generated in the simulation as configured.

If the optional sex parameter is "*" (the default), then the supplied mutation rate map will be used for both sexes (which is the only option for hermaphroditic simulations).  In sexual simulations sex may be "M" or "F" instead, in which case the supplied mutation rate map is used only for that sex (i.e., when generating a gamete from a parent of that sex).  In this case, two calls must be made to initializeMutationRate(), one for each sex, even if a rate of zero is desired for the other sex; no default mutation rate map is supplied.

In nucleotide-based models, initializeMutationRate() may not be called.  Instead, the desired sequence-based mutation rate(s) should be expressed in the mutationMatrix parameter to initializeGenomicElementType().  If variation in the mutation rate along the chromosome is desired, initializeHotspotMap() should be used.

(object<MutationType>$)initializeMutationType(is$ id, numeric$ dominanceCoeff, string$ distributionType, ...)

Add a mutation type at initialization time.  The id must not already be used for any mutation type in the simulation.  The id parameter may be either an integer giving the ID of the new mutation type, or a string giving the name of the new mutation type (such as "m5" to specify an ID of 5).  The dominanceCoeff parameter supplies the dominance coefficient for the mutation type; 0.0 produces no dominance, 1.0 complete dominance, and values greater than 1.0, overdominance.  The distributionType may be "f", in which case the ellipsis ... should supply a numeric$ fixed selection coefficient; "e", in which case the ellipsis should supply a numeric$ mean selection coefficient for an exponential distribution; "g", in which case the ellipsis should supply a numeric$ mean selection coefficient and a numeric$ alpha shape parameter for a gamma distribution; "n", in which case the ellipsis should supply a numeric$ mean selection coefficient and a numeric$ sigma (standard deviation) parameter for a normal distribution; "p", in which case the ellipsis should supply a numeric$ mean selection coefficient and a numeric$ scale parameter for a Laplace distribution; "w", in which case the ellipsis should supply a numeric$ λ scale parameter and a numeric$ k shape parameter for a Weibull distribution; or "s", in which case the ellipsis should supply a string$ Eidos script parameter.  The global symbol for the new mutation type is immediately available; the return value also provides the new object.

Note that by default in WF models, all mutations of a given mutation type will be converted into Substitution objects when they reach fixation, for efficiency reasons.  If you need to disable this conversion, to keep mutations of a given type active in the simulation even after they have fixed, you can do so by setting the convertToSubstitution property of MutationType to F.  In contrast, by default in nonWF models mutations will not be converted into Substitution objects when they reach fixation; convertToSubstitution is F by default in nonWF models.  To enable conversion in nonWF models for neutral mutation types with no indirect fitness effects, you should therefore set convertToSubstitution to T.

(object<MutationType>$)initializeMutationTypeNuc(is$ id, numeric$ dominanceCoeff, string$ distributionType, ...)

Add a nucleotide-based mutation type at initialization time.  This function is identical to initializeMutationType() except that the new mutation type will be nucleotide-based – in other words, mutations belonging to the new mutation type will have an associated nucleotide.  This function may be called only in nucleotide-based models (as enabled by the nucleotideBased parameter to initializeSLiMOptions()).

Nucleotide-based mutations always use a mutationStackGroup of -1 and a mutationStackPolicy of "l".  This ensures that a new nucleotide mutation always replaces any previously existing nucleotide mutation at a given position, regardless of the mutation types of the nucleotide mutations.  These values are set automatically by initializeMutationTypeNuc(), and may not be changed.

See the documentation for initializeMutationType() for all other discussion.

(void)initializeRecombinationRate(numeric rates, [Ni ends = NULL], [string$ sex = "*"])

Set the recombination rate per base position per gamete.  To be precise, this recombination rate is the probability that a breakpoint will occur between one base and the next base; note that this is different from how the mutation rate is defined (see initializeMutationRate()).  All rates must be in the interval [0.0, 0.5].  A rate of 0.5 implies complete independence between the adjacent bases, which might be used to implement independent assortment of loci located on different chromosomes (see the example below).  Whether a breakpoint occurs between two bases is then, in effect, determined by a binomial draw with a single trial and the given rate as probability (but under the hood SLiM uses a mathematically equivalent but much more efficient strategy).  The recombinational process in SLiM will never generate more then one crossover between one base and the next (in one generation/genome), and a supplied rate of 0.5 will therefore result in an actual probability of 0.5 for a crossover at the relevant position.  (Note that this was not true in SLiM 2.x and earlier, however; their implementation of recombination resulted in a crossover probability of about 39.3% for a rate of 0.5, due to the use of an inaccurate approximation method.  Recombination rates lower than about 0.01 would have been essentially exact, since the approximation error became large only as the rate approached 0.5.)

There are two ways to call this function.  If the optional ends parameter is NULL (the default), then rates must be a singleton value that specifies a single recombination rate to be used along the entire chromosome.  If, on the other hand, ends is supplied, then rates and ends must be the same length, and the values in ends must be specified in ascending order.  In that case, rates and ends taken together specify the recombination rates to be used along successive contiguous stretches of the chromosome, from beginning to end; the last position specified in ends should extend to the end of the chromosome (i.e. at least to the end of the last genomic element, if not further).  Note that a recombination rate of 1 centimorgan/Mbp corresponds to a recombination rate of 1e-8 in the units used by SLiM.

For example, if the following call is made:

initializeRecombinationRate(c(0, 0.5, 0), c(5000, 5001, 9999));

then the result is that the recombination rates between bases 0 / 1, 1 / 2, ..., 4999 / 5000 will be 0, the rate between bases 5000 / 5001 will be 0.5, and the rate between bases 5001 / 5002 onward (up to 9998 / 9999) will again be 0.  Setting the recombination rate between one specific pair of bases to 0.5 forces recombination to occur with a probability of 0.5 between those bases, which effectively breaks the simulated locus into separate chromosomes at that point; this example effectively has one simulated chromosome from base position 0 to 5000, and another from 5001 to 9999.

If the optional sex parameter is "*" (the default), then the supplied recombination rate map will be used for both sexes (which is the only option for hermaphroditic simulations).  In sexual simulations sex may be "M" or "F" instead, in which case the supplied recombination map is used only for that sex.  In this case, two calls must be made to initializeRecombinationRate(), one for each sex, even if a rate of zero is desired for the other sex; no default recombination map is supplied.

(void)initializeSex(string$ chromosomeType)

Enable and configure sex in the simulation.  The argument chromosomeType gives the type of chromosome to be simulated; this should be "A", "X", or "Y".  Calling this function has the side effect of enabling sex in the simulation; individuals will be male and female (rather than hermaphroditic) regardless of the chromosomeType chosen for simulation.  There is no way to disable sex once it has been enabled; if you don’t want to have sex, don’t call this function.

The xDominanceCoeff parameter has been deprecated and removed.  In SLiM 3.7 and later, use the haploidDominanceCoeff property of MutationType instead.  If the chromosomeType is "X", the optional xDominanceCoeff parameter can supply the dominance coefficient used when a mutation is present in an XY male, and is thus “heterozygous” (but in a different sense than the heterozygosity of an XX female with one copy of the mutation).

(void)initializeSLiMModelType(string$ modelType)

Configure the type of SLiM model used for the simulation.  At present, one of two model types may be selected.  If modelType is "WF", SLiM will use a Wright-Fisher (WF) model; this is the model type that has always been supported by SLiM, and is the model type used if initializeSLiMModelType() is not called.  If modelType is "nonWF", SLiM will use a non-Wright-Fisher (nonWF) model instead; this is a new model type supported by SLiM 3.0 and above.

If initializeSLiMModelType() is called at all then it must be called before any other initialization function, so that SLiM knows from the outset which features are enabled and which are not.

(void)initializeSLiMOptions([logical$ keepPedigrees = F], [string$ dimensionality = ""], [string$ periodicity = ""], [integer$ mutationRuns = 0], [logical$ preventIncidentalSelfing = F], [logical$ nucleotideBased = F], [logical$ randomizeCallbacks = T])

Configure options for the simulation.  If initializeSLiMOptions() is called at all then it must be called before any other initialization function (except initializeSLiMModelType()), so that SLiM knows from the outset which optional features are enabled and which are not.

If keepPedigrees is T, SLiM will keep pedigree information for every individual in the simulation, tracking the identity of its parents and grandparents.  This allows individuals to assess their degree of pedigree-based relatedness to other individuals (see Individual’s relatedness() and sharedParentCount() methods), as well as allowing a model to find “trios” (two parents and an offspring they generated) using the pedigree properties of Individual.  As a side effect of keepPedigrees being T, the pedigreeID, pedigreeParentIDs, and pedigreeGrandparentIDs properties of Individual will have defined values, as will the genomePedigreeID property of Genome.  Note that pedigree-based relatedness doesn’t necessarily correspond to genetic relatedness, due to effects such as assortment and recombination.  Beginning in SLiM 3.5, keepPedigrees=T also enables tracking of individual reproductive output, available through the reproductiveOutput property of Individual and the lifetimeReproductiveOutput property of Subpopulation.

If dimensionality is not "", SLiM will enable its optional “continuous space” facility.  Three values for dimensionality are presently supported: "x", "xy", and "xyz", specifying that continuous space should be enabled for one, two, or three dimensions, respectively, using (x), (x, y), and (x, y, z) coordinates respectively.  This has a number of side effects.  First of all, it means that the specified properties of Individual (x, y, and/or z) will be interpreted by SLiM as spatial positions; in particular, SLiMgui will use those properties to display subpopulations spatially.  Second, it allows spatial interactions to be defined, evaluated, and queried using initializeInteractionType() and interaction() callbacks.  And third, it enables the use of any other properties and methods related to continuous space, such as setting the spatial boundaries of subpopulations, which would otherwise raise an error.

If periodicity is not "", SLiM will designate the specified spatial dimensions as being periodic – wrapping around at the edges of the spatial boundaries of that dimension.  This option may only be used if the dimensionality parameter to initializeSLiMOptions() has been used to enable spatiality in the model, and only spatial dimensions that were specified in the dimensionality of the model may be declared to be periodic (but if desired, it is permissible to make just a subset of those dimensions periodic; it is not an all-or-none proposition).  For example, if the specified dimensionality is "xy", the model’s periodicity may be "x", "y", or "xy" (or "", the default, to specify that there are no periodic dimensions).  A one-dimensional periodic model would model a space like the perimeter of a circle.  A two-dimensional model periodic in one of those dimensions would model a space like a cylinder without its end caps; if periodic in both dimensions, the modeled space is a torus.  The shapes of three-dimensional periodic models are harder to visualize, but are essentially higher-dimensional analogues of these concepts.  Periodic boundary conditions are commonly used to model spatial scenarios without “edge effects”, since there are no edges in the periodic spatial dimensions.  The pointPeriodic() method of Subpopulation is typically used in conjunction with this option, to actually implement the periodic boundary condition for the specified dimensions.

If mutationRuns is not 0, SLiM will use the value given as the number of mutation runs inside Genome objects; if it is 0 (the default), SLiM will calculate a number of mutation runs that it estimates will work well.  Internally, SLiM divides genomes into a sequence of consecutive mutation runs, allowing more efficient internal computations.  The optimal mutation run length is short enough that each mutation run is relatively unlikely to be modified by mutation/recombination events when inherited, but long enough that each mutation run is likely to contain a relatively large number of mutations; these priorities are in tension, so an intermediate balance between them is generally desirable.  The optimal number of mutation runs will depend upon the machine and even the compiler used to build SLiM, so SLiM’s default value may not be optimal; for maximal performance it can thus be beneficial to experiment with different values and find the optimal value for the simulation.  Specifying the number of mutation runs is an advanced technique, but in certain cases it can improve performance significantly; in particular, if a simulation involves a very long chromosome but only a small portion of that chromosome is actually used by the simulation, it may be beneficial to specify that a single mutation run be used with mutationRuns=1.

If preventIncidentalSelfing is T, incidental selfing in hermaphroditic models will be prevented by SLiM.  By default (i.e., if preventIncidentalSelfing is F), SLiM chooses the first and second parents in a biparental mating event independently.  It is therefore possible for the same individual to be chosen as both the first and second parent, resulting in selfing events even when the selfing rate is zero.  In many models this is unimportant, since it happens fairly infrequently and does not have large consequences.  This behavior is SLiM’s default because it is the simplest option, and produces results that most closely align with simple analytical population genetics models.  However, in some models this selfing can be undesirable and problematic.  In particular, models that involve very high variance in fitness or very small effective population sizes may see elevated rates of selfing that substantially influence model results.  If preventIncidentalSelfing is set to T, all such incidental selfing will be prevented (by choosing a new second parent if the first parent was chosen again).  Non-incidental selfing, as requested by the selfing rate, will still be permitted.  Note that if incidental selfing is prevented, SLiM will hang if it is unable to find a different second parent; there must always be at least two individuals in the population with non-zero fitness, and mateChoice() and modifyChild() callbacks must not absolutely prevent those two individuals from producing viable offspring.  Enforcement of the prohibition on incidental selfing will occur after mateChoice() callbacks have been called (and thus the default mating weights provided to mateChoice() callbacks will not exclude the first parent!), but will occur before modifyChild() callbacks are called (so those callbacks may assume that the first and second parents are distinct).

If nucleotideBased is T, the model will be nucleotide-based.  In this case, auto-generated mutations (i.e., mutation types used by genomic element types) must be nucleotide-based, and an ancestral nucleotide sequence must be supplied with initializeAncestralNucleotides().  Non-nucleotide-based mutations may still be used, but may not be referenced by genomic element types.  A mutation rate (or rate map) may not be supplied with initializeMutationRate(); instead, a hotspot map may (optionally) be supplied with initializeHotspotMap().  This choice has many consequences across SLiM. 

If randomizeCallbacks is T (the default), the order in which individuals are processed in callbacks will be randomized to make it easier to avoid order-dependency bugs.  This flag exists because the order of individuals in each subpopulation is non-random; most notably, females always come before males in the individuals vector, but non-random ordering may also occur with respect to things like migrant versus non-migrant status, origin by selfing versus cloning versus biparental mating, and other factors.  When this option is F, individuals in a subpopulation are processed in the order of the individuals vector in each tick cycle stage, which may lead to order-dependency issues if there is an enabled callback whose behavior is not fully independent between calls.  Setting this option to T will cause individuals within each subpopulation to be processed in a randomized order in each tick cycle stage; specifically, this randomizes the order of calls to mutationEffect() callbacks in both WF and nonWF models, and calls to reproduction() and survival() callbacks in nonWF models.  Each subpopulation is still processed separately, in sequential order, so order-dependency issues between subpopulations are still possible if callbacks have effects that are not fully independent.  This feature was added in SLiM 4, breaking backward compatibility; to recover the behavior of previous versions of SLiM, pass F for this option (but then be very careful about order-dependency issues in your script).  The default of T is the safe option, but a small speed penalty is incurred by the randomization of the processing order – for most models the difference will be less than 1%, but in the worst case it may approach 10%.  Models that do not have any order-dependency issue may therefore run somewhat faster if this is set to F.  Note that anywhere that your script uses the individuals property of Subpopulation, the order of individuals returned will be non-random (regardless of the setting of this option); you should use sample() to shuffle the order of the individuals vector if necessary to avoid order-dependency issues in your script.

This function will likely be extended with further options in the future, added on to the end of the argument list.  Using named arguments with this call is recommended for readability.  Note that turning on optional features may increase the runtime and memory footprint of SLiM.

(void)initializeSpecies([integer$ tickModulo = 1], [integer$ tickPhase = 1], [string$ avatar = ""], [string$ color = ""])

Configure options for the species being initialized.  This initialization function may only be called in multispecies models (i.e., models with explicit species declarations); in single-species models, the default values are assumed and cannot be changed.

The tickModulo and tickPhase parameters determine the activation schedule for the species.  The active property of the species will be set to T (thus activating the species) every tickModulo ticks, beginning in tick tickPhase.  (However, when the species is activated in a given tick, the skipTick() method may still be called in a first() event to deactivate it.)  See the active property of Species for more details.

The avatar parameter, if not "", sets a string value used to represent the species graphically, particularly in SLiMgui but perhaps in other contexts also.  The avatar should generally be a single character – usually an emoji corresponding to the species, such as "🦊" for foxes or "🐭" for mice.  If avatar is the empty string, "", SLiMgui will choose a default avatar.

The color parameter, if not "", sets a string color value used to represent the species in SLiMgui.  Colors may be specified by name, or with hexadecimal RGB values of the form "#RRGGBB" (see the Eidos manual for details).  If color is the empty string, "", SLiMgui will choose a default color.

(void)initializeTreeSeq([logical$ recordMutations = T], [Nif$ simplificationRatio = NULL], [Ni$ simplificationInterval = NULL], [logical$ checkCoalescence = F], [logical$ runCrosschecks = F], [logical$ retainCoalescentOnly = T], [Ns$ timeUnit = NULL])

Configure options for tree sequence recording.  Calling this function turns on tree sequence recording, as a side effect, for later reconstruction of the simulation’s evolutionary dynamics; if you do not want tree sequence recording to be enabled, do not call this function. Note that tree-sequence recording internally uses SLiM’s “pedigree tracking” feature to uniquely identify individuals and genomes; however, if you want to use pedigree tracking in your script you must still enable it yourself with initializeSLiMOptions(keepPedigrees=T).

The recordMutations flag controls whether information about individual mutations is recorded or not.  Such recording takes time and memory, and so can be turned off if only the tree sequence itself is needed, but it is turned on by default since mutation recording is generally useful.

The simplificationRatio and simplificationInterval parameters control how often automatic simplification of the recorded tree sequence occurs.  This is a speed–memory tradeoff: more frequent simplification (lower simplificationRatio or smaller simplificationInterval) means the stored tree sequences will use less memory, but at a cost of somewhat longer run times.  Conversely, a larger simplificationRatio or simplificationInterval means that SLiM will wait longer between simplifications.  There are three ways these parameters can be used.  With the first option, with a non-NULL simplificationRatio and a NULL value for simplificationInterval, SLiM will try to find an optimal tick interval for simplification such that the ratio of the memory used by the tree sequence tables, (before:after) simplification, is close to the requested ratio. The default of 10 (used if both simplificationRatio and simplificationInterval are NULL) thus requests that SLiM try to find a tick interval such that the maximum size of the stored tree sequences is ten times the size after simplification. INF may be supplied to indicate that automatic simplification should never occur; 0 may be supplied to indicate that automatic simplification should be performed at the end of every tick.  Alternatively – the second option – simplificationRatio may be NULL and simplificationInterval may be set to the interval, in ticks, between simplifications.  This may provide more reliable performance, but the interval must be chosen carefully to avoid exceeding the available memory.  The simplificationInterval value may be a very large number to specify that simplification should never occur (not INF, though, since it is an integer value), or 1 to simplify every tick.  Finally – the third option – both parameters may be non-NULL, in which case simplificationRatio is used as described above, while simplificationInterval provides the initial interval first used by SLiM (and then subsequently increased or decreased to try to match the requested simplification ratio).  The default initial interval, used when simplificationInterval is NULL, is usually 20; this is chosen to be relatively frequent, and thus unlikely to lead to a memory overflow, but it can result in rather slow spool-up for models where the equilibrium simplification interval, as determined by the simplification ratio, is much longer.  It can therefore be helpful to set a larger initial interval so that the early part of the model run is not excessively bogged down in simplification.

The checkCoalescence parameter controls whether a check for full coalescence is conducted after each simplification.  If a model will call treeSeqCoalesced() to check for coalescence during its execution, checkCoalescence should be set to T.  Since the coalescence checks entail a performance penalty, the default of F is preferable otherwise.  See the documentation for treeSeqCoalesced() for further discussion.

The runCrosschecks parameter controls whether cross-checks between SLiM’s internal data structures and the tree-sequence recording data structures will be conducted.  These two sets of data structures record much the same thing (mutations in genomes), but using completely different representations, so such cross-checks can be useful to confirm that the two data structures do indeed represent the same conceptual state.  This slows down the model considerably, however, and would normally be turned on only for debugging purposes, so it is turned off by default.

The retainCoalescentOnly parameter controls how, exactly, simplification of the tree-sequence data is performed in SLiM (both for auto-simplification and for calls to treeSeqSimplify()).  More specifically, this parameter controls the behavior of simplification for individuals and genomes that have been “retained” by calling treeSeqRememberIndividuals() with the parameter permanent=F.  The default of retainCoalescentOnly=T helps to keep the number of retained individuals relatively small, which is helpful if your simulation regularly flags many individuals for retaining.  In this case, changing retainCoalescentOnly to F may dramatically increase memory usage and runtime, in a similar way to permanently remembering all the individuals.  See the documentation of treeSeqRememberIndividuals() for further discussion.

The timeUnit parameter controls the time unit stated in the tree sequence when it is saved (which can be accessed through tskit APIs); it has no effect on the running simulation whatsoever.  The default value, NULL, means that a time unit of "ticks" will be used for all model types.  (In SLiM 3.7 / 3.7.1, NULL implied a time unit of "generations" for WF models, but "ticks" for nonWF models; given the new multispecies timescale parameters in SLiM 4, a default of "ticks" makes sense in all cases since now even in WF models one tick might not equal one biological generation.)  It may be helpful to set timeUnit to "generations" explicitly when modeling non-overlapping generations in which one tick equals one generation, to tell tskit that the time unit does in fact represent biological generations; doing so may avoid warnings from tskit or msprime regarding the time unit, in cases such as recapitation where the simulation timescale is important.

3.2.  Nucleotide utilities

(is)codonsToAminoAcids(integer codons, [li$ long = F], [logical$ paste = T])

Returns the amino acid sequence corresponding to the codon sequence in codons.  Codons should be represented with values in [0, 63] where AAA is 0, AAC is 1, AAG is 2, and TTT is 63; see ancestralNucleotides() for discussion of this encoding.  If long is F (the default), the standard single-letter codes for amino acids will be used (where Serine is "S", etc.); if long is T, the standard three-letter codes will be used instead (where Serine is "Ser", etc.).  Beginning in SLiM 3.5, if long is 0, integer codes will be used as follows (and paste will be ignored):

stop (TAA, TAG, TGA) 0
Alanine 1
Arginine 2
Asparagine 3
Aspartic acid (Aspartate) 4
Cysteine 5
Glutamine 6
Glutamic acid (Glutamate) 7
Glycine 8
Histidine 9
Isoleucine 10
Leucine 11
Lysine 12
Methionine 13
Phenylalanine 14
Proline 15
Serine 16
Threonine 17
Tryptophan 18
Tyrosine 19
Valine 20

There does not seem to be a widely used standard for integer coding of amino acids, so SLiM just numbers them alphabetically, making stop codons 0.  If you want a different coding, you can make your own 64-element vector and use it to convert codons to whatever integer codes you need.  Other integer values of long are reserved for future use (to support other codings), and will currently produce an error.

When long is T or F and paste is T (the default), the amino acid sequence returned will be a singleton string, such as "LYATI" (when long is F) or "Leu-Tyr-Ala-Thr-Ile" (when long is T).  When long is T or F and paste is F, the amino acid sequence will instead be returned as a string vector, with one element per amino acid, such as "L" "Y" "A" "T" "I" (when long is F) or "Leu" "Tyr" "Ala" "Thr" "Ile" (when long is T).  Using the paste=T option is considerably faster than using paste() in script.

This function interprets the supplied codon sequence as the sense strand (i.e., the strand that is not transcribed, and which mirrors the mRNA’s sequence).  This uses the standard DNA codon table directly.  For example, if the nucleotide sequence is CAA TTC, that will correspond to a codon vector of 16 61, and will result in the amino acid sequence Gln-Phe ("QF").

(is)codonsToNucleotides(integer codons, [string$ format = "string"])

Returns the nucleotide sequence corresponding to the codon sequence supplied in codons.  Codons should be represented with values in [0, 63] where AAA is 0, AAC is 1, AAG is 2, and TTT is 63; see ancestralNucleotides() for discussion of this encoding.

The format parameter controls the format of the returned sequence.  It may be "string" to obtain the sequence as a singleton string (e.g., "TATACG"), "char" to obtain it as a string vector of single characters (e.g., "T", "A", "T", "A", "C", "G"), or "integer" to obtain it as an integer vector (e.g., 3, 0, 3, 0, 1, 2), using SLiM’s standard code of A=0, C=1, G=2, T=3.

(float)mm16To256(float mutationMatrix16)

Returns a 64×4 mutation matrix that is functionally identical to the supplied 4×4 mutation matrix in mutationMatrix16.  The mutation rate for each of the 64 trinucleotides will depend only upon the central nucleotide of the trinucleotide, and will be taken from the corresponding entry for the same nucleotide in mutationMatrix16.  This function can be used to easily construct a simple trinucleotide-based mutation matrix which can then be modified so that specific trinucleotides sustain a mutation rate that does not depend only upon their central nucleotide.

See the documentation for initializeGenomicElementType() for further discussion of how these 64×4 mutation matrices are interpreted and used.

(float)mmJukesCantor(float$ alpha)

Returns a mutation matrix representing a Jukes–Cantor (1969) model with mutation rate alpha to each possible alternative nucleotide at a site.  This 2×2 matrix is suitable for use with initializeGenomicElementType().  Note that the actual mutation rate produced by this matrix is 3*alpha.

(float)mmKimura(float$ alpha, float$ beta)

Returns a mutation matrix representing a Kimura (1980) model with transition rate alpha and transversion rate beta.  This 2×2 matrix is suitable for use with initializeGenomicElementType().  Note that the actual mutation rate produced by this model is alpha+2*beta.

(integer)nucleotideCounts(is sequence)

A convenience function that returns an integer vector of length four, providing the number of occurrences of A / C / G / T nucleotides, respectively, in the supplied nucleotide sequence.  The parameter sequence may be a singleton string (e.g., "TATA"), a string vector of single characters (e.g., "T", "A", "T", "A"), or an integer vector (e.g., 3, 0, 3, 0), using SLiM’s standard code of A=0, C=1, G=2, T=3.

(float)nucleotideFrequencies(is sequence)

A convenience function that returns a float vector of length four, providing the frequencies of occurrences of A / C / G / T nucleotides, respectively, in the supplied nucleotide sequence.  The parameter sequence may be a singleton string (e.g., "TATA"), a string vector of single characters (e.g., "T", "A", "T", "A"), or an integer vector (e.g., 3, 0, 3, 0), using SLiM’s standard code of A=0, C=1, G=2, T=3.

(integer)nucleotidesToCodons(is sequence)

Returns the codon sequence corresponding to the nucleotide sequence in sequence.  The codon sequence is an integer vector with values from 0 to 63, based upon successive nucleotide triplets in the nucleotide sequence.  The codon value for a given nucleotide triplet XYZ is 16X + 4Y + Z, where X, Y, and Z have the usual values A=0, C=1, G=2, T=3.  For example, the triplet AAA has a codon value of 0, AAC is 1, AAG is 2, AAT is 3, ACA is 4, and on upward to TTT which is 63.  If the nucleotide sequence AACACATTT is passed in, the codon vector 1 4 63 will therefore be returned.  These codon values can be useful in themselves; they can also be passed to codonsToAminoAcids() to translate them into the corresponding amino acid sequence if desired.

The nucleotide sequence in sequence may be supplied in any of three formats: a string vector with single-letter nucleotides (e.g., "T", "A", "T", "A"), a singleton string of nucleotide letters (e.g., "TATA"), or an integer vector of nucleotide values (e.g., 3, 0, 3, 0) using SLiM’s standard code of A=0, C=1, G=2, T=3.  If the choice of format is not driven by other considerations, such as ease of manipulation, then the singleton string format will certainly be the most memory-efficient for long sequences, and will probably also be the fastest.  The nucleotide sequence provided must be a multiple of three in length, so that it translates to an integral number of codons.

(is)randomNucleotides(integer$ length, [Nif basis = NULL], [string$ format = "string"])

Generates a new random nucleotide sequence with length bases.  The four nucleotides ACGT are equally probable if basis is NULL (the default); otherwise, basis may be a 4-element integer or float vector providing relative fractions for A, C, G, and T respectively (these need not sum to 1.0, as they will be normalized).  More complex generative models such as Markov processes are not supported intrinsically in SLiM at this time, but arbitrary generated sequences may always be loaded from files on disk.

The format parameter controls the format of the returned sequence.  It may be "string" to obtain the generated sequence as a singleton string (e.g., "TATA"), "char" to obtain it as a string vector of single characters (e.g., "T", "A", "T", "A"), or "integer" to obtain it as an integer vector (e.g., 3, 0, 3, 0), using SLiM’s standard code of A=0, C=1, G=2, T=3.  For passing directly to initializeAncestralNucleotides(), format "string" (a singleton string) will certainly be the most memory-efficient, and probably also the fastest.  Memory efficiency can be a significant consideration; the nucleotide sequence for a chromosome of length 109 will occupy approximately 1 GB of memory when stored as a singleton string (with one byte per nucleotide), and much more if stored in the other formats.  However, the other formats can be easier to work with in Eidos, and so may be preferable for relatively short chromosomes if you are manipulating the generated sequence.

3.3.  Population genetics utilities

(float$)calcFST(object<Genome> genomes1, object<Genome> genomes2, [No<Mutation> muts = NULL], [Ni$ start = NULL], [Ni$ end = NULL])

Calculates the FST between two Genome vectors – typically, but not necessarily, the genomes that constitute two different subpopulations (which we will assume for the purposes of this discussion).  In general, higher FST indicates greater genetic divergence between subpopulations.

The calculation is done using only the mutations in muts; if muts is NULL, all mutations are used.  The muts parameter can therefore be used to calculate the FST only for a particular mutation type (by passing only mutations of that type).

The calculation can be narrowed to apply to only a window – a subrange of the full chromosome – by passing the interval bounds [start, end] for the desired window.  In this case, the vector of mutations used for the calculation will be subset to include only mutations within the specified window.  The default behavior, with start and end of NULL, provides the genome-wide FST, which is often used to assess the overall level of genetic divergence between sister species or allopatric subpopulations.

The code for calcFST() is, roughly, an Eidos implementation of Wright’s definition of FST (but see below for further discussion and clarification):

FST = 1 - HS / HT

where HS is the average heterozygosity in the two subpopulations, and HT is the total heterozygosity when both subpopulations are combined.  In this implementation, the two genome vectors are weighted equally, not weighted by their size.  In SLiM 3, the implementation followed Wright’s definition closely, and returned the average of ratios: mean(1.0 - H_s/H_t), in the Eidos code.  In SLiM 4, it returns the ratio of averages instead: 1.0 - mean(H_s)/mean(H_t).  In other words, the FST value reported by SLiM 4 is an average across the specified mutations in the two sets of genomes, where H_s and H_t are first averaged across all specified mutations prior to taking the ratio of the two.  This ratio of averages is less biased than the average of ratios, and and is generally considered to be best practice (see, e.g., Bhatia et al., 2013).  This means that the behavior of calcFST() differs between SLiM 3 and SLiM 4.

The implementation of calcFST(), viewable with functionSource(), treats every mutation in muts as independent in the heterozygosity calculations; in other words, if mutations are stacked, the heterozygosity calculated is by mutation, not by site.  Similarly, if multiple Mutation objects exist in different genomes at the same site (whether representing different genetic states, or multiple mutational lineages for the same genetic state), each Mutation object is treated separately for purposes of the heterozygosity calculation, just as if they were at different sites.  One could regard these choices as embodying an infinite-sites interpretation of the segregating mutations.  In most biologically realistic models, such genetic states will be quite rare, and so the impact of these choices will be negligible; however, in some models these distinctions may be important.

(float$)calcHeterozygosity(object<Genome> genomes, [No<Mutation> muts = NULL], [Ni$ start = NULL], [Ni$ end = NULL])

Calculates the heterozygosity for a vector of genomes, based upon the frequencies of mutations in the genomes.  The result is the expected heterozygosity, for the individuals to which the genomes belong, assuming that they are under Hardy-Weinberg equilibrium; this can be compared to the observed heterozygosity of an individual, as calculated by calcPairHeterozygosity().  Often genomes will be all of the genomes in a subpopulation, or in the entire population, but any genome vector may be used.  By default, with muts=NULL, the calculation is based upon all mutations in the simulation; the calculation can instead be based upon a subset of mutations, such as mutations of a specific mutation type, by passing the desired vector of mutations for muts.

The calculation can be narrowed to apply to only a window – a subrange of the full chromosome – by passing the interval bounds [start, end] for the desired window.  In this case, the vector of mutations used for the calculation will be subset to include only mutations within the specified window.  The default behavior, with start and end of NULL, provides the genome-wide heterozygosity.

The implementation of calcHeterozygosity(), viewable with functionSource(), treats every mutation as independent in the heterozygosity calculations.  One could regard this choice as embodying an infinite-sites interpretation of the segregating mutations.  In most biologically realistic models, such genetic states will be quite rare, and so the impact of this choice will be negligible; however, in some models this distinction may be important.  See calcPairHeterozygosity() for further discussion.

(float$)calcInbreedingLoad(object<Genome> genomes, [No<MutationType>$ mutType = NULL])

Calculates inbreeding load (the haploid number of lethal equivalents, or B) for a vector of genomes passed in genomes.  The calculation can be limited to a focal mutation type passed in mutType; if mutType is NULL (the default), all of the mutations for the focal species will be considered.  In any case, only deleterious mutations (those with a negative selection coefficient) will be included in the final calculation.

The inbreeding load is a measure of the quantity of recessive deleterious variation that is heterozygous in a population and can contribute to fitness declines under inbreeding.  This function implements the following equation from Morton et al. (1956), which assumes no epistasis and random mating:

B = sum(qs) − sum(q2s) − 2sum(q(1−q)sh)

where q is the frequency of a given deleterious allele, s is the absolute value of the selection coefficient, and h is its dominance coefficient.  Note that the implementation sets a maximum |s| of 1.0 (i.e., a lethal allele); |s| can sometimes be greater than 1.0 when s is drawn from a distribution, but in practice an allele with s < -1.0 has the same lethal effect as when s = -1.0.  Also note that this implementation will not work when the model changes the dominance coefficients of mutations using mutationEffect() callbacks, since it relies on the dominanceCoeff property of MutationType. Finally, note that, to estimate the diploid number of lethal equivalents (2B), the result from this function can simply be multiplied by two.

This function was contributed by Chris Kyriazis; thanks, Chris!

(float$)calcPairHeterozygosity(object<Genome>$ genome1, object<Genome>$ genome2, [Ni$ start = NULL], [Ni$ end = NULL], [logical$ infiniteSites = T])

Calculates the heterozygosity for a pair of genomes; these will typically be the two genomes of a diploid individual (individual.genome1 and individual.genome2), but any two genomes may be supplied.

The calculation can be narrowed to apply to only a window – a subrange of the full chromosome – by passing the interval bounds [start, end] for the desired window.  In this case, the vector of mutations used for the calculation will be subset to include only mutations within the specified window.  The default behavior, with start and end of NULL, provides the genome-wide heterozygosity.

The implementation of calcPairHeterozygosity(), viewable with functionSource(), treats every mutation as independent in the heterozygosity calculations by default (i.e., with infiniteSites=T).  If mutations are stacked, the heterozygosity calculated therefore depends upon the number of unshared mutations, not the number of differing sites.  Similarly, if multiple Mutation objects exist in different genomes at the same site (whether representing different genetic states, or multiple mutational lineages for the same genetic state), each Mutation object is treated separately for purposes of the heterozygosity calculation, just as if they were at different sites.  One could regard these choices as embodying an infinite-sites interpretation of the segregating mutations.  In most biologically realistic models, such genetic states will be quite rare, and so the impact of this choice will be negligible; however, in some models this distinction may be important.  The behavior of calcPairHeterozygosity() can be switched to calculate based upon the number of differing sites, rather than the number of unshared mutations, by passing infiniteSites=F.

(float$)calcWattersonsTheta(object<Genome> genomes, [No<Mutation> muts = NULL], [Ni$ start = NULL], [Ni$ end = NULL])

Calculates Watterson’s theta (a metric of genetic diversity comparable to heterozygosity) for a vector of genomes, based upon the mutations in the genomes.  Often genomes will be all of the genomes in a subpopulation, or in the entire population, but any genome vector may be used.  By default, with muts=NULL, the calculation is based upon all mutations in the simulation; the calculation can instead be based upon a subset of mutations, such as mutations of a specific mutation type, by passing the desired vector of mutations for muts.

The calculation can be narrowed to apply to only a window – a subrange of the full chromosome – by passing the interval bounds [start, end] for the desired window.  In this case, the vector of mutations used for the calculation will be subset to include only mutations within the specified window.  The default behavior, with start and end of NULL, provides the genome-wide Watterson’s theta.

The implementation of calcWattersonsTheta(), viewable with functionSource(), treats every mutation as independent in the heterozygosity calculations.  One could regard this choice as embodying an infinite-sites interpretation of the segregating mutations, as with calcHeterozygosity().  In most biologically realistic models, such genetic states will be quite rare, and so the impact of this assumption will be negligible; however, in some models this distinction may be important.  See calcPairHeterozygosity() for further discussion.

(float$)calcVA(object<Individual> individuals, io<MutationType>$ mutType)

Calculates VA, the additive genetic variance, among a vector individuals, in a particular mutation type mutType that represents quantitative trait loci (QTLs) influencing a quantitative phenotypic trait.  The mutType parameter may be either an integer representing the ID of the desired mutation type, or a MutationType object specified directly.

This function assumes that mutations of type mutType encode their effect size upon the quantitative trait in their selectionCoeff property, as is fairly standard in SLiM.  The implementation of calcVA(), which is viewable with functionSource(), is quite simple; if effect sizes are stored elsewhere (such as with setValue()), a new user-defined function following the pattern of calcVA() can easily be written.

3.4.  Other utilities

(float)summarizeIndividuals(object<Individual> individuals, integer dim, numeric spatialBounds, string$ operation, [Nlif$ empty = 0.0], [logical$ perUnitArea = F], [Ns$ spatiality = NULL])

Returns a vector, matrix, or array that summarizes spatial patterns of information related to the individuals in individuals.  In essence, those individuals are assigned into bins according to their spatial position, and then a summary value for each bin is calculated based upon the individuals each bin contains.  The individuals might be binned in one dimension (resulting in a vector of summary values), in two dimensions (resulting in a matrix), or in three dimensions (resulting in an array).  Typically the spatiality of the result (the dimensions into which the individuals are binned) will match the dimensionality of the model, as indicated by the default value of NULL for the optional spatiality parameter; for example, a two-dimensional ("xy") model would by default produce a two-dimensional matrix as a summary.  However, a spatiality that is more restrictive than the model dimensionality may be passed; for example, in a two-dimensional ("xy") model a spatiality of "y" could be passed to summarize individuals into a vector, rather than a matrix, assigning them to bins based only upon their y position (i.e., the value of their y property).  Whatever spatiality is chosen, the parameter dim provides the dimensions of the desired result, in the same form that the dim() function does: first the number of rows, then the number of columns, and then the number of planes, as needed (see the Eidos manual for discussion of matrices, arrays, and dim()).  The length of dims must match the requested spatiality; for spatiality "xy", for example, dims might be c(50,100) to request that the returned matrix have 50 rows and 100 columns.  The result vector/matrix/array is in the correct orientation to be directly usable as a spatial map, by passing it to the defineSpatialMap() method of Subpopulation.  For further discussion of dimensionality and spatiality, see initializeInteractionType() and InteractionType.

The spatialBounds parameter defines the spatial boundaries within which the individuals are binned.  Typically this is the spatial bounds of a particular subpopulation, within which the individuals reside; for individuals in p1, for example, you would likely pass p1.spatialBounds for this.  However, this is not required; individuals may come from any or all subpopulations in the model, and spatialBounds may be any bounds of non-zero area (if an individual falls outside of the given spatial bounds, it is excluded, as if it were not in individuals at all).  If you have multiple subpopulations that conceptually reside within the same overall coordinate space, for example, that can be accommodated here.  The bounds are supplied in the dimensionality of the model, in the same form as for Subpopulation; for an "xy" model, for example, they are supplied as a four-element vector of the form c(x0, y0, x1, y1) even if the summary is being produced with spatiality "y".  To produce the result, a grid with dimensions defined by dims is conceptually stretched out across the given spatial bounds, such that the centers of the edge and corner grid squares are aligned with the limits of the spatial bounds.  This matches the way that defineSpatialMap() defines its maps.

The particular summary produced depends upon the parameters operation and empty.  Consider a single grid square represented by a single element in the result.  That grid square contains zero or more of the individuals in individuals.  If it contains zero individuals and empty is not NULL, the empty value is used for the result, regardless of operation, providing specific, separate control over the treatment of empty grid squares.  If empty is NULL, this separate control over the treatment of empty grid squares is declined; empty grid squares will be handled through the standard mechanism described next.  In all other cases for the given grid square – when it contains more than zero individuals, or when empty is NULLoperation is executed as an Eidos lambda, a small snippet of code, supplied as a singleton string, that is executed in a manner similar to a function call.  Within the execution of the operation lambda, a constant named individuals is defined to be the focal individuals being evaluated – all of the individuals within that grid square.  This lambda should evaluate to a singleton logical, integer, or float value, comprising the result value for the grid square; these types will all be coerced to float (T being 1 and F being 0).

Two examples may illustrate the use of empty and operation.  To produce a summary indicating presence/absence, simply use the default of 0.0 for empty, and "1.0; " (or "1;", or "T;") for operation.  This will produce 0.0 for empty grid squares, and 1.0 for those that contain at least one individual.  Note that the use of empty is essential here, because operation doesn’t even check whether individuals are present or not.  To produce a summary with a count of the number of individuals in each grid square, again use the default of 0.0 for empty, but now use an operation of "individuals.size();", counting the number of individuals in each grid square.  In this case, empty could be NULL instead and operation would still produce the correct result; but using empty makes summarizeIndividuals() more efficient since it allows the execution of operation to be skipped for those squares.

Lambdas are not limited in their complexity; they can use if, for, etc., and can call methods and functions.  A typical operation to compute the mean phenotype in a quantitative genetic model that stores phenotype values in tagF, for example, would be "mean(individuals.tagF);", and this is still quite simple compared to what is possible.  However, keep in mind that the lambda will be evaluated for every grid cell (or at least those that are non-empty), so efficiency can be a concern, and you may wish to pre-calculate values shared by all of the lambda calls, making them available to your lambda code using defineGlobal() or defineConstant().

There is one last twist, if perUnitArea is T: values are divided by the area (or length, in 1D, or volume, in 3D) that their corresponding grid cell comprises, so that each value is in units of “per unit area” (or “per unit length”, or “per unit volume”).  The total area of the grid is defined by the spatial bounds, and the area of a given grid cell is defined by the portion of the spatial bounds that is within that cell.  This is not the same for all grid cells; grid cells that fall partially outside spatialBounds (because, remember, the centers of the edge/corner grid cells are aligned with the limits of spatialBounds) will have a smaller area inside the bounds.  For an "xy" spatiality summary, for example, corner cells have only a quarter of their area inside spatialBounds, while edge elements have half of their area inside spatialBounds; for purposes of perUnitArea, then, their respective areas are ¼ and ½ the area of an interior grid cell.  By default, perUnitArea is F, and no scaling is performed.  Whether you want perUnitArea to be F or T depends upon whether the summary you are producing is, conceptually, “per unit area”, such as density (individuals per unit area) or local competition strength (total interaction strength per unit area), or is not, such as “mean individual age”, or “maximum tag value”.  For the previous example of counting individuals with an operation of "individuals.size();", a value of F for perUnitArea (the default) will produce a simple count of individuals in each grid square, whereas with T it would produce the density of individuals in each grid square.

(object<Dictionary>$)treeSeqMetadata(string$ filePath, [logical$ userData = T])

Returns a Dictionary containing top-level metadata from the .trees (tree-sequence) file at filePath.  If userData is T (the default), the top-level metadata under the SLiM/user_metadata key is returned; this is the same metadata that can optionally be supplied to treeSeqOutput() in its metadata parameter, so it makes it easy to recover metadata that you attached to the tree sequence when it was saved.  If userData is F, the entire top-level metadata Dictionary object is returned; this can be useful for examining the values of other keys under the SLiM key, or values inside the top-level dictionary itself that might have been placed there by msprime or other software.

This function can be used to read in parameter values or other saved state (tag property values, for example), in order to resuscitate the complete state of a simulation that was written to a .trees file.  It could be used for more esoteric purposes too, such as to search through .trees files in a directory (with the help of the Eidos function filesAtPath()) to find those files that satisfy some metadata criterion.