 NCL Home > Documentation > Functions > Random number generators, Array creators

# generate_sample_indices

Generate indices (subscripts) for resampling: with and without replacement. Available in version 6.3.0 and later.

## Prototype

```load "\$NCARG_ROOT/lib/ncarg/nclscripts/csm/contributed.ncl"  ; This library is automatically loaded
; from NCL V6.2.0 onward.
; No need for user to explicitly load.

function generate_sample_indices (
N       : integer or long,
method  : integer
)
```

## Arguments

N

The number of indices (subscripts) to be generated.

method

Integer which specifies which type of indices should be generated.

• method=0: Sample without replacement. These can be used to reshuffle the order. Use when the order of sampling may be important. Statistics like the mean and standard deviation will not be changed.
• method=1: Sample with replacement. These can be used for (say) bootstrap resampling. Statistics like the mean and standard deviation will be changed.

## Return value

A one-dimensional array of size N containing the random integer indices (subscripts).

## Description

Uses a random algorithm to generate subscripts.

If method=1, the indicies are set using random_uniform. Please read the documentation for random_setallseed. The 'seeds' can be manually set via random_setallseed. Example 1 of random_setallseed illustrates a method of using the current date as a seed generator.

## Examples

The generate_sample_indices function uses the uniform random number generator. All NCL's random number functions use random 'seeds'. The seeds can be any integer between 1 and 2147483562 (default is 1234567890). The seeds may be set: (i) by default; (ii) manually specified; or, (iii) 'dynamically set. Some examples:

```
Manually set the random number seed(s)
rand1 = 1234567890
rand2 =  236484749
random_setallseed(rand1, rand2) ; also: random_setallseed(1234567890, 236484749)

or, have  the system generate some numbers. One approach:

rand1 = toint(systemfunc(" date +%s"))
rand2 = toint(systemfunc(" date +%s"))

```

Example 1: Generate N random indices with (method=1) and without (method=0) replacement.

```         n   = 10
iwo = generate_sample_indices( n, 0 )
iw  = generate_sample_indices( n, 1 )

print("iwo+"   "+iw)

The indices would be
iwo iw
(0)     5   8
(1)     3   3
(2)     0   2
(3)     1   5
(4)     8   7
(5)     9   0
(6)     4   0
(7)     2   2
(8)     6   0
(9)     7   7
```
Note that the iwo are unique while the iw have indices repeated (0, 7) while other indices (1, 4, 6, 9) are not present.

Example 2: Resample an array x[*] with and without replacement.

```         N   = 10
x   = fspan(17.89, 52.30,N)
iwo = generate_sample_indices( N, 0 )
iw  = generate_sample_indices( N, 1 )

xAvg     = avg(x)
xAvg_iwo = avg(x(iwo))
xAvg_iw  = avg(x(iw) )

xStd     = stddev(x)
xStd_iwo = stddev(x(iwo))
xStd_iw  = stddev(x(iw) )

print(x+"   "+iwo+"   "+ x(iwo)+"   "+ iw+"   "+x(iw)  )

print("avg:  "+xAvg+"      "+xAvg_iwo+"      "+xAvg_iw )
print("std:  "+xStd+"      "+xStd_iwo+"      "+xStd_iw )

x         iwo   x(iwo)     iw    x(iw)
(0)     17.89        2   25.5367      4   33.1833
(1)     21.7133      3   29.36        0   17.89
(2)     25.5367      1   21.7133      9   52.3
(3)     29.36        0   17.89        7   44.6533
(4)     33.1833      5   37.0067      3   29.36
(5)     37.0067      8   48.4767      4   33.1833
(6)     40.83        7   44.6533      8   48.4767
(7)     44.6533      4   33.1833      0   17.89
(8)     48.4767      6   40.83        1   21.7133
(9)     52.3         9   52.3         0   17.89

avg:    35.095           35.095           31.654
std:    11.5757          11.5757          13.1459

```
Example 3: Resample an array x(N,J,K) with and without replacement.
```
iwo = generate_sample_indices( N, 0 )
iw  = generate_sample_indices( N, 1 )

xw   = new (dimsizes(x), typeof(x))
xwo  = new (dimsizes(x), typeof(x))

do nl=0,nlat-1
do ml=0,mlon-1
xw(:,nl,ml)  = x(iw ,nl,ml)
xwo(:,nl,ml) = x(iwo,nl,ml)

; .... do something ...

end do
end do
```

==================================
Examples 4 and 5 illustrate the 'mechanics' of bootstrapping. They bootstrap the mean (average) statistic of a univariate variable. Bootstrapping (say) regression coefficients would require 'x' and 'y' variables. In this case, the 'y' would be resampled. Bootstrapping correlations would require that 'x' and 'y' pairs be resampled.

There are other approaches. For example, in correlation bootstrapping, rather then pair resampling, individual 'x' and 'y' pairs, random block sequences could be used. Say, N=1200, then randomly calculate the correlations of many sub-sample blocks of length (say) 50.
==================================

Example 4:

Resample an array x(N) 'nBoot' times with replacement. Since 'x' is one-dimensional, avg and stddev are used. (dim_avg_n and dim_stddev_n could have been used.) Sort the boot-strapped array in ascending order. Get robust (a) two sided 2.5 and 97.5% confidence limits and (b) one-sided 95% confidence limit.

This example is provided to illustrate the 'mechanics' of the bootstrap methodology. This does not account for any trend or other complications.

Bootstrapping can be computationally intense and, hence, time comsuming. A good test strategy to test the script with a small 'nBoot' to make sure the mechanics are working correctly.

```

N       = 100000
xMean   = 0.0             ; population mean
xStdDev = 10.0            ; populaton standard deviation

random_setallseed(36484749,119494848)   ; optional: see above comments
x       = random_normal(xMean, xStdDev,N)

xAvg    = avg(x)          ; Mean
xStd    = stddev(x)       ; Sample Standard Deviation
xStdErr = xStd/sqrt(N)                       ; std error of mean
print("N="+N+"  xAvg="+xAvg+"  xStd="+xStd+"  xStdErr="+xStdErr)

nBoot =   10000                           ; user specified
xBoot   = new (nBoot, typeof(x))

random_setallseed(119494848, 36484749)   ; optional: see above comments

;For 'fun', generate the sampling indices directly in the 'x' array.
do ns=1,nBoot
xBoot(ns-1) = avg( x(generate_sample_indices(N,1)) )
end do

xAvgBoot= avg(xBoot)      ; Averages of bootstrapped samples
xStdBoot= stddev(xBoot)   ; Std Dev  "        "        "
print("nBoot="+nBoot+"  xAvgBoot="+xAvgBoot+"  xStdBoot="+xStdBoot)

qsort(xBoot)                ; sort bootstrap means into ascending order

n025 = toint(0.025*nBoot)   ; indices for sorted array
n950 = toint(0.950*nBoot)
n975 = toint(0.975*nBoot)
print("n025="+n025+"  n950="+n950+"  n975="+n975)

xBoot_025 = xBoot(n025)     ;  2.5% level
xBoot_950 = xBoot(n950)     ; 95.0% level
xBoot_975 = xBoot(n975)     ; 97.5% level
print("xBoot_025="+xBoot_025)
print("xBoot_950="+xBoot_950)
print("xBoot_975="+xBoot_975)
```
The output for this examples is:
```
N=100000  xAvg=-0.00324151  xStd=10.0478               <== near 0, 10
nBoot=10000  xAvgBoot=-0.00394782  xStdBoot=0.0316319  <== near xStdErr

n025=250  n950=9500  n975=9750
xBoot_025=-0.0662322
xBoot_950=0.0472644
xBoot_975=0.0571641

xStdErr=0.0316228
```

Example 5:

Similar to Example 4 but resample an array containing 100 years of monthly mean values: x(ntim,nlat,mlon) where ntim=1200. Certainly, all 1200 time steps could be used BUT there is an annual cycle. To bootstrap the (say) January and July values, the monthly sub-samples to be boot strapped would be x(0::12,:,:) and x(6::12,:,:), respectively. Another approach would be: (i) calculate a monthly climatology (ii) calculate the anomalies from each monthly climatology; then, (iii) assuming the anomalies are independent of month, bootstrap the derived anomaly array.

This example is provided to illustrate the 'mechanics' of the bootstrap methodology on a multi-dimensional array. This does not account for trend or other complications.

Bootstrapping can be computationally intense and, hence, time comsuming. This is particulary true if the array sizes [(ntim,nlat,mlon)] are large. A prudent test strategy would be to test script with a small 'nBoot' to make sure the mechanics are working correctly.

```	fmdl    = addfile("MODEL.nc", "r")
x       = fmdl->TREFHT                      ; [time | 1200] x [lat | 192] x [lon | 288]

xAvg    = avg(xAnom)                        ; mean anomalies over all grid pts and times
xStd    = stddev(xAnom)                     ; stddev of all anomalies and times
print(xAvg)                                  ; scalar [hopefully 0.0  :-) ]
print(xStd)                                  ; scalar

dimx    = dimsizes(x)
ntim    = dimx(0)
nlat    = dimx(1)
mlon    = dimx(2)

units   = x@units                            ; convenience only
; calculate monthly climatologies
xClm   = clmMonTLL(x)                       ; [time | 12] x [lat | 192] x [lon | 288]
printVarSummary(xClm)
; calculate monthly anomalies
xAnom  = calcMonAnomTLL(x, xClm)           ; [time | 1200] x [lat | 192] x [lon | 288]
xAnom@units = units

delete([/x, xClm/])                          ; no longer needed

nBoot   = 1000                               ; This could be time consuming
; Test with small nBoot (eg: nBoot=10)
dimx(0) = nBoot                              ; 'trick'
xBoot   = new (dimx, typeof(xAnom))         ; array to hold nBoot results

do ns=0,nBoot-1                              ; generate bootstrap results
iw = generate_sample_indices(ntim,1)
xBoot(ns,:,:) = dim_avg_n( xAnom(iw,:,:), 0)
end do
delete(xAnom)                                ; no longer needed

; mean of all bootstrap samples at each grid point
xAvgBoot = dim_avg_n(xBoot, 0)              ;  x 
printVarSummary(xBoot)

idx  = dim_pqsort_n(xBoot, 2, 0) ; sort bootstrap means into ascending order at each grid point
i025 = toint(0.025*nBoot)
i975 = toint(0.975*nBoot)
; NCL's 'dimension reduction' will result in a 2D array
xBoot_025   = xBoot(i025,:,:)           ;  2.5% level; (nlat,mlon)
xBoot_975   = xBoot(i975,:,:)           ; 97.5% level; (nlat,mlon)
xBoot_Range = xBoot_975 - xBoot_025

xBoot_025@long_bame = "Bootstrap  2.5% Mean"
xBoot_025@units     = units
xBoot_975@long_bame = "Bootstrap 97.5% Mean"
xBoot_975@units     = units
xBoot_Range@long_name = "95% anomaly range"
xBoot_Range@units     = units

copy_VarCoords(xAnom(0,:,:), xBoot_025)
copy_VarCoords(xAnom(0,:,:), xBoot_975)

printVarSummary(xBoot_025)    ; [lat | 192] x [lon | 288]
printVarSummary(xBoot_975)    ; [lat | 192] x [lon | 288]
printVarSummary(xBoot_Range)  ; [lat | 192] x [lon | 288]
printMinMax(xBoot_Range, 1))
```