- Add function
`pcor()`

to compute partial correlations.

Add two options in

`big_spLinReg()`

and`big_spLogReg()`

;`power_scale`

for using a different scaling for LASSO and`power_adaptive`

for using adaptive LASSO (where larger marginal effects are penalized less). See documentation for details.`big_(c)prodVec()`

and`big_(c)prodMat()`

(re)gain a`ncores`

parameter. Note that for`big_(c)prodMat()`

, it might be beneficial to use the BLAS parallelism (with`bigparallelr::set_blas_ncores()`

) instead of this parameter, especially when the matrix`A`

is large-ish.

- Function
`big_colstats()`

can now be run in parallel (added parameter`ncores`

).

- It is now possible to use C++ FBM accessors without linking to {RcppArmadillo}.

Functions

`big_(c)prodMat()`

and`big_(t)crossprodSelf()`

now use much less memory, and may be faster.Add

`covar_from_df()`

to convert a data frame with factors/characters to a numeric matrix using one-hot encoding.

- Remove some ‘Suggests’ dependencies.

Add a new column

`$all_conv`

to output of`summary()`

for`big_spLinReg()`

and`big_spLogReg()`

to check whether all models have stopped because of “no more improvement”. Also add a new parameter`sort`

to`summary()`

.Now

`warn`

(enabled by default) if some models may not have reached a minimum when using`big_spLinReg()`

and`big_spLogReg()`

.

- Fix
`In .self$nrow * .self$ncol : NAs produced by integer overflow`

.

Make two different memory-mappings: one that is read-only (using

`$address`

) and one where it is possible to write (using`$address_rw`

). This enables to use file permissions to prevent modifying data.Also add a new field

`$is_read_only`

to be used to prevent modifying data (at least with`<-`

) even when you have write permissions to it. Functions creating an FBM now gain a parameter`is_read_only`

.Make vector accessors (e.g.

`X[1:10]`

) faster.

Move some code to new packages {bigassertr} and {bigparallelr}.

`big_randomSVD()`

gains arguments related to matrix-vector multiplication.`assert_noNA()`

is faster.

- Add
`big_increment()`

.

In `plot.big_SVD()`

,

Can now plot many PCA scores (more than two) at once.

Use

`coord_fixed()`

when plotting PCA scores because it is good practice.Use log-scale in scree plot to better see small differences in singular values.

Reexport

`cowplot::plot_grid()`

to merge multiple ggplots.

`AUCBoot()`

is now 6-7 times faster.

- Add parameters
`center`

and`scale`

to products.

- Fix a bug in
`big_univLogReg()`

for variables with no variation. IRLS was not converging, so`glm()`

was used instead. The problem is that`glm()`

drops dimensions causing singularities so that Z-score of the first covariate (or intercept) was used instead of a missing value.

Use

*mio*instead of*boost*for memory-mapping.Add a parameter

`base.row`

to`predict.big_sp_list()`

and automatically detect if needed (as well as for`covar.row`

).Possibility to subset a

`big_sp_list`

without losing attributes, so that one can access one model (corresponding to one alpha) even if it is not the ‘best’.Add parameters

`pf.X`

and`pf.covar`

in`big_sp***Reg()`

to provide different penalization for each variable (possibly no penalization at all).

Add `%*%`

, `crossprod`

and `tcrossprod`

operations for ‘double’ FBMs.

Now also returns the number of non-zero variables (`$nb_active`

) and the number of candidate variables (`$nb_candidate`

) for each step of the regularization paths of `big_spLinReg()`

and `big_spLogReg()`

.

- Parameters
`warn`

and`return.all`

of`big_spLinReg()`

and`big_spLogReg()`

are deprecated; now always return the maximum information. Now provide two methods (`summary`

and`plot`

) to get a quick assessment of the fitted models.

Check of missing values for input vectors (indices and targets) and matrices (covariables).

`AUC()`

is now stricter: it accepts only 0s and 1s for`target`

.

`$bm()`

and`$bm.desc()`

have been added in order to get an`FBM`

as a`filebacked.big.matrix`

. This enables using {bigmemory} functions.

- Type
`float`

added.

`big_write`

added.

`big_read`

now has a`filter`

argument to filter rows, and argument`nrow`

has been removed because it is now determined when reading the first block of data.Removed the

`save`

argument from`FBM`

(and others); now, you must use`FBM(...)$save()`

instead of`FBM(..., save = TRUE)`

.

You can now fill an FBM using a data frame. Note that factors will be used as integers.

Package {bigreadr} has been developed and is now used by

`big_read`

.

- There have been some changes regarding how conversion between types is checked. Before, you would get a warning for any possible loss of precision (without actually checking it). Now, any loss of precision due to conversion between types is reported as a warning, and only in this case. If you want to disable this feature, you can use
`options(bigstatsr.downcast.warning = FALSE)`

, or you can use`without_downcast_warning()`

to disable this warning for one call.

- change
`big_read`

so that it is faster (corresponding vignette updated).

possibility to add a “base predictor” for

`big_spLinReg`

and`big_spLogReg`

.**don’t store the whole regularization path (as a sparse matrix) in**`big_spLinReg`

and`big_spLogReg`

anymore because it caused major slowdowns.directly average the K predictions in

`predict.big_sp_best_list`

.only use the “PSOCK” type of cluster because “FORK” can leave zombies behind. You can change this with

`options(bigstatsr.cluster.type = "PSOCK")`

.

Fix a bug in

`big_spLinReg`

related to the computation of summaries.Now provides function

`plus`

to be used as the`combine`

argument in`big_apply`

and`big_parallelize`

instead of`'+'`

.

- Before, this package used only the “PSOCK” type of cluster, which has some significant overhead. Now, it uses the “FORK” type on non-Windows systems. You can change this with
`options(bigstatsr.cluster.type = "PSOCK")`

. Uses “PSOCK” in 0.4.0.

- you can now provide multiple \(\alpha\) values (as a numeric vector) in
`big_spLinReg`

and`big_spLogReg`

. One will be chosen by grid-search.

- fixed a bug in
`big_prodMat`

when using a dimension of 1 or 0.

**Package {bigstatsr} is published in Bioinformatics**

- no scaling is used by default for
`big_crossprod`

,`big_tcrossprod`

,`big_SVD`

and`big_randomSVD`

(before, there was no default at all)

**Integrate Cross-Model Selection and Averaging (CMSA) directly in**`big_spLinReg`

and`big_spLogReg`

, a procedure that automatically chooses the value of the \(\lambda\) hyper-parameter.**Speed up**`big_spLinReg`

and`big_spLogReg`

(issue #12)

- Speed up AUC computations

**No longer use the**`big.matrix`

format of package bigmemory