For some reason I’ve always struggled to successfully perform regression in R. Fitting linear or polynomial models is fairly straightforward alright, for instance:

`x=1:10;`

y=(2+rnorm(10,0,0.1))*x+1;

myFit=lm(y ~ x);

plot(x,y);

lines(x,x*myFit$coef[2]+myFit$coef[1]);

or:

`x=1:10;`

x2=x^2;

y=(2+rnorm(10,0,0.1))*x+(1+rnorm(10,0,0.1))*x2+1;

myFit=lm(y ~ x + x2);

plot(x,y);

lines(x,x*myFit$coef[2]+x2*myFit$coef[3]+myFit$coef[1]);

But for fitting more exotic distribution, I’ve had to resort to some dirty hackish not too satisfying workarounds such as this to fit a sigmoid:

`x=seq(0,10,length=100);`

y=1/( 1+exp(-(2+rnorm(100,0,0.05))*x+3+rnorm(100,0,0.2)) );

new_y=log(1/y-1);

myFit=lm(new_y ~ x);

plot(x,y);

lines(x,1/(1+exp(x*myFit$coef[2]+myFit$coef[1])));

The more detailed guide to this I found there. Long story short: it transforms the sigmoid into a linear function (could be done with a polynomial one, too), then fits that, then turns this back into a sigmoid.

The only advantage of this method is that it makes it easy to use penalized regression to fit that sigmoid, such as:

`library('penalized');`

x=seq(0,10,length=100);

y=1/( 1+exp(-(2+rnorm(100,0,0.05))*x+3+rnorm(100,0,0.2)) );

new_y=log(1/y-1);

myFit=penalized(response=new_y,penalized=~x,lambda1=100,lambda2=10);

plot(x,y);

lines(x,1/(1+exp(x*myFit@penalized[1]+myFit@unpenalized[1])));

(NB: the penalization is on purpose way to high, so that the curve differs visibly from the previous one)

But that still left me with no way to fit more complex stuff, such as an exponential distribution, more precisely I wanted to fit `a*(x-3)^b+c`

, so no easy transformation could bring me back to the linear case. I don’t know how I didn’t manage to find it earlier, but this time after searching (again) for a while, I finally found the function to fit anything with partial least square. So now I can fit anything I want, although I haven’t found yet how to add some penalization. Example:

x=c(10,40,70,100,130,160,190,220,250,280); y=c(0.147,0.0325,0.0203,0.0150,0.0123,0.0109,0.0101,0.0094,0.0085,0.0080); myFit = nls(y ~ a*(x-3)^b + c,start = list(a = 1, b = -1, c=0.05),trace=TRUE,control=list(maxiter=10,tol=1e-08,minFactor=1e-10)); plot(x,y); plotX=seq(min(x),max(x),length=100); lines(plotX,summary(myFit)$coefficients["a",1]*(plotX-3)^summary(myFit)$coefficients["b",1]+summary(myFit)$coefficients["c",1]);

NB: I added a bunch of non-mandatory but potentially useful arguments (trace and control), see the documentation for more details on them.

Also, try to pick sensible starting values for your parameters, it can save you *a lot* of time. So if you have some *a priori* knowledge about the expected results, do use it!

Hurray, now I can fit whatever I want. I’ll make sure to post an update whenever I find a way to fit whatever I want and also use penalization on this. Not searching right now because I don’t really need penalization for the moment. For the sake of readability I tried to make sure all code blocks work out of the box and stand-alone. In case one was broken, don’t hesitate to drop a comment.

Sources:

## 0 Responses

Stay in touch with the conversation, subscribe to the RSS feed for comments on this post.