Aprendizaje Widrow-Hoff

Contenido

ADALINE Convergencia Ejemplo: Solución analítica Ejemplo: Solución númerica Filtrado Adaptativo Ejemplo: Filtro adaptable Problema P10.1 Problema P10.2 Problema P10.3 Problema P10.4 Problema P10.5 Problema P10.6 Problema P10.8 Problema P10.9 App Referencias

ADALINE

ADALINE (ADAptive LInear NEuron)

Finales de los 50' del siglo pasado
Se busca que los parámetros de la red minimicen el error cuadrado promedio
El algoritmo de optimización lo llamarón minimos cuadrados promedio
Es una red para resolver porblemas de aprendizaje supervisado
Se puede utiliar para clasificar (en ese sentido más potente que el perceptron)
Cuando se utiliza para clasificar de igual forma que el perceptron solo sirve para clasificació linealmente separable

Una ADALINE de una capa

observemos que la activación es la identidad. Ahora si se tiene dos enradas y una neurona, podemos dibujar el hiperplano separador

Tendríamos un clasificador tipo popsitivo (contra perceptron tipo 1), tipos negativa (contra perceptron tipo 0), y una tercer clasificación tipo 0, esta última esta asociada al perceptron como tipo 1 (hardlims), entonces siguiendo esto, esta tercer clasificación la pordriamos asociar a tipo positivo en nuestra ADALINE y así sustituir una red perceptron por una ADALINE

Convergencia

Ejemplo: Solución analítica

p1 = [1 -1 -1]';

t1 = -1;

p2 = [1 1 -1]';

t2 = 1;

R = (1/2) * p1 * p1' + (1/2) * p2 * p2'

R = 3×3

     1     0    -1
     0     1     0
    -1     0     1

h = (1/2) * t1 * p1 + (1/2) * t2 * p2

h = 3×1

     0
     1
     0

c = (1/2) * (-1)^2 + (1/2) * (1)^2

c = 1

syms w1 w2 w3

x = [w1 w2 w3]';

F(w1,w2,w3) = c -2*x'*h + x'*R*x % Error cuadrado promedio

F(w1, w2, w3) =

(w2-1)^2 + (w1-w3)^2 % una infinidad optimizan w2 = 1, w1 = w3

ans =

E = gradient(F)==0

E(w1, w2, w3) =

% Compute analytic solution of a symbolic equation

E(w1, w2, w3) =

solution = solve(E,[w1,w2,w3]);

Warning: Unable to solve symbolically. Returning a numeric solution using vpasolve.

% Display symbolic solution returned by solve

displaySymSolution(solution);

solution = struct with fields:
    w1: [1×1 sym]
    w2: [1×1 sym]
    w3: [1×1 sym]

where

xopt = inv(R)*h % no tenemos solución unica

Warning: Matrix is singular to working precision.

xopt = 3×1

   NaN
   NaN
   NaN

eig(R)

ans = 3×1

     0
     1
     2

tasa_estable_max = 1/max(eig(R))

tasa_estable_max = 0.5000

Ejemplo: Solución númerica

%adaline(W0,b0,alpha, p,t,ba)
W0 = [0 0 0];
p1 = [1 -1 -1]';
t1 = -1;
p2 = [1 1 -1]';
t2 = 1;
p = [p1 p2];
t = [t1 t2];
%adaline(W0,b0,alpha, p,t,ba)
[W,b] = adaline_epoca(W0,[0],0.2,p,t)
W = 3×3
         0   -0.4000    0.1600
         0    0.4000    0.9600
         0    0.4000   -0.1600
b = 0
[W,b] = adaline_epoca(W(:,end)',[0],0.2,p,t)
W = 3×3
    0.1600    0.0160   -0.0384
    0.9600    1.1040    1.0496
   -0.1600   -0.0160    0.0384
b = 0
[W,b] = adaline_epoca(W(:,end)',[0],0.2,p,t)
W = 3×3
   -0.0384    0.0122    0.0028
    1.0496    0.9990    0.9897
    0.0384   -0.0122   -0.0028
b = 0
[W,b] = adaline_epoca(W(:,end)',[0],0.2,p,t)
W = 3×3
    0.0028   -0.0036    0.0009
    0.9897    0.9961    1.0005
   -0.0028    0.0036   -0.0009
b = 0
[W,b] = adaline_epoca(W(:,end)',[0],0.2,p,t)
W = 3×3
    0.0009    0.0004   -0.0003
    1.0005    1.0010    1.0003
   -0.0009   -0.0004    0.0003
b = 0
[W,b] = adaline_epoca(W(:,end)',[0],0.2,p,t)
W = 3×3
   -0.0003    0.0001    0.0000
    1.0003    0.9999    0.9999
    0.0003   -0.0001   -0.0000
b = 0
[W,b] = adaline_epoca(W(:,end)',[0],0.2,p,t) % una neurona con un solo peso 
W = 3×3
    0.0000   -0.0000    0.0000
    0.9999    1.0000    1.0000
   -0.0000    0.0000   -0.0000
b = 0

Filtrado Adaptativo

Se define un nuevo bloque, que represetan el retraso de la señal de estrada

Un filtro adaptativo es

Es un sistemas LT! en tiempo discreto donde estamos ajustando los parametros del modelo de acuerdo a las entradas presentes para despues utlilizar este modelo para entradas con ruido.

Ejemplo: Filtro adaptable

independientes y media cero

clear vars

v = @(k) 1.2 * sin(2 * pi * k / 3);

m = @(k) 0.12 * sin(2 * pi * k / 3 + pi / 2);

k = 0:10;

figure

subplot(1,2,1)

stem(k,v(k))

subplot(1,2,2)

stem(k,m(k))

% Señales periodicas

Evto2 = mean(v(1:3).^2)

Evto2 = 0.7200

Evv1 = mean(v(1:3).*v(0:2))

Evv1 = -0.3600

R = [Evto2, Evv1; Evv1, Evto2]

R = 2×2

    0.7200   -0.3600
   -0.3600    0.7200

Emv = mean(m(1:3).*v(1:3)) % esto es cero

Emv = 1.1373e-17

Emv1 = mean(m(1:3).*v(0:2))

Emv1 = -0.0624

h = [0;Emv1]

h = 2×1

         0
   -0.0624

xopt = inv(R)*h

xopt = 2×1

   -0.0577
   -0.1155

[ved, D] = eig(R)

ved = 2×2

   -0.7071   -0.7071
   -0.7071    0.7071

D = 2×2

    0.3600         0
         0    1.0800

Esto2 = (1/0.4)*integral(@(s) s.^2,-0.2,0.2)

Esto2 = 0.0133

Em2 = mean(m(1:3).^2)

Em2 = 0.0072

c = Esto2 + Em2

c = 0.0205

syms w1 w2

x = [w1 w2]';

F(w1,w2) = vpa(c -2*x'*h + x'*R*x) % error cuadrático promedio

F(w1, w2) =

figure

fmesh(F,[-2 2 -2 2])

figure

fcontour(F,[-2 2 -2 2])

hold on

plot(xopt(1),xopt(2),'or')

%adaline(W0,b0,alpha, p,t,ba)

W0 = [0 -2];

muestras = 90;

n = 0:muestras;

p = [v(n);v(n-1)]

p = 2×91

         0    1.0392   -1.0392   -0.0000    1.0392   -1.0392   -0.0000    1.0392   -1.0392   -0.0000    1.0392   -1.0392   -0.0000    1.0392   -1.0392   -0.0000    1.0392   -1.0392   -0.0000    1.0392   -1.0392   -0.0000    1.0392   -1.0392   -0.0000    1.0392   -1.0392   -0.0000    1.0392   -1.0392   -0.0000    1.0392   -1.0392    0.0000    1.0392   -1.0392   -0.0000    1.0392   -1.0392   -0.0000    1.0392   -1.0392   -0.0000    1.0392   -1.0392    0.0000    1.0392   -1.0392   -0.0000    1.0392
   -1.0392         0    1.0392   -1.0392   -0.0000    1.0392   -1.0392   -0.0000    1.0392   -1.0392   -0.0000    1.0392   -1.0392   -0.0000    1.0392   -1.0392   -0.0000    1.0392   -1.0392   -0.0000    1.0392   -1.0392   -0.0000    1.0392   -1.0392   -0.0000    1.0392   -1.0392   -0.0000    1.0392   -1.0392   -0.0000    1.0392   -1.0392    0.0000    1.0392   -1.0392   -0.0000    1.0392   -1.0392   -0.0000    1.0392   -1.0392   -0.0000    1.0392   -1.0392    0.0000    1.0392   -1.0392   -0.0000

%rng(123) % controlamos aletorios

s = unifrnd(-0.2,0.2,1,muestras+1);

t = s + m(n);

alpha = 0.1;

%adaline(W0,b0,alpha, p,t,ba)

[W,b] = adaline_epoca(W0,[0],alpha,p,t)

W = 2×92

         0         0   -0.0234   -0.3411   -0.3411   -0.3085   -0.4845   -0.4845   -0.4026   -0.4326   -0.4326   -0.3531   -0.3534   -0.3534   -0.3123   -0.3243   -0.3243   -0.2873   -0.2798   -0.2798   -0.2513   -0.2865   -0.2865   -0.2179   -0.2123   -0.2123   -0.2115   -0.2474   -0.2474   -0.1800   -0.1622   -0.1622   -0.1793   -0.1738   -0.1738   -0.1769   -0.1432   -0.1432   -0.1585   -0.1639   -0.1639   -0.1247   -0.1403   -0.1403   -0.1613   -0.1134   -0.1134   -0.0988   -0.1158   -0.1158
   -2.0000   -1.5672   -1.5672   -1.2495   -1.0446   -1.0446   -0.8685   -0.7179   -0.7179   -0.6879   -0.5583   -0.5583   -0.5580   -0.4699   -0.4699   -0.4579   -0.3908   -0.3908   -0.3984   -0.3470   -0.3470   -0.3118   -0.3096   -0.3096   -0.3152   -0.2791   -0.2791   -0.2432   -0.2471   -0.2471   -0.2650   -0.2406   -0.2406   -0.2460   -0.2022   -0.2022   -0.2359   -0.2035   -0.2035   -0.1981   -0.1778   -0.1778   -0.1622   -0.1636   -0.1636   -0.2114   -0.1757   -0.1757   -0.1587   -0.1417

b = 0

plot(W(1,:),W(2,:))

figure

plot(n,s)

hold on

plot(n,t)

figure

plot(n,s)

salidas = diag(W(:,1:end-1)'*[v(n);v(n-1)]);

size(salidas)

ans = 1×2

    91     1

restaurada = t-salidas';

size(restaurada)

ans = 1×2

     1    91

hold on

plot(n,restaurada)

mean((s(floor(muestras/2):end)-restaurada(floor(muestras/2):end)).^2)

ans = 0.0019

Problema P10.1

syms k y(k)

a(k) =[2 -1 3]*[y(k);y(k-1);y(k-2)]

a(k) =

a(-1)

ans =

i) La salida es cero antes de

for i=0:5

[i,a(i)]

end

ans =

ii) solución

y_0 = 5;
y_1 = -4;
a_0 = 2*y_0
a_0 = 10
a_1 = -y_0 + 2*y_1
a_1 = -13
a_2 = 3*y_0-y_1
a_2 = 19
a_3 = 3*y_1
a_3 = -12

iii) Contribuye de 0 a 2

Problema P10.2

p1 = [1 1]';
p2 = [-1 -1]';
p3 = [2 2]';
P = [p1 p2 p3];
figure
plot(P(1,1:2),P(2,1:2), 'ok','MarkerSize',15, 'LineWidth',5)
hold on 
plot(P(1,3),P(2,3),'ob','MarkerSize',15, 'LineWidth',5)
axis([-2 2.5 -2 2.5])
g = gca;
g.XAxisLocation = 'origin';
g.YAxisLocation = 'origin';
g.Box = 'off';

i) El problema es linealmente separable, por lo cual se puede ADALINE se puede implementar

ii) Encontramos la ecuación de la recta que pase por

(hay infinidad de soluciones).

syms p11 p12

p12(p11) =((0-3)/(3-0))*(p11 - 0) + 3

p12(p11) =

hold on

fplot(p12, [-0.5,3])

Asi

, solo para recordar, el vector de pesos puede ser cualqueir multiplo del indicaddo si se elige el de dirección contrario la clasificación cambiara porsitivos por negativos y viceversa.

iii) el problema no es linealmente seprable

iv)el probema no es linealmente separable

p1 = [1 1]';

p2 = [1 -1]';

p3 = [1 0]';

P = [p1 p2 p3];

figure

plot(P(1,1:2),P(2,1:2), 'ok','MarkerSize',15, 'LineWidth',5)

hold on

plot(P(1,3),P(2,3),'ob','MarkerSize',15, 'LineWidth',5)

axis([-2 2.5 -2 2.5])

g = gca;

g.XAxisLocation = 'origin';

g.YAxisLocation = 'origin';

g.Box = 'off';

Problema P10.3

p1 = [1 1]';

t1 = 1;

p2 = [1 -1]';

t2 = -1;

c = (1/2)*t1^2 + (1/2)*t2^2

c = 1

h = (1/2)*t1*p1 + (1/2)*t2*p2

h = 2×1

     0
     1

R = (1/2)*p1*p1' + (1/2)*p2*p2'

R = 2×2

     1     0
     0     1

syms w11 w12

x = [w11 w12]';

F(w11,w12)= c - 2 * x' * h + x' * R * x

F(w11, w12) =

figure

fmesh(F,[-3 3 -2 4])

figure

fcontour(F,[-3 3 -2 4])

axis('square')

xopt = inv(R)*h

xopt = 2×1

     0
     1

Problema P10.4

p1 = [1 1]';

t1 = 1;

p2 = [1 -1]';

t2 = -1;

figure

plot(p1(1,1),p1(2,1), 'ok','MarkerSize',15, 'LineWidth',5)

hold on

plot(p2(1,1),p2(2,1),'ob','MarkerSize',15, 'LineWidth',5)

axis([-2 2.5 -2 2.5])

g = gca;

g.XAxisLocation = 'origin';

g.YAxisLocation = 'origin';

g.Box = 'off';

W0 = [0 0];

p = [p1 p2];

t = [t1 t2];

alpha = 0.25;

%adaline(W0,b0,alpha, p,t,ba)

[W,b] = adaline_epoca(W0,[0],alpha,p,t)

W = 2×3

         0    0.5000         0
         0    0.5000    1.0000

b = 0

hold on

fimplicit(@(p1,p2) W(1,2)*p1 + W(2,2)*p2)

quiver(0,0,W(1,2),W(2,2),0, 'MaxHeadSize',0.5)

fimplicit(@(p1,p2) W(1,3)*p1 + W(2,3)*p2)

quiver(0,0,W(1,3),W(2,3),0, 'MaxHeadSize',0.5)

Problema P10.5

val = eig(R)
val = 2×1
     1
     1
1/max(val)
ans = 1

Problema P10.6

% c = E(t^2) = E(y^2) = C(0)

c = 3

c = 3

% R = E(zz')

R = [3 -1;-1 3]

R = 2×2

     3    -1
    -1     3

% h = E(tz)

h = [-1 -1]'

h = 2×1

    -1
    -1

syms w11 w12

x = [w11 w12]';

F(w11,w12)= c - 2 * x' * h + x' * R * x

F(w11, w12) =

figure

fmesh(F,[-3 3 -2 4])

figure

fcontour(F,[-4 4 -4 4])

axis('square')

xopt = inv(R) * h

xopt = 2×1

   -0.5000
   -0.5000

[V,D] = eig(2*R)

V = 2×2

   -0.7071   -0.7071
   -0.7071    0.7071

D = 2×2

     4     0
     0     8

hold on

quiver(-0.5,-0.5,V(1,1),V(1,2),0, 'MaxHeadSize',0.5)

quiver(-0.5,-0.5,V(2,1),V(2,2),0, 'MaxHeadSize',0.5)

ii) la máxima tasa

1 / max(eig(R))
ans = 0.2500

iii) Obtener los coefficientes del proceso con los datos del problema enlace

W0 = [0.75 0];

Mdl = arima('Constant',0,'AR',{-0.5 -0.5},'Variance',2);

rng(5)

sim = simulate(Mdl,100000);

t = sim(3:end)';

p1 =sim(2:end-1)';

p2 = sim(1:end-2)';

p = [p1;p2];

alpha = 0.00001;

%adaline(W0,b0,alpha, p,t,ba)

[W,b] = adaline_epoca(W0,[0],alpha,p,t);

figure

fcontour(F,[-1 1 -1 1])

axis('square')

hold on

plot(W(1,:),W(2,:))

plot(-0.5,-0.5,'o')

Problema P10.8

C1 = [1 1;1 2];

C2 = [2 2;-1 0];

C3 = [-1 -2;2 1];

C4 = [-1 -2;-1 -2];

figure

plot(C1(1,:),C1(2,:), 'bo', 'MarkerSize',20)

hold on

plot(C2(1,:),C2(2,:), 'gs','MarkerSize',20)

plot(C3(1,:),C3(2,:), 'ro', 'MarkerSize',30, 'LineWidth',5)

plot(C4(1,:),C4(2,:), 'ks','MarkerSize',30,'LineWidth',5)

axis([-3 3 -3 3])

g = gca;

g.XAxisLocation = 'origin';

g.YAxisLocation = 'origin';

g.Box = 'off';

legend("C1","C2","C3","C4")

legend('Location','eastoutside')

% como el problema citado, cambiamos 0 por -1 para lso tipo negativos
p1 = [1;1];
t1 = [-1;-1];
p2 = [1;2];
t2 = t1;
p3 = [2;-1];
t3 = [-1;1];
p4 = [2;0];
t4 = t3;
p5 = [-1;2];
t5 = [1;-1];
p6 = [-2;1];
t6 = t5;
p7 = [-1;-1];
t7 = [1;1];
p8 = [-2;-2];
t8 = t7;
W0 = [1 0;0 1];
b0 = [1;1];
p = [p1 p2 p3 p4 p5 p6 p7 p8];
t = [t1 t2 t3 t4 t5 t6 t7 t8];
alpha = 0.001; % Hagan propone 0.04, no me converge con eso 
%adaline(W0,b0,alpha, p,t,ba)
[W,b] = adaline_epoca(W0,b0,alpha,p,t,1);
W(:,end-1:end)'
ans = 2×2
    0.9371   -0.0116
    0.0002    0.9446
W0 = W(:,end-1:end)';
b(end-1:end)'
ans = 2×1
    0.9839
    0.9802
b0 = b(end-1:end)';

epocas = 1000;

for i = 2:epocas

[W,b] = adaline_epoca(W0,b0,alpha,p,t,1);

W0 = W(:,end-1:end)';

b0 = b(end-1:end)';

end

Wf = W(:,end-1:end)'

Wf = 2×2

   -0.5937   -0.0510
    0.1680   -0.6659

bf = b(end-1:end)'

bf = 2×1

    0.0133
    0.1680

fimplicit(@(p1,p2) Wf(1,1)*p1 + Wf(1,2)*p2 + bf(1),[-3,3],'r')

fimplicit(@(p1,p2) Wf(2,1)*p1 + Wf(2,2)*p2 + bf(2),[-3,3],'k')

quiver(0,0.2,Wf(1,1),Wf(1,2),0, 'MaxHeadSize',0.5,"Color",'r','DisplayName',"_1w")

quiver(0,0.2,Wf(2,1),Wf(2,2),0, 'MaxHeadSize',0.5,"Color",'k','DisplayName',"_2w")

axis('square') % escala

grid on

PIENSA EN EL PLATAMEAMIENTO ANÁLITICO DE ESTE PROBLEMA

Problema P10.9

p1 = [1 -1 -1 -1 1 1 1 1 1 -1 -1 -1 -1 -1 -1 -1]';

t1 = 60;

p2 = [-1 -1 -1 -1 1 -1 -1 -1 1 1 1 1 1 -1 -1 -1]';

t2 = 60;

p3 = [1 1 1 1 1 -1 1 1 1 -1 1 1 -1 -1 -1 -1]';

t3 = 0;

p4 = [-1 -1 -1 -1 1 1 1 1 1 -1 1 1 1 -1 1 1]';

t4 = 0;

p5 = [1 1 1 1 1 1 -1 -1 1 -1 -1 -1 -1 -1 -1 -1]';

t5 = -60;

p6 = [-1 -1 -1 -1 1 1 1 1 1 1 -1 -1 1 -1 -1 -1]';

t6 = -60;

p = [p1 p2 p3 p4 p5 p6];

t = [t1 t2 t3 t4 t5 t6];

alpha = 0.03;

W0 =zeros(1,16);

b0 = 0;

[W,b] = adaline_epoca(W0,b0,alpha,p,t,1);

W = 16×7

         0    3.6000    0.2160   -1.0670    0.9512   -4.0244    2.3037
         0   -3.6000   -6.9840   -8.2670   -6.2488  -11.2244   -4.8963
         0   -3.6000   -6.9840   -8.2670   -6.2488  -11.2244   -4.8963
         0   -3.6000   -6.9840   -8.2670   -6.2488  -11.2244   -4.8963
         0    3.6000    6.9840    5.7010    3.6827   -1.2928   -7.6209
         0    3.6000    0.2160    1.4990   -0.5192   -5.4947  -11.8228
         0    3.6000    0.2160   -1.0670   -3.0853    1.8903   -4.4378
         0    3.6000    0.2160   -1.0670   -3.0853    1.8903   -4.4378
         0    3.6000    6.9840    5.7010    3.6827   -1.2928   -7.6209
         0   -3.6000   -0.2160    1.0670    3.0853    8.0608    1.7327

b = 1×7

         0    3.6000    6.9840    5.7010    3.6827   -1.2928   -7.6209

e = 0;

for i = 1:6

a=W(:,end)'*p(:,i)+b(end);

e = e + (t(i)-a)^2;

end

e = 1.7426e+04

sum((t-(W(:,end)'*p+b(end))).^2)

ans = 1.7426e+04

epocas = 100;

W0 = W(:,end)';

b0 = b(end)';

e(1) = 2* 60 ^2;

e(2) = e;

for i = 3:epocas+1

[W,b] = adaline_epoca(W0,b0,alpha,p,t,1);

W0 = W(:,end)';

b0 = b(end)';

e(i) = sum((t-(W0*p+b0)).^2);

end

Wf = W(:,end)'

Wf = 1×16

   23.1250  -25.0000  -25.0000  -25.0000   -5.6250  -21.2500   -7.5000   -7.5000   -5.6250  -13.7500   11.8750   11.8750  -23.1250    5.6250   -3.7500   -3.7500

bf = b(end)'

bf = -5.6250

e(end)

ans = 6.3109e-28

figure

plot(e)

xlabel("epocas")

App

Adaptative Noise Cancellation

nnd10nc

EEG Noise Cancellation

nnd10eeg

Linear Classification

nnd10lc

Referencias

El material se toma del libro de Martin Hagan et. al. enlace

function [W,b] = adaline_epoca(W0,b0,alpha, p,t,ba)
% ba = 1 activa la actualizción del bias
if nargin > 5
  ba = ba;
else
  ba = 0;
end
[fil,col] = size(p);
W = [W0'];
b = [b0'];
for i = 1:col
    a = W0*p(:,i)+ b0;
    e = t(:,i)-a; 
    Wn = W0 + 2 *alpha * e * p(:,i)';
    
    if ba
        bn = b0 +  2 *alpha * e;
        b = [b bn'];
        b0 = bn;
    end
    
    W = [W Wn'];
    W0 = Wn;
end
end