Using Persistent Communication in Fortran MPI

Dear community;
I attended the summer bootcamp on 2015. We discussed OpemMP, MPI and OpenACC.
In the boot camp we solved the famous problem of laplace equation. However, I am working with a more complex and bigger problem, a CFD real code.
In this code, I need to update the ghost cells at each time step. Therefore the overhead is quite high. As a result, I have decided to use persistent communication, but my ghost cells are populated with zeroes.
The code runs, but the solution is not right. As I said before, all my ghost cells are populated with zeroes, I have tried different things but nothing has worked out.
I wanted to know if someone has used the persistent communication with the Laplace equation. If so, I will greatly appreciate if you can share the code with me to compare my syntax with a wroking version. Below I show a small part of the code.

The code, calls the MPI_Subroutine where I set the communication characteristics.

!Starting up MPI
call MPI_INIT(ierr)

!Compute the size of local block (1D Decomposition)
Jmax = JmaxGlobal
Imax = ImaxGlobal/npes
if ( - npes*Imax)) then
Imax = Imax + 1
end if
if ( then
Imax = Imax + 2
Imax = Imax + 1

! Computing neighboars
if (MyRank.eq.0) then
Left = MyRank - 1
end if

if (MyRank.eq.(npes -1)) then
Right = MyRank + 1
end if

! Initializing the Arrays in each processor, according to the number of local nodes
Call InitializeArrays

!Creating the channel of communication for this computation,
!Sending and receiving the u_old (Ghost cells)
Call MPI_SEND_INIT(u_old(2,: ),Jmax,MPI_DOUBLE_PRECISION,Left,tag,MPI_COMM_WORLD,req(1),ierr)
Call MPI_RECV_INIT(u_old(Imax,: ),jmax,MPI_DOUBLE_PRECISION,Right,tag,MPI_COMM_WORLD,req(2),ierr)
Call MPI_SEND_INIT(u_old(Imax-1,: ),Jmax,MPI_DOUBLE_PRECISION,Right,tag,MPI_COMM_WORLD,req(3),ierr)
Call MPI_RECV_INIT(u_old(1,: ),jmax,MPI_DOUBLE_PRECISION,Left,tag,MPI_COMM_WORLD,req(4),ierr)

End Subroutine MPI_Subroutine


From the main code, where the do loop is, I call the MPI_STARTALL and WaitALL in each time step.

Call MPI_STARTALL(4,req,ierr)
Call MPI_WAITALL(4,req,status,ierr)

Req is an array of dimension (4) the same status.

I am using Fortran 90... Any suggestions and comments?
Thanks before hand