Tree of Wisdom: 2009

Date
Note
2009.04.06
HGS
If I use Kraken to test the scalability of HGS for saturated flow with critical function assigned at assembly_element_3d, the machine regardless of using or not using the critical function. Such as,
!$omp critical(A_E_3d)
Call assembly_element_3d
!$omp end critical(A_E_3d)
I t is needed to be checked if the function works for linux system.
è the reason is that the computing is actually working but the results are totally different from the case of assigning the critical region. Thus, the critical region is necessary for this calculation.

Linux tip: top è this is for checking the status of threads on Kraken.

2009.04.07
HGS
Modified the private and shared variables, which were initially assigned default variables (Default (shared). I tested the modified version (hydrosphere-0401-ORIGINAL-OMP.f90) with OMP and serial versions, but those results are not the same each other.
The followings are modified one:

!$OMP PARALLEL &
!$OMP DEFAULT(none)
!$OMP SHARED(in,gb,r,fluxmbal,hold,n_flow_pos,rhof,NTHREADS)&
!$OMP SHARED(nrem,nchunk,neloop,TETRA,nln,NLNLOC,intet)&
!$OMP SHARED(HNEW,elev,DO_MASS_BALANCE,IADPIV)&
!$OMP SHARED(STEADY_STATE,link_pm2d,IC)&
!$OMP SHARED(FLUID_PRESSURE,PERMEABILITY_INPUT,DENSITY,flexch_flux)&
!$OMP SHARED(FIRST_TIMESTEP,FVOL_NODE_IS)&
!$OMP SHARED(PM_FLUID_SLICE,LINK_D2PM,DUAL_FLUID_SLICE) &
!$OMP SHARED(DO_N_FLOW_CHECK,INACTIVE) &

!$OMP PRIVATE(l,j,i,elem_orig,volf,jj,iband,nn,totmacc_pm) &
!$OMP PRIVATE(totmacc_dual,fmass_init,ast,term,sum) &
!$OMP PRIVATE(inloc,unknown,nodemd2,total_head,amat,rhs,astor) &
!$OMP PRIVATE(masspm,amatdual,jacob_exch,rhsdual,rho,exchmat,TID) &
!$OMP PRIVATE(incr,offset,i_start,i_end,nlnj,JLOOP) &
!$OMP PRIVATE(PROCESSING_3D,SATURATED) &
!$OMP PRIVATE(IPROP_DUAL,DUAL_CONTINUUM,VOL_FRACD,FLUX_VOLUME_CALC)
This one doesn’t work. It is needed to be corrected.

-----------------------------------------------------------------------
forrtl: severe (408): fort: (2): Subscript #1 of the array LINK_PM2D has value 1
which is greater than the upper bound of -1
-----------------------------------------------------------------------
forrtl: severe (408): fort: (3): Subscript #1 of the array UNKNOWN has value 0 w
hich is less than the lower bound of 1
-----------------------------------------------------------------------

If I assigned private variable to nlnloc, the code doesn’t work because of the following reason:

TID= 0 i_start= 1 i_end= 1030301 nlnloc= 210758952
Fortran Pause - Enter command or to continue.
TID= 1 i_start= 1030302 i_end= 2060602 nlnloc= 2009198838

I got unkown related error continuously, but I don’t know why.

I got it!! This is the reason that I assigned private variable jloop, but this was a big mistake because this jloop is already assigned before the parallel loop so the jloop should be the shared variable!!!
-----------------------------------------------------------------------

do i=1,nlnloc
do j=1,nlnj
jj=jloop(i,j)
amat(i,jj) = amat(i,jj)*volf
end do
end do
-----------------------------------------------------------------------

Usually, the left hand side variable is assigned shared variable and left one is private variable.

Problem!

integer, allocatable :: fvol_node_is(:)

the program doesn’t work because of this variable.

For Kraken, if there is a segment error, change the following code.

---------------hydroshpere.f90 -----------------

subroutine set_matrix_flow

line 928

if(nef .gt. 0) then
call fluid_evap_bc
else
! eflux(: ) =0.0d0 è this one should be commented!
end if

--------------------------------------------------

Error message:
This name has already been assigned a data type. [GO_WRITE]

Reason for this error:
integer(di) :: iflag,nel3d,nel2d,inloc(maxnln),go_write
integer(di) :: unknown(maxnln),ialloc,i,j,iel,jj,ielem3d,ielem2d
integer(di) :: n1,ncount,nnf,nodemd2(maxnln),node,go_write

================================================================
The lastest modified version for shared and private variables.

!$OMP PARALLEL &
!$OMP DEFAULT(none) &
!$OMP SHARED(in,gb,r,fluxmbal,hold,n_flow_pos,rhof) &
!$OMP SHARED(neloop,TETRA,nln,intet,nn) &
!$OMP SHARED(HNEW,elev,IADPIV,STEADY_STATE) &
!$OMP SHARED(IC,FLUID_PRESSURE,PERMEABILITY_INPUT) &
!$OMP SHARED(DENSITY,flexch_flux,FIRST_TIMESTEP,DO_MASS_BALANCE) &
!$OMP SHARED(PM_FLUID_SLICE,LINK_D2PM,DUAL_FLUID_SLICE) &
!$OMP SHARED(DO_N_FLOW_CHECK,INACTIVE,PROCESSING_3D,nlnj,nlnloc) &
!$OMP SHARED(jloop,link_pm2d,FVOL_NODE_IS,dual_continuum) &
!$OMP SHARED(FLUX_VOLUME_CALC,SATURATED,IPROP_DUAL,VOL_FRACD) &
!$OMP PRIVATE(l,j,i,elem_orig,volf,jj,iband,totmacc_pm) &
!$OMP PRIVATE(totmacc_dual,fmass_init,ast,term,sum) &
!$OMP PRIVATE(inloc,unknown,nodemd2,total_head,amat,rhs,astor) &
!$OMP PRIVATE(masspm,amatdual,jacob_exch,rhsdual,rho,exchmat,TID)&
!$OMP PRIVATE(nthreads,nrem,nchunk,incr,offset,i_start,i_end)

==================================================================
I wanted to simulate 100x100x100 domain, but I got error in grok eco file following as:

------------------------------------------------------------------------
GRID CHECK
Grid OK
error: allocating (run-time error number: 179 )
First type arrays
------------------------------------------------------------------------
This one is caused by the over taking the memory size in array_sizes.default file. So I changed the number of flux nodes, which is initially assigned:
-----------------------------------------------------------------------
flow bc: flux nodes
1000000000 è 1000
-------------------------------------------------------------------------
Actually this one is wrong, I mean what I should modify is that the number of head nodes because I assigned the first type boundary conditions to the both sides of the domain so the number of BC is about 21,000.
-----------------------------------------------------------------------
flow bc: head nodes
10000100 è 21000
-----------------------------------------------------------------------
And the eco file shows that the number of BC node specified on the domain.
Total specified number is 20808 so 21000 is enough to store the value in the memory.
My fault was that I assigned 100000000 to flux nodes without consideration, but it caused stalemate.
--------------------------------------------------------------------------------------------
INSTRUCTION: choose nodes block
Find nodes which are in the block defined by:
X from 0.000000000000000E+000 to 0.000000000000000E+000
Y from 0.000000000000000E+000 to 100.000000000000
Z from 0.000000000000000E+000 to 100.000000000000
Nodes affected: 10404

INSTRUCTION: specified head
First-type head for chosen nodes for porous medium
Time on Head
0.000000000000000E+000 120.0000
Nodes checked: 10404
New specified head nodes created 10404

INSTRUCTION: clear chosen nodes

INSTRUCTION: choose nodes block
Find nodes which are in the block defined by:
X from 100.000000000000 to 100.000000000000
Y from 0.000000000000000E+000 to 100.000000000000
Z from 0.000000000000000E+000 to 100.000000000000
Nodes affected: 10404

INSTRUCTION: specified head
First-type head for chosen nodes for porous medium
Time on Head
0.000000000000000E+000 110.0000
Nodes checked: 10404
New specified head nodes created 10404
--------------------------------------------------------------------------------------------
2009.04.08
HGS
Do simply assign the private variables such as SATURATED, which is used as logical variable.

The minor problem, which the results of Max. change are different each other.
I comprared two conditions such as serial and parallel with the mesh size from
1. 20x20x20 (the results are same)
2. 50x50x50 (the results are same)
3. 80x80x80 (the results are slightly different)

The error is getting bigger as the number of node increases.

To reduce the calculation time I commented the subroutine output_flux and save_nprop, which are related to saving heads, saving velocities, and saving fluxes. The calculation time decreases to 50% of the former version.

To get the output file for the global matrix for a certain time step I used the following conditions :

if(.not. flux_volume_calc) then
if(do_mass_balance)then
if(sim_time == 100.0) then

do i=1,ia(nn+1)-1
if(i .le. nn) then
write(4,'(a,i8,a,g12.5,a,i8,a,g12.5)')'
r(',i,')=',r(i),' gb(',i,')=',gb(i)
else
write(4,'(a,i8,a,g12.5)')' r(',i,')=',r(i)
endif
enddo
endif
endif
endif

this output is performed in the massbalance stage after calculating the global matrix.

I found something fish in gb() matrix by comparing the results of parallel with serial one.
Specifially, I couldn’t find any differences between gb() for parallel and serial results with 13 number of decimal, but there are some differences between those number calculated from the both methods.
To get the data I make the following code

!if(.not. flux_volume_calc) then
! if(.not. do_mass_balance)then
! if(sim_time .eq. 100.0) then
!
! write(5,'(i10)')nn
! write(5,'(i10)')ia(nn+1)-1
!
! do i=1,ia(nn+1)-1
! if(i .le. nn) then
! write(5,'(a,i8,a,g30.23,a,i8,a,g30.23)')' r( ',i,' )= ',r(i),' gb( ',i,' )= ',gb(i)
! else
! write(5,'(a,i8,a,g30.23)')' r( ',i,' )= ',r(i)
! endif
! enddo
!endif
!endif
!endif

And the compare file is in
D:\01-Hwang's Project\06-CODES\03-Waterloo\10-File comparison

if(unknown(i) == 9) then
if(sim_time == 100.0) then
write(6,'(a)')' === Diagonal === '
write(6,'(a,i8)')' unknown(i) = ', unknown(i)
write(6,'(a,i5,a,g30.23,a,i3,a,g30.23)')' gb(',unknown(i),')=', gb(unknown(i)),' rhs(',i,')= ',rhs(i)
write(6,'(a,i5,a,g30.23,a,i3,a,g30.23,a,i5,a,i10)')' r(',iadpiv(unknown(i)),')=',r(iadpiv(unknown(i))),' astor(',i,') = ',astor(i), ' iadpiv(',unknown(i),')=',iadpiv(unknown(i))
endif
endif

gb(unknown(i))= gb(unknown(i)) + rhs(i)
r(iadpiv(unknown(i))) = r(iadpiv(unknown(i))) + astor(i)

if(unknown(i) == 9) then
if(sim_time == 100.0) then
write(6,'(a)')' '
write(6,'(a)')' '
write(6,'(a,i5,a,g30.23)')' gb(',unknown(i),')=', gb(unknown(i))
write(6,'(a,i5,a,g30.23)')' r(',iadpiv(unknown(i)),')=',r(iadpiv(unknown(i)))
write(6,'(a)')' '
endif
endif

This continues to 2009.04.09
2009.04.09
HGS
This morning I check a certain entry based on the compared results.
As expected there is a difference between P and S.

rhs( i= 2)= 0.84372571257313603755765E-07 (for serial)
rhs( i= 2)= 0.84372571257313590520875E-07 (for parall)

the rhs is from call assembly_element_3d(…rhs…)

Don’t forget the numbering:

do l=i_start,i_end

elem_orig = l
do i=1,nlnloc
inloc(i) = in(i,l)
end do

do i=1,nlnloc
unknown(i)=inloc(i)
end do
end do

And I also put the conditional writing at assembly_element_3d subroutine.
if(unknown(i) == 9) then
if(sim_time == 100.0) then
write(6,'(a)')' === assembly_element_3d=== '
write(6,'(a,i8)')' unknown(i) = ', unknown(i)
write(6,'(a,i5,a,g30.23,a,g30.23,a,g30.23,a,i3,a,g30.23)') ' astor(',i,')=', astor(i),' sspm*ast*swval_t(i)=',sspm*ast*swval_t(i),' sspm=',sspm,' swval_t(',i,')= ',swval_t(i)
write(6,'(a,i5,a,g30.23,a,g30.23,a,g30.23,a,i3,a,g30.23)') ' rhs(',i,')=',rhs(i),' sspm*ast*swval_t(i)*head_old(i)=',sspm*ast*swval_t(i)*head_old(i),' ast=',ast,' head_old(',i,')=',head_old(i)
write(6,'(a)')' '

endif
endif

astor(i)= sspm*ast*swval_t(i)
rhs(i) = rhs(i) + sspm*ast*swval_t(i)*head_old(i)

if(unknown(i) == 9) then
if(sim_time == 100.0) then
write(6,'(a)')' '
write(6,'(a,i5,a,g30.23)')' astor(',i,')=', astor(i)
write(6,'(a,i5,a,g30.23)')' rhs(',i,')=',rhs(i)
write(6,'(a)')' '
endif
endif

I found the differences are

For P,
=== assembly_element_3d===
unknown(i) = 9
astor( 2)= 0.16277669433610031510031E-07
sspm*ast*swval_t(i)= 0.16277669433610031510031E-07
sspm= 0.10000000000000000818031E-04
swval_t( 2)= 1.0000000000000000000000
rhs( 2)= 0.0000000000000000000000
sspm*ast*swval_t(i)*head_old(i)= 0.84372571257313590520875E-07
ast= 0.16277669433610029211396E-02
head_old( 2)= 5.1833323929715406919172

astor( 2)= 0.16277669433610031510031E-07
rhs( 2)= 0.84372571257313590520875E-07

for S,
=== assembly_element_3d===
unknown(i) = 9
astor( 2)= 0.16277669433610031510031E-07
sspm*ast*swval_t(i)= 0.16277669433610031510031E-07
sspm= 0.10000000000000000818031E-04
swval_t( 2)= 1.0000000000000000000000
rhs( 2)= 0.0000000000000000000000
sspm*ast*swval_t(i)*head_old(i)= 0.84372571257313603755765E-07
ast= 0.16277669433610029211396E-02
head_old( 2)= 5.1833323929715415800956

astor( 2)= 0.16277669433610031510031E-07
rhs( 2)= 0.84372571257313603755765E-07

because of head_old( ) the rhs value is different.

YJ suggests that I decrease the solver criteria to 10e-15.

In grok file I put the following commander to assign the solver criteria

Flow solver convergence criteria
1e-15

However, the results are also different each other. So frustrating!!

2009.04.11
HGS
I could n’t figure out the cause of the differences but I checked again and expand the number of decimal from 30 to 50. I hope I can find out something causing the error. (Actually I’m not sure if the difference causes the error )

do i=1,nlnloc
do j=1,nlnj
jj=jloop(i,j)

call find(unknown(i),unknown(jj),iband)

if(iband == 5388449) write(8,'(a,g20.13,a,i5)')' SIM_TIME = ',sim_time , ' TID = ',TID
if(iband == 5388449) write(8,'(a,i10,a,i3,a,i10,a,i3,a,i10,a,i10,a,i3,a,i3,a,g50.43)') ' element = ',l,' unknown(',i,')=',unknown(i),' unknown(',jj,')=',unknown(jj),' iband=',iband,' amat(',i,',',jj,')= ',amat(i,jj)

if(iband.ne.0) then

if(iband == 5388449) write(8,'(a,i10,a,g50.43,a,i5,a,i5,a,g50.43)')'OffD r(',iband,')=', r(iband),' amat(',i,',',jj,')= ',amat(i,jj)

r(iband) = r(iband) + amat(i,jj)

if(iband == 5388449) write(8,'(a,i10,a,g50.43)')'OffD r(',iband,')=', r(iband)

if(iadpiv(unknown(i)) == 5388449) write(8,'(a,i10,a,g50.43,a,i10,a,g50.43,a,i10,a,i10)')'Diag r(',iadpiv(unknown(i)),')=',r(iadpiv(unknown(i))),' astor(',i,') = ',astor(i), ' iadpiv(',unknown(i),')=',iadpiv(unknown(i))

gb(unknown(i))= gb(unknown(i)) + rhs(i)
r(iadpiv(unknown(i))) = r(iadpiv(unknown(i))) + astor(i)

if(iadpiv(unknown(i)) == 5388449) write(8,'(a,i10,a,g50.43)')'Diag r(',iadpiv(unknown(i)),')=',r(iadpiv(unknown(i)))

== RESULTS ===========================================================

Serial code ::

Diag r( 5388449)= 0.0000000000000000000000000000000000000000000E-02
Off Dg amat( 5, 5)= + 0.3125000050446639660722825126981661015000000E-02
Diag astor( 5)= + 0.0162760416666666684736702874758407233500000E-02
_______________________________________________________________________________
Diag r( 5388449)= 0.3287760467113306182829202128914403147000000E-02
Off Dg amat( 4, 4)= + 0.3125000050446639227041956132779887412000000E-02
Diag astor( 4)= + 0.0162760416666666684736702874758407233500000E-02
_______________________________________________________________________________
Diag r( 5388449)= 0.6575520934226613233020142246232353500000000E-02
Off Dg amat( 4, 4)= + 0.5208333434226612018713709062467387412000000E-02
Diag astor( 4)= + 0.0162760416666666684736702874758407233500000E-02
_______________________________________________________________________________
Diag r( 5388449)= 1.1946614785119892207521097304834256650000000E-02
Off Dg amat( 2, 2)= + 0.3125000050446639227041956132779887412000000E-02
Diag astor( 2)= + 0.0162760416666666684736702874758407233500000E-02
_______________________________________________________________________________
Diag r( 5388449)= 1.5234375252233197522988561445345112590000000E-02
Off Dg amat( 1, 1)= + 0.3125000050446639227041956132779887412000000E-02
Diag astor( 1)= + 0.0162760416666666684736702874758407233500000E-02
_______________________________________________________________________________
Diag r( 5388449)= 1.8522135719346504573179501562663062940000000E-02
Off Dg amat( 1, 1)= + 0.5208333434226612018713709062467387412000000E-02
Diag astor( 1)= + 0.0162760416666666684736702874758407233500000E-02
_______________________________________________________________________________
Diag r( 5388449)= 2.3893229570239784415042194609668513290000000E-02

The summation error starts at 1.e-17

For parallel code ::

Diag r( 5388449)= 0.0000000000000000000000000000000000000000000E-02
Off Dg amat( 2, 2)= + 0.3125000050446639227041956132779887412000000E-02
Diag astor( 5)= + 0.0162760416666666684736702874758407233500000E-02
_______________________________________________________________________________
Diag r( 5388449)= 0.3287760467113305749148333134712629544000000E-02
Off Dg amat( 1, 1)= + 0.3125000050446639227041956132779887412000000E-02
Diag astor( 1)= + 0.0162760416666666684736702874758407233500000E-02
_______________________________________________________________________________
Diag r( 5388449)= 0.6575520934226611498296666269425259088000000E-02
Off Dg amat( 1, 1)= + 0.5208333434226612018713709062467387412000000E-02
Diag astor( 1)= + 0.0162760416666666684736702874758407233500000E-02
_______________________________________________________________________________
Diag r( 5388449)= 1.1946614785119888738074145351220067820000000E-02
Off Dg amat( 5, 5)= + 0.3125000050446639660722825126981661015000000E-02
Diag astor( 5)= + 0.0162760416666666684736702874758407233500000E-02
_______________________________________________________________________________
Diag r( 5388449)= 1.5234375252233194053541609491730923760000000E-02
Off Dg amat( 4, 4)= + 0.3125000050446639660722825126981661015000000E-02
Diag astor( 5)= + 0.0162760416666666684736702874758407233500000E-02
_______________________________________________________________________________
Diag r( 5388449)= 1.8522135719346501103732549609048874120000000E-02
Off Dg amat( 4, 4)= + 0.5208333434226612018713709062467387412000000E-02
Diag astor( 5)= + 0.0162760416666666684736702874758407233500000E-02
_______________________________________________________________________________
Diag r( 5388449)= 2.3893229570239780945595242656054324470000000E-02

The summation error starts at 1.e-16
Thus, the difference between the sumation from serial and parallel is

Diag r( 5388449)= 2.3893229570239784415042194609668513290000000E-02
Diag r( 5388449)= 2.3893229570239780945595242656054324470000000E-02

The summation error starts at 1.e-16
== RESULTS ===========================================================

The error problem solved.

The following one is the final version for saturated flow

!$OMP PARALLEL &
!$OMP DEFAULT(none) &
!$OMP FIRSTPRIVATE(nln,DENSITY,jloop,nlnloc,nlnj)&
!$OMP SHARED(gb,r,fluxmbal,hold,n_flow_pos,rhof,in) &
!$OMP SHARED(neloop,TETRA,intet,nn,DO_N_FLOW_CHECK) &
!$OMP SHARED(HNEW,elev,IADPIV,STEADY_STATE) &
!$OMP SHARED(PM_FLUID_SLICE,FLUID_PRESSURE,PERMEABILITY_INPUT) &
!$OMP SHARED(INACTIVE,IC,flexch_flux,FIRST_TIMESTEP,DO_MASS_BALANCE,DUAL_FLUID_SLICE) &
!$OMP SHARED(PROCESSING_3D,LINK_D2PM) &
!$OMP SHARED(link_pm2d,FVOL_NODE_IS,dual_continuum) &
!$OMP SHARED(FLUX_VOLUME_CALC,SATURATED,IPROP_DUAL,VOL_FRACD,sim_time) &
!$OMP PRIVATE(inloc,l,j,i,elem_orig,volf,jj,iband,totmacc_pm) &
!$OMP PRIVATE(totmacc_dual,fmass_init,ast,term,sum) &
!$OMP PRIVATE(unknown,nodemd2,total_head,amat,rhs,astor) &
!$OMP PRIVATE(masspm,amatdual,jacob_exch,rhsdual,rho,exchmat,TID) &
!$OMP PRIVATE(nthreads,nrem,nchunk,incr,offset,i_start,i_end,k,m,mm)

==importance of Parallel programming ==
Parallel programming has become important as a way to use multiple cores or CPUs, or cluster computing, effectively. Many people have lost sight of considerations which should come ahead of that, such as vectorization, when that is applicable. Once you decide to get involved in parallel program development, and want to choose a programming model to start out with, you get into an area where the fashions have swung back and forth in just the last 5 years. Some of the introductory course material is intended to give an overview of the options, so it already gets beyond the scope of this forum, unless you give more idea to help people make specific suggestions.OpenMP has regained momentum as a basis for shared memory parallelism; it's the basis for the Intel C, C++, and Fortran compiler auto-parallel options, Parallel Studio, and MKL threading. MPI also gained momentum; it's been running well ahead of general server market growth. For those who stay within C++, TBB has proven to be useful. You'll find a lot of both promotional and solid reference material on all of those in these forum areas.

== comparison between critical and without critical ==
1. Without critical region
SIM_TIME = 100.0000000000 TID = 1
element = 512001 unknown( 2)= 262442 unknown( 2)= 262442 iband= 5388449
amat( 2, 2)= 0.3125000050446639227041956132779887412000000E-02
astor( 2) = 0.1627766943361003151003123625480822900000000E-07 iadpiv( 262442)= 5388449

amat( 1, 1)= 0.5208333434226612018713709062467387412000000E-02
astor( 1) = 0.1627766943361003151003123625480822900000000E-07 iadpiv( 262442)= 5388449

amat( 5, 5)= 0.3125000050446639660722825126981661015000000E-02
astor( 5) = 0.1627766943361003151003123625480822900000000E-07 iadpiv( 262442)= 5388449

amat( 4, 4)= 0.3125000050446639660722825126981661015000000E-02
astor( 4) = 0.1627766943361003151003123625480822900000000E-07 iadpiv( 262442)= 5388449

2. With critical region
SIM_TIME = 100.0000000000 TID = 1
element = 512001 unknown( 2)= 262442 unknown( 2)= 262442 iband= 5388449
amat( 2, 2)= 0.3125000050446639227041956132779887412000000E-02
astor( 2) = 0.1627766943361003151003123625480822900000000E-07 iadpiv( 262442)= 5388449

amat( 1, 1)= 0.3125000050446639227041956132779887412000000E-02
astor( 1) = 0.1627766943361003151003123625480822900000000E-07 iadpiv( 262442)= 5388449

amat( 5, 5)= 0.3125000050446639660722825126981661015000000E-02
astor( 5) = 0.1627766943361003151003123625480822900000000E-07 iadpiv( 262442)= 5388449

amat( 4, 4)= 0.3125000050446639660722825126981661015000000E-02
astor( 4) = 0.1627766943361003151003123625480822900000000E-07 iadpiv( 262442)= 5388449

2009.04.13
HGS
This morning, I try coming up the parts causing difference for the results if I assigned critical section in assembly_element_3d.

The critical region should be assigned from
! ...Element dimensions to ! ...Form elemental matrix.

Studied about a lock function.

retested linux version. For 2 cpus, its computing time is 50% of that of 1 cpu, but, for 16 cpus, the computing time is 4 times more than the time taken by 2 cpus.
I try coming up the cause…

I also tested The threadprivate example.

By Assigning the following code in the front of the code, pglobal can be used as global array.
#include
#include
#endif
int *pglobal;
#pragma omp threadprivate(pglobal)

int main()
{
#pragma omp parallel \
shared(n,length) \
private(i,j,sum) \
private(TID,i_start,i_end,nthreads,nrem,nchunk,incr,offset)
{ }
}

I also compared its computing speed with general private. Theadprivate is relatively slower than the general private code.

#include
#include
#endif
int main()
{
int *pglobal;

#pragma omp parallel \
shared(n,length) \
private(i,j,sum) \
private(pglobal,TID,i_start,i_end,nthreads,nrem,nchunk,incr,offset)
{ }
}

Tree of Wisdom

Monday, April 13, 2009

Thursday, January 1, 2009

Random Fracture Generation

Followers

Blog Archive

About Me