What is "My quotes"?
sysmap_64bit: rmap ovflo, lost
Hello Guest
  
  • Login
• Register…
• Start blog
  • Who, Where, When
• What is interesting here?
• Duels
  • Polls
• Avatars
• Interests
  • Cities and Countries
• Random blog
• Users search
  • Search
• Games
• Tests
• QAIX
  • Сообщества
• Talxy Chat
• Horoscope
• Online
 
Register!

QAIX > Oracle database development > sysmap_64bit: rmap ovflo, lost 4 May 2006 01:04:59

  Top users: 
  Recent blog posts: 
  They have birthday today: 
  Forums:   
  Discuss: 
  Recent forum topics: 
  Recent forum comments:
  Модератор:

sysmap_64bit: rmap ovflo, lost

Guest 27 April 2006 01:21:28
 helpful gurus:

hp-ux 11.11 4 processor rpr3340 box crashed last night. Trying to
figure out how to prevent this in the future. Oracle 9.2.0.6.

The uptime was a little over 2 months. Looking at syslog, I see lots
(>17K lines) of:
Apr 25 20:47:34 ZEUS vmunix: sysmap_64bit: rmap ovflo, lost
[68419543,68419559)

They started at the exact time my Oracle RMAN backup started. The
script that does the backup does a number of things, such as remove old
backup files, run the RMAN script (nocatalog), then compress some of
the backup files onto an nfs device. The RMAN completed, system
crashed during compress.

Looking at
http://forums1.itrc­.hp.com/service/foru­ms/questionanswer.do­?threadId=70397
(neither of the links in there work for me), I now have the idea that
something fragmented kernel memory. But what? I was about to write a
script to periodically capture the largest processes while RMAN is
running, but then I started wondering if it is not really RMAN, but
something previous to RMAN that sets up the problem. Looking again at
syslog, I see the rmaps happening on a few days in April at various
times during the day, once during production day and 7 times off-hours
(sometimes during RMAN, sometimes during compress), April 10-15, but no
other times since boot. If it were RMAN, wouldn't I see the problem
whenever RMAN ran? And why this time did it go nuts and crash the
system, but not the other times?

Using the 'UNIX95= ps -e -o "vsz args" |sort' command, I see that some
third party application processes get big: 132640K is the biggest just
now, (those are killed off nightly if the users forget to log off - but
later than this backup). So I tried 'ps -efl|sort -nk10|tail -10' ,
which shows that same process as 29527 pages (and lets me see exactly
who it is). But I don't quite get what vsz and sz are telling me, I
guess I need to subtract some shared memory? man ps isn't too clear.

I don't see how to figure which process is fragmenting memory. Don't
have glance. Should I be looking for processes that get bigger and
smaller, rather than the largest? There is a transaction monitor that
appears to be doing that. Or should I watch for something continually
growing? I don't know of anything that has changed on this system
specific to this month, and don't really see how a memory leak could
come and go and come back big when users and cron do the same thing
day-to-day.

Is it really going to be necessary to reboot this thing monthly?

Any help appreciated, I'm trying to do as much as possible before the
hardware folk start interrupting production.

This is pretty typical swapinfo:

# swapinfo -am
Mb Mb Mb PCT START/ Mb
TYPE AVAIL USED FREE USED LIMIT RESERVE PRI NAME
dev 4096 633 3463 15% 0 - 1
/dev/vg00/lvol2
reserve - 3400 -3400
memory 6320 4499 1821 71%

TIA

jg
--
@home.com is bogus.
s/home.com/cox.net/­

Add comment
Hpuxrac 27 April 2006 18:02:14 permanent link ]
 This looks like an issue to work on with hp support and not "probably"
necessarily oracle related.

I would open a support item with hp and work it from that angle first.

Add comment
Guest 27 April 2006 18:25:15 permanent link ]
 
hpuxrac wrote:> This looks like an issue to work on with hp support and not "probably"> necessarily oracle related.>
I would open a support item with hp and work it from that angle first.

Agreed. I was primarily posting this to comp.sys.hp.hpux, but you
never know who might have run into this. I had a vague memory of RMAN
and memory leakage, but it was probably some old bug from long ago.
Looks like this might be the kick they need to get glance (which I've
been recommending for 5 years). Looks like some weekend work for the
admin.

jg
--
@home.com is bogus.
http://www.flickr.c­om/photos/mudshark/1­17551768/in/set-7205­7594090059726/

Add comment
Don Morris 27 April 2006 18:43:16 permanent link ]
 joel-garry@home.com wrote:> helpful gurus:>
hp-ux 11.11 4 processor rpr3340 box crashed last night. Trying to> figure out how to prevent this in the future. Oracle 9.2.0.6.>
The uptime was a little over 2 months. Looking at syslog, I see lots> (>17K lines) of:> Apr 25 20:47:34 ZEUS vmunix: sysmap_64bit: rmap ovflo, lost> [68419543,68419559)

Ouch.. kernel virtual address space is so fragmented it is getting
dropped on the floor.

Next chance you get, double the nsysmap64 kernel tunable -- that's
a workaround... not a definite fix.

More information:
http://docs.hp.com/­en/TKP-90202/re68.ht­ml
They started at the exact time my Oracle RMAN backup started. The> script that does the backup does a number of things, such as remove old> backup files, run the RMAN script (nocatalog), then compress some of> the backup files onto an nfs device. The RMAN completed, system> crashed during compress.

My bet? This backup results in lots of little I/O buffers being
created which are freed in a very asynchronous manner -- with more than
a few taking a *long* time to be freed.

The easiest way for the kernel virtual address space to get fragmented
is to have lots of little dynamic allocations be made... and then lots
of little pieces to be freed back which can't form their original larger
ranges (coalesce) because pieces are missing. I/O buffers (being small
bits of kernel dynamic memory) are unsurprisingly good at this.
Looking at> http://forums1.itrc­.hp.com/service/foru­ms/questionanswer.do­?threadId=70397> (neither of the links in there work for me), I now have the idea that> something fragmented kernel memory. But what? I was about to write a> script to periodically capture the largest processes while RMAN is> running, but then I started wondering if it is not really RMAN, but> something previous to RMAN that sets up the problem. Looking again at> syslog, I see the rmaps happening on a few days in April at various> times during the day, once during production day and 7 times off-hours> (sometimes during RMAN, sometimes during compress), April 10-15, but no> other times since boot. If it were RMAN, wouldn't I see the problem> whenever RMAN ran? And why this time did it go nuts and crash the> system, but not the other times?

You only see the problem when the data structures to hold free kernel
virtual address space overflow. In other words, you may be fragmented
every time RMAN runs -- but just not fragmented *enough*. Alternately,
it may be that you have an intermittent I/O timeout or somesuch which
causes some runs of RMAN to hold on to every Nth buffer for a long time
and cause the critical fragmentation problem -- where your other runs
manage to release the buffers together and they coalesce back up without
much fragmentation.
Using the 'UNIX95= ps -e -o "vsz args" |sort' command, I see that some> third party application processes get big: 132640K is the biggest just> now, (those are killed off nightly if the users forget to log off - but> later than this backup). So I tried 'ps -efl|sort -nk10|tail -10' ,> which shows that same process as 29527 pages (and lets me see exactly> who it is). But I don't quite get what vsz and sz are telling me, I> guess I need to subtract some shared memory? man ps isn't too clear.

User virtual address space is managed completely differently. ps isn't
going to help you here at all... (although if you're running lots and
lots of processes, that can consume a lot of dynamic memory to manage
the process metadata -- and can lead to fragmentation as well. Hence why
nsysmap64 defaults to being based off of nproc for nproc > 800).
I don't see how to figure which process is fragmenting memory. Don't> have glance. Should I be looking for processes that get bigger and> smaller, rather than the largest? There is a transaction monitor that> appears to be doing that. Or should I watch for something continually> growing? I don't know of anything that has changed on this system> specific to this month, and don't really see how a memory leak could> come and go and come back big when users and cron do the same thing> day-to-day.

You really don't have the tools to find this -- HP Support does. You
should have gotten a dump when the panic occurred, I highly recommend
using your support channels to track down the root cause of the
fragmentation and see what they recommend.
Is it really going to be necessary to reboot this thing monthly?

You can mitigate it by increasing nsysmap64 as mentioned above -- but
you still need Support to help figure out the root cause to make sure
increasing the sysmap capacity isn't just applying a bandaid.

Don
Any help appreciated, I'm trying to do as much as possible before the> hardware folk start interrupting production.>
This is pretty typical swapinfo:>
# swapinfo -am> Mb Mb Mb PCT START/ Mb> TYPE AVAIL USED FREE USED LIMIT RESERVE PRI NAME> dev 4096 633 3463 15% 0 - 1> /dev/vg00/lvol2> reserve - 3400 -3400> memory 6320 4499 1821 71%>
jg
Add comment
Hpuxrac 27 April 2006 19:18:23 permanent link ]
 I have more than vague memories of rman memory leaks which along with
other bugs ( eccentricities maybe ? ) stopped it from being very useful
until late 8.1.7 or ( better ) 9i.

But your maintenance level is way above that.

Add comment
Guest 4 May 2006 01:04:59 permanent link ]
 
Don Morris wrote:> joel-garry@home.com­ wrote:

Sorry I didn't notice this post earlier, google sometimes doesn't show
everything without some prodding. Thanks, it was very helpful! A
couple of interspersed comments for the future curious:
helpful gurus:> >
hp-ux 11.11 4 processor rpr3340 box crashed last night. Trying to> > figure out how to prevent this in the future. Oracle 9.2.0.6.> >
The uptime was a little over 2 months. Looking at syslog, I see lots> > (>17K lines) of:> > Apr 25 20:47:34 ZEUS vmunix: sysmap_64bit: rmap ovflo, lost> > [68419543,68419559)>­
Ouch.. kernel virtual address space is so fragmented it is getting> dropped on the floor.>
Next chance you get, double the nsysmap64 kernel tunable -- that's> a workaround... not a definite fix.

Workaround better than blind reboots!

"pathological workloads" I love it! Now why didn't I find this when I
searched for "rmap ovfl?" Yet another mystery of the universe. Rule
of Thumb in docs... (that's a smiley for some of us Oracle folk, there
have been heated discussions about it regarding configuration and
performance tuning).
They started at the exact time my Oracle RMAN backup started. The> > script that does the backup does a number of things, such as remove old> > backup files, run the RMAN script (nocatalog), then compress some of> > the backup files onto an nfs device. The RMAN completed, system> > crashed during compress.>
My bet? This backup results in lots of little I/O buffers being> created which are freed in a very asynchronous manner -- with more than> a few taking a *long* time to be freed.

That sounds like a good bet. We also have transaction processors which
run continuously, I don't know what they are doing, but I'm always
suspicious of such things. Support calls to both vendors involved (app
and tp) asking about memory leaks or fragmentation have been met with
resounding silence.
The easiest way for the kernel virtual address space to get fragmented> is to have lots of little dynamic allocations be made... and then lots> of little pieces to be freed back which can't form their original larger> ranges (coalesce) because pieces are missing. I/O buffers (being small> bits of kernel dynamic memory) are unsurprisingly good at this.

What? No garbage collection? Ah well, not for me to question the
gods.
Looking at> > http://forums1.itrc­.hp.com/service/foru­ms/questionanswer.do­?threadId=70397> > (neither of the links in there work for me), I now have the idea that> > something fragmented kernel memory. But what? I was about to write a> > script to periodically capture the largest processes while RMAN is> > running, but then I started wondering if it is not really RMAN, but> > something previous to RMAN that sets up the problem. Looking again at> > syslog, I see the rmaps happening on a few days in April at various> > times during the day, once during production day and 7 times off-hours> > (sometimes during RMAN, sometimes during compress), April 10-15, but no> > other times since boot. If it were RMAN, wouldn't I see the problem> > whenever RMAN ran? And why this time did it go nuts and crash the> > system, but not the other times?>
You only see the problem when the data structures to hold free kernel> virtual address space overflow. In other words, you may be fragmented> every time RMAN runs -- but just not fragmented *enough*. Alternately,> it may be that you have an intermittent I/O timeout or somesuch which> causes some runs of RMAN to hold on to every Nth buffer for a long time> and cause the critical fragmentation problem -- where your other runs> manage to release the buffers together and they coalesce back up without> much fragmentation.

Yes, we've also seen intermittent I/O timeouts during times of heavy
I/O (not just rman, but also compressing files and high database redo
activity), all advice about that has been not to worry about it. I've
been trying to use this as an argument to move away from RAID-5, but
that's another discussion. What you are saying here ties lots of
little bits of evidence together.
Using the 'UNIX95= ps -e -o "vsz args" |sort' command, I see that some> > third party application processes get big: 132640K is the biggest just> > now, (those are killed off nightly if the users forget to log off - but> > later than this backup). So I tried 'ps -efl|sort -nk10|tail -10' ,> > which shows that same process as 29527 pages (and lets me see exactly> > who it is). But I don't quite get what vsz and sz are telling me, I> > guess I need to subtract some shared memory? man ps isn't too clear.>
User virtual address space is managed completely differently. ps isn't> going to help you here at all... (although if you're running lots and> lots of processes, that can consume a lot of dynamic memory to manage> the process metadata -- and can lead to fragmentation as well. Hence why> nsysmap64 defaults to being based off of nproc for nproc > 800).

Thanks for clarifying that. I've noticed Oracle's recommendations for
nproc vary over time.
I don't see how to figure which process is fragmenting memory. Don't> > have glance. Should I be looking for processes that get bigger and> > smaller, rather than the largest? There is a transaction monitor that> > appears to be doing that. Or should I watch for something continually> > growing? I don't know of anything that has changed on this system> > specific to this month, and don't really see how a memory leak could> > come and go and come back big when users and cron do the same thing> > day-to-day.>
You really don't have the tools to find this -- HP Support does. You> should have gotten a dump when the panic occurred, I highly recommend> using your support channels to track down the root cause of the> fragmentation and see what they recommend.

Noted. I think that stuff all got turned off by habit from the days of
K-class and Autoraid.
Is it really going to be necessary to reboot this thing monthly?>
You can mitigate it by increasing nsysmap64 as mentioned above -- but> you still need Support to help figure out the root cause to make sure> increasing the sysmap capacity isn't just applying a bandaid.

It's posts like yours that really make me appreciate usenet. I can't
thank you enough.

jg
--
@home.com is bogus.
http://www.wacky-pa­cks.com/crazylabels.­html

Add comment
 

Add new comment

As:
Login:  Password:  
 
 
  
 
Пожалуйста, относитесь к собеседникам уважительно, не используйте нецензурные слова, не злоупотребляйте заглавными буквами, не публикуйте рекламу и объявления о купле/продаже, а также материалы нарушающие сетевой этикет или законы РФ. Ваш ip-адрес записывается.


QAIX > Oracle database development > sysmap_64bit: rmap ovflo, lost 4 May 2006 01:04:59

see also:
[JBoss Portal] - Re: how can i start my…
[JBoss jBPM] - BPEL - beta2 "no port…
[Security & JAAS/JBoss] - How to, SSO…
pass tests:
see also:
How to Convert QuickTime to AVI, MPEG…
How to Convert audio files between…
How to use Driver Robot to update…

  Copyright © 2001—2010 QAIX
Идея: Монашёв Михаил.
Авторами текстов, изображений и видео, размещённых на этой странице, являются пользователи сайта.
See Help and FAQ in the community support.qaix.com.
Write in the community about the bugs you have noticedbugs.qaix.com.
Write your offers and comments in the communities suggest.qaix.com.
Information for parents.
Пишите нам на .
If you would like to report an abuse of our service, such as a spam message, please .
Если Вы хотите пожаловаться на содержимое этой страницы, пожалуйста .