How can I avoid getting notifications to my own posts?
Confirm string GetHashCode reflects string, not object
Hello Guest
  
  • Login
• Register…
• Start blog
  • Who, Where, When
• What can I do?
• What to Read?
  • Polls
• Avatars
• Interests
  • Cities and Countries
• Random blog
• Users search
  • Search
• Games
• Tests
• QAIX
  • Сообщества
• Talxy Chat
• Horoscope
• Online
 
Зарегистрируйся!

QAIX > .Net Development > Confirm string GetHashCode reflects string, not object 1 May 2002 22:19:21

  Recent blog posts: 
  They have birthday today: 
  Forums:   
  Discuss: 
  Recent forum topics: 
  Recent forum comments:
  Moderators:

Confirm string GetHashCode reflects string, not object

Erick Thompson 1 May 2002 22:19:21
 I need to check if a particular string has been seen before, so I was going
to maintain a list of hashcodes from the strings. However, I want to confirm
that the GetHashCode method returns a hashcode based on the string contents,
not on the string object. The reason I'm not completely sure is that I'm not
totally confident I know how string interning will effect things. Can I use
GetHashCode to identify a string?

Thanks,
Erick

You can read messages from the DOTNET archive, unsubscribe from DOTNET, or
subscribe to other DevelopMentor lists at http://discuss.deve­lop.com.

Add comment
Erick Thompson 26 April 2002 23:41:21 permanent link ]
 Upon further reflection, I think that using GetHashCode isn't a good idea to
check if I've seen a string before. I think I'll use a MD5 sig instead, so I
don't have to worry about collisions.

Erick

----- Original Message -----
From: "Erick Thompson" <ethompson@NBR.ORG>­
To: <DOTNET@DISCUSS.DEV­ELOP.COM>
Sent: Friday, April 26, 2002 12:44 PM
Subject: [DOTNET] Confirm string GetHashCode reflects string, not object

I need to check if a particular string has been seen before, so I was
going> to maintain a list of hashcodes from the strings. However, I want to
confirm> that the GetHashCode method returns a hashcode based on the string
contents,> not on the string object. The reason I'm not completely sure is that I'm
totally confident I know how string interning will effect things. Can I
GetHashCode to identify a string?>
Thanks,> Erick>
You can read messages from the DOTNET archive, unsubscribe from DOTNET, or> subscribe to other DevelopMentor lists at http://discuss.deve­lop.com.>

You can read messages from the DOTNET archive, unsubscribe from DOTNET, or
subscribe to other DevelopMentor lists at http://discuss.deve­lop.com.

Add comment
Ian Griffiths 28 April 2002 01:41:00 permanent link ]
 To answer your original question though, the string's hash function *has* to
be based on the string value, not the object identity. (So interning or
lack thereof will not affect you.) This is because the String class
overrides Equals. Any class that overrides Equals is *required* to override
GetHashCode too and to make the two consistent. (I.e. if Equals says two
objects are the same, they must have the same hash values.)

However, just because two strings have the same hash code doesn't mean they
are the same - GetHashCode *is* allowed to return false positives. (It's
just false negatives that aren't allowed.) In fact here is a guaranteed
valid-for-every-occ­asion implementation of GetHashCode():

public override int GetHashCode() { return 1; }

This doesn't break any rules. Of course it won't yield great performance if
you use this on objects which are acting as keys in a HashTable... And
it'll get you rather a lot of false positives for comparison. :-)­


But *any* hash has some chance of collision, unless the hash actually has at
least as many bits as the data being hashed. The larger the hash, the lower
the probability of collision will be (given a suitably good hash algorithm
of course...) but the probability never drops to zero. MD5 is believed to
be a good hash algorithm, and it has a length of 128 bytes (i.e. 1024 bits)
IIRC, which gives a pretty low chance of a collision.

But the question I would ask is are you actually wasting more space by
storing MD5 hashes than you would be by just remembering the strings in the
first place? Unless your strings are on average longer than 128 bytes (the
size of an MD5 hash) then it will actually be rather more expensive to do
what you're doing than just to store all the strings... Or is there some
reason you can't store the strings?


--
Ian Griffiths
DevelopMentor

----- Original Message -----
From: "Erick Thompson" <ethompson@NBR.ORG>­

Upon further reflection, I think that using GetHashCode isn't a good idea
check if I've seen a string before. I think I'll use a MD5 sig instead, so
don't have to worry about collisions.>
Erick>
----- Original Message -----> From: "Erick Thompson" <ethompson@NBR.ORG>­>
I need to check if a particular string has been seen before, so I> > was going to maintain a list of hashcodes from the strings.> > However, I want to confirm that the GetHashCode method> > returns a hashcode based on the string contents, not on the> > string object. The reason I'm not completely sure is that I'm> > not totally confident I know how string interning will effect things.> > Can I use GetHashCode to identify a string?

You can read messages from the DOTNET archive, unsubscribe from DOTNET, or
subscribe to other DevelopMentor lists at http://discuss.deve­lop.com.

Add comment
Peter Stephens 28 April 2002 06:27:33 permanent link ]
 
Ian Griffiths spake:> MD5 is believed to> be a good hash algorithm, and it has a length of 128 bytes (i.e.> 1024 bits)

Actually MD5 is 128 *bits*, that is 16 bytes...


--
Peter

You can read messages from the DOTNET archive, unsubscribe from DOTNET, or
subscribe to other DevelopMentor lists at http://discuss.deve­lop.com.

Add comment
Erick Thompson 29 April 2002 20:34:21 permanent link ]
 
But the question I would ask is are you actually wasting more space by> storing MD5 hashes than you would be by just remembering the strings in
first place? Unless your strings are on average longer than 128 bytes
(the> size of an MD5 hash) then it will actually be rather more expensive to do> what you're doing than just to store all the strings... Or is there some> reason you can't store the strings?

There is no reason not to store the strings, except that I was a little
concerned about memory useage. The number of strings is going to fairly
large, so I was hoping for some savings using a hash. Given that the 128
bytes is actually a good sized string, and the possibility for false
positives, I think I may go back to storing the actual strings.

Thanks for the good overview of GetHashCode. I think I need to up the time I
spend with Richter's book.

Erick

You can read messages from the DOTNET archive, unsubscribe from DOTNET, or
subscribe to other DevelopMentor lists at http://discuss.deve­lop.com.

Add comment
Joel Mueller 1 May 2002 00:23:19 permanent link ]
 An MD5 hash is actually only 16 bytes, not 128. Given that each
character in a string is two bytes (unicode), if your strings are
typically longer than 8 characters, you ought to get a memory usage
saving from using the MD5 hash.
-----Original Message-----> From: dotnet discussion [mailto:D­OTNET@DISCUSS.DEVELOP.COM]> On Behalf Of Erick Thompson> Sent: Monday, April 29, 2002 12:34 PM> To: DOTNET@DISCUSS.DEVE­LOP.COM> Subject: Re: [DOTNET] Confirm string GetHashCode reflects> string, not object>
There is no reason not to store the strings, except that I> was a little concerned about memory useage. The number of> strings is going to fairly large, so I was hoping for some> savings using a hash. Given that the 128 bytes is actually a> good sized string, and the possibility for false positives, I> think I may go back to storing the actual strings.>
Thanks for the good overview of GetHashCode. I think I need> to up the time I spend with Richter's book.>
Erick>
You can read messages from the DOTNET archive, unsubscribe> from DOTNET, or subscribe to other DevelopMentor lists at> http://discuss.deve­lop.com.>

You can read messages from the DOTNET archive, unsubscribe from DOTNET, or
subscribe to other DevelopMentor lists at http://discuss.deve­lop.com.

Add comment
Ian Griffiths 1 May 2002 22:19:21 permanent link ]
 *goes back and actually reads the documentation for HashAlgorithm.HashS­ize*

"Gets the size of the computer hash code in *bits*." (my emphasis)

D'Oh!


--
Ian Griffiths
DevelopMentor

----- Original Message -----
From: "Joel Mueller" <jmueller@SWIFTK.CO­M>

An MD5 hash is actually only 16 bytes, not 128. Given that each> character in a string is two bytes (unicode), if your strings are> typically longer than 8 characters, you ought to get a memory usage> saving from using the MD5 hash.>
-----Original Message-----> > From: Erick Thompson> >
There is no reason not to store the strings, except that I> > was a little concerned about memory useage. The number of> > strings is going to fairly large, so I was hoping for some> > savings using a hash. Given that the 128 bytes is actually a> > good sized string, and the possibility for false positives, I> > think I may go back to storing the actual strings.> >
Thanks for the good overview of GetHashCode. I think I need> > to up the time I spend with Richter's book.

You can read messages from the DOTNET archive, unsubscribe from DOTNET, or
subscribe to other DevelopMentor lists at http://discuss.deve­lop.com.

Add comment
 

Add new comment

As:
Login:  Password:  
 
 
  
 
Пожалуйста, относитесь к собеседникам уважительно, не используйте нецензурные слова, не злоупотребляйте заглавными буквами, не публикуйте рекламу и объявления о купле/продаже, а также материалы нарушающие сетевой этикет или УК РФ.


QAIX > .Net Development > Confirm string GetHashCode reflects string, not object 1 May 2002 22:19:21

see also:
IIS won't let me upload to _private
Included Content
Attachment with Frontpage Formmail
пройди тесты:
see also:
How to burn iTunes movies to DVD
Upgraded Perfect Solution For iPod Fans
Joke your friends with prankish…

  Copyright © 2001—2008 QAIX
Idea: Miсhael Monashev
Помощь и задать вопросы можно в сообществе support.qaix.com.
Сообщения об ошибках оставляем в сообществе bugs.qaix.com.
Предложения и комментарии пишем в сообществе suggest.qaix.com.
Информация для родителей.
Write us at:
If you would like to report an abuse of our service, such as a spam message, please .