This patch implements the .trim() method for strings.
Now that I'm reading S29, I see there is no .trim() method there. I got that
because it was referenced in pugs in the cookbook (not in tests, though) and I
was trying to get the examples to run. Bummer
Sorry for constant spamming, but now that I've put disparate pieces together, I see what's going on here. I'll stop soon, but I am *sick* of rewriting the trim() function over and over
There are no tests because it's not in the spec. If there's a spec, I know where to write the tests and will happily commit tests for them to Pugs and I'll submit a new patch against any-str.pir and t/spectest.data to get them to pass.
: (Replying to p6l instead of p6c as requested.) : : On Mon, Apr 04, 2005 at 10:39:16AM -0700, Larry Wall wrote: : > (Now that builtins are just functions out in * space, we can probably : > afford to throw a few more convenience functions out there for common : > operations like word splitting and whitespace trimming. (Specific : > proposals to p6l please.))
So even though it's not in the spec, it seems like something Larry is not entirely opposed to (or wasn't back in 2005). So here's my proposal (copied to p6l):
=item trim
our Str multi Str::trim ( Str $string )
Removes leading and trailing whitespace from a string.
Setting leading or trailing to false (they default to true) would result in either leading or trailing whitespace not being trimmed. Setting both to false would be a no-op.
Removes leading and trailing whitespace from a string.
=cut
I could optionally make the following work:
$string.trimleading<0>);
$string.trimtrailing<0>);
Setting leading or trailing to false (they default to true) would result in either leading or trailing whitespace not being trimmed. Setting both to false would be a no-op.
Unless someone protests loudly, I can add this to S29, and I (or someone else with tuits) can implement it in Rakudo.
Removes leading and trailing whitespace from a string.
=cut
I could optionally make the following work:
$string.trimleading<0>);
$string.trimtrailing<0>);
Setting leading or trailing to false (they default to true) would result in
either leading or trailing whitespace not being trimmed. Setting both to false
would be a no-op.
Unless someone protests loudly, I can add this to S29, and I (or
someone else with tuits) can implement it in Rakudo.
I've already submitted a patch for Rakudo which implements this for the trivial $string.trim and trim($string) case. The optional :leading and :trailing parameters aren't there.
I'm happy to finish the work according to whatever spec is agreed upon. I want this badly enough that it's important to me
Alternatively, those could be ltrim() and rtrim(). If you need to dynamically determine what you're going to trim, you'd couldn't just set variables to do it, though. You'd have to figure out which methods to call. Or all could be allowed and $string.trimleading<0>) could all $string.rtrim internally.
On Mon, Jan 12, 2009 at 07:01:25AM -0800, Ovid wrote:
I could optionally make the following work:
$string.trimleading<0>);
$string.trimtrailing<0>);
Alternatively, those could be ltrim() and rtrim().
'left' and 'right' are probably not the right names for functions which trim leading and/or trailing space, since their meanings get somewhat ambiguous if a language renders right-to-left instead of left-to-right or vice-versa
On Mon, Jan 12, 2009 at 05:04:50AM -0800, Ovid wrote:
: ...the trivial $string.trim and trim($string) case.
Hmm, I'd think .trim should work like .chomp, and return the trimmed
string without changing the original. You'd use $str.=trim to do it
in place.
In the pir, doesn't the "s = self" line copy self, thus ensuring that I'm changing "s" and not "self"? Or do I need "s = clone self" (or however it's written).
Can't say I really like the negated options though. They smell funny.
Agreed, but ltrim and rtrim will disappoint Israelis and dyslexics alike. Suggestions welcome as I can't think of anything better.
Setting leading or trailing to false (they default to true) would
result in either leading or trailing whitespace not being trimmed
Alternatively, those could be ltrim() and rtrim(). If you need to dynamically determine what you're going to trim, you'd couldn't just set variables to do it, though. You'd have to figure out which methods to call. Or all could be allowed and $string.trimleading<0>) could all $string.rtrim internally.
I like having the options, I think. If the default value of :trailing was /<ws>*/, then someone could change it to /<ws>*\#\N*/ to chomp trailing line comments. (Assuming they don't simply redefine the <ws> token)
Geoffrey Broadwell 12 January 2009 20:33:32 [ permanent link ]
On Mon, 2009-01-12 at 07:01 -0800, Ovid wrote:
----- Original Message ----
I could optionally make the following work:
$string.trimleading<0>);
$string.trimtrailing<0>);
Alternatively, those could be ltrim() and rtrim(). If you need to dynamically determine what you're going to trim, you'd couldn't just set variables to do it, though. You'd have to figure out which methods to call. Or all could be allowed and $string.trimleading<0>) could all $string.rtrim internally.
When I saw your proposed syntax above, instead of reading "don't trim leading/trailing whitespace", I read "change the definition of 'whitespace' to 'codepoint 0' for leading/trailing".
That of course raises the question of how one *would* properly override trim's concept of whitespace ....
On Mon, Jan 12, 2009 at 09:33:32AM -0800, Geoffrey Broadwell wrote: : That of course raises the question of how one *would* properly override : trim's concept of whitespace ....
Well, given that .trim is essentially just .comb(/\S.*\S/), which in turn is really just m:g/(\S.*\S)/, I don't see much need for alternate trimmings.
On Mon, Jan 12, 2009 at 05:04:50AM -0800, Ovid wrote:
: ...the trivial $string.trim and trim($string) case.
Hmm, I'd think .trim should work like .chomp, and return the trimmed
string without changing the original. You'd use $str.=trim to do it
in place.
Can't say I really like the negated options though. They smell funny.
Larry
I'm +1 on adding a trim, I do a lot of csv import (with trimming) in perl 5.
On a side note, between modifying the original and returning a fixed string, which one would we expect to be faster? And I assume either would be faster than the regex method of trimming?
Since that's RTL (Right To Left) text, should ltrim remove the leading or trailing whitespace?
I like Jonathan's trim_start and trim_end.
Side note: I'm implementing the tests now, but only for bog-standard .trim. I won't do the rest until we settle this.
So far I only have one failing test:
is_deeply(trim(()), (), "trim on empty list");
Results in:
not ok 10 - trim on empty list # have: "" # want: []
Note that this output is from my locally hacked version of Test.pm which is kind enough to tell you what the failure is. I'll submit a patch for that later.
Also `:!start` to imply `:end` unless `:!end` (which in turn
implies `:start` unless `:!end`)?
Ugh, forget this, I was having a blank moment.
Actually that makes me wonder now whether it’s actually a good idea at all to make the function parametrisable at all. Even `.ltrim.rtrim` is shorter and easier than `.trimstart,:end)`! Plus if there are separate `.ltrim` and `.rtrim` functions it would be better to implement `.trim` by calling them rather than vice versa, so it wouldn’t even be less efficient two make two calls rather than a parametrised one.
And if anyone really needs to be able to decide the trimming based on flags, they can do that themselves with `.ltrim`/ `.rtrim` with rather little code anyway.
So I question the usefulness of parametrisation here.
Since that's RTL (Right To Left) text, should ltrim remove the
leading or trailing whitespace?
I like Jonathan's trim_start and trim_end.
Let me ask you first: does a string that runs Right-to-Left start at the left and end at the right or start at the right and end at the left?
Now to answer your question, *I* know where the *left* side is in a string that runs from right to left: it’s at the *left*, same as if the string ran from the left to the right, because left is at the *left*.
I mean, if the the meaning of “left” was inverted by “right-to-left”, in which it is contained, then what does the latter even mean? (OK, we’re on a Perl6 list so I guess the answer is it’s a juction… )
Clearly one of us has an inversed sense of which pair of terms is ambiguous, and I don’t think it’s me…
Let me ask you first: does a string that runs Right-to-Left start
at the left and end at the right or start at the right and end at
the left?
Now to answer your question, *I* know where the *left* side is in
a string that runs from right to left: it’s at the *left*, same
as if the string ran from the left to the right, because left is
at the *left*.
I see your point, but it complicates the internals of the trim method because then I have to detect if a string is RTL and reverse it, then unreverse it when done (or something conceptually similar).
I'd rather not toss in said complications for a problem space I don't know very well.
On the other hand, this is a core feature, not a quick CPAN jobbie, so it's important to get it RIGHT or it will be LEFT out. (I kill me. I really do
Plus if there are separate `.ltrim` and `.rtrim` functions it
would be better to implement `.trim` by calling them rather
than vice versa, so it wouldn’t even be less efficient two
make two calls rather than a parametrised one.
Depends on your string implementation if they're
non-destructive, since they potentially have to copy the middle
of the string twice if your implementation can't support one
string pointing into the middle of another. And again, I think
.trim should be non-destructive, and .=trim should be the
destructive version.
Sure, but that doesn’t affect my point: if `.trim` is implemented as calling `.ltrim` + `.rtrim`, as I assumed, then all ways of trimming a string at both ends will be equally efficient or inefficient depending on whether or not the implementation supports offsetted strings.
And now I see yours. I was visualising the memory layout of a string, wherein a right-to-left string gets displayed from the right end of it’s in-memory representation so “left” and “right” are absolutes in that picture. But of course RTL reverses the relation of left/right in memory and left/right on screen.
I think a week’s worth of wolf sleep is catching up to me, sorry.
I've just committed the pugs tests for trim. However, it's just 'trim' with no left/right, leading/trailing, Catholic/Protestant implementation. I'll submit a patch for trim with the spectest data updated and work on the rest after the dust settles.
On Mon, Jan 12, 2009 at 06:36:55PM +0100, Moritz Lenz wrote: : Carl M sak wrote: : > Jonathan (>), Ovid (>>), Larry (>>>): : >>>> Can't say I really like the negated options though. They smell funny. : >>> : >>> Agreed, but ltrim and rtrim will disappoint Israelis and dyslexics alike. : >>> Suggestions welcome as I can't think of anything better. : >> : >> The .Net framework calls 'em TrimStart and TrimEnd (and has a Trim that does : >> both). So maybe trim_start and trim_end if we wanted to take that lead... : > : > How about .trimstart) and .trimend)? : : That would be my favourite: : : our Str multi method trim (Str $string:, :start = True, :end = True)
Er, that would make .trimstart) also default :end to True...
Well, except it won't parse either. For at least two reasons.
: So $str.=trim would trim both start and end, and if you want only one, : you can say $str.=trim!end);.
HEY! Don't ignore my nose. At least, not this time.
Switches should almost never default to true. It's more like:
our Str multi method trim (Str $string: start = False, end = False, both = not $start || $end)
or really, since "start/end" really mean "startonly/endonly":
our Str multi method trim (Str $string: :start($start_only) = False, :end($end_only) = False) { my $do_start = not $end_only; my $do_end = not $start_only; ... }
I really shouldn't be participating in the bikeshedding though...
Mark J. Reed 13 January 2009 03:05:19 [ permanent link ]
On Mon, Jan 12, 2009 at 4:19 PM, Aristotle Pagaltzis <pagaltzis@gmx.de> wrote:
Maybe :h and :t (head/tail).
I like the echo of the csh pathname modifiers there. Unless that confuses people into thinking that .trim has something to do with pathname canonicalization...
Jason Switzer 13 January 2009 03:19:20 [ permanent link ]
On Mon, Jan 12, 2009 at 9:07 AM, jesse <jesse@fsck.com> wrote:
'left' and 'right' are probably not the right names for functions which
trim leading and/or trailing space, since their meanings get somewhat
ambiguous if a language renders right-to-left instead of left-to-right
or vice-versa
I'm in favor of using the proposed syntax, but I will agree with lwall that it seems like overkill to have the specialized trims. .trim should be non-destructive and .=trim should be destructive (these seems intuitive).
Some languages run in the direction of left-to-right, some in the direction right-to-left (some even top-to-bottom). No matter what language you speak or which direction your native language reads, left is the same for everyone as well as right.
If we wanted to simplify matters, use :left, :right and :both. Those have the same meaning everywhere.
If we wanted language dependent version, use :leading, :trailing, and :both. That will require each implementation properly handle the language variations.
By the way, good work on this. Everyone loves useful string functions.
If we wanted language dependent version, use :leading, :trailing, and :both.
That will require each implementation properly handle the language
variations.
I think :start and :end are my favorites. Huffman++ (maybe :begin and :end for consistency?).
Still raises the question of what to do with arrays of hashes of arrays with @array >> .= trim;
I can't trim keys of pairs because they're used as unique identifiers in hashes and conflicts will occur. So recursive trimming needs have a special case for keys or it needs to not be allowed (and thus fail with AoHoA and similar complex data structures).
By the way, good work on this. Everyone loves useful string functions.
Jason Switzer 13 January 2009 03:39:41 [ permanent link ]
On Mon, Jan 12, 2009 at 6:26 PM, Ovid <publiustemp-perl6language2@yahoo.com>wrote:
----- Original Message ----
From: jason switzer <jswitzer@gmail.com>
If we wanted language dependent version, use :leading, :trailing, and
:both.
That will require each implementation properly handle the language
variations.
I think :start and :end are my favorites. Huffman++ (maybe :begin and :end
for consistency?).
My best advise is to keep it consistant. .chomp makes references to chomping from the end, not the trailing. .substr makes reference to start. I think that's better to just find terminology that has already been agreed upon and keep abusing it.
and means "Hear O Israel, the Lord is our God, the Lord is One".
Actually that's the response: "blessed be the name of the glory of His kingdom for ever and ever" (and your guess is as good as anyone else's as to the actual meaning of that
-- brandon s. allbery [solaris,freebsd,perl,pugs,haskell] allbery@kf8nh.com system administrator [openafs,heimdal,too many hats] allbery@ece.cmu.edu electrical and computer engineering, carnegie mellon university KF8NH