|
I'm never going to figure it out!
Can I get some hints on a solution?
|
|
|
|
|
So, this is my attempt
I tried the following regex in Java 8, (?i)(Helló) (Wórld) which matches HeLlÓ WóRlD the replacement string \U$1 \L$2 should become HELLÓ wórld, but it doesn't work.
Can someone please help me???
|
|
|
|
|
In C# (for example) there is no "proper case": one changes to "lower case", then uses "title case".
TextInfo.ToTitleCase(String) Method (System.Globalization) | Microsoft Docs
Quote: Converts the specified string to title case (except for words that are entirely in uppercase, which are considered to be acronyms).
It was only in wine that he laid down no limit for himself, but he did not allow himself to be confused by it.
― Confucian Analects: Rules of Confucius about his food
|
|
|
|
|
Thanks getting in touch.
I'm using Java 8
Are you able help?
|
|
|
|
|
Hi
I'm a Regex novice (very much learning as I go!) and I am trying to write a regex expression to capture the minimum and maximum temperature values in the string below. I'm hoping someone may be able to help e see where I have gone wrong?
class="outsideimage m0200006341e8" title=""></td>
</tr>
<tr>
<th>Humidity</th>
<td>100%</td>
</tr>
<tr>
<th valign="top">Temp Min</th>
<td valign="top" class="mm">3.1°C</td>
</tr>
<tr>
<th valign="top">Temp Max</th>
<td valign="top" class="mm">5.7°C</td>
I have to include the m00006341e8 in the string to match as there are several other sensors that report in the overall text string and they all have a different serial number. I need to just extract the max and min values for this particular sensor. I think I've allowed for the fact that the humidity value may change
I have tried the following, but it doesn't seem to be working:
For the minimum temperature:
class="outsideimage m0200006341e8" title=""></td></tr><tr><th>Humidity</th><td>[0-9]*\%</td></tr><tr><th valign="top">Temp Min</th><td valign="top" class="mm">([-]?[0-9]*[.]?[0-9]?)
and for the maximum temperature:
class="outsideimage m0200006341e8" title=""></td></tr><tr><th>Humidity</th><td>[0-9]*\%</td></tr><tr><th valign="top">Temp Min</th><td valign="top" class="mm">[-]?[0-9]*[.]?[0-9]?°C</td></tr><tr><th valign="top">Temp Max</th><td valign="top" class="mm">([-]?[0-9]*[.]?[0-9]?)°C</td>
|
|
|
|
|
Don't try to use regular expressions to parse HTML. Use a library which is designed for the job instead.
For example, in .NET you should use either AngleSharp[^] or Html Agility Pack[^].
You may also need to look at the surrounding HTML - based on the fragment you've shown, it's not clear whether the data from the other sensors is sufficiently separated from the data you're trying to extract.
"These people looked deep within my soul and assigned me a number based on the order in which I joined."
- Homer
|
|
|
|
|
Thanks Richard. Unfortunately, in the software I am using, I don't have an alternative and have to use Regex since that is all that is supported. I've since realised that it IS capturing the minimum temperature correctly, but it is not returning the Maximum temperature for some reason.
|
|
|
|
|
Check the raw source of the string you're trying to match. It could be that the ° character is actually ° , ° , or ° . Or it could be a different Unicode character entirely - for example:
º = º / º / º ˚ = ˚ / ˚ / ˚ ᵒ = ᵒ / ᵒ ゜ = ゜ / ゜ ᣞ = ᣞ / ᣞ ⁰ = ⁰ / ⁰
Your minimum temperature regex doesn't try to match the character, but your maximum temp regex does.
"These people looked deep within my soul and assigned me a number based on the order in which I joined."
- Homer
|
|
|
|
|
Hello Community,
I'm trying to compile a regular expression that will search for strings that exclude certain characters.
For example, the following string value has 6 leading 0's
000000120
The next string excludes the 0's
121
122
I would like a regular expression that can find strings without the leading 0's and then add the 0's to it.
Therefore, 121, and 122 would become 000000121 and 000000122.
Can you help with this?
Thanks
Carlton
|
|
|
|
|
How about:
\b[1-9]\d*\b Demo[^]
Adding the correct number of leading zeros will depend on the language you're using. For example, in C#:
string output = Regex.Replace(input, @"\b[1-9]\d*\b", match => match.Value.PadLeft(9, '0')); In Javascript:
const output = input.replace(/\b[1-9]\d*\b/g, match => match.padStart(9, '0')); Demo[^]
"These people looked deep within my soul and assigned me a number based on the order in which I joined."
- Homer
|
|
|
|
|
This is fantastic. However, my platform uses javase and the following regex was able to pick out the strings without leading 0's e.g. 121 and 122. However, the regex "(9, '0')" doesn't replace 121, and 122 with 000000121 and 000000122. Nevertheless, this is great.
\b[1-9]\d*\b</
|
|
|
|
|
Does it have to be regex?
There's
StringUtils.leftPad () if you want to pad with leading 0s, as long as you know the total length you want. Or use a format string to do it.
String paddedStr = String.format("%09d", originalVal); (I think)
|
|
|
|
|
I need to get the values from below following html snippet. So far I came up with this regex which helps me trim it down to the values I needed, but to automate this I need to join 2 regex statements to get the result "18" which is where I am stuck at. Or Please suggest a better method for me get the values.
I am using WebHarvey scraping tool. The program is based on .net but it doesn't support inserting .net code so I need only regex command.
First Regex Statement
(?s)(?<=attribute bathroom).+?(?=\/span)
Result:
" title="Bathrooms" style=" ">
<span class="value" style=" ">18<
Second Regex Statement
(?s)(?<=<span class="value" style=" ">).+?(?=<)
Result: 18
HTML Snippet
<ul class="iconContainer" style=" ">
<li class="attribute propertyId">
xxx1
</li>
<li class="attribute propertyType">
Factory
</li>
<li class="attribute bathroom" title="Bathrooms" style=" ">
18
</li>
<li class="attribute carspace" title="Car Spaces" style=" ">
18
</li>
<li class="attribute landArea">
<span title="Land Area">
5,010<span class="unit">mclass="superscript"></span>
</span>
<span>|</span>
<span title="Floor Area">
9,270<span class="unit">m^__b class="superscript">2</span>
</span>
</li>
</ul>
|
|
|
|
|
Please do not repost the same question. You can easily edit your own questions if you need to add more details.
|
|
|
|
|
Don't try to use Regex to parse an HTML document. You'll end up with an extremely fragile solution, where even the slightest change to the source document will cause it to break.
Use a proper HTML parsing library instead - for example, AngleSharp[^].
"These people looked deep within my soul and assigned me a number based on the order in which I joined."
- Homer
|
|
|
|
|
In my question I have mentioned "
I am using WebHarvey scraping tool. The program is based on .net but it doesn't support inserting .net code so I need only regex command. "
I cannot use any solution except using regex in this tool. When 2 of my regex statements are bringing the result I wanted then I am pretty sure using regex can get the solution needed but due to lack of knowledge I am stuck here.
Parsing HTML with regex is not best practice but I am willing to take the risk. Suggest a solution please.
|
|
|
|
|
He was saying instead of using WebHarvery, use AngleSharp instead.
|
|
|
|
|
I'd suggest getting a better scraping tool, or writing your own.
Given the sample input, this regex should match:
(?<=class="attribute bathroom"[^>]*>\s*<span[^>]*>)[^<]+ Demo[^]
"These people looked deep within my soul and assigned me a number based on the order in which I joined."
- Homer
|
|
|
|
|
Hi, I'd lke to subsitute the string Alfa("Beta") to Alfa(Gamma("Beta")) using regular expressions in Visual Studio using regexp.
The first part is simple, the search string will be Alfa\("(.*)"\)
But how to specify the replacement string? I used Alfa\(Gamma\("(.*)"\)\) , but the result was Alfa(Gamma("(.*)")) and not the requested Alfa(Gamma("Beta"))
Thank you for your advice in advance
|
|
|
|
|
|
Thank you! 
|
|
|
|
|
Hi Richard,
I used your advice and it worked perfectly. The search string was alfa\("(.*)"\) and the regexp substitute string was alfa(gamma("$1")) Thus I obtained the wished result string alfa(gamma("beta"))
But one more question: I encountered an input string alfa("beta","delta") and the wished result string should be alfa(gamma("beta"),"delta"), but I obtained alfa(gamma("beta","delta"))
How to change regexp to achieve this?
Thank you, best regards,
Michael
|
|
|
|
|
Maybe so?
alfa\("([^"]*)"(.*)
replace with
alfa(gamma("$1")$2
|
|
|
|
|
|
Hi
I have a line in my csv file as below
""|*"I have delimiter |* and an escaped \" quote in me"|*100|*200|*300|*"am a string"|*""
I have to interpret " quote as text-qualifier and |* as delimiter. I have to ignore escaped quote \" and consider it part of the string. 100, 200, 300 are integer data fields, so, they are not surrounded by text-qualifier.
The expected result is an array of strings.
a[0] = "" which is a Null string
a[1] = "I have delimiter |* and an escaped \" quote in me"
a[2] = "100"
a[3] = "200"
a[4] = "300"
a[5] = "am a string"
a[6] = "" which is a Null string
Code is as below, it looks like \" is not getting escaped properly, could you please let me know how to fix this, thanks.
The RegularExpression code is as in here: Split Function that Supports Text Qualifiers[^]
using System.Text.RegularExpressions;
public string[] Split(string expression, string delimiter,
string qualifier, bool ignoreCase)
{
string _Statement = String.Format
("{0}(?=(?:[^{1}]*{1}[^{1}]*{1})*(?![^{1}]*{1}))",
Regex.Escape(delimiter), Regex.Escape(qualifier));
RegexOptions _Options = RegexOptions.Compiled | RegexOptions.Multiline;
if (ignoreCase) _Options = _Options | RegexOptions.IgnoreCase;
Regex _Expression = New Regex(_Statement, _Options);
return _Expression.Split(expression);
}
|
|
|
|