VB.net
VB.NET에서 정규식 사용하기
by 호야호잇
2018. 9. 17.
VB.NET 2008 에서 정규식 사용하기
정규식 클래스 : System.Text.RegularExpressions.Regex
예제1 - 일치하는 패턴 목록 구하기
Dim regexPattern As String Dim regex As Regex Dim regexMatches As MatchCollection Dim strSource As String strSource = "~~~~~원본 문자열~~~~~" regexPattern = "(?<lbl>([\x27]([l]|[L])([b]|[B])([l]|[L])_\w*[\x27])|([\x22]([l]|[L])([b]|[B])([l]|[L])_\w*[\x22]))" regex = New Regex(regexPattern) regexMatches = regex.Matches(strSource)
For j As Integer = 0 To regexMatches.Count - 1 '찾은 문자열에 대해서 어떤 구체적인 작업 Next |
예제2 - 일치하는 패턴 치환하기
The program creates a Regex object, passing its constructor a regular expression pattern that will identify text to replace. It calls the object's Replace method, passing it the replacement pattern. |
| Private Sub btnGo_Click(ByVal sender As System.Object, _ ByVal e As System.EventArgs) Handles btnGo.Click Try Dim reg_exp As New Regex(txtPattern.Text) lblResult.Text = reg_exp.Replace(txtInput.Text, _ txtReplacementPattern.Text) Catch ex As Exception MessageBox.Show(ex.Message) End Try End Sub | The trick in this example lies in the search and replacement patterns. In this example, the search pattern is "(?m)^([^,]*), (.*)$". The pieces of this expression have the following meanings:(?m) | This is an option directive that indicates a multi-line string. This makes the ^ and $ characters match the beginning and end of a line rather than the beginning and end of the string. | ^ | Match the beginning of a line. | ([^,]*) | Match any character other than comma any number of times. This part is enclosed in parentheses so it forms the first match group. | , | Match a comma followed by a space. | (.*) | Match any character any number of times. This part is enclosed in parentheses so it forms the second match group. | $ | Match the end of the line. |
The replacement pattern is "$2 $1". This says to replace the stuff that was matched with the second match group, a space, and then the first match group. This example takes this text: Archer, Ann Baker, Bob Carter, Cindy Deevers, DanAnd converts it into this: Ann Archer Bob Baker Cindy Carter Dan Deevers |
정규식 표현법
Metacharacters Defined |
---|
MChar | Definition |
---|
^ | Start of a string. | $ | End of a string. | . | Any character (except \n newline) | | | Alternation. | {...} | Explicit quantifier notation. | [...] | Explicit set of characters to match. | (...) | Logical grouping of part of an expression. | * | 0 or more of previous expression. | + | 1 or more of previous expression. | ? | 0 or 1 of previous expression; also forces minimal matching when an expression might match several strings within a search string. | \ | Preceding one of the above, it makes it a literal instead of a special character. Preceding a special matching character, see below. |
| |
Character Escapes http://tinyurl.com/5wm3wl |
---|
Escaped Char | Description |
---|
ordinary characters | Characters other than . $ ^ { [ ( | ) ] } * + ? \ match themselves. |
\a | Matches a bell (alarm) \u0007. |
\b | Matches a backspace \u0008 if in a []; otherwise matches a word boundary (between \w and \W characters). |
\t | Matches a tab \u0009. |
\r | Matches a carriage return \u000D. |
\v | Matches a vertical tab \u000B. |
\f | Matches a form feed \u000C. |
\n | Matches a new line \u000A. |
\e | Matches an escape \u001B. |
\040 | Matches an ASCII character as octal (up to three digits); numbers with no leading zero are backreferences if they have only one digit or if they correspond to a capturing group number. (For more information, see Backreferences.) For example, the character \040 represents a space. |
\x20 | Matches an ASCII character using hexadecimal representation (exactly two digits). |
\cC | Matches an ASCII control character; for example \cC is control-C. |
\u0020 | Matches a Unicode character using a hexadecimal representation (exactly four digits). |
\* | When followed by a character that is not recognized as an escaped character, matches that character. For example, \* is the same as \x2A. |
Character Classes http://tinyurl.com/5ck4ll |
---|
Char Class | Description |
---|
. | Matches any character except \n. If modified by the Singleline option, a period character matches any character. For more information, see Regular Expression Options. |
[aeiou] | Matches any single character included in the specified set of characters. |
[^aeiou] | Matches any single character not in the specified set of characters. |
[0-9a-fA-F] | Use of a hyphen (–) allows specification of contiguous character ranges. |
\p{name} | Matches any character in the named character class specified by {name}. Supported names are Unicode groups and block ranges. For example, Ll, Nd, Z, IsGreek, IsBoxDrawing. |
\P{name} | Matches text not included in groups and block ranges specified in {name}. |
\w | Matches any word character. Equivalent to the Unicode character categories [\p{Ll}\p{Lu}\p{Lt}\p{Lo}\p{Nd}\p{Pc}]. If ECMAScript-compliant behavior is specified with the ECMAScript option, \w is equivalent to [a-zA-Z_0-9]. |
\W | Matches any nonword character. Equivalent to the Unicode categories [^\p{Ll}\p{Lu}\p{Lt}\p{Lo}\p{Nd}\p{Pc}]. If ECMAScript-compliant behavior is specified with the ECMAScript option, \W is equivalent to [^a-zA-Z_0-9]. |
\s | Matches any white-space character. Equivalent to the Unicode character categories [\f\n\r\t\v\x85\p{Z}]. If ECMAScript-compliant behavior is specified with the ECMAScript option, \s is equivalent to [ \f\n\r\t\v]. |
\S | Matches any non-white-space character. Equivalent to the Unicode character categories [^\f\n\r\t\v\x85\p{Z}]. If ECMAScript-compliant behavior is specified with the ECMAScript option, \S is equivalent to [^ \f\n\r\t\v]. |
\d | Matches any decimal digit. Equivalent to \p{Nd} for Unicode and [0-9] for non-Unicode, ECMAScript behavior. |
\D | Matches any nondigit. Equivalent to \P{Nd} for Unicode and [^0-9] for non-Unicode, ECMAScript behavior. |
정규 표현식에서 사용하는 Special characters
\ | \ 다음에 나오는 특수 문자를 문자열로 인식 |
^ | 라인의 처음과 패턴과 매치 |
$ | 라인의 끝과 패턴과 매치 |
* | 0개 이상의 문자와 매치(모든것이라는 의미) |
+ | 1개 이상의 문자와 매치, {1,}와 같은 의미 |
? | 0 또는 1개의 문자 |
. | 1개의 문자와 일치 |
() | 한번 match를 수행해서 나온 결과를 기억 |
| | OR |
{n} | 정확히 n개의 문자 |
{n,} | n개 이상의 문자 |
{n,m} | n이상 m이하의 문자 |
[xyz] | 캐릭터 셋 |
[^xyz] | 네가티브(-) 캐릭터 셋 |
[\b] | 백스페이스와 매치 |
\b | 단어의 시작 또는 끝에서 빈 문자열과 매치 |
\B | 단어의 시작 또는 끝이 아닌 곳에서의 빈 문자열과 매치 |
\cX | control 문자와 매치 |
\d | 0부터 9까지의 아라비아 숫자와 매치. [0-9]과 같은 의미 |
\f | form-feed와 매치 |
\n | linefeed와 매치 |
\r | 캐리지 리턴과 매치 |
\s | 화이트스페이스 문자와 매치. [ \t\n\r\f\v]과 같은 의미 |
\S | \s가 아닌 문자들과 매치. [^ \t\n\r\f\v]과 같은 의미 |
\t | 탭 의미 |
\v | 수직 탭 의미 |
\w | w는 문자가 아닌 0, 1, 2, 3 ... 등과 같은 숫자를 의미 |
\W | W는 문자가 아닌 요소, 즉 % 등과 같은 특수 문자를 의미함 |
\n | n은 마지막 일치하는 문장, n은 1-9의 정수 |
정규 표현식과 함께 사용하는 함수들
exec | 문장에서 매치를 위해 검색을 수행하는 정규 표현식 메소드 |
test | 문장에서 매치를 위해 테스트하는 정규표현식 메소드 |
match | 문장에서 매치를 위해 검색을 수행하는 string 메소드 |
search | 문장에서 매치를 위해 테스트하는 string 메소드 |
replace | 문장에서 매치를 위해 검색을 실행하고 문장을 대체하는 String 메소드 |
split | 문장에서 매치하는 부분을 배열에 할당하는 String 메소드 |
Sample Expression
Pattern | Description |
^\d{5}$ | 5 numeric digits, such as a US ZIP code. |
^(\d{5})|(\d{5}-\d{4}$ | 5 numeric digits, or 5 digits-dash-4 digits. This matches a US ZIP or US ZIP+4 format. |
^(\d{5})(-\d{4})?$ | Same as previous, but more efficient. Uses ? to make the -4 digits portion of the pattern optional, rather than requiring two separate patterns to be compared individually (via alternation). |
^[+-]?\d+(\.\d+)?$ | Matches any real number with optional sign. |
^[+-]?\d*\.?\d*$ | Same as above, but also matches empty string. |
^(20|21|22|23|[01]\d)[0-5]\d$ | Matches any 24-hour time value. |
/\*.*\*/ | Matches the contents of a C-style comment /* … */ |
--.* SQL Comment --