Find | Match | Array Looping

Visual Basic for Applications

Data Processing

Excel

Find | Match | Array Looping

Using Find, Match or Loops are the basis of VBA automation. These approaches are the most frequently used types of locating the boundaries of data. In this post let’s learn all there is to know about the time it takes to find data, when to use which.

Visual Basic for Applications | Data Processing | Excel

< >

1. Introduction

Locating data in a range of values is an essential early step in most data processing projects and it often consists of taking the same steps to locate a dataset. Given the repetitiveness and similarity of such tasks, there are a few approaches that automate and help kickstart the process. Often one would jump between Find, Match and Looping arrays.

1.1 Generating the Test Data

Andreas Killer came up with a nice routine for generating test data. It was slightly modified so that we are able to provide the threshold when calling the function:

Function generate_test_data(ByVal dbl_threshold As Double)
Dim arr_test_data(1 To 100000, 1 To 2) As Variant
Dim lng_counter As Long
Rnd -1652 'Random seed
For lng_counter = LBound(arr_test_data) To UBound(arr_test_data, 1)
If Rnd > dbl_threshold Then arr_test_data(lng_counter, 1) = "foo"
If Rnd > dbl_threshold Then arr_test_data(lng_counter, 2) = "bar"
Next lng_counter
With ActiveWorkbook.Sheets(1)
.UsedRange.ClearContents
.Range("A1").Resize(UBound(arr_test_data, 1), UBound(arr_test_data, 2)).Value2 = arr_test_data
End With
End Function

We randomly generate foo and bar in column 1 and column 2 in a range of 100 000 rows, with the number of foos and bars controlled by the value of dbl_threshold.

Changing the dbl_threshold to a higher number will give few foos and bars, and changing it to a smaller value will give lots.

Since the foo and the bar are using different random numbers there will be rows with foo but no bar and bar but no foo, as well as rows with both foo and bar.

This makes it easy to test how the various methods compare with different densities of data.

2. Scope

Using the VBA function Timer - we will time how much time will it take to find every pair in the existing range.

The Timer function returns fractional portions of a second.

2.1 FIND

Function time_find(ByVal rng_test_data As Range) As Double
Dim lng_result_array As Long
Dim dbl_start_time As Double, dbl_end_time As Double
Dim rng_lookup_column As Range, rng_found_result As Range
Dim str_first_address As String
dbl_start_time = Timer
Set rng_lookup_column = rng_test_data.Resize(rng_test_data.Rows.Count, 1)
With rng_lookup_column
Set rng_found_result = .Find("foo", After:=.Cells(.Rows.Count, .Columns.Count), _
LookIn:=xlValues, SearchDirection:=xlNext, MatchCase:=False)
str_first_address = rng_found_result.Address
Do
Set rng_found_result = .FindNext(rng_found_result)
If rng_found_result.Offset(0, 1) = "bar" Then
lng_result_array = lng_result_array + 1
End If
Loop While Not rng_found_result Is Nothing And rng_found_result.Address <> str_first_address
End With
dbl_end_time = Timer
time_find = dbl_end_time - dbl_start_time
End Function

This Excel VBA function will loop over all of the sheets in an activated workbook and map their values to an array. It's your responsibility to activate the correct workbook prior to starting the function. The function will initialize and change the contents of an array defined by the user. The function call should be done in this way:

2.2 MATCH

Function time_match(ByVal rng_test_data As Range) As Double
Dim lng_match_position As Long, lng_result_array As Long
Dim dbl_start_time As Double, dbl_end_time As Double
Dim rng_lookup_column As Range
dbl_start_time = Timer
Set rng_lookup_column = rng_test_data.Resize(rng_test_data.Rows.Count, 1)
On Error GoTo Finish
Do
lng_match_position = Application.Match("foo", rng_lookup_column, False)
If rng_lookup_column(lng_match_position, 2) = "bar" Then
lng_result_array = lng_result_array + 1
End If
Set rng_lookup_column = rng_lookup_column.Resize(rng_lookup_column.Rows.Count - lng_match_position, 1).Offset(lng_match_position, 0)
Loop
Finish:
dbl_end_time = Timer
time_match = dbl_end_time - dbl_start_time
End Function

This Excel VBA function will extract the headers from an active sheet into an array. It's your responsibility to activate the correct workbook and sheet prior to starting the function. The function will initialize and change the contents of the array defined by the user. The user has the possibility to pick a predefined start cell or a whole range. The whole range can be chosen manualy if bPickWholeRange is set to True. The function call should be done in this way:

2.3 Array Looping

Function time_array(ByVal arr_test_data As Variant) As Double
Dim lng_array_counter As Long, lng_result_array As Long
Dim dbl_start_time As Double, dbl_end_time As Double
dbl_start_time = Timer
For lng_array_counter = LBound(arr_test_data, 1) To UBound(arr_test_data, 1)
If arr_test_data(lng_array_counter, 1) = "foo" And _
arr_test_data(lng_array_counter, 2) = "bar" Then
lng_result_array = lng_result_array + 1
End If
Next lng_array_counter
dbl_end_time = Timer
time_array = dbl_end_time - dbl_start_time
End Function

This Excel VBA function will extract the table of data from an active sheet into an array. It's your responsibility to activate the correct workbook and sheet prior to starting the function. The function will initialize and change the contents of the array defined by the user. The user has the possibility to pick a predefined start cell or a whole range. The whole range can be chosen manualy if bPickWholeRange is set to True. The function call should be done in this way:

3. Results

This Excel VBA function will create a new sheet with a user-defined name. The user must activate the desired workbook prior to calling the function. This function doesn't check if a sheet with the same name already exists and will halt the macro if a check hasn't been performed. The function call should be done in this way:

Sub compare_search()
Dim rng_test_data As Range
Dim int_trial_counter As Integer
Dim arr_results(1 To 100, 1 To 4) As Variant, arr_thresholds As Variant, _
arr_test_data As Variant, var_threshold As Variant
With ActiveWorkbook.Sheets(2)
.Cells(1, 1).Resize(1, UBound(arr_results, 2)).Value2 = Array("Threshold", "Find", "Match", "Array")
End With
arr_thresholds = Array(0.99, 0.9, 0.85, 0.8, 0.75, 0.7, 0.65, 0.6, 0.55, 0.5, 0.45, 0.4, 0.35, 0.3, _
0.25, 0.2, 0.15, 0.1, 0.05, 0.001)
For Each var_threshold In arr_thresholds
Call generate_test_data(var_threshold)
Set rng_test_data = ActiveWorkbook.Sheets(1).UsedRange
arr_test_data = rng_test_data.Value2
For int_trial_counter = LBound(arr_results, 1) To UBound(arr_results, 1)
arr_results(int_trial_counter, 1) = var_threshold
arr_results(int_trial_counter, 2) = time_find(rng_test_data)
arr_results(int_trial_counter, 3) = time_match(rng_test_data)
arr_results(int_trial_counter, 4) = time_array(arr_test_data)
Next int_trial_counter
With ActiveWorkbook.Sheets(2)
.Cells(.UsedRange.Rows.Count + 1, 1).Resize(UBound(arr_results, 1), 4).Value2 = arr_results
End With
Next var_threshold
End Sub

4. Conclusions

The implementation of the following functions can greately reduce execution times while providing enough flexibility to transform and validate data.

Microsoft Logo
Excel Logo