How To Search In Code Site-Wide For Any Text


By  October 14th, 2015





code-search


Ever wondered if a crucial piece of text or code is present site-wide? Maybe some analytics, tracking, or tag manager code?


Or how about when you need to find old email addresses, specific spelling errors or similar? This is where site-wide custom text search can help. With it you can find answers to questions like “which pages on my site are missing Google Analytics“, “how to find old Google Analytics code”, or “is Google Tag Manager placed at the right place on all pages”.


A1 Website Analyzer

One crawler tool that allows for custom search is our secret super tool A1 Website Analyzer. It can search in the full code of a page using regular expressions. Don’t know regular expressions? No worries; if your needs are simple, chances are you can simply write the text you are searching for or use one of the presets. But if you have complex needs, like finding variations of code blocks, regular expressions can be your savior.


Learning the basics of regular expressions will be one of the most valuable things you can do as a web developer or even just as a geek user. Besides finding the things you need, advanced search and replaces and similar, many code libraries also contains functions that use regular expressions.


If you already know regex or don’t care, you can skip right to the search tutorial itself.


Regex

regex


When using regular expressions it is important to understand special characters have special meaning:



  • “.+” will match any character one to infinite times.
  • “.*” will match any character zero to infinite times.
  • “.*?” will match any character until the next part of the regular expression code can match something.
  • “s*” will match any whitespace character zero to infinite times.
  • “s+” will match any whitespace character one to infinite times.
  • “s” will match one whitespace character one time.
  • “[0-9a-zA-Z]” will match an English lowercase/uppercase letter or digit one time.
  • “[^<]*” will match any character except “<” zero to infinite times.
  • “(center|centre)” will match “center” or “centre”
  • “(center|centre)?” like above, but will continue with the next regular expression part even if no match

Say we want to look for occurrences of the following text strings:



  • search engine peoples
  • Search Engine Peoples
  • Search Engine Professionals

This regex can find any and all of these:


(S|s)+earch (E|e)ngine (P|p)(rofessionals|eoples)


For more information on regular expressions, try these resources:



Code Search Tutorial

In this demonstration, we’ll configure A1 Website Analyzer to search for two types of Google Analytics code throughout all pages it crawls.


We first select the presets “ga_old” and “ga_new”:


a1wa-presets-custom-search-popup


 


When selecting them in the popup presets, they are automatically added to the dropdown list:


a1wa-presets-custom-search-dropdown


After we run the scan and inspect the results, we make sure to enable the column that shows custom search results.


a1wa-data-column-custom-search


This column will contain the results. Examples of how to read them:



  • Old and new analytics code found in the page:
    ga_old=1;ga_new=1
  • Old analytics code found once in the page:
    ga_old=1
  • Old analytics code found twice in th epage:
    ga_old=2

Taking It Further

Now is the time to insert your own regular expression search strings. Remember that from the presets you can see the format in A1 Website Analyzer is:


“name=expression”


This is because that besides the regular expression itself, A1 Website Analyzer also needs a “name” it can use for showing the site search results.


When you have written your new regular expression, e.g.


SEPMISSPELL=(S|s)+earch (E|e)ngine (P|p)(rofessionals|eoples)


you can add it using the [+] button:


a1wa-presets-custom-search-add


Example Searches

Some useful examples on how to add [+] searches for:


Google Tag Manager Code


If Google Tag Manager used in page:


gt=<iframe src=”http://www.googletagmanager.com/


Nofollow Present In Code


If “nofollow” used in any page links:


anf=<a [^>]*?rel=”?nofollow”?


(Note: A1 Website Analyzer already has functionality to show links found on a page – this includes information such as “nofollow”)


Frame Tag Used In Code


If “frame” tags used in page:


fra=<(iframe|frame)(s|>)


Having learned above, you are now ready to initiate crawls of websites doing site-wide custom searches of just about anything!






* Includes images from CyberHades, Pleuntje




About the Author:





My paid passion at Search Engine People sees me applying my passions and knowledge to a wide array of problems, ones I usually experience as challenges. People who know me know I love coffee.

Ruud Hein


How To Search In Code Site-Wide For Any Text
The post How To Search In Code Site-Wide For Any Text appeared first on Search Engine People Blog.


Search Engine People Blog

(76)

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.