Assume the following problem, you have myriad of word files and you want to search inside files for a specific keyword and opening each file one by one is not very efficient way to do things, would be nice if there was a grep like tool for docx files? Well there is not a dedicated one but you still can use grep in combination with a simple trick! follow me in this article and see how!

The Elegant Fix: Pandoc + Grep

pandoc can extract clean text from Word files. By piping its output to grep, you can search .docx files directly from your terminal.

Example:

pandoc file.docx -t plain | grep "keyword"

Here:

  • pandoc file.docx -t plain converts the Word document into plain text.
  • grep searches the extracted content with your usual options.

This works on Linux, macOS, and Windows (via Git Bash or WSL).

Create a Convenient Alias: wgrep

To avoid typing the full command every time, define a reusable alias.

Add this line to your ~/.bashrc or ~/.zshrc:

function wgrep() {
  local file="$1"
  shift
  echo ">>> $file"
  pandoc "$file" -t plain | grep "$@"
}
export -f wgrepThen reload your shell:
source ~/.bashrc

Now you can run something like:

wgrep mycv.docx -i certifica

>>> mycv.docx
🎓 Certificate in Translation (English-Greek)
Certifications
- MoR Foundation Certificate in Risk Management (October 2024)
- MSP Foundation Certificate in Programme Management (September 2024)
- French – B2 (D.E.L.F. Certificate)

Or include any grep flags you want:

wgrep file.docx -i "keyword"        # Case-insensitive
wgrep file.docx -E "error|failure"  # Regex
wgrep file.docx -n "TODO"           # Show line numbers

Searching Across Many .docx Files

A very big advantage of a command line tool is that can be used in automation scenarios were you need to search in multiple files very fast! to do this use the following command example

find . -name "*.docx" | xargs -I {} bash -c 'wgrep "$@"' _ {} -i Certifica

>>> ./mycv1.docx
🎓 Certificate in Translation (English-Greek)
Certifications
- MoR Foundation Certificate in Risk Management (October 2024)
- MSP Foundation Certificate in Programme Management (September 2024)
- French – B2 (D.E.L.F. Certificate)

>>> ./mycv.docx
🎓 Certificate in Translation (English-Greek)
Certifications
- MoR Foundation Certificate in Risk Management (October 2024)
- MSP Foundation Certificate in Programme Management (September 2024)
- French – B2 (D.E.L.F. Certificate)

This command will search recursively all files in this directory and print any match along with the filename.

Windows solution

If you need this tool more likely you are not a developer but rather a person who works with Documents, you might have a Mac which makes the previous examples fully working but probably you work on Windows, lets make a Windows / Power shell version of the tool and examples of how to use it versus multiple files, note that i dont own a windows computer and i will rely on power shell for Mac, write me in the comments if the power shell solution worked on Windows for you!

Install pandoc on windows

First we need to install pandoc on windows, to do this go to this site https://github.com/jgm/pandoc/releases/tag/3.8.2 and download the msi file (the installer for windows) and install it on your computer.

Creating the power shell script

Save the following file as wgrep.ps1

#!/usr/bin/env pwsh
<#
.SYNOPSIS
Search text inside .docx files using Pandoc and grep (or Select-String on Windows if grep is missing)

.EXAMPLE
./wgrep.ps1 -Pattern "sofia" -CaseInsensitive
.EXAMPLE
./wgrep.ps1 -Pattern "error|fail" -Regex -Path "./docs"
#>

param(
    [Parameter(Mandatory=$true)][string]$Pattern,
    [string]$Path = ".",
    [switch]$Regex,
    [switch]$CaseInsensitive
)

# Detect platform search command
$hasGrep = Get-Command grep -ErrorAction SilentlyContinue
if ($hasGrep) {
    $searchCmd = "grep"
} else {
    $searchCmd = "Select-String"
}

# Get all DOCX files recursively
$files = Get-ChildItem -Path $Path -Recurse -Filter *.docx -ErrorAction SilentlyContinue

foreach ($file in $files) {
    Write-Host ">>> $($file.FullName)" -ForegroundColor Cyan

    # Convert DOCX to plain text via Pandoc
    try {
        $text = & pandoc $file.FullName -t plain 2>$null
    } catch {
        Write-Warning "Pandoc failed on $($file.FullName)"
        continue
    }

    if ($searchCmd -eq "grep") {
        $args = @()
        if ($CaseInsensitive) { $args += "-i" }
        if ($Regex) { $args += "-E" }
        $args += $Pattern

        # Run grep
        $text | & grep @args
    } else {
        # Use PowerShell Select-String fallback
        $options = @{
            Pattern = $Pattern
            InputObject = $text
        }
        if ($CaseInsensitive) { $options['CaseSensitive'] = $false }
        if ($Regex) { $options['SimpleMatch'] = $false } else { $options['SimpleMatch'] = $true }

        $matches = Select-String @options
        if ($matches) { $matches | ForEach-Object { $_.Line } }
    }
}

And run it vs a docx file, we can see it works

./wgrep.ps1 -Pattern "Sofia" -Path mycv1.docx
>>> /Users/kpatronas/Downloads/mycv1.docx
- Bulgarian – B2 (University of Sofia)

To run it vs multiple files in the same dir enter the following

./wgrep.ps1 -Pattern "Sofia" -Path .
>>> /Users/kpatronas/Downloads/mycv.docx
- Bulgarian – B2 (University of Sofia)
>>> /Users/kpatronas/Downloads/mycv1.docx
- Bulgarian – B2 (University of Sofia)

The tool can also do the following apart from simple text search

pwsh ./wgrep.ps1 -Pattern "confidential"
pwsh ./wgrep.ps1 -Pattern "error|fail" -Regex
pwsh ./wgrep.ps1 -Pattern "sofia" -CaseInsensitive
pwsh ./wgrep.ps1 -Pattern "TODO" -Path "./reports"

Conclusion

Text hidden inside Word files doesn't have to stay trapped in a GUI. With Pandoc, you can turn .docx documents into plain text streams and combine them with tools like grep to perform instant, scriptable searches directly from your terminal.

This simple pairing removes the barrier between formatted office files and traditional text processing. Whether you're scanning one résumé or hundreds of reports, you can now search them all with the same precision and speed you use for code — no clicks, no Word, just pure command-line control.