Assume the following problem, you have myriad of word files and you want to search inside files for a specific keyword and opening each file one by one is not very efficient way to do things, would be nice if there was a grep like tool for docx files? Well there is not a dedicated one but you still can use grep in combination with a simple trick! follow me in this article and see how!
The Elegant Fix: Pandoc + Grep
pandoc can extract clean text from Word files. By piping its output to grep, you can search .docx files directly from your terminal.
Example:
pandoc file.docx -t plain | grep "keyword"Here:
pandoc file.docx -t plainconverts the Word document into plain text.grepsearches the extracted content with your usual options.
This works on Linux, macOS, and Windows (via Git Bash or WSL).
Create a Convenient Alias: wgrep
To avoid typing the full command every time, define a reusable alias.
Add this line to your ~/.bashrc or ~/.zshrc:
function wgrep() {
local file="$1"
shift
echo ">>> $file"
pandoc "$file" -t plain | grep "$@"
}
export -f wgrepThen reload your shell:
source ~/.bashrcNow you can run something like:
wgrep mycv.docx -i certifica
>>> mycv.docx
🎓 Certificate in Translation (English-Greek)
Certifications
- MoR Foundation Certificate in Risk Management (October 2024)
- MSP Foundation Certificate in Programme Management (September 2024)
- French – B2 (D.E.L.F. Certificate)Or include any grep flags you want:
wgrep file.docx -i "keyword" # Case-insensitive
wgrep file.docx -E "error|failure" # Regex
wgrep file.docx -n "TODO" # Show line numbersSearching Across Many .docx Files
A very big advantage of a command line tool is that can be used in automation scenarios were you need to search in multiple files very fast! to do this use the following command example
find . -name "*.docx" | xargs -I {} bash -c 'wgrep "$@"' _ {} -i Certifica
>>> ./mycv1.docx
🎓 Certificate in Translation (English-Greek)
Certifications
- MoR Foundation Certificate in Risk Management (October 2024)
- MSP Foundation Certificate in Programme Management (September 2024)
- French – B2 (D.E.L.F. Certificate)
>>> ./mycv.docx
🎓 Certificate in Translation (English-Greek)
Certifications
- MoR Foundation Certificate in Risk Management (October 2024)
- MSP Foundation Certificate in Programme Management (September 2024)
- French – B2 (D.E.L.F. Certificate)This command will search recursively all files in this directory and print any match along with the filename.
Windows solution
If you need this tool more likely you are not a developer but rather a person who works with Documents, you might have a Mac which makes the previous examples fully working but probably you work on Windows, lets make a Windows / Power shell version of the tool and examples of how to use it versus multiple files, note that i dont own a windows computer and i will rely on power shell for Mac, write me in the comments if the power shell solution worked on Windows for you!
Install pandoc on windows
First we need to install pandoc on windows, to do this go to this site https://github.com/jgm/pandoc/releases/tag/3.8.2 and download the msi file (the installer for windows) and install it on your computer.
Creating the power shell script
Save the following file as wgrep.ps1
#!/usr/bin/env pwsh
<#
.SYNOPSIS
Search text inside .docx files using Pandoc and grep (or Select-String on Windows if grep is missing)
.EXAMPLE
./wgrep.ps1 -Pattern "sofia" -CaseInsensitive
.EXAMPLE
./wgrep.ps1 -Pattern "error|fail" -Regex -Path "./docs"
#>
param(
[Parameter(Mandatory=$true)][string]$Pattern,
[string]$Path = ".",
[switch]$Regex,
[switch]$CaseInsensitive
)
# Detect platform search command
$hasGrep = Get-Command grep -ErrorAction SilentlyContinue
if ($hasGrep) {
$searchCmd = "grep"
} else {
$searchCmd = "Select-String"
}
# Get all DOCX files recursively
$files = Get-ChildItem -Path $Path -Recurse -Filter *.docx -ErrorAction SilentlyContinue
foreach ($file in $files) {
Write-Host ">>> $($file.FullName)" -ForegroundColor Cyan
# Convert DOCX to plain text via Pandoc
try {
$text = & pandoc $file.FullName -t plain 2>$null
} catch {
Write-Warning "Pandoc failed on $($file.FullName)"
continue
}
if ($searchCmd -eq "grep") {
$args = @()
if ($CaseInsensitive) { $args += "-i" }
if ($Regex) { $args += "-E" }
$args += $Pattern
# Run grep
$text | & grep @args
} else {
# Use PowerShell Select-String fallback
$options = @{
Pattern = $Pattern
InputObject = $text
}
if ($CaseInsensitive) { $options['CaseSensitive'] = $false }
if ($Regex) { $options['SimpleMatch'] = $false } else { $options['SimpleMatch'] = $true }
$matches = Select-String @options
if ($matches) { $matches | ForEach-Object { $_.Line } }
}
}And run it vs a docx file, we can see it works
./wgrep.ps1 -Pattern "Sofia" -Path mycv1.docx
>>> /Users/kpatronas/Downloads/mycv1.docx
- Bulgarian – B2 (University of Sofia)To run it vs multiple files in the same dir enter the following
./wgrep.ps1 -Pattern "Sofia" -Path .
>>> /Users/kpatronas/Downloads/mycv.docx
- Bulgarian – B2 (University of Sofia)
>>> /Users/kpatronas/Downloads/mycv1.docx
- Bulgarian – B2 (University of Sofia)The tool can also do the following apart from simple text search
pwsh ./wgrep.ps1 -Pattern "confidential"
pwsh ./wgrep.ps1 -Pattern "error|fail" -Regex
pwsh ./wgrep.ps1 -Pattern "sofia" -CaseInsensitive
pwsh ./wgrep.ps1 -Pattern "TODO" -Path "./reports"Conclusion
Text hidden inside Word files doesn't have to stay trapped in a GUI.
With Pandoc, you can turn .docx documents into plain text streams and combine them with tools like grep to perform instant, scriptable searches directly from your terminal.
This simple pairing removes the barrier between formatted office files and traditional text processing. Whether you're scanning one résumé or hundreds of reports, you can now search them all with the same precision and speed you use for code — no clicks, no Word, just pure command-line control.