Generating Realistic Test Files for Secure and Comprehensive Application Testing

This blog post explores how generating realistic test files can enhance secure and comprehensive application testing, a strategy effectively utilized in BizStream’s YouthCenter product to ensure robust performance without compromising sensitive data.

In software development, generating test files is essential for various scenarios, especially when dealing with sensitive information. Creating realistic test data that mimics production data allows developers to thoroughly test and debug their applications in a controlled environment while ensuring compliance with data protection regulations.

A multi-device display showcasing the YouthCenter application by BizStream, visible on a smartphone, tablet, and laptop screen. The application interface includes client summaries, case information, and dashboards tailored for juvenile case management.
YouthCenter application

I have been working on the YouthCenter application by BizStream for a while. YouthCenter is a comprehensive juvenile case management system used by organizations such as juvenile probation, courts, and detention centers. Recently, I needed to implement a feature to facilitate file migration within the application. Using live production data was not an option due to the risk of exposing Personally Identifiable Information (PII) or other sensitive data protected under regulations like HIPAA. Instead, I generated a variety of documents, including Word files, Excel spreadsheets, PDFs, and images. This approach allowed me to test the application effectively without compromising real data, providing a reliable testing ground that closely resembled real-world usage.

One of the main advantages of using generated test files is the ability to create realistic data sets that closely mimic real-world scenarios without exposing sensitive information. This ensures comprehensive testing of various application functionalities, such as file processing, uploads, and migrations, under conditions that reflect actual usage. It helps identify and fix potential issues that may only surface with large and varied data sets. Moreover, generating diverse file types and structures ensures that the application can handle different data formats and edge cases effectively. This method also supports performance testing by simulating high data loads, helping optimize the application’s performance and scalability. Ultimately, it enhances the reliability and robustness of the application, ensuring it performs well in production environments.

For the example below, the file generator will generate as many files as the user needs. File names and the content of the files are all generated. My personal preference is Bacon Ipsum, but the generation call can be modified to call any Ipsum service you would like.

Code Structure and Organization

To effectively generate various types of files for testing purposes, our code is organized into several key components:

  • Base file generator class
  • Specific file generators for each file type
    • DocxGenerator
    • XlsxGenerator
    • CsvGenerator
    • TxtGenerator
    • PdfGenerator
    • ImageGenerator
  • Utility classes

This modular structure enhances maintainability and scalability, allowing for easy addition of new file types or modification of existing ones. Below are the details and explanations of each code file.

1. Base File Generator Class

The base class, FileGenerator, provides common functionality for all file generators, such as generating random text and images. It also defines an abstract method Generate that each specific file generator must implement. This file requires the RestSharp package to make HTTP requests and Bogus for generating random data.

The FileGenerator class is an abstract base class that provides methods for generating random text and images. It includes an abstract method, Generate, that each derived class must implement. The GetBaconIpsumTextAsync method fetches random text from the Bacon Ipsum API using RestSharp, while the GetRandomImage method generates a random image with specified dimensions. The GetRandomFileName method uses the Bogus library to generate random file names.

File Path: Generators/FileGenerator.cs

Required Packages
  • RestSharp: For making HTTP requests to the Bacon Ipsum API.
  • Bogus: For generating random words and other test data.

This modular structure enhances maintainability and scalability, allowing for easy addition of new file types or modification of existing ones. Below are the details and explanations of each code file.

				
					
 public abstract class FileGenerator
    {
        public const int MaxParagraphs = 10;

        protected Random Random = new();

        public abstract Task Generate(string path);

        public virtual async Task<string> GetBaconIpsumTextAsync(int paragraphCount)
        {
            var client = new RestClient("https://baconipsum.com/api/");
            var request = new RestRequest()
                .AddParameter("type", "all-meat")
                .AddParameter("paras", paragraphCount)
                .AddParameter("format", "text");

            var response = await client.ExecuteAsync(request);

            if (response.IsSuccessful && response.Content != null)
            {
                return response.Content;
            }

            Console.WriteLine($"Failed to fetch Bacon Ipsum text. Status: {response.StatusCode}, Error: {response.ErrorMessage}");
            throw new Exception("Failed to fetch Bacon Ipsum text.");
        }

        public virtual byte[] GetRandomImage(int width, int height)
        {
            using (var bitmap = new Bitmap(width, height))
            {
                using (var g = Graphics.FromImage(bitmap))
                {
                    g.Clear(Color.FromArgb(Random.Next(256), Random.Next(256), Random.Next(256)));
                }
                using (var ms = new MemoryStream())
                {
                    bitmap.Save(ms, ImageFormat.Png);
                    return ms.ToArray();
                }
            }
        }

        public virtual string GetRandomFileName(int wordCount)
        {
            var lorem = new Bogus.DataSets.Lorem();
            return string.Join("_", lorem.Words(wordCount));
        }
    }
				
			

2. Word Document Generator

The DocxGenerator class is responsible for generating Word documents. It uses the DocX library to create and manipulate Word documents. The class inherits from FileGenerator and implements the Generate method, which creates a new Word document, inserts random text fetched from the Bacon Ipsum API, and saves the document to the specified path.

File Path: Generators/DocxGenerator.cs

Required Packages
  • DocX: For generating Word documents.
				
					public class DocxGenerator : FileGenerator
{
    public override async Task Generate(string path)
    {
        using var document = DocX.Create(path);
        var text = await GetBaconIpsumTextAsync(Random.Next(1, MaxParagraphs));
        document.InsertParagraph(text);
        document.Save();
    }
}
				
			

3. Excel Spreadsheet Generator

The XlsxGenerator class handles generating Excel spreadsheets using the ClosedXML library. It follows the same pattern as the DocxGenerator class, fetching random text from the Bacon Ipsum API and populating the Excel sheet with random words. The class ensures that the generated file is saved correctly to the specified path.

File Path: Generators/XlsxGenerator.cs

Required Packages
  • ClosedXML: For generating Excel spreadsheets.
				
					public class XlsxGenerator : FileGenerator
{
    private const int maxRows = 10;
    private const int maxColumns = 10;

    public override async Task Generate(string path)
    {
        using var workbook = new XLWorkbook();
        var worksheet = workbook.Worksheets.Add("Sheet1");
        var text = await GetBaconIpsumTextAsync(1);
        var words = text.Split(' ');
        for (var i = 1; i <= maxRows; i++)
        {
            for (var j = 1; j <= maxColumns; j++)
            {
                worksheet.Cell(i, j).Value = words[Random.Next(words.Length)];
            }
        }
        workbook.SaveAs(path);
    }
}
				
			

4. CSV File Generator

The CsvGenerator class generates CSV files by fetching random text from the Bacon Ipsum API, splitting it into words, and organizing it into a grid format. This class ensures the generated CSV files contain varied and realistic data, making them suitable for testing. I did not do any handling for commas in the Ipsum data returned, so data in the CSV files could potentially be misaligned. This was acceptable for my testing purposes, but can be handled within this generator if needed.

File Path: Generators/CsvGenerator.cs

Required Packages
  • N/A
				
					public class CsvGenerator : FileGenerator
{
    private const int maxRows = 10;
    private const int maxColumns = 10;
    public override async Task Generate(string path)
    {
        var text = await GetBaconIpsumTextAsync(1);
        var words = text.Split(' ');
        var sb = new StringBuilder();
        for (var i = 0; i < maxColumns; i++)
        {
            for (var j = 0; j < maxRows; j++)
            {
                sb.Append(words[Random.Next(words.Length)]);
                if (j < maxRows - 1) sb.Append(",");
            }
            sb.AppendLine();
        }
        await File.WriteAllTextAsync(path, sb.ToString());
    }
}
				
			

5. Text File Generator

The TxtGenerator class creates plain text files filled with random text fetched from the Bacon Ipsum API. This class ensures the generated text files are populated with realistic content, aiding in comprehensive testing.

File Path: Generators/TxtGenerator.cs

Required Packages
  • N/A
				
					public class TxtGenerator : FileGenerator
{
    public override async Task Generate(string path)
    {
        var content = await GetBaconIpsumTextAsync(Random.Next(1, MaxParagraphs));
        File.WriteAllText(path, content);
    }
}
				
			

6. PDF File Generator

The PdfGenerator class handles generating PDF files using the PdfSharp library. It fetches random text from the Bacon Ipsum API, breaks it into lines that fit within the page width, and adds new pages as needed to accommodate the text. This class ensures that the generated PDFs are filled with realistic, multi-page content.

File Path: Generators/PdfGenerator.cs

Required Packages
  • PdfSharp: For generating PDF files.
				
					public class PdfGenerator : FileGenerator
{
    private new const int MaxParagraphs = 5;
    private static readonly XSize PageSize = new(595, 842); // Standard A4 size in points

    public override async Task Generate(string path)
    {
        var text = await GetBaconIpsumTextAsync(Random.Next(1, MaxParagraphs));
        var document = new PdfDocument();
        var page = document.AddPage();
        var font = new XFont("Verdana", 20, XFontStyleEx.Bold);
        var rect = new XRect(40, 40, page.Width - 80, page.Height - 80); // Margins of 40

        var lines = BreakTextIntoLines(text, font, rect.Width);

        DrawTextOnPages(document, lines, font, rect);
        document.Save(path);
    }

    private void DrawTextOnPages(PdfDocument document, string[] lines, XFont font, XRect rect)
    {
        var format = new XStringFormat
        {
            Alignment = XStringAlignment.Near,
            LineAlignment = XLineAlignment.Near
        };

        var y = rect.Top;
        var lineHeight = font.Height;
        PdfPage page = null;
        XGraphics gfx = null;

        foreach (var line in lines)
        {
            if (y + lineHeight > rect.Bottom || page == null)
            {
                if (gfx != null)
                {
                    gfx.Dispose();
                }
                page = document.AddPage();
                gfx = XGraphics.FromPdfPage(page);
                y = rect.Top;
            }
            gfx.DrawString(line, font, XBrushes.Black, new XRect(rect.Left, y, rect.Width, rect.Height), format);
            y += lineHeight;
        }

        if (gfx != null)
        {
            gfx.Dispose();
        }
    }

    private string[] BreakTextIntoLines(string text, XFont font, double maxWidth)
    {
        var words = text.Split(' ');
        var lines = new System.Collections.Generic.List<string>();
        var currentLine = string.Empty;

        using var gfx = XGraphics.CreateMeasureContext(PageSize, XGraphicsUnit.Point, XPageDirection.Downwards);
        foreach (var word in words)
        {
            var testLine = string.IsNullOrEmpty(currentLine) ? word : $"{currentLine} {word}";
            var size = gfx.MeasureString(testLine, font);

            if (!(size.Width > maxWidth))
            {
                currentLine = testLine;
                continue;
            }

            if (string.IsNullOrEmpty(currentLine))
            {
                lines.Add(word);
                currentLine = string.Empty;
                continue;
            }

            lines.Add(currentLine);
            currentLine = word;
        }

        if (!string.IsNullOrEmpty(currentLine))
        {
            lines.Add(currentLine);
        }

        return lines.ToArray();
    }
}
				
			

7. Folder Path Generator

The FolderPathGenerator class is responsible for generating random folder paths and ensuring they exist. This utility class helps manage the organization of generated files by creating new directories and keeping track of existing ones. This class ensures that each file generated can be placed in a randomly chosen folder, either new or pre-existing.

File Path: Utilities/FolderPathGenerator.cs

Required Packages
  • N/A
				
					public static class FolderPathGenerator
{
    private static readonly Random Random = new();
    private static readonly List<string> GeneratedPaths = new();
    public static int GeneratedPathsCount => GeneratedPaths.Count;

    public static string GenerateRandomPath(string rootPath)
    {
        var path = Path.Combine(rootPath, Path.GetRandomFileName());
        Directory.CreateDirectory(path);
        GeneratedPaths.Add(path);
        return path;
    }

    public static string GetRandomExistingPath()
    {
        if (GeneratedPaths.Count == 0) throw new InvalidOperationException("No existing paths available.");
        return GeneratedPaths[Random.Next(GeneratedPaths.Count)];
    }
}
				
			

8. Image File Generator

The ImageGenerator class is responsible for generating image files. It uses the .NET System.Drawing library to create images with random colors. This generator is useful for creating test image files that can be used to simulate image uploads and processing in applications. The images generated by this code may not be anything stunning (solid canvas in a random RGB color), but the files have proper formatting to be able to be opened in an image viewing application.

File Path: Generators/ImageGenerator.cs

Required Packages
  • N/A
				
					public class ImageGenerator : FileGenerator
{
    public override Task Generate(string path)
    {
        var imageData = GetRandomImage(800, 600);
        File.WriteAllBytes(path, imageData);
        return Task.CompletedTask;
    }
}
				
			

9. Main Program

The main program of the application serves as the entry point and coordinates the entire file generation process. It begins by setting the root path for file generation to the user’s desktop and prompts the user for the number of files to generate. The program then initializes an array of file generators, each capable of creating a specific type of file, such as Word documents, Excel spreadsheets, CSV files, text files, PDF files, and images.

In a loop that runs for the specified number of files, the program randomly decides whether to create a new folder or use an existing one for each file. It selects a random file generator from the array and generates a file with a random name and appropriate extension in the chosen directory. The program ensures asynchronous operations complete before proceeding, providing an efficient and organized way to generate a diverse set of test files. This setup simulates real-world file handling scenarios, enhancing the robustness and comprehensiveness of application testing.

File Path: Program.cs

Required Packages
  • RestSharp
  • ClosedXML
  • DocX
  • PdfSharp
  • Bogus
				
					
public class Program
{
    private static async Task Main(string[] args)
    {
        // Set the root path to the desktop
        var rootPath = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Desktop), "GeneratedFiles");

        Console.WriteLine("Enter the number of files to generate:");
        var fileCount = int.Parse(Console.ReadLine() ?? "10");

        var generators = new Generators.FileGenerator[]
        {
            new DocxGenerator(),
            new XlsxGenerator(),
            new CsvGenerator(),
            new TxtGenerator(),
            new PdfGenerator(),
            new ImageGenerator()
        };

        var random = new Random();

        for (int i = 0; i < fileCount; i++)
        {
            string folderPath;
            if (random.NextDouble() < 0.1 || FolderPathGenerator.GeneratedPathsCount == 0)
            {
                folderPath = FolderPathGenerator.GenerateRandomPath(rootPath);
            }
            else
            {
                folderPath = FolderPathGenerator.GetRandomExistingPath();
            }

            var generator = generators[random.Next(generators.Length)];
            var fileName = Path.Combine(folderPath, $"{generator.GetRandomFileName(random.Next(1, 6))}.{GetFileExtension(generator)}");
            Console.WriteLine($"Generating: {fileName}");
            await generator.Generate(fileName);
        }

        Console.WriteLine("All files generated.");
    }

    static string GetFileExtension(Generators.FileGenerator generator)
    {
        return generator switch
        {
            DocxGenerator => "docx",
            XlsxGenerator => "xlsx",
            CsvGenerator => "csv",
            TxtGenerator => "txt",
            PdfGenerator => "pdf",
            ImageGenerator => "png",
            _ => throw new NotImplementedException()
        };
    }
}
				
			

Final Thoughts

Creating an application to generate various types of test files (such as Word documents, Excel spreadsheets, CSV files, text files, PDF files, and images) ensures that developers can produce realistic and diverse data sets without using sensitive production data. This process is particularly useful for testing features like file processing, uploads, and migrations, allowing for comprehensive testing while maintaining data privacy and security. The application is organized into key components: a base file generator class, specific file generators for each file type, and utility classes. The main program coordinates the file generation process by prompting the user for the number of files to generate and leveraging these file generators to create content. By dynamically deciding whether to create new directories or use existing ones, the program effectively simulates real-world file organization. This setup enhances testing reliability, helping developers identify and fix potential issues, ultimately improving the application’s robustness and performance in production environments.

About the Author

Brandon Dekker

When Brandon was offered a position at BizStream, he almost couldn’t believe it! He gets to do what he loves daily in a fun environment – this is his dream job! In his free time, he hangs out with his Destiny 2 clan, writes more code, and works on cars.

Subscribe to Our Blog

Stay up to date on what BizStream is doing and keep in the loop on the latest in marketing & technology.