data-dependency

1 posts

line

Code Quality Improvement Techniques Part 29 (opens in new tab)

Complexity in software often arises from "Gordian Variables," where tangled data dependencies make the logic flow difficult to trace and maintain. By identifying and designing an ideal intermediate data structure, developers can decouple these dependencies and simplify complex operations. This approach replaces convoluted conditional checks with a clean, structured data flow that highlights the core business logic. ## The Complexity of Tangled Dependencies Synchronizing remote data with local storage often leads to fragmented logic when the relationship between data IDs and objects is not properly managed. * Initial implementations frequently use set operations like `subtract` on ID lists to determine which items to create, update, or delete. * This approach forces the program to re-access original data sets multiple times, creating a disconnected flow between identifying a change and executing it. * Dependency entanglements often necessitate "impossible" runtime error handling (e.g., `error("This must not happen")`) because the compiler cannot guarantee data presence within maps during the update phase. * Inconsistent processing patterns emerge, where "add" and "update" logic might follow one sequence while "delete" logic follows an entirely different one. ## Designing Around Intermediate Data Structures To untangle complex flows, developers should work backward from an ideal data representation that categorizes all possible states—additions, updates, and deletions. * The first step involves creating lookup maps for both remote and local entries to provide O(1) access to data objects. * A unified collection of all unique IDs from both sources serves as the foundation for a single, comprehensive transformation pass. * A specialized utility function, such as `partitionByNullity`, can transform a sequence of data pairs (`Pair<Remote?, Local?>`) into three distinct, non-nullable lists. * This transformation results in a `Triple` containing `createdEntries`, `updatedEntries` (as pairs), and `deletedEntries`, effectively separating data preparation from business execution. ## Improved Synchronization Flow Restructuring the function around categorized lists allows the primary synchronization logic to remain concise and readable. * The synchronization function becomes a sequence of two phases: data categorization followed by execution loops. * By using the `partitionByNullity` pattern, the code eliminates the need for manual null checks or "impossible" error branches during the update process. * The final implementation highlights the most important part of the code—the `forEach` blocks for adding, updating, and deleting—by removing the noise of ID-based lookups and set mathematics. When faced with complex data dependencies, prioritize the creation of a clean intermediate data structure over-optimizing individual logical branches. Designing a data flow that naturally represents the different states of your business logic will result in more robust, self-documenting, and maintainable code.