Skip to content

Query Collector

Dorai Thodla edited this page Nov 22, 2020 · 1 revision

Goal:

Collect SQL queries from open source projects on github to study them

Input:

  1. Programming Language (like Python)
  2. Number of projects to scan (n)

Output:

  1. SQL queries in the project (note that some projects may be using ORMs like SQLAlchemy or internal ORMs like Django Projects)

Process:

  1. Search for projects developed in the specified (user requested) programming language
  2. Pick the top n projects (n=100 by default)
  3. Analyze the repositories (that latest commits)
  4. Find whether they use sql databases (a scan of import statements may be a good place to start)
  5. Find the SQL strings and extract them (they may be buried in the code (which is bad) or in separate files or stored procedures)
Clone this wiki locally