Instaloctrack

 Instaloctrack

instaloctrack

TL;DR : ascineema, video of the project

A tool to scrape geotagged locations on Instagram profiles. Output in JSON & interactive map.

requirements

sudo apt install chromium-chromedriver && chmod a+x /usr/bin/chromedriver

🛠️ installation

git clone https://github.com/bernsteining/instaloctrack
cd instaloctrack
pip3 install .

Or use Docker:

sudo docker build -t instaloctrack -f Dockerfile .

Usage

instaloctrack -h
usage: instaloctrack [-h] [-t TARGET_ACCOUNT] [-l LOGIN] [-p PASSWORD] [-v]

Instagram location data gathering tool. Usage: python3 instaloctrack.py -t <target_account>

optional arguments:
  -h, --help            show this help message and exit
  -t TARGET_ACCOUNT, --target TARGET_ACCOUNT
                        Instagram profile to investigate
  -l LOGIN, --login LOGIN
                        Instagram profile to connect to, in order to access
                        the instagram posts of the target account
  -p PASSWORD, --password PASSWORD
                        Password of the Instagram profile to connect to
  -v, --visual          Spawns Chromium GUI, otherwise Chromium is headless

e.g.

instaloctrack -t <target_account>

If the target profile is private and you have an account following the target profile you can scrape the data with a connected session:

instaloctrack -t <target_account> -l <your_account> -p <your_password>

or with Docker:

sudo docker run -v /tmp/output:/tmp/output instaloctrack -t <target_account> -o /tmp/output

⚙️ How it works

First, we retrieve all the pictures links of the account by scrolling the whole Instagram profile, thanks to selenium's webdriver.

Then, we retrieve asynchronously (asyncio) each picture link, we check if it contains a location in the picture description, and retrieve the location's data if there's one, and the timestamp.

  • NB: Since 2018 Instagram deprecated its location API and it's not possible anymore to get the GPS coordinates of a picture, all we can retrieve is the name of the location. (If you can prove me that I'm wrong about this, please tell me!)

Because Instagram doesn't provide GPS coordinates, and we're only given names of places, we have to geocode these (.ie. get the GPS coords from the name's place).

For this, I used Nominatim's awesome API, which uses OpenStreetMap. For our usage, no API key is required, and we respect Nominatim's usage Policy by requesting GPS coordinatess once every second.

Eventually, once we have all the GPS coordinatess, we generate a HTML (thanks to jinja2 templating) with Javascript embedded that plots an Open Street Map (thanks to Leaflet library) with all our locations pinned. Once again, no API key is required for this step.

Also, the data collected by the script (location names, timestamps, GPS coordinates, errors) are dumped to a JSON file in order to be re-used.

Example

As an example, here's the output on the former French President's Instagram profile, @fhollande:

Map of @fhollande's locations on Instagram

The Heatmap:

Heatmap of @fhollande's locations on Instagram

Information available when clicking on a marker:

available data when clicking on a marker

Stats about the location data:

stats about the location data

The JSON data dump (just a part of it to show the format for a given location):

{
    "link": "https://www.instagram.com/p/-Q_9EvR9eu",
    "place": {
      "id": "290297",
      "name": "MusĂŠe du quai Branly - Jacques Chirac",
      "slug": "musee-du-quai-branly-jacques-chirac",
      "street_address": " 37 quai Branly",
      " zip_code": " 75007",
      " city_name": " Paris",
      " region_name": " ",
      " country_code": " FR"
    },
    "timestamp": "2015-11-19",
    "gps": {
      "lat": "48.8566969",
      "lon": "2.3514616"
    }
  }

Possible Improvements

  • Cleaner code :D
  • Factorize the geocoding function which is waaay too long and cryptic
  • Use beautifulsoup instead of regex parsing
  • Remove weird blank space caused by progress bar
  • Use other geocoding tools (e.g. https://geo.api.gouv.fr/adresse) than Nominatim when it fails? (specify arg?)
    • Use geopy ?
    • Use Overpass instead of Nominatim ?
  • Add an argument to select only a set of pictures (selected by date, or rank)
  • Time information about the duration of the script

Get A Weekly Email With Trending Projects For These Topics
No Spam. Unsubscribe easily at any time.
 python (45,841) 
 instagram (222) 
 map (197) 
 selenium (184) 
 osint (165) 
 scraper (156) 
 geolocation (89) 

Find Open Source By Browsing 7,000 Topics Across 59 Categories

Advertising đŸ“Ś10
All Projects
Application Programming Interfaces đŸ“Ś124
Applications đŸ“Ś192
Artificial Intelligence đŸ“Ś78
Blockchain đŸ“Ś73
Build Tools đŸ“Ś113
Cloud Computing đŸ“Ś80
Code Quality đŸ“Ś28
Collaboration đŸ“Ś32
Command Line Interface đŸ“Ś49
Community đŸ“Ś83
Companies đŸ“Ś60
Compilers đŸ“Ś63
Computer Science đŸ“Ś80
Configuration Management đŸ“Ś42
Content Management đŸ“Ś175
Control Flow đŸ“Ś213
Data Formats đŸ“Ś78
Data Processing đŸ“Ś276
Data Storage đŸ“Ś135
Economics đŸ“Ś64
Frameworks đŸ“Ś215
Games đŸ“Ś129
Graphics đŸ“Ś110
Hardware đŸ“Ś152
Integrated Development Environments đŸ“Ś49
Learning Resources đŸ“Ś166
Legal đŸ“Ś29
Libraries đŸ“Ś129
Lists Of Projects đŸ“Ś22
Machine Learning đŸ“Ś347
Mapping đŸ“Ś64
Marketing đŸ“Ś15
Mathematics đŸ“Ś55
Media đŸ“Ś239
Messaging đŸ“Ś98
Networking đŸ“Ś315
Operating Systems đŸ“Ś89
Operations đŸ“Ś121
Package Managers đŸ“Ś55
Programming Languages đŸ“Ś245
Runtime Environments đŸ“Ś100
Science đŸ“Ś42
Security đŸ“Ś396
Social Media đŸ“Ś27
Software Architecture đŸ“Ś72
Software Development đŸ“Ś72
Software Performance đŸ“Ś58
Software Quality đŸ“Ś133
Text Editors đŸ“Ś49
Text Processing đŸ“Ś136
User Interface đŸ“Ś330
User Interface Components đŸ“Ś514
Version Control đŸ“Ś30
Virtualization đŸ“Ś71
Web Browsers đŸ“Ś42
Web Servers đŸ“Ś26
Web User Interface đŸ“Ś210

ComentĂĄrios

Ebook

Postagens mais visitadas