Troubleshooting

This page covers common errors that we run into. If you don’t find an answer here, reach out to us on Slack or make an issue on the GitHub repository. Troubleshooting itself is a great skill, but we hope you spend more time coding and less time staring at error messages. In most cases, we’ve seen your problem before. There are no stupid questions!

:smile:

Table of Contents

General Tips

  1. It’s always a good idea to sync your fork with the main repo, especially if you’re seeing import errors.
    git remote add upstream https://github.com/pgh-public-meetings/city-scrapers-pitt.git
    git fetch upstream
    git merge upstream/master
    
  2. Run pipenv sync --dev --three to make sure you have all dependencies installed.
  3. Check that your virtual environment is in fact running and that your prompt looks something like
    (city-scrapers-pitt) $
    

    If not, simply run pipenv shell to get back in.

  4. It can be helpful to distinguish between an issue that affects just your spider versus the entire project / environment. Try running a different spider and see if you get the same error messages.

No module named pathlib2

Problem: A dependency is missing. We fixed this in a recent update. Solution: Sync your fork to get this dependency:

git remote add upstream https://github.com/pgh-public-meetings/city-scrapers-pitt.git
git fetch upstream
git merge upstream/master

Original Issue

Twisted Error using pipenv install twisted

Problem: A dependency is missing.

At least three contributors have had this problem, possibly for different reasons.

Solution: Try the steps described here

Updates from “upstream” not integrating in local environment

Problem: Updates from upstream are not integrating into the local environment.

Solution: Try:

git rebase upstream/master

Original Issue

SSL: CERTIFICATE_VERIFY_FAILED

Problem: An SSL environment variable was not set or is incorrect.

Windows/Linux Solution: Here

Mac OS Solution: Try running:

/Applications/Python\ PYTHON VERSION/Install\ Certificates.command

Replace PYTHON VERSION with your version of Python. You can find that by running python3 -V. In our case it is Python 3.6 so we will run:

/Applications/Python\ 3.6/Install\ Certificates.command

If that still fails, go here for more options.

Original Issue

city-scrapers-core module not found

Problem: The city-scrapers-core module, which provides helper methods for running spiders, is missing.

Solution:

  1. First, make sure you are in the correct directory. If you’re not in the proper directory path, this module may not be found. Make sure you are in the topmost location of your cityscrapers directory.
  2. Try uninstalling and reinstalling city-scrapers-core with pipenv install city-scrapers-core
  3. Sync your fork:
    git remote add upstream https://github.com/pgh-public-meetings/city-scrapers-pitt.git
    git fetch upstream
    git merge upstream/master
    

Original Issue

Different behavior between Scrapy and Pytest on an ASP.NET Site

Problem: A Spider passes against the test HTML file generated by scrapy genspider... but fails when run with Scrapy on the real website.

Solution: Windows ASP.NET encodes new line characters differently than Mac OS / Linux systems. This means that when run against the real thing instead of the test file, the same spider will need to manually strip the \r characters to convert them into a Unix-friendly format. Run scrapy shell <YOUR URL> on your website, and if the response HTML contains \r\n it is likely that this website uses a Windows style encoding and will present this problem.

Original Issue